Bulk URL Checker with uv: Validate Website Accessibility in Python

Learn how to build a powerful URL checker script using uv that validates multiple websites concurrently, detects broken links, and generates detailed reports.

Bulk URL Checker with uv: Validate Website Accessibility in Python

Broken links suck. They annoy users, hurt your SEO, and make you look unprofessional. I wrote this script to check hundreds of URLs at once because manually clicking through links is a waste of time.

This tool checks URLs in parallel, categorizes what went wrong, and saves the broken ones to a file. I use it for auditing sites, checking external links, and monitoring APIs. It’s simple but gets the job done.

New to uv?

If you’re new to uv or want to learn how to set up full Python projects, start with our comprehensive guide Getting Started with uv: Setting Up Your Python Project in 2025 before diving into this advanced script.

What This Script Does

  • Checks multiple URLs at once: Uses ThreadPoolExecutor to run requests in parallel
  • Fixes URLs without protocols: Automatically adds HTTPS if missing
  • Catches different error types: Timeouts, connection errors, HTTP errors
  • Shows response times: See how fast each URL responds
  • Reads from files: Load URLs from a text file
  • Saves broken links: Writes problematic URLs to a file for review
  • Shows progress: Real-time counter while checking

The Script

Save this as url_checker.py:

#!/usr/bin/env -S uv run
# /// script
# dependencies = [
#     "requests",
# ]
# ///

import requests
from urllib.parse import urlparse
import time
from concurrent.futures import ThreadPoolExecutor, as_completed
import sys

def check_url(url, timeout=10):
    """
    Check if a URL is accessible and return status information.

    Args:
        url (str): The URL to check
        timeout (int): Timeout in seconds (default: 10)

    Returns:
        dict: Contains url, status, error_type, and response_time
    """
    # Add http:// if no scheme is provided
    if not url.startswith(('http://', 'https://')):
        url = 'https://' + url

    start_time = time.time()

    try:
        response = requests.get(url, timeout=timeout, allow_redirects=True)
        response_time = time.time() - start_time

        return {
            'url': url,
            'status': 'OK',
            'status_code': response.status_code,
            'error_type': None,
            'response_time': round(response_time, 2)
        }

    except requests.exceptions.Timeout:
        return {
            'url': url,
            'status': 'TIMEOUT',
            'status_code': None,
            'error_type': 'Connection timeout',
            'response_time': timeout
        }

    except requests.exceptions.ConnectionError as e:
        return {
            'url': url,
            'status': 'CONNECTION_ERROR',
            'status_code': None,
            'error_type': f'Connection error: {str(e)[:100]}...',
            'response_time': time.time() - start_time
        }

    except requests.exceptions.RequestException as e:
        return {
            'url': url,
            'status': 'ERROR',
            'status_code': None,
            'error_type': f'Request error: {str(e)[:100]}...',
            'response_time': time.time() - start_time
        }

def read_urls_from_file(filename):
    """Read URLs from a text file, one per line."""
    urls = []
    try:
        with open(filename, 'r', encoding='utf-8') as file:
            for line in file:
                url = line.strip()
                if url and not url.startswith('#'):  # Skip empty lines and comments
                    urls.append(url)
        return urls
    except FileNotFoundError:
        print(f"Error: File '{filename}' not found.")
        return []
    except Exception as e:
        print(f"Error reading file: {e}")
        return []

def check_urls_batch(urls, timeout=10, max_workers=10):
    """
    Check multiple URLs concurrently.

    Args:
        urls (list): List of URLs to check
        timeout (int): Timeout per request in seconds
        max_workers (int): Maximum number of concurrent threads

    Returns:
        list: List of results for each URL
    """
    results = []

    with ThreadPoolExecutor(max_workers=max_workers) as executor:
        # Submit all tasks
        future_to_url = {executor.submit(check_url, url, timeout): url for url in urls}

        # Process completed tasks
        for i, future in enumerate(as_completed(future_to_url), 1):
            result = future.result()
            results.append(result)

            # Progress indicator
            print(f"Checked {i}/{len(urls)} URLs: {result['url']} - {result['status']}")

    return results

def main():
    # Configuration
    filename = input("Enter the filename containing URLs (or press Enter for 'urls.txt'): ").strip()
    if not filename:
        filename = 'urls.txt'

    timeout = input("Enter timeout in seconds (or press Enter for 10): ").strip()
    timeout = int(timeout) if timeout.isdigit() else 10

    print(f"\nReading URLs from '{filename}'...")
    urls = read_urls_from_file(filename)

    if not urls:
        print("No URLs found to check.")
        return

    print(f"Found {len(urls)} URLs to check.")
    print(f"Using timeout: {timeout} seconds")
    print("-" * 50)

    # Check all URLs
    results = check_urls_batch(urls, timeout=timeout)

    # Separate problematic URLs
    problematic_urls = [r for r in results if r['status'] != 'OK']
    working_urls = [r for r in results if r['status'] == 'OK']

    print("\n" + "=" * 50)
    print("SUMMARY")
    print("=" * 50)
    print(f"Total URLs checked: {len(results)}")
    print(f"Working URLs: {len(working_urls)}")
    print(f"Problematic URLs: {len(problematic_urls)}")

    if problematic_urls:
        print("\n" + "=" * 50)
        print("PROBLEMATIC URLs")
        print("=" * 50)

        # Group by error type
        timeout_urls = [r for r in problematic_urls if r['status'] == 'TIMEOUT']
        connection_error_urls = [r for r in problematic_urls if r['status'] == 'CONNECTION_ERROR']
        other_error_urls = [r for r in problematic_urls if r['status'] == 'ERROR']

        if timeout_urls:
            print(f"\nTIMEOUT ERRORS ({len(timeout_urls)}):")
            for result in timeout_urls:
                print(f"  - {result['url']}")

        if connection_error_urls:
            print(f"\nCONNECTION ERRORS ({len(connection_error_urls)}):")
            for result in connection_error_urls:
                print(f"  - {result['url']}")
                print(f"    Error: {result['error_type']}")

        if other_error_urls:
            print(f"\nOTHER ERRORS ({len(other_error_urls)}):")
            for result in other_error_urls:
                print(f"  - {result['url']}")
                print(f"    Error: {result['error_type']}")

        # Save problematic URLs to file
        with open('problematic_urls.txt', 'w', encoding='utf-8') as f:
            f.write("# Problematic URLs found during check\n")
            f.write(f"# Checked on: {time.strftime('%Y-%m-%d %H:%M:%S')}\n\n")

            if timeout_urls:
                f.write("# TIMEOUT ERRORS\n")
                for result in timeout_urls:
                    f.write(f"{result['url']}\n")
                f.write("\n")

            if connection_error_urls:
                f.write("# CONNECTION ERRORS\n")
                for result in connection_error_urls:
                    f.write(f"{result['url']}\n")
                f.write("\n")

            if other_error_urls:
                f.write("# OTHER ERRORS\n")
                for result in other_error_urls:
                    f.write(f"{result['url']}\n")

        print(f"\nProblematic URLs saved to 'problematic_urls.txt'")

    if working_urls:
        print(f"\nWORKING URLs ({len(working_urls)}):")
        for result in working_urls:
            print(f"  ✓ {result['url']} (Status: {result['status_code']}, Time: {result['response_time']}s)")

if __name__ == "__main__":
    print("URL Connection Checker")
    print("=" * 30)
    main()

How It Works

FunctionWhat It Does
check_url()Checks one URL, handles errors, times the response
read_urls_from_file()Loads URLs from a text file, skips comments and empty lines
check_urls_batch()Runs multiple checks in parallel using threads
main()Handles user input, runs the checks, prints results

Error Types

  • OK: URL works fine
  • TIMEOUT: Took too long to respond
  • CONNECTION_ERROR: DNS or connection issues
  • ERROR: Other request failures

Running It

With uv, you just run the file. No setup needed.

Quick Start

  1. Create a URL file (urls.txt):
# URLs to check
https://www.google.com
https://www.github.com
https://nonexistent-website-12345.com
bitdoze.com
example.com
  1. Run it:
uv run url_checker.py
  1. Follow the prompts:
    • Press Enter for urls.txt or type a different filename
    • Press Enter for 10 second timeout or enter your own

Sample Output

URL Connection Checker
==============================
Enter the filename containing URLs (or press Enter for 'urls.txt'):
Enter timeout in seconds (or press Enter for 10):

Reading URLs from 'urls.txt'...
Found 8 URLs to check.
Using timeout: 10 seconds
--------------------------------------------------
Checked 1/8 URLs: https://www.google.com - OK
Checked 2/8 URLs: https://www.github.com - OK
Checked 3/8 URLs: https://www.stackoverflow.com - OK
Checked 4/8 URLs: https://nonexistent-website-12345.com - CONNECTION_ERROR
Checked 5/8 URLs: https://httpstat.us/500 - OK
Checked 6/8 URLs: https://httpstat.us/404 - OK
Checked 7/8 URLs: https://bitdoze.com - OK
Checked 8/8 URLs: https://example.com - OK

==================================================
SUMMARY
==================================================
Total URLs checked: 8
Working URLs: 7
Problematic URLs: 1

==================================================
PROBLEMATIC URLs
==================================================

CONNECTION ERRORS (1):
  - https://nonexistent-website-12345.com
    Error: Connection error: HTTPSConnectionPool(host='nonexistent-website-12345.com', port=443)...

Problematic URLs saved to 'problematic_urls.txt'

WORKING URLs (7):
  ✓ https://www.google.com (Status: 200, Time: 0.15s)
  ✓ https://www.github.com (Status: 200, Time: 0.23s)
  ✓ https://www.stackoverflow.com (Status: 200, Time: 0.18s)
  ✓ https://httpstat.us/500 (Status: 500, Time: 1.02s)
  ✓ https://httpstat.us/404 (Status: 404, Time: 0.98s)
  ✓ https://bitdoze.com (Status: 200, Time: 0.45s)
  ✓ https://example.com (Status: 200, Time: 0.32s)

Tips and Tricks

Custom Settings

Run with your own file and timeout:

uv run url_checker.py
# Enter: my_links.txt
# Enter: 5

Organize URL Files

Use separate files for different checks:

APIs (apis.txt):

https://api.github.com
https://httpbin.org/get

Social (social.txt):

https://twitter.com/myhandle
https://linkedin.com/in/me

Speed It Up

For lots of URLs, increase workers:

results = check_urls_batch(urls, timeout=timeout, max_workers=20)
URLsWorkersApprox Time
1-505-1010-30 sec
51-20010-1530-60 sec
200+15-251-3 min

Reading the Output

CodeMeaningWhat To Do
200Works fineNothing
301/302RedirectUpdate if permanent
404Not foundFix or remove link
500Server errorContact site owner
TimeoutToo slowCheck connection or increase timeout
Connection ErrorDNS/networkCheck URL spelling

Output File

problematic_urls.txt gets created with broken links organized by error type.

Use Cases

  • Site audits: Check your external links
  • SEO: Validate backlinks
  • API monitoring: Check endpoint health
  • Competitor tracking: Monitor if competitor sites are down

Customizations

Add User-Agent

Some sites block scripts without a user agent:

headers = {'User-Agent': 'Mozilla/5.0 ...'}
response = requests.get(url, headers=headers, ...)

Export to CSV

import csv

def save_to_csv(results, filename='results.csv'):
    with open(filename, 'w', newline='') as f:
        writer = csv.DictWriter(f, fieldnames=['url', 'status', ...])
        writer.writeheader()
        writer.writerows(results)

CI/CD Integration

# GitHub Actions - check URLs weekly
name: URL Check
on:
  schedule:
    - cron: "0 9 * * 1"

jobs:
  check:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v2
      - run: curl -LsSf https://astral.sh/uv/install.sh | sh
      - run: uv run url_checker.py

Wrap Up

That’s it. A simple script that checks URLs fast and tells you what’s broken. No virtual environments to manage, no dependencies to install manually. Just uv run and go.

I use this regularly to keep sites clean. Works for me, should work for you too.