proxywhirl.exports

Export functionality for generating web dashboard data.

This module provides functions to export proxy data and statistics for consumption by the web dashboard.

Functions

export_for_web(db_path, output_dir[, include_stats, ...])

Export data for the web dashboard.

generate_proxy_lists(storage, output_dir[, max_age_hours])

Generate proxy list text files and metadata.json from database.

generate_rich_proxies(storage[, include_geo, ...])

Generate rich proxy data from database.

generate_stats_from_files(proxy_dir)

Generate statistics from proxy list files and rich proxy data.

parse_proxy_url(url)

Parse proxy URL to extract IP and port.

Module Contents

async proxywhirl.exports.export_for_web(db_path, output_dir, include_stats=True, include_rich_proxies=True, include_proxy_lists=True, max_age_hours=72)[source]

Export data for the web dashboard.

Parameters:
  • db_path (pathlib.Path) – Path to SQLite database

  • output_dir (pathlib.Path) – Directory to write output files

  • include_stats (bool) – Whether to generate stats.json

  • include_rich_proxies (bool) – Whether to generate proxies-rich.json

  • include_proxy_lists (bool) – Whether to generate text files and metadata.json

  • max_age_hours (int) – Only include proxies validated within this time window. Default: 72 hours (36 runs at 2h schedule). Set to 0 to include all proxies.

Returns:

Mapping of output type to file path.

Return type:

dict[str, Path]

async proxywhirl.exports.generate_proxy_lists(storage, output_dir, max_age_hours=72)[source]

Generate proxy list text files and metadata.json from database.

Creates:
  • http.txt, https.txt, socks4.txt, socks5.txt (one proxy per line)

  • all.txt (combined with headers)

  • proxies.json (structured JSON with metadata)

  • metadata.json (counts and timestamp)

Parameters:
  • storage (proxywhirl.storage.SQLiteStorage) – SQLiteStorage instance to query

  • output_dir (pathlib.Path) – Directory to write output files

  • max_age_hours (int) – Only include proxies validated within this time window. Default: 72 hours (36 runs at 2h schedule). Set to 0 to include all proxies.

Returns:

Mapping of protocol name to proxy count.

Return type:

dict[str, int]

async proxywhirl.exports.generate_rich_proxies(storage, include_geo=True, geo_sample_size=5000, max_age_hours=72)[source]

Generate rich proxy data from database.

Parameters:
  • storage (proxywhirl.storage.SQLiteStorage) – SQLiteStorage instance to query

  • include_geo (bool) – Whether to include country data (slower)

  • geo_sample_size (int) – Max IPs to geolocate (rate limited)

  • max_age_hours (int) – Only include proxies validated within this time window. Default: 72 hours (36 runs at 2h schedule). Set to 0 to include all proxies.

Returns:

Proxies with metadata and aggregations.

Return type:

dict[str, Any]

proxywhirl.exports.generate_stats_from_files(proxy_dir)[source]

Generate statistics from proxy list files and rich proxy data.

Parameters:

proxy_dir (pathlib.Path) – Path to directory containing proxy list files

Returns:

Dashboard statistics including health, performance, validation, geographic, and source ranking data.

Return type:

dict[str, Any]

proxywhirl.exports.parse_proxy_url(url)[source]

Parse proxy URL to extract IP and port.

Parameters:

url (str) – Full proxy URL (e.g., “http://1.2.3.4:8080”)

Returns:

Tuple of (ip, port)

Return type:

tuple[str, int]