Proxy Lifecycle¶
A proxy in ProxyWhirl goes through a well-defined lifecycle from discovery to ejection. Understanding this flow helps you diagnose issues and tune behavior.
The Six Stages¶
flowchart LR
A[Fetch] --> B[Validate]
B --> C[Enrich]
C --> D[Rotate]
D --> E[Monitor]
E -->|healthy| D
E -->|degraded| F[Eject]
F -->|circuit half-open| D
1. Fetch¶
Proxies enter the system via ProxyFetcher, which pulls from hundreds of built-in sources (ALL_SOURCES) or user-defined endpoints.
Each source has a ProxySourceConfig specifying URL, expected format, and protocol. Fetching happens concurrently – the CLI fetch command supports --concurrency for parallel source downloads.
Parsers handle four formats: JSONParser, CSVParser, PlainTextParser, HTMLTableParser. For JavaScript-rendered pages, BrowserRenderer (Playwright) renders the page before parsing.
2. Validate¶
ProxyValidator tests each proxy by making a real HTTP request (default target: httpbin.org/ip).
Validation checks:
Connectivity: Can we establish a connection through the proxy?
Protocol support: Does the proxy handle HTTP/HTTPS/SOCKS correctly?
Response correctness: Does the response contain the expected IP?
Latency: How fast does the proxy respond? (recorded for performance-based strategies)
Proxies that fail validation are discarded before entering the pool. This prevents wasting rotation slots on dead proxies.
3. Enrich¶
Validated proxies are enriched with metadata:
Geolocation: Country and region codes via IP geolocation (
geo.py)Protocol detection: HTTP, HTTPS, SOCKS4, SOCKS5
Anonymity level: Transparent, anonymous, elite (based on header analysis)
Source tracking: Which source provided this proxy
Enrichment data powers context-aware strategies (geo-targeted, cost-aware).
4. Rotate¶
This is the steady-state. The rotation strategy (RotationStrategy.select()) picks the next proxy for each request. The selection considers:
Pool health: Only healthy proxies are candidates
Strategy logic: Round-robin index, random choice, EMA scores, session mappings, etc.
Selection context: Session IDs, target countries, previously-failed proxies
After each request, record_result() feeds the outcome back to the strategy for adaptive learning.
5. Monitor¶
Proxies are continuously monitored via two mechanisms:
- Passive monitoring (every request):
Each request’s success/failure and response time update the proxy’s health metrics. The circuit breaker tracks failures in a rolling window.
- Active monitoring (periodic):
HealthMonitorruns scheduled health checks (default: every 300 seconds) against a configurable target URL. This catches proxies that died between requests.
Health states:
HEALTHY: Responding normally
DEGRADED: Elevated failure rate or latency
DEAD: Circuit breaker open, excluded from rotation
6. Eject (and Recover)¶
When a proxy’s circuit breaker opens (failures exceed threshold), the proxy is ejected from rotation. No requests are attempted – should_attempt_request() returns False immediately.
But ejection isn’t permanent. After the circuit breaker timeout, a half-open test allows a single request through. If it succeeds, the proxy re-enters rotation (stage 4). If it fails, the timeout resets.
This automatic recovery is what distinguishes ProxyWhirl from simple blacklisting. See Circuit Breakers for the full state machine.
Persistence Across Restarts¶
The lifecycle state persists via SQLiteStorage:
Proxy metadata: URL, protocol, geolocation, source
Health history: Success rate, average latency, last check time
Circuit breaker state (optional): Current state, failure window, next test time
On application restart, proxies load from storage with their last-known health state, avoiding a cold start where all proxies are tested from scratch.
Data Flow Summary¶
Stage |
Component |
Input |
Output |
|---|---|---|---|
Fetch |
|
Source URLs |
Raw proxy URLs |
Validate |
|
Raw URLs |
Validated |
Enrich |
|
Validated proxies |
Enriched proxies with metadata |
Rotate |
|
|
Selected proxy |
Monitor |
|
Request outcomes |
Health status updates |
Eject/Recover |
|
Failure counts |
State transitions |
Further Reading¶
Getting Started – quickstart with fetching and validation
Python API –
ProxyFetcher,ProxyValidator,HealthMonitorAPICircuit Breakers – circuit breaker state machine details
Cache Architecture – how cache interacts with the lifecycle