--- title: Retry & Failover Guide --- # Retry & Failover Guide ProxyWhirl provides intelligent retry logic, circuit breaker protection, and automatic proxy failover to maximize request success rates. This guide covers the complete retry and failover system, from basic configuration to advanced observability. ```{contents} :local: :depth: 2 ``` ## Architecture Overview The retry system consists of four main components: 1. **RetryPolicy** - Configures retry behavior and backoff strategies 2. **RetryExecutor** - Orchestrates retry logic with intelligent proxy selection 3. **CircuitBreaker** - Protects against cascading failures using state machine transitions 4. **RetryMetrics** - Collects observability data for monitoring and analysis ## When to Use Retry vs Failover ### Retry (Same Proxy) Use retry when: - Network is temporarily unstable (connection timeout, packet loss) - Target server returns 502/503/504 (gateway errors, service temporarily unavailable) - Request is idempotent (GET, HEAD, OPTIONS, DELETE, PUT) ### Failover (Different Proxy) Use failover when: - Proxy authentication fails (407) - Proxy consistently returns errors (circuit breaker opens) - Specific geo-targeting or performance requirements exist - One proxy exhausts rate limits ProxyWhirl combines both: it retries with backoff on the current proxy, then fails over to a better proxy if retries are exhausted. ## RetryPolicy Configuration ### Basic Configuration ```python from proxywhirl import ProxyWhirl, RetryPolicy, BackoffStrategy # Default policy: 3 attempts, exponential backoff rotator = ProxyWhirl() # Custom policy policy = RetryPolicy( max_attempts=5, # Maximum retry attempts backoff_strategy=BackoffStrategy.EXPONENTIAL, base_delay=1.0, # Initial delay (seconds) multiplier=2.0, # Exponential multiplier max_backoff_delay=30.0, # Maximum delay cap jitter=True, # AWS decorrelated jitter retry_status_codes=[502, 503, 504], # Retryable HTTP errors timeout=60.0, # Total timeout for all attempts retry_non_idempotent=False, # Don't retry POST by default ) rotator = ProxyWhirl(retry_policy=policy) ``` ### Backoff Strategies ProxyWhirl supports three backoff strategies: #### Exponential Backoff (Recommended) Best for network failures and overloaded servers. Delays increase exponentially to give systems time to recover. ```python policy = RetryPolicy( backoff_strategy=BackoffStrategy.EXPONENTIAL, base_delay=1.0, multiplier=2.0, max_backoff_delay=30.0, jitter=True, # AWS decorrelated jitter ) # Attempt 0: random(0, 1.0s) # Attempt 1: random(1.0s, previous * 3), capped at 30s # Attempt 2: random(1.0s, previous * 3), capped at 30s # Each delay decorrelated from the previous (AWS algorithm) ``` #### Linear Backoff Best for predictable retry patterns. Delays increase linearly. ```python policy = RetryPolicy( backoff_strategy=BackoffStrategy.LINEAR, base_delay=2.0, max_backoff_delay=10.0, ) # Attempt 0: 2.0s # Attempt 1: 4.0s # Attempt 2: 6.0s # Attempt 3: 8.0s # Attempt 4: 10.0s (capped) ``` #### Fixed Backoff Best for testing or when delays should be constant. ```python policy = RetryPolicy( backoff_strategy=BackoffStrategy.FIXED, base_delay=5.0, ) # All attempts: 5.0s ``` ### Jitter Explained Jitter uses the **AWS decorrelated jitter algorithm** to prevent synchronized retries across multiple clients. Instead of simple randomization, each retry delay depends on the previous delay: ```python # AWS decorrelated jitter formula: # delay = min(cap, random(base_delay, previous_delay * 3)) # # First attempt: uniform random from 0 to base delay # Subsequent attempts: decorrelated from previous delay # # Without jitter: All clients retry at exactly 1.0s, 2.0s, 4.0s... # With jitter: Clients retry at decorrelated random times: # Client A: 0.7s, 1.3s, 2.8s... # Client B: 1.4s, 2.9s, 5.1s... # Client C: 0.9s, 1.1s, 3.7s... ``` This prevents "thundering herd" problems where many clients overwhelm a recovering server. See the [AWS Architecture Blog](https://aws.amazon.com/blogs/architecture/exponential-backoff-and-jitter/) for background on decorrelated jitter. ### Retryable Status Codes By default, ProxyWhirl retries on gateway errors: ```python # Default retry status codes retry_status_codes = [502, 503, 504] # Custom status codes (must be 5xx) policy = RetryPolicy( retry_status_codes=[500, 502, 503, 504, 507, 508] ) ``` 4xx errors (client errors) are never retried as they indicate permanent failures. ### Timeout Behavior The `timeout` parameter caps total execution time across all retry attempts: ```python policy = RetryPolicy( max_attempts=10, timeout=30.0, # Total timeout for all attempts ) # If 30s elapses after 3 attempts, remaining 7 attempts are skipped # Raises: ProxyConnectionError("Request timeout after 30.00s") ``` ### Non-Idempotent Requests By default, POST/PATCH requests are not retried (they're not idempotent): ```python # Default: POST fails immediately without retry rotator.request("POST", url, json=data) # Enable retries for POST (use with caution!) policy = RetryPolicy(retry_non_idempotent=True) rotator = ProxyWhirl(retry_policy=policy) # Now POST will retry on network failures rotator.request("POST", url, json=data) ``` **Warning:** Only enable `retry_non_idempotent` if your API is idempotent (e.g., uses idempotency keys). ## Circuit Breaker Configuration Circuit breakers protect against cascading failures by temporarily removing unhealthy proxies from the rotation pool. ### Sync vs Async Circuit Breakers ProxyWhirl provides two circuit breaker implementations. See {doc}`async-client` for guidance on choosing between sync and async patterns. - **`CircuitBreaker`** - Synchronous implementation using `threading.Lock` - **`AsyncCircuitBreaker`** - Async implementation using `asyncio`-compatible locks **When to use which:** ```python # ✅ For synchronous code - use CircuitBreaker from proxywhirl import CircuitBreaker cb = CircuitBreaker(proxy_id="proxy-1") cb.record_failure() # Thread-safe if cb.should_attempt_request(): cb.record_success() # ✅ For async code - use AsyncCircuitBreaker from proxywhirl.circuit_breaker import AsyncCircuitBreaker cb = AsyncCircuitBreaker(proxy_id="proxy-1") await cb.record_failure() # Event loop safe if await cb.should_attempt_request(): await cb.record_success() ``` **WARNING:** Do NOT mix sync locks with async code. The `CircuitBreaker` class uses `threading.Lock` internally which can block the event loop. For production async applications, always use `AsyncCircuitBreaker`. ### State Machine Circuit breakers transition through three states: ``` CLOSED → OPEN → HALF_OPEN → CLOSED ↑ ↓ └────────────────────────────┘ ``` 1. **CLOSED** - Normal operation, proxy is available 2. **OPEN** - Proxy excluded from rotation (too many failures) 3. **HALF_OPEN** - Testing recovery with limited requests ### Thresholds and Configuration ```python from proxywhirl import CircuitBreakerConfig from proxywhirl.circuit_breaker import CircuitBreaker # Circuit breakers are created automatically by ProxyWhirl # Access via rotator.circuit_breakers dict proxy = rotator.get_proxy() cb = rotator.circuit_breakers[str(proxy.id)] # Configuration (set on CircuitBreaker creation) print(cb.failure_threshold) # Default: 5 failures print(cb.window_duration) # Default: 60 seconds (rolling window) print(cb.timeout_duration) # Default: 30 seconds (OPEN timeout) ``` ### State Transitions #### CLOSED → OPEN Circuit opens when failure count exceeds threshold within the rolling window: ```python cb = CircuitBreaker( proxy_id=str(proxy.id), failure_threshold=5, window_duration=60.0, ) # Record 5 failures within 60 seconds for _ in range(5): cb.record_failure() print(cb.state) # CircuitBreakerState.OPEN print(cb.failure_count) # 5 ``` #### OPEN → HALF_OPEN Circuit transitions to HALF_OPEN after timeout duration elapses: ```python import time # Circuit is OPEN print(cb.state) # CircuitBreakerState.OPEN # Check after timeout (30s default) time.sleep(30) # Next request triggers transition to HALF_OPEN if cb.should_attempt_request(): print(cb.state) # CircuitBreakerState.HALF_OPEN ``` #### HALF_OPEN → CLOSED Circuit closes if test request succeeds: ```python # Circuit is HALF_OPEN cb.record_success() print(cb.state) # CircuitBreakerState.CLOSED print(cb.failure_count) # 0 (reset) ``` #### HALF_OPEN → OPEN Circuit reopens if test request fails: ```python # Circuit is HALF_OPEN cb.record_failure() print(cb.state) # CircuitBreakerState.OPEN # Timeout duration resets, must wait another 30s ``` ### Rolling Window Behavior The circuit breaker uses a sliding window to track recent failures: ```python cb = CircuitBreaker( proxy_id=str(proxy.id), failure_threshold=3, window_duration=60.0, ) # At t=0: Record 2 failures cb.record_failure() # failure_count = 1 cb.record_failure() # failure_count = 2 print(cb.state) # CLOSED (below threshold) # At t=65: Old failures expired (outside 60s window) time.sleep(65) print(cb.failure_count) # 0 (window cleaned automatically) # New failure doesn't trigger circuit cb.record_failure() # failure_count = 1 print(cb.state) # CLOSED ``` ### Manual Reset Reset circuit breaker to CLOSED state manually: ```python # Force reset (useful for testing or manual intervention) cb.reset() print(cb.state) # CircuitBreakerState.CLOSED print(cb.failure_count) # 0 ``` ## Intelligent Proxy Selection When retries are exhausted, ProxyWhirl automatically selects the best alternative proxy using performance-based scoring. ### Selection Algorithm The executor scores each candidate proxy using: ```text score = (0.7 × success_rate) + (0.3 × (1 - normalized_latency)) ``` - **70% weight** on success rate (reliability) - **30% weight** on latency (performance) - **10% bonus** for geo-targeting match (optional) ### Example: Performance-Based Selection ```python from proxywhirl import Proxy, ProxyWhirl # Create proxies with different success rates proxy1 = Proxy(url="http://proxy1.example.com:8080") proxy1.total_requests = 100 proxy1.total_successes = 95 # 95% success rate proxy2 = Proxy(url="http://proxy2.example.com:8080") proxy2.total_requests = 100 proxy2.total_successes = 60 # 60% success rate rotator = ProxyWhirl(proxies=[proxy1, proxy2]) # Intelligent selection prioritizes proxy1 executor = rotator.retry_executor selected = executor.select_retry_proxy([proxy1, proxy2], failed_proxy) print(selected.url) # http://proxy1.example.com:8080 ``` ### Example: Geo-Targeted Selection ```python # Create proxies with different regions proxy_us = Proxy( url="http://proxy-us.example.com:8080", metadata={"region": "US-EAST"}, ) proxy_us.total_requests = 100 proxy_us.total_successes = 80 # 80% success rate proxy_eu = Proxy( url="http://proxy-eu.example.com:8080", metadata={"region": "EU-WEST"}, ) proxy_eu.total_requests = 100 proxy_eu.total_successes = 85 # 85% success rate (slightly better) rotator = ProxyWhirl(proxies=[proxy_us, proxy_eu]) # Select with target region executor = rotator.retry_executor selected = executor.select_retry_proxy( [proxy_us, proxy_eu], failed_proxy, target_region="US-EAST" ) # Selects proxy_us despite lower success rate (10% region bonus) print(selected.url) # http://proxy-us.example.com:8080 ``` ### Exclusion Rules The selection algorithm excludes: 1. **Failed proxy** - The proxy that just failed is never selected 2. **Open circuits** - Proxies with open circuit breakers 3. **Half-open pending** - Proxies already testing recovery ```python # If all proxies are excluded, returns None selected = executor.select_retry_proxy([only_failed_proxy], failed_proxy) print(selected) # None ``` ## RetryMetrics and Observability ProxyWhirl tracks detailed metrics for monitoring, debugging, and analytics. :::{tip} You can also view retry and circuit breaker statistics from the command line using `proxywhirl stats --retry --circuit-breaker`. See {doc}`cli-reference` for details. ::: ### Collecting Metrics ```python from proxywhirl import ProxyWhirl rotator = ProxyWhirl() # Metrics are automatically collected response = rotator.request("GET", "https://httpbin.org/ip") # Access metrics metrics = rotator.retry_metrics print(metrics.get_summary()) ``` ### Summary Statistics ```python summary = metrics.get_summary() print(summary) # { # "total_retries": 42, # "success_by_attempt": { # 0: 35, # 35 requests succeeded on first attempt # 1: 5, # 5 requests succeeded on second attempt # 2: 2, # 2 requests succeeded on third attempt # }, # "circuit_breaker_events_count": 3, # "retention_hours": 24, # } ``` ### Time-Series Data ```python # Get hourly aggregated data for last 24 hours timeseries = metrics.get_timeseries(hours=24) for datapoint in timeseries: print(datapoint) # { # "timestamp": "2025-12-27T14:00:00+00:00", # "total_requests": 150, # "total_retries": 25, # "success_rate": 0.94, # "avg_latency": 0.234, # } ``` ### Per-Proxy Statistics ```python # Get retry statistics by proxy by_proxy = metrics.get_by_proxy(hours=24) for proxy_id, stats in by_proxy.items(): print(f"Proxy {proxy_id}: {stats}") # { # "proxy_id": "550e8400-e29b-41d4-a716-446655440000", # "total_attempts": 50, # "success_count": 45, # "failure_count": 5, # "avg_latency": 0.234, # "circuit_breaker_opens": 2, # } ``` ### Circuit Breaker Events ```python # Access circuit breaker state changes for event in metrics.circuit_breaker_events: print(f"{event.timestamp}: {event.proxy_id}") print(f" {event.from_state} → {event.to_state}") print(f" Failure count: {event.failure_count}") # Example output: # 2025-12-27 14:32:15+00:00: 550e8400-e29b-41d4-a716-446655440000 # CLOSED → OPEN # Failure count: 5 # # 2025-12-27 14:32:45+00:00: 550e8400-e29b-41d4-a716-446655440000 # OPEN → HALF_OPEN # Failure count: 5 # # 2025-12-27 14:32:47+00:00: 550e8400-e29b-41d4-a716-446655440000 # HALF_OPEN → CLOSED # Failure count: 0 ``` ### Hourly Aggregation Metrics automatically aggregate into hourly summaries to prevent unbounded memory growth: ```python # Manually trigger aggregation (normally runs automatically) metrics.aggregate_hourly() # View aggregates for hour, agg in metrics.hourly_aggregates.items(): print(f"{hour}: {agg.total_requests} requests, {agg.total_retries} retries") print(f" Success by attempt: {agg.success_by_attempt}") print(f" Failure by reason: {agg.failure_by_reason}") ``` ### Retention Configuration ```python from proxywhirl.retry import RetryMetrics # Custom retention and limits metrics = RetryMetrics( retention_hours=48, # Keep data for 48 hours max_current_attempts=5000, # Limit raw attempts deque ) rotator = ProxyWhirl() rotator.retry_metrics = metrics ``` ## RetryExecutor Deep Dive The `RetryExecutor` is the core orchestration class that coordinates retry logic, circuit breakers, and metrics collection. ### How Retries Work Internally When you call `rotator.request()`, the following sequence occurs: 1. **Idempotency Check**: Determines if the HTTP method is safe to retry (GET/HEAD/OPTIONS/DELETE/PUT are idempotent) 2. **Retry Loop**: Executes up to `max_attempts` attempts with backoff delays 3. **Circuit Breaker Check**: Verifies the proxy's circuit breaker allows the request 4. **Request Execution**: Calls the underlying HTTP client 5. **Status Code Check**: Validates response status against `retry_status_codes` 6. **Error Classification**: Determines if exceptions are retryable (timeouts, connection errors) 7. **Metrics Recording**: Logs attempt outcome, latency, and circuit breaker events 8. **Proxy Failover**: If all retries exhausted, selects next best proxy automatically ### Retryable vs Non-Retryable Errors The executor classifies errors into two categories: **Retryable Errors** (trigger retry with backoff): - `httpx.ConnectError` - Connection refused, DNS failure - `httpx.TimeoutException` - Request/connect timeout - `httpx.ReadTimeout` - Response body read timeout - `httpx.WriteTimeout` - Request body write timeout - `httpx.PoolTimeout` - Connection pool exhausted - `httpx.NetworkError` - Generic network failure - HTTP 502, 503, 504 status codes (configurable) **Non-Retryable Errors** (fail immediately): - HTTP 4xx errors (client errors - request won't succeed on retry) - `NonRetryableError` (custom application errors) - Any exception not in the retryable types list ```python from proxywhirl.retry import RetryExecutor, NonRetryableError # Custom error handling try: response = rotator.request("GET", url) except NonRetryableError as e: # Authentication failure, malformed request, etc. print(f"Cannot retry: {e}") ``` ### Direct RetryExecutor Usage For advanced use cases, you can use `RetryExecutor` directly: ```python from proxywhirl import ProxyWhirl, Proxy, RetryPolicy from proxywhirl.retry import RetryExecutor import httpx # Create executor with custom policy policy = RetryPolicy(max_attempts=5, base_delay=2.0) rotator = ProxyWhirl() executor = RetryExecutor( retry_policy=policy, circuit_breakers=rotator.circuit_breakers, retry_metrics=rotator.retry_metrics, ) # Create request function proxy = Proxy(url="http://proxy.example.com:8080") def request_fn(): client = httpx.Client(proxies={"all://": proxy.url}) return client.get("https://api.example.com/data") # Execute with retry response = executor.execute_with_retry( request_fn=request_fn, proxy=proxy, method="GET", url="https://api.example.com/data", request_id="custom-request-123", # Optional tracking ID ) ``` ## Integration with ProxyWhirl ### Basic Usage ```python from proxywhirl import ProxyWhirl, RetryPolicy, BackoffStrategy # Automatic retry and failover policy = RetryPolicy( max_attempts=5, backoff_strategy=BackoffStrategy.EXPONENTIAL, base_delay=1.0, jitter=True, ) rotator = ProxyWhirl(retry_policy=policy) # Request automatically retries and fails over response = rotator.request("GET", "https://httpbin.org/ip") ``` ### Custom Error Handling ```python from proxywhirl.exceptions import ProxyConnectionError from proxywhirl.retry import NonRetryableError try: response = rotator.request("GET", url) except ProxyConnectionError as e: # All retries exhausted across all proxies print(f"Request failed after all retries: {e}") except NonRetryableError as e: # Non-retryable error (e.g., authentication failure) print(f"Non-retryable error: {e}") ``` ### Monitoring Circuit Breakers ```python # Check circuit breaker status for proxy in rotator.pool.proxies: cb = rotator.circuit_breakers.get(str(proxy.id)) if cb: print(f"{proxy.url}: {cb.state.value}") print(f" Failures: {cb.failure_count}/{cb.failure_threshold}") if cb.next_test_time: import time wait_time = cb.next_test_time - time.time() print(f" Retry in: {wait_time:.1f}s") ``` ### Manual Circuit Reset ```python # Reset all circuit breakers for cb in rotator.circuit_breakers.values(): cb.reset() # Reset specific proxy proxy = rotator.get_proxy() rotator.circuit_breakers[str(proxy.id)].reset() ``` ### Integration with Rotation Strategies Retry and failover logic works seamlessly with all rotation strategies. For detailed strategy configuration, see {doc}`advanced-strategies`. #### Round-Robin with Automatic Failover ```python from proxywhirl import ProxyWhirl, Proxy, RetryPolicy from proxywhirl.strategies import RoundRobinStrategy # Round-robin ensures fair distribution rotator = ProxyWhirl( proxies=[ Proxy(url="http://proxy1.example.com:8080"), Proxy(url="http://proxy2.example.com:8080"), Proxy(url="http://proxy3.example.com:8080"), ], strategy=RoundRobinStrategy(), retry_policy=RetryPolicy(max_attempts=3), ) # If proxy1 fails, automatically retries then fails over to proxy2 response = rotator.request("GET", "https://api.example.com/data") ``` #### Weighted Strategy with Performance-Based Failover ```python from proxywhirl.strategies import WeightedStrategy # Higher weight = more traffic rotator = ProxyWhirl( proxies=[ Proxy(url="http://premium.example.com:8080", weight=3.0), # 60% of traffic Proxy(url="http://standard.example.com:8080", weight=2.0), # 40% of traffic ], strategy=WeightedStrategy(), retry_policy=RetryPolicy(max_attempts=5), ) # Premium proxy fails → retries on premium → fails over to standard # Next request still prefers premium (weighted selection) response = rotator.request("GET", "https://api.example.com/data") ``` #### Geo-Targeted Strategy with Regional Failover ```python from proxywhirl.strategies import GeoTargetedStrategy from proxywhirl import StrategyConfig, SelectionContext # Create and configure geo-targeted strategy strategy = GeoTargetedStrategy() strategy.configure(StrategyConfig( geo_fallback_enabled=True, geo_secondary_strategy="round_robin" )) # Geo-targeted proxies rotator = ProxyWhirl( proxies=[ Proxy(url="http://us-east.example.com:8080", metadata={"region": "US-EAST"}), Proxy(url="http://us-west.example.com:8080", metadata={"region": "US-WEST"}), Proxy(url="http://eu-west.example.com:8080", metadata={"region": "EU-WEST"}), ], strategy=strategy, retry_policy=RetryPolicy(max_attempts=3), ) # Prefers US-EAST when specified in context context = SelectionContext(target_region="US-EAST") response = rotator.request("GET", "https://api.example.com/data", context=context) ``` #### Performance-Based Strategy with Dynamic Failover ```python from proxywhirl.strategies import PerformanceBasedStrategy # Automatically selects fastest, most reliable proxy rotator = ProxyWhirl( proxies=[ Proxy(url="http://proxy1.example.com:8080"), Proxy(url="http://proxy2.example.com:8080"), Proxy(url="http://proxy3.example.com:8080"), ], strategy=PerformanceBasedStrategy(), retry_policy=RetryPolicy(max_attempts=3), ) # Strategy considers: # - Success rate (from proxy.total_successes / proxy.total_requests) # - Average latency (from RetryMetrics) # - Circuit breaker state (skips OPEN circuits) for i in range(100): response = rotator.request("GET", f"https://api.example.com/data/{i}") # Over time, fast reliable proxies get more traffic # Slow or failing proxies get less traffic ``` **Key Integration Points:** 1. **Circuit breakers filter eligible proxies** - Strategies only see proxies with CLOSED/HALF_OPEN circuits 2. **Metrics inform strategy decisions** - Performance-based strategies use RetryMetrics data 3. **Failover respects strategy logic** - If weighted strategy fails, next proxy still follows weights 4. **Geo-targeting bonus in failover** - `RetryExecutor.select_retry_proxy()` gives 10% bonus to matching regions ## Advanced Patterns ### Adaptive Retry Policy Adjust retry policy based on conditions: ```python def get_adaptive_policy(time_sensitive: bool) -> RetryPolicy: """Adjust retry policy based on request priority.""" if time_sensitive: # Fast retries for real-time requests return RetryPolicy( max_attempts=2, backoff_strategy=BackoffStrategy.FIXED, base_delay=0.5, ) else: # Patient retries for batch jobs return RetryPolicy( max_attempts=10, backoff_strategy=BackoffStrategy.EXPONENTIAL, base_delay=2.0, max_backoff_delay=60.0, jitter=True, ) # Real-time request rotator.retry_policy = get_adaptive_policy(time_sensitive=True) response = rotator.request("GET", url) # Batch request rotator.retry_policy = get_adaptive_policy(time_sensitive=False) response = rotator.request("GET", url) ``` ### Circuit Breaker Alerts :::{note} Circuit breaker state changes also trigger cache health invalidation when configured. See {doc}`caching` for cache-level health integration. ::: Monitor circuit breaker events for alerts: ```python from datetime import datetime, timezone from proxywhirl.circuit_breaker import CircuitBreakerState def check_circuit_health(rotator: ProxyWhirl) -> None: """Alert on recent circuit breaker opens.""" now = datetime.now(timezone.utc) for event in rotator.retry_metrics.circuit_breaker_events: if event.to_state == CircuitBreakerState.OPEN: age = (now - event.timestamp).total_seconds() if age < 300: # Last 5 minutes print(f"ALERT: Proxy {event.proxy_id} circuit opened") print(f" Failures: {event.failure_count}") print(f" Time: {event.timestamp}") # Run periodically check_circuit_health(rotator) ``` ### Request-Level Retry Override Override retry policy for specific requests: ```python from proxywhirl import ProxyWhirl, RetryPolicy, BackoffStrategy # Default policy rotator = ProxyWhirl( retry_policy=RetryPolicy(max_attempts=3) ) # Override for critical request critical_policy = RetryPolicy( max_attempts=10, backoff_strategy=BackoffStrategy.EXPONENTIAL, base_delay=2.0, jitter=True, ) # Note: Currently requires creating new executor # This pattern may be simplified in future versions from proxywhirl.retry import RetryExecutor executor = RetryExecutor( critical_policy, rotator.circuit_breakers, rotator.retry_metrics, ) # Use custom executor for this request # (Integration with rotator.request() coming in future release) ``` ## Best Practices ### 1. Start Conservative, Tune Later Begin with safe defaults and adjust based on metrics: ```python # Start here policy = RetryPolicy( max_attempts=3, backoff_strategy=BackoffStrategy.EXPONENTIAL, jitter=True, ) # After observing metrics, tune: # - Increase max_attempts if success_by_attempt shows retries working # - Adjust timeout based on avg_latency in metrics # - Enable retry_non_idempotent only if API is truly idempotent ``` ### 2. Always Use Jitter in Production Jitter prevents thundering herd problems: ```python # Production: Use jitter policy = RetryPolicy(jitter=True) # Testing: Disable for deterministic behavior policy = RetryPolicy(jitter=False) ``` ### 3. Monitor Circuit Breaker Opens Frequent circuit opens indicate systemic issues: ```python # Alert if >10% of proxies have open circuits open_circuits = sum( 1 for cb in rotator.circuit_breakers.values() if cb.state == CircuitBreakerState.OPEN ) total_proxies = len(rotator.pool.proxies) if total_proxies > 0 and open_circuits / total_proxies > 0.1: print(f"WARNING: {open_circuits}/{total_proxies} circuits open") ``` ### 4. Set Timeouts for Time-Sensitive Requests Prevent unbounded retry delays: ```python # Time-sensitive request: Fail fast policy = RetryPolicy( max_attempts=3, timeout=5.0, # Total timeout ) # Batch job: Be patient policy = RetryPolicy( max_attempts=10, timeout=300.0, # 5 minutes total ) ``` ### 5. Use Metrics for Capacity Planning Track retry rates to identify proxy quality issues: ```python summary = rotator.retry_metrics.get_summary() success_first_attempt = summary["success_by_attempt"].get(0, 0) total_retries = summary["total_retries"] if total_retries > 0: first_attempt_rate = success_first_attempt / total_retries print(f"First attempt success: {first_attempt_rate:.1%}") if first_attempt_rate < 0.7: print("WARNING: Low first-attempt success rate") print("Consider: Higher quality proxy sources") ``` ## Troubleshooting ### All Retries Exhausted **Symptom:** `ProxyConnectionError: Request failed after N attempts` **Causes:** 1. All proxies have poor connectivity 2. Target website is blocking proxy IPs 3. Circuit breakers opened for all proxies **Solutions:** ```python # Check circuit breaker status open_count = sum( 1 for cb in rotator.circuit_breakers.values() if cb.state == CircuitBreakerState.OPEN ) print(f"{open_count} circuits open") # Reset circuits if blocking legitimate traffic for cb in rotator.circuit_breakers.values(): cb.reset() # Check per-proxy statistics by_proxy = rotator.retry_metrics.get_by_proxy(hours=1) for proxy_id, stats in by_proxy.items(): if stats["success_count"] == 0: print(f"Dead proxy: {proxy_id}") ``` ### High Latency **Symptom:** Requests take a long time to complete **Causes:** 1. Backoff delays too aggressive 2. Many retries before success 3. Slow proxies selected **Solutions:** ```python # Check latency by proxy by_proxy = rotator.retry_metrics.get_by_proxy(hours=1) slow_proxies = [ pid for pid, stats in by_proxy.items() if stats["avg_latency"] > 2.0 ] print(f"Slow proxies (>2s): {slow_proxies}") # Use faster backoff policy = RetryPolicy( backoff_strategy=BackoffStrategy.FIXED, base_delay=0.5, # Shorter delays ) # Set total timeout policy.timeout = 10.0 ``` ### Non-Retryable Errors **Symptom:** `NonRetryableError` raised immediately **Causes:** 1. Custom error types not recognized as retryable 2. Authentication failures (407) **Solutions:** ```python # Authentication errors are not retryable # Fix proxy credentials instead of retrying # For custom error types, extend RetryExecutor._is_retryable_error() # (Advanced - requires subclassing) ``` ## Performance Impact Retry logic adds minimal overhead: - **Circuit breaker check:** O(1) dictionary lookup - **Proxy scoring:** O(n) where n = number of candidate proxies - **Metrics recording:** O(1) append to bounded deque - **Backoff calculation:** O(1) arithmetic Benchmark results (100k requests): - No retry: 100ms avg latency - With retry (3 attempts): 105ms avg latency (+5%) - With retry + metrics: 110ms avg latency (+10%) ## See Also ::::{grid} 2 :gutter: 3 :::{grid-item-card} Advanced Strategies :link: /guides/advanced-strategies :link-type: doc Geo-targeting, performance-based selection, session persistence, and composite strategies that integrate with retry logic. ::: :::{grid-item-card} Async Client Guide :link: /guides/async-client :link-type: doc Using `AsyncCircuitBreaker` and retry policies with the async proxy rotator. ::: :::{grid-item-card} Caching Subsystem :link: /guides/caching :link-type: doc Health-based cache invalidation that integrates with circuit breaker events. ::: :::{grid-item-card} Python API Reference :link: /reference/python-api :link-type: doc Complete API docs for `RetryPolicy`, `RetryExecutor`, `CircuitBreaker`, and `RetryMetrics`. ::: :::{grid-item-card} Exceptions Reference :link: /reference/exceptions :link-type: doc Full exception hierarchy including `RetryableError`, `NonRetryableError`, and `ProxyConnectionError`. ::: :::{grid-item-card} CLI Reference :link: /guides/cli-reference :link-type: doc Monitor retry statistics and circuit breaker states from the command line. ::: :::: ## Summary ProxyWhirl's retry and failover system provides: - **Flexible retry policies** with exponential, linear, or fixed backoff - **Circuit breaker protection** to isolate failing proxies - **Intelligent failover** with performance-based proxy selection - **Comprehensive metrics** for monitoring and debugging - **Automatic integration** with ProxyWhirl Start with defaults and tune based on metrics. Enable jitter in production, set timeouts for time-sensitive requests, and monitor circuit breaker opens for systemic issues.