# Cache API Reference Complete reference for ProxyWhirl's multi-tier caching system with L1 (memory), L2 (disk), and L3 (SQLite) support. :::{seealso} For cache configuration patterns and optimization tips, see [Caching](../guides/caching.md). For TOML cache configuration options, see [Configuration](configuration.md). ::: ```python from proxywhirl.cache import ( CacheManager, CacheConfig, CacheEntry, HealthStatus, CacheTierType, MemoryCacheTier, DiskCacheTier, SQLiteCacheTier, CredentialEncryptor, ) # L2BackendType and JsonlCacheTier are available via submodule imports: from proxywhirl.cache.models import L2BackendType from proxywhirl.cache.tiers import JsonlCacheTier ``` ## Overview The cache subsystem provides three-tier storage for proxies with automatic promotion, credential encryption, TTL management, and health-based invalidation. It supports graceful degradation when tiers fail and provides comprehensive statistics for monitoring. **Architecture:** - **L1 (Memory)**: Fast in-memory cache using OrderedDict with LRU eviction - **L2 (Disk)**: Configurable persistent cache with two backend options: - **JSONL** (default): File-based using sharded JSON Lines files, human-readable, portable, best for <10K entries - **SQLite**: Database-based with indexed lookups, faster for >10K entries with O(log n) performance - **L3 (SQLite)**: Full database cache with SQL indexing, health history tracking, and complete queryability ## Data Models ### CacheEntry Container for a single cached proxy with metadata, TTL, and health tracking. ```{eval-rst} .. py:class:: CacheEntry Pydantic model that stores proxy information with TTL, health status, and access tracking. Credentials are SecretStr in memory, encrypted at rest in L2/L3. **Example:** .. code-block:: python from proxywhirl.cache import CacheEntry, HealthStatus from datetime import datetime, timezone, timedelta from pydantic import SecretStr entry = CacheEntry( key="abc123", proxy_url="http://proxy.example.com:8080", username=SecretStr("user"), password=SecretStr("pass"), source="api", fetch_time=datetime.now(timezone.utc), last_accessed=datetime.now(timezone.utc), ttl_seconds=3600, expires_at=datetime.now(timezone.utc) + timedelta(seconds=3600), health_status=HealthStatus.HEALTHY ) # Check expiration if entry.is_expired: print("Entry has expired") # Check health if entry.is_healthy: print("Proxy is healthy") ``` #### Fields **Identity:** - `key` (str): Unique cache key (proxy URL hash) - `proxy_url` (str): Full proxy URL (scheme://host:port) **Credentials (encrypted at rest in L2/L3):** - `username` (SecretStr | None): Proxy username - `password` (SecretStr | None): Proxy password **Metadata:** - `source` (str): Proxy source identifier - `fetch_time` (datetime): When proxy was fetched - `last_accessed` (datetime): Last cache access time - `access_count` (int): Number of cache hits (default: 0) **TTL & Health:** - `ttl_seconds` (int): Time-to-live in seconds (≥0) - `expires_at` (datetime): Absolute expiration time - `health_status` (HealthStatus): Current health status (default: UNKNOWN) - `failure_count` (int): Consecutive failures (≥0, default: 0) - `evicted_from_l1` (bool): Whether entry was evicted from L1 cache (default: False) **Health Monitoring (Feature 006):** - `last_health_check` (datetime | None): Last health check timestamp - `consecutive_health_failures` (int): Consecutive health check failures (≥0, default: 0) - `consecutive_health_successes` (int): Consecutive successful health checks (≥0, default: 0) - `recovery_attempt` (int): Current recovery attempt count (≥0, default: 0) - `next_check_time` (datetime | None): Scheduled next health check - `last_health_error` (str | None): Last health check error message - `total_health_checks` (int): Total health checks performed (≥0, default: 0) - `total_health_check_failures` (int): Total health check failures (≥0, default: 0) #### Properties ```{eval-rst} .. py:property:: is_expired :type: bool Check if entry has expired based on TTL. :returns: True if current time ≥ expires_at, False otherwise ``` ```{eval-rst} .. py:property:: is_healthy :type: bool Check if proxy is healthy enough to use. :returns: True if health_status == HEALTHY, False otherwise ``` --- ### CacheConfig Configuration for cache behavior and tier settings. ```{eval-rst} .. py:class:: CacheConfig Pydantic model that aggregates configuration for all three tiers plus global settings like TTL, cleanup intervals, and storage paths. **Example:** .. code-block:: python from proxywhirl.cache import CacheConfig, CacheTierConfig, L2BackendType from pydantic import SecretStr # Default JSONL backend (file-based, portable) config = CacheConfig( # Tier configurations l1_config=CacheTierConfig( enabled=True, max_entries=1000, eviction_policy="lru" ), l2_config=CacheTierConfig( enabled=True, max_entries=5000, eviction_policy="lru" ), l2_backend=L2BackendType.JSONL, # or L2BackendType.SQLITE for large caches l3_config=CacheTierConfig( enabled=True, max_entries=None, # Unlimited eviction_policy="lru" ), # TTL Configuration default_ttl_seconds=3600, ttl_cleanup_interval=60, enable_background_cleanup=True, cleanup_interval_seconds=60, per_source_ttl={ "api": 7200, # API sources: 2 hours "scraper": 1800 # Scrapers: 30 minutes }, # Storage Paths l2_cache_dir=".cache/proxies", l3_database_path=".cache/db/proxywhirl.db", # Encryption encryption_key=SecretStr("your-32-byte-url-safe-base64-key"), # Health Integration health_check_invalidation=True, failure_threshold=3, # Performance Tuning enable_statistics=True, statistics_interval=5 ) # SQLite backend for large caches (>10K entries) large_cache_config = CacheConfig( l2_backend=L2BackendType.SQLITE, l2_config=CacheTierConfig(max_entries=50000) ) ``` #### Fields **Tier Configuration:** - `l1_config` (CacheTierConfig): L1 (Memory) configuration (default: max_entries=1000) - `l2_config` (CacheTierConfig): L2 (Disk) configuration (default: max_entries=5000) - `l2_backend` (L2BackendType): L2 storage backend - "jsonl" or "sqlite" (default: JSONL) - `l3_config` (CacheTierConfig): L3 (SQLite) configuration (default: max_entries=None) **TTL Configuration:** - `default_ttl_seconds` (int): Default TTL for cached proxies (≥60, default: 3600) - `ttl_cleanup_interval` (int): Background cleanup interval (≥10, default: 60) - `enable_background_cleanup` (bool): Enable background TTL cleanup thread (default: False) - `cleanup_interval_seconds` (int): Interval between cleanup runs (≥5, default: 60) - `per_source_ttl` (dict[str, int]): Per-source TTL overrides (default: empty dict) **Storage Paths:** - `l2_cache_dir` (str): Directory for L2 cache (JSONL shards or SQLite database) (default: ".cache/proxies") - `l3_database_path` (str): SQLite database path for L3 (default: ".cache/db/proxywhirl.db") **Encryption:** - `encryption_key` (SecretStr | None): Fernet encryption key (from env: PROXYWHIRL_CACHE_ENCRYPTION_KEY) **Health Integration:** - `health_check_invalidation` (bool): Auto-invalidate on health check failure (default: True) - `failure_threshold` (int): Failures before health invalidation (≥1, default: 3) **Performance Tuning:** - `enable_statistics` (bool): Track cache statistics (default: True) - `statistics_interval` (int): Stats aggregation interval (≥1, default: 5) --- ### CacheTierConfig Configuration for a single cache tier. ```{eval-rst} .. py:class:: CacheTierConfig Pydantic model that defines capacity, eviction policy, and enable/disable state for one tier (L1, L2, or L3). **Example:** .. code-block:: python from proxywhirl.cache import CacheTierConfig config = CacheTierConfig( enabled=True, max_entries=1000, eviction_policy="lru" # "lru", "lfu", or "fifo" ) ``` #### Fields - `enabled` (bool): Enable this tier (default: True) - `max_entries` (int | None): Max entries (None=unlimited, default: None) - `eviction_policy` (str): Eviction policy: "lru", "lfu", or "fifo" (default: "lru") #### Validators ```{eval-rst} .. py:method:: validate_policy(v: str) -> str :classmethod: Validate eviction policy is supported. :param v: Policy name to validate :raises ValueError: If policy is not one of ["lru", "lfu", "fifo"] :returns: Validated policy name ``` --- ### CacheStatistics Aggregate cache statistics across all tiers. ```{eval-rst} .. py:class:: CacheStatistics Pydantic model that combines tier-level statistics and tracks cross-tier operations like promotions and demotions. **Example:** .. code-block:: python from proxywhirl.cache import CacheStatistics stats = CacheStatistics() stats.l1_stats.hits = 100 stats.l1_stats.misses = 20 print(f"L1 hit rate: {stats.l1_stats.hit_rate:.2%}") print(f"Overall hit rate: {stats.overall_hit_rate:.2%}") print(f"Total size: {stats.total_size}") # Export to monitoring metrics = stats.to_metrics_dict() ``` #### Fields **Per-Tier Statistics:** - `l1_stats` (TierStatistics): L1 statistics (default: empty TierStatistics) - `l2_stats` (TierStatistics): L2 statistics (default: empty TierStatistics) - `l3_stats` (TierStatistics): L3 statistics (default: empty TierStatistics) **Cross-Tier Operations:** - `promotions` (int): L3→L2→L1 promotions (≥0, default: 0) - `demotions` (int): L1→L2→L3 demotions (≥0, default: 0) **Degradation Tracking:** - `l1_degraded` (bool): L1 tier unavailable (default: False) - `l2_degraded` (bool): L2 tier unavailable (default: False) - `l3_degraded` (bool): L3 tier unavailable (default: False) #### Computed Properties ```{eval-rst} .. py:property:: overall_hit_rate :type: float Overall hit rate across all tiers (0.0 to 1.0). Uses max of per-tier misses to avoid triple-counting misses that cascade through L1→L2→L3 lookups. ``` ```{eval-rst} .. py:property:: total_size :type: int Total cached entries across all tiers. ``` #### Methods ```{eval-rst} .. py:method:: to_metrics_dict() -> dict[str, float] Convert to flat metrics dict for monitoring systems. :returns: Dictionary with metric names and float values **Example:** .. code-block:: python metrics = stats.to_metrics_dict() # { # "cache.l1.hit_rate": 0.85, # "cache.l2.hit_rate": 0.60, # "cache.l3.hit_rate": 0.40, # "cache.overall.hit_rate": 0.75, # "cache.total_size": 1500.0, # "cache.promotions": 250.0, # "cache.demotions": 150.0, # "cache.l1.size": 1000.0, # "cache.l2.size": 450.0, # "cache.l3.size": 50.0 # } ``` --- ### TierStatistics Statistics for a single cache tier. ```{eval-rst} .. py:class:: TierStatistics Pydantic model that tracks hits, misses, evictions by reason, and computes hit rate. **Example:** .. code-block:: python from proxywhirl.cache import TierStatistics stats = TierStatistics(hits=100, misses=20) print(f"Hit rate: {stats.hit_rate:.2%}") # 83.33% print(f"Total evictions: {stats.total_evictions}") ``` #### Fields - `hits` (int): Cache hits (≥0, default: 0) - `misses` (int): Cache misses (≥0, default: 0) - `current_size` (int): Current number of entries (≥0, default: 0) - `evictions_lru` (int): LRU evictions (≥0, default: 0) - `evictions_ttl` (int): TTL-based evictions (≥0, default: 0) - `evictions_health` (int): Health-based evictions (≥0, default: 0) - `evictions_corruption` (int): Corruption-based evictions (≥0, default: 0) #### Computed Properties ```{eval-rst} .. py:property:: hit_rate :type: float Cache hit rate (0.0 to 1.0). :formula: hits / (hits + misses) if total > 0, else 0.0 ``` ```{eval-rst} .. py:property:: total_evictions :type: int Total evictions across all reasons. :formula: evictions_lru + evictions_ttl + evictions_health + evictions_corruption ``` --- ### HealthStatus (Enum) Proxy health status for cache entries (imported from `proxywhirl.models`). ```{eval-rst} .. py:class:: HealthStatus String enum representing proxy health status with 5 states. **Values:** - ``UNKNOWN = "unknown"`` - Not yet tested (default) - ``HEALTHY = "healthy"`` - Working normally - ``DEGRADED = "degraded"`` - Partial functionality (some failures) - ``UNHEALTHY = "unhealthy"`` - Experiencing issues (many failures) - ``DEAD = "dead"`` - Not responding (completely unusable) **Example:** .. code-block:: python from proxywhirl.cache import HealthStatus status = HealthStatus.HEALTHY print(status.value) # "healthy" # All 5 states are available for state in HealthStatus: print(f"{state.name}: {state.value}") ``` --- ### CacheTierType (Enum) Type of cache tier. ```{eval-rst} .. py:class:: CacheTierType String enum representing cache tier types. **Values:** - ``L1 = "l1"`` - Memory tier - ``L2 = "l2"`` - Disk tier - ``L3 = "l3"`` - SQLite tier **Example:** .. code-block:: python from proxywhirl.cache import CacheTierType tier = CacheTierType.L1 print(tier.value) # "l1" ``` --- ### L2BackendType (Enum) L2 cache backend type selection. ```{eval-rst} .. py:class:: L2BackendType String enum for selecting the L2 disk cache storage backend. **Values:** - ``JSONL = "jsonl"`` - File-based JSONL with sharding (default, best for <10K entries) - ``SQLITE = "sqlite"`` - SQLite database (faster for >10K entries) **Example:** .. code-block:: python from proxywhirl.cache import CacheConfig, L2BackendType # Default JSONL backend config = CacheConfig() assert config.l2_backend == L2BackendType.JSONL # SQLite backend for large caches config = CacheConfig(l2_backend=L2BackendType.SQLITE) ``` **When to use each backend:** | Backend | Best For | Performance | Features | |---------|----------|-------------|----------| | JSONL | <10K entries | O(n) lookups | Human-readable, portable, simple debugging | | SQLite | >10K entries | O(log n) lookups | Indexed queries, faster batch operations | --- ## Tier Implementations ### CacheTier (Abstract Base Class) Abstract base class for cache tier implementations. ```{eval-rst} .. py:class:: CacheTier Defines the interface that all cache tiers (L1, L2, L3) must implement, including graceful degradation on repeated failures. **Attributes:** - ``config`` (CacheTierConfig) - Configuration for this tier - ``tier_type`` (TierType) - Type of tier (L1/L2/L3) - ``enabled`` (bool) - Whether tier is operational - ``failure_count`` (int) - Consecutive failures for degradation tracking - ``failure_threshold`` (int) - Failures before auto-disabling tier (default: 3) ``` #### Constructor ```{eval-rst} .. py:method:: __init__(config: CacheTierConfig, tier_type: TierType) -> None Initialize cache tier with configuration. :param config: Configuration for this tier :param tier_type: Type of tier (L1/L2/L3) ``` #### Abstract Methods ```{eval-rst} .. py:method:: get(key: str) -> Optional[CacheEntry] :abstractmethod: Retrieve entry by key, None if not found or expired. :param key: Cache key to lookup :returns: CacheEntry if found and valid, None otherwise ``` ```{eval-rst} .. py:method:: put(key: str, entry: CacheEntry) -> bool :abstractmethod: Store entry, return True if successful. :param key: Cache key for entry :param entry: CacheEntry to store :returns: True if stored successfully, False otherwise ``` ```{eval-rst} .. py:method:: delete(key: str) -> bool :abstractmethod: Remove entry by key, return True if existed. :param key: Cache key to delete :returns: True if entry existed and was deleted, False if not found ``` ```{eval-rst} .. py:method:: clear() -> int :abstractmethod: Clear all entries, return count of removed entries. :returns: Number of entries removed ``` ```{eval-rst} .. py:method:: size() -> int :abstractmethod: Return current number of entries. :returns: Number of entries in tier ``` ```{eval-rst} .. py:method:: keys() -> list[str] :abstractmethod: Return list of all keys. :returns: List of cache keys ``` ```{eval-rst} .. py:method:: cleanup_expired() -> int :abstractmethod: Remove all expired entries in bulk. :returns: Number of entries removed ``` #### Concrete Methods ```{eval-rst} .. py:method:: handle_failure(error: Exception) -> None Handle tier operation failure for graceful degradation. Increments failure count and disables tier if threshold exceeded. Called by implementations when operations fail. :param error: Exception that occurred ``` ```{eval-rst} .. py:method:: reset_failures() -> None Reset failure count on successful operation. Re-enables tier if previously disabled and resets failure counter. Implementations should call this after successful operations. ``` --- ### MemoryCacheTier L1 in-memory cache using OrderedDict for LRU tracking. ```{eval-rst} .. py:class:: MemoryCacheTier(CacheTier) Provides O(1) lookups with automatic LRU eviction when max_entries exceeded. **Example:** .. code-block:: python from proxywhirl.cache.tiers import MemoryCacheTier, TierType from proxywhirl.cache import CacheTierConfig config = CacheTierConfig(max_entries=1000, eviction_policy="lru") tier = MemoryCacheTier(config, TierType.L1_MEMORY) # Store entry tier.put(key, entry) # Retrieve entry (moves to end for LRU) cached = tier.get(key) # Delete entry deleted = tier.delete(key) # Get all keys keys = tier.keys() # Get size size = tier.size() # Clear all cleared = tier.clear() # Cleanup expired removed = tier.cleanup_expired() ``` #### Constructor ```{eval-rst} .. py:method:: __init__(config: CacheTierConfig, tier_type: TierType, on_evict: Optional[Callable[[str, CacheEntry], None]] = None) -> None :noindex: Initialize memory cache with LRU tracking. :param config: Tier configuration :param tier_type: Type of tier (L1/L2/L3) :param on_evict: Optional callback when entry is evicted (key, entry) ``` #### Features - O(1) lookups - Automatic LRU eviction when max_entries exceeded - Thread-safe with failure tracking - No persistence - Callbacks on eviction for demotion to L2 --- ### JsonlCacheTier L2 file-based cache using sharded JSONL files with encryption. ```{eval-rst} .. py:class:: JsonlCacheTier(CacheTier) File-based cache tier using JSON Lines format with consistent-hash sharding. Best for <10K entries. Human-readable, portable, and git-friendly. Uses sharded JSONL files with: - Consistent hash sharding (default 16 shards) - In-memory index for O(1) key→shard lookups - File locking (portalocker) for concurrent access safety - Fernet encryption for credentials at rest - Human-readable JSON Lines format **Example:** .. code-block:: python from proxywhirl.cache.tiers import JsonlCacheTier, TierType from proxywhirl.cache import CacheTierConfig, CredentialEncryptor from pathlib import Path config = CacheTierConfig(max_entries=5000, eviction_policy="lru") encryptor = CredentialEncryptor() cache_dir = Path(".cache/proxies") tier = JsonlCacheTier( config=config, tier_type=TierType.L2_FILE, cache_dir=cache_dir, encryptor=encryptor, num_shards=16 # Default ) # Store entry (writes to appropriate shard file) tier.put(key, entry) # Retrieve entry (uses in-memory index for O(1) shard lookup) cached = tier.get(key) # Delete entry deleted = tier.delete(key) # Get all keys (from index) keys = tier.keys() # Get size size = tier.size() # Clear all (removes all shard files) cleared = tier.clear() # Cleanup expired entries removed = tier.cleanup_expired() ``` #### Constructor ```{eval-rst} .. py:method:: __init__(config: CacheTierConfig, tier_type: TierType, cache_dir: Path, encryptor: Optional[CredentialEncryptor] = None, num_shards: int = 16) -> None :noindex: Initialize JSONL file cache with sharding and encryption. :param config: Tier configuration :param tier_type: Type of tier (L1/L2/L3) :param cache_dir: Directory for shard files :param encryptor: Optional encryptor for credentials :param num_shards: Number of shard files (default: 16) ``` #### File Structure ```text .cache/proxies/ ├── shard_00.jsonl ├── shard_01.jsonl ├── ... └── shard_15.jsonl ``` Each shard file contains JSON Lines entries: ```text {"key": "abc123", "proxy_url": "http://proxy:8080", "source": "free-proxy-list", "ttl_seconds": 3600, ...} {"key": "def456", "proxy_url": "socks5://proxy:1080", "source": "geonode", "ttl_seconds": 7200, ...} ``` #### Features - Human-readable JSON Lines format - Portable (can copy/move files) - Git-friendly for version control - Consistent-hash sharding for distribution - In-memory index for fast lookups - File locking for concurrent access - Encrypted credentials at rest - Best for <10K entries #### When to Use JSONL vs SQLite | Factor | JSONL (JsonlCacheTier) | SQLite (DiskCacheTier) | |--------|------------------------|------------------------| | Entry count | <10K entries | >10K entries | | Lookup speed | O(n) per shard | O(log n) indexed | | Portability | Copy files anywhere | Single .db file | | Git-friendly | Yes | Not recommended | | Human-readable | Yes | No (binary) | | Concurrent writes | File locking | WAL mode | --- ### DiskCacheTier L2 SQLite-based cache with encryption and indexed lookups. ```{eval-rst} .. py:class:: DiskCacheTier(CacheTier) Optimized for >10K entries using SQLite with B-tree indexes instead of JSONL. Provides O(log n) lookups vs O(n) for JSONL, achieving <10ms reads for 10K+ entries. Uses a lightweight SQLite database with: - Primary key index on cache key for fast lookups - Encrypted credentials stored as BLOB - Efficient bulk operations (cleanup, size, keys) - File-based persistence without complex sharding **Example:** .. code-block:: python from proxywhirl.cache.tiers import DiskCacheTier, TierType from proxywhirl.cache import CacheTierConfig, CredentialEncryptor from pathlib import Path config = CacheTierConfig(max_entries=5000, eviction_policy="lru") encryptor = CredentialEncryptor() cache_dir = Path(".cache/proxies") tier = DiskCacheTier(config, TierType.L2_FILE, cache_dir, encryptor) # Same interface as MemoryCacheTier tier.put(key, entry) cached = tier.get(key) ``` #### Constructor ```{eval-rst} .. py:method:: __init__(config: CacheTierConfig, tier_type: TierType, cache_dir: Path, encryptor: Optional[CredentialEncryptor] = None) -> None :noindex: Initialize SQLite-based L2 cache. :param config: Tier configuration :param tier_type: Type of tier (should be L2_FILE) :param cache_dir: Directory for cache database :param encryptor: Credential encryptor for username/password ``` #### Methods ```{eval-rst} .. py:method:: migrate_from_jsonl(jsonl_dir: Optional[Path] = None) -> int Migrate existing JSONL shard files to SQLite L2 cache. This method provides a migration path from the old JSONL-based L2 cache to the new SQLite-based implementation. It reads all shard_*.jsonl files from the specified directory and imports them into the SQLite database. :param jsonl_dir: Directory containing shard_*.jsonl files (defaults to self.cache_dir) :returns: Number of entries successfully migrated **Example:** .. code-block:: python tier = DiskCacheTier(config, TierType.L2_FILE, cache_dir) migrated = tier.migrate_from_jsonl() print(f"Migrated {migrated} entries from JSONL to SQLite") ``` ```{eval-rst} .. py:method:: close() -> None Close the persistent SQLite connection and release database resources. Should be called when the cache tier is no longer needed to properly release database resources and file locks. Safe to call multiple times. Thread-safe via internal lock. **Example:** .. code-block:: python tier = DiskCacheTier(config, TierType.L2_FILE, cache_dir, encryptor) try: tier.put(key, entry) cached = tier.get(key) finally: tier.close() ``` #### Features - O(log n) indexed lookups using SQLite B-tree - Encrypted credential storage (BLOB fields) - Atomic operations with SQLite transactions - Efficient bulk cleanup using SQL DELETE - Simple file-based persistence (single .db file) - Automatic schema initialization #### Database Schema ```sql CREATE TABLE l2_cache ( key TEXT PRIMARY KEY, proxy_url TEXT NOT NULL, username_encrypted BLOB, password_encrypted BLOB, source TEXT NOT NULL, fetch_time REAL NOT NULL, last_accessed REAL NOT NULL, access_count INTEGER DEFAULT 0, ttl_seconds INTEGER NOT NULL, expires_at REAL NOT NULL, health_status TEXT DEFAULT 'unknown', failure_count INTEGER DEFAULT 0, evicted_from_l1 INTEGER DEFAULT 0 ); CREATE INDEX idx_l2_expires_at ON l2_cache(expires_at); CREATE INDEX idx_l2_source ON l2_cache(source); ``` --- ### SQLiteCacheTier L3 SQLite database cache with encrypted credentials and health history. ```{eval-rst} .. py:class:: SQLiteCacheTier(CacheTier) Provides durable persistence with SQL indexing for fast lookups and comprehensive health history tracking. **Example:** .. code-block:: python from proxywhirl.cache.tiers import SQLiteCacheTier, TierType from proxywhirl.cache import CacheTierConfig, CredentialEncryptor from pathlib import Path config = CacheTierConfig(max_entries=None, eviction_policy="lru") # Unlimited encryptor = CredentialEncryptor() db_path = Path(".cache/db/proxywhirl.db") tier = SQLiteCacheTier(config, TierType.L3_SQLITE, db_path, encryptor) # Same interface as other tiers tier.put(key, entry) cached = tier.get(key) # Optimized bulk cleanup with SQL DELETE removed = tier.cleanup_expired() # O(1) instead of O(n) ``` #### Constructor ```{eval-rst} .. py:method:: __init__(config: CacheTierConfig, tier_type: TierType, db_path: Path, encryptor: Optional[CredentialEncryptor] = None) -> None :noindex: Initialize SQLite cache. :param config: Tier configuration :param tier_type: Type of tier (should be L3_SQLITE) :param db_path: Path to SQLite database file :param encryptor: Credential encryptor for username/password ``` #### Features - Full persistence - SQL indexing for fast lookups - Health history tracking with separate table - Automatic schema migration - Optimized bulk cleanup (O(1) using SQL DELETE) - Credential encryption with BLOB storage - Foreign key constraints #### Database Schema ```sql CREATE TABLE cache_entries ( key TEXT PRIMARY KEY, proxy_url TEXT NOT NULL, username_encrypted BLOB, password_encrypted BLOB, source TEXT NOT NULL, fetch_time REAL NOT NULL, last_accessed REAL NOT NULL, access_count INTEGER DEFAULT 0, ttl_seconds INTEGER NOT NULL, expires_at REAL NOT NULL, health_status TEXT DEFAULT 'unknown', failure_count INTEGER DEFAULT 0, created_at REAL NOT NULL, updated_at REAL NOT NULL, -- Health monitoring fields last_health_check REAL, consecutive_health_failures INTEGER DEFAULT 0, consecutive_health_successes INTEGER DEFAULT 0, recovery_attempt INTEGER DEFAULT 0, next_check_time REAL, last_health_error TEXT, total_health_checks INTEGER DEFAULT 0, total_health_check_failures INTEGER DEFAULT 0, evicted_from_l1 INTEGER DEFAULT 0 ); CREATE TABLE health_history ( id INTEGER PRIMARY KEY AUTOINCREMENT, proxy_key TEXT NOT NULL, check_time REAL NOT NULL, status TEXT NOT NULL, response_time_ms REAL, error_message TEXT, check_url TEXT NOT NULL, FOREIGN KEY (proxy_key) REFERENCES cache_entries(key) ON DELETE CASCADE ); -- Indexes CREATE INDEX idx_expires_at ON cache_entries(expires_at); CREATE INDEX idx_source ON cache_entries(source); CREATE INDEX idx_health_status ON cache_entries(health_status); CREATE INDEX idx_last_accessed ON cache_entries(last_accessed); CREATE INDEX idx_health_history_proxy ON health_history(proxy_key); CREATE INDEX idx_health_history_time ON health_history(check_time); ``` --- ## Utilities ### CredentialEncryptor :::{warning} If no encryption key is provided and the `PROXYWHIRL_CACHE_ENCRYPTION_KEY` environment variable is not set, a new key is generated automatically. This means cached data encrypted with a previous key will be unreadable. Always persist your encryption key for production use. ::: Handles encryption/decryption of proxy credentials using Fernet symmetric encryption (AES-128-CBC + HMAC). Supports **key rotation** via `MultiFernet`: set `PROXYWHIRL_CACHE_KEY_PREVIOUS` to the old key when rotating, allowing decryption of data encrypted with either key while new encryptions use the current key. ```{eval-rst} .. py:class:: CredentialEncryptor Provides Fernet symmetric encryption for proxy credentials at rest (L2/L3 tiers). Uses environment variable PROXYWHIRL_CACHE_ENCRYPTION_KEY for key management. **Example:** .. code-block:: python from proxywhirl.cache import CredentialEncryptor from pydantic import SecretStr import os # Option 1: Use environment variable os.environ["PROXYWHIRL_CACHE_ENCRYPTION_KEY"] = "your-32-byte-url-safe-base64-key" encryptor = CredentialEncryptor() # Option 2: Provide key directly from cryptography.fernet import Fernet key = Fernet.generate_key() encryptor = CredentialEncryptor(key=key) # Encrypt credentials plaintext = SecretStr("mypassword") encrypted = encryptor.encrypt(plaintext) # bytes # Decrypt credentials decrypted = encryptor.decrypt(encrypted) # SecretStr print(decrypted.get_secret_value()) # "mypassword" ``` #### Constructor ```{eval-rst} .. py:method:: __init__(key: Optional[bytes] = None) -> None :noindex: Initialize encryptor with Fernet key. :param key: Optional Fernet key (32 url-safe base64-encoded bytes). If None, reads from PROXYWHIRL_CACHE_ENCRYPTION_KEY env var. If env var not set, generates a new key (WARNING: regenerated keys cannot decrypt existing cached data). :raises ValueError: If provided key is invalid for Fernet **Attributes:** - ``key`` (bytes) - Fernet encryption key - ``_cipher`` (Fernet) - Fernet cipher instance ``` #### Methods ```{eval-rst} .. py:method:: encrypt(secret: SecretStr) -> bytes Encrypt a SecretStr to bytes. :param secret: SecretStr containing plaintext to encrypt :returns: Encrypted bytes suitable for storage in BLOB fields :raises ValueError: If encryption fails **Example:** .. code-block:: python encrypted = encryptor.encrypt(SecretStr("password123")) # b'gAAAAA...' ``` ```{eval-rst} .. py:method:: decrypt(encrypted: bytes) -> SecretStr Decrypt encrypted bytes back to SecretStr. :param encrypted: Encrypted bytes from storage :returns: SecretStr containing decrypted plaintext (never logs value) :raises ValueError: If decryption fails (wrong key, corrupted data) **Example:** .. code-block:: python decrypted = encryptor.decrypt(encrypted_bytes) print(decrypted.get_secret_value()) # "password123" ``` --- ### CacheManager Main orchestrator for multi-tier proxy caching with automatic promotion/demotion, TTL management, and health-based invalidation. ```{eval-rst} .. py:class:: CacheManager Manages caching across three tiers: - **L1 (Memory)**: Fast in-memory cache using OrderedDict (LRU) - **L2 (Disk)**: Persistent cache with configurable backend (JSONL or SQLite) - **L3 (SQLite)**: Database cache for cold storage with full queryability Supports TTL-based expiration, health-based invalidation, and graceful degradation when tiers fail. Thread-safe via ``threading.RLock``. **Example:** .. code-block:: python from proxywhirl.cache import CacheManager, CacheConfig, CacheEntry, HealthStatus from datetime import datetime, timezone, timedelta config = CacheConfig() manager = CacheManager(config) # Store an entry entry = CacheEntry( key="abc123", proxy_url="http://proxy.example.com:8080", source="api", fetch_time=datetime.now(timezone.utc), last_accessed=datetime.now(timezone.utc), ttl_seconds=3600, expires_at=datetime.now(timezone.utc) + timedelta(seconds=3600), health_status=HealthStatus.HEALTHY ) manager.put(entry.key, entry) # Retrieve (promotes to higher tiers on hit) retrieved = manager.get(entry.key) # Delete from all tiers manager.delete(entry.key) # Statistics stats = manager.get_statistics() print(f"Overall hit rate: {stats.overall_hit_rate:.2%}") # Export/import manager.export_to_file("proxies.jsonl") manager.warm_from_file("proxies.jsonl", ttl_override=3600) ``` #### Constructor ```{eval-rst} .. py:method:: __init__(config: CacheConfig) -> None :noindex: Initialize cache manager with configuration. :param config: Cache configuration with tier settings (required) Initializes L1 (memory), L2 (disk), and L3 (SQLite) tiers based on config. Starts background TTL cleanup if ``enable_background_cleanup`` is True. ``` #### Methods ```{eval-rst} .. py:method:: get(key: str) -> CacheEntry | None :no-index: Retrieve entry from cache with tier promotion. Checks L1 → L2 → L3 in order. Promotes entries to higher tiers on hit. Updates ``access_count`` and ``last_accessed`` on successful retrieval. Expired entries are automatically deleted from all tiers. :param key: Cache key to retrieve :returns: CacheEntry if found and not expired, None otherwise .. py:method:: put(key: str, entry: CacheEntry) -> bool :no-index: Store entry in all enabled tiers. Writes to all tiers for redundancy. Credentials are automatically redacted in logs. :param key: Cache key :param entry: CacheEntry to store :returns: True if stored in at least one tier, False otherwise .. py:method:: delete(key: str) -> bool :no-index: Delete entry from all tiers. :param key: Cache key to delete :returns: True if deleted from at least one tier, False if not found .. py:method:: clear() -> int :no-index: Clear all entries from all tiers. :returns: Total number of entries cleared .. py:method:: invalidate_by_health(key: str) -> None Mark proxy as unhealthy and evict if failure threshold reached. Increments the ``failure_count`` and sets ``health_status`` to UNHEALTHY. If ``failure_count`` reaches the configured ``failure_threshold``, the proxy is removed from all cache tiers. :param key: Cache key to invalidate .. py:method:: get_statistics() -> CacheStatistics Get current cache statistics. :returns: CacheStatistics with hit rates, sizes, and tier degradation status .. py:method:: export_to_file(filepath: str) -> dict[str, int] Export all cache entries to a JSONL file. :param filepath: Path to export file :returns: Dict with ``exported`` and ``failed`` counts .. py:method:: warm_from_file(file_path: str, ttl_override: int | None = None) -> dict[str, int] Load proxies from a file to pre-populate the cache. Supports JSON (array), JSONL (newline-delimited), and CSV formats. Invalid entries are skipped with warnings logged. :param file_path: Path to file containing proxy data :param ttl_override: Optional TTL in seconds (overrides ``default_ttl_seconds``) :returns: Dict with ``loaded``, ``skipped``, and ``failed`` counts .. py:method:: generate_cache_key(proxy_url: str) -> str :staticmethod: Generate cache key from proxy URL using SHA256 hash. :param proxy_url: Proxy URL to hash :returns: Hex-encoded SHA256 hash (first 16 chars) ``` --- ### Crypto Utilities The `proxywhirl.cache.crypto` module provides helper functions for encryption key management and rotation. ```python from proxywhirl.cache.crypto import get_encryption_keys, create_multi_fernet, rotate_key ``` #### `get_encryption_keys() -> list[bytes]` Get all valid encryption keys for MultiFernet. Returns keys in priority order: current key first, then previous key. Reads from `PROXYWHIRL_CACHE_ENCRYPTION_KEY` and `PROXYWHIRL_CACHE_KEY_PREVIOUS` environment variables. Generates a new key if no env vars are set. #### `create_multi_fernet() -> MultiFernet` Create a `MultiFernet` instance with all valid encryption keys. MultiFernet tries keys in order for decryption (newest first). All new encryptions use the first (current) key. #### `rotate_key(new_key: str) -> None` Rotate encryption keys by setting a new current key. Moves the current `PROXYWHIRL_CACHE_ENCRYPTION_KEY` to `PROXYWHIRL_CACHE_KEY_PREVIOUS` and sets the new key as current. This allows gradual migration: new data uses the new key, old data can still be decrypted with the previous key. ```python from cryptography.fernet import Fernet from proxywhirl.cache.crypto import rotate_key # Generate new key and rotate new_key = Fernet.generate_key().decode() rotate_key(new_key) # Old data remains readable via PROXYWHIRL_CACHE_KEY_PREVIOUS ``` --- ### TTLManager Manages TTL-based expiration with hybrid lazy + background cleanup. Used internally by `CacheManager` when `enable_background_cleanup=True`. ```{eval-rst} .. py:class:: TTLManager Combines two cleanup strategies: - **Lazy expiration**: Check TTL on every ``get()`` operation - **Background cleanup**: Periodic scan of all tiers to remove expired entries **Example:** .. code-block:: python from proxywhirl.cache.manager import TTLManager, CacheManager from proxywhirl.cache import CacheConfig config = CacheConfig(enable_background_cleanup=False) manager = CacheManager(config) # Manually create and start TTL manager ttl_mgr = TTLManager(manager, cleanup_interval=60) ttl_mgr.start() # ... later ... ttl_mgr.stop() ``` #### Constructor ```{eval-rst} .. py:method:: __init__(cache_manager: CacheManager, cleanup_interval: int = 60) -> None :noindex: :param cache_manager: Parent CacheManager instance :param cleanup_interval: Seconds between cleanup runs (default: 60) ``` #### Methods ```{eval-rst} .. py:method:: start() -> None Start background cleanup thread. Idempotent. .. py:method:: stop() -> None Stop background cleanup thread. Safe to call if not running. ``` #### Attributes - `enabled` (bool): Whether background cleanup is running - `cleanup_interval` (int): Seconds between cleanup runs --- ## Usage Examples ### Working with Cache Tiers Directly ```python from proxywhirl.cache.tiers import MemoryCacheTier, DiskCacheTier, SQLiteCacheTier, TierType from proxywhirl.cache import CacheTierConfig, CacheEntry, CredentialEncryptor, HealthStatus from datetime import datetime, timezone, timedelta from pathlib import Path from pydantic import SecretStr # Initialize tiers config = CacheTierConfig(max_entries=1000, eviction_policy="lru") encryptor = CredentialEncryptor() l1 = MemoryCacheTier(config, TierType.L1_MEMORY) l2 = DiskCacheTier(config, TierType.L2_FILE, Path(".cache/l2"), encryptor) l3 = SQLiteCacheTier(config, TierType.L3_SQLITE, Path(".cache/l3.db"), encryptor) # Create entry entry = CacheEntry( key="proxy1", proxy_url="http://proxy.example.com:8080", username=SecretStr("user"), password=SecretStr("pass"), source="api", fetch_time=datetime.now(timezone.utc), last_accessed=datetime.now(timezone.utc), ttl_seconds=3600, expires_at=datetime.now(timezone.utc) + timedelta(seconds=3600), health_status=HealthStatus.HEALTHY ) # Store in L1 l1.put(entry.key, entry) # Retrieve from L1 (O(1) lookup) cached = l1.get(entry.key) if cached: print(f"L1 hit: {cached.proxy_url}") # Store in L2 (persisted to disk) l2.put(entry.key, entry) # Retrieve from L2 (O(log n) SQLite lookup) cached = l2.get(entry.key) if cached: print(f"L2 hit: {cached.proxy_url}") # Store in L3 (full database persistence) l3.put(entry.key, entry) # Cleanup expired entries removed_l1 = l1.cleanup_expired() removed_l2 = l2.cleanup_expired() removed_l3 = l3.cleanup_expired() print(f"Removed: L1={removed_l1}, L2={removed_l2}, L3={removed_l3}") ``` --- ### Encryption and Security ```python from proxywhirl.cache import CredentialEncryptor from cryptography.fernet import Fernet from pydantic import SecretStr import os # Generate and save encryption key key = Fernet.generate_key() os.environ["PROXYWHIRL_CACHE_ENCRYPTION_KEY"] = key.decode() # Initialize encryptor encryptor = CredentialEncryptor() # Encrypt credentials username = SecretStr("admin") password = SecretStr("secret123") encrypted_user = encryptor.encrypt(username) encrypted_pass = encryptor.encrypt(password) print(f"Encrypted username: {encrypted_user.hex()}") print(f"Encrypted password: {encrypted_pass.hex()}") # Decrypt credentials decrypted_user = encryptor.decrypt(encrypted_user) decrypted_pass = encryptor.decrypt(encrypted_pass) print(f"Decrypted: {decrypted_user.get_secret_value()}") # "admin" # Password value never logged by SecretStr ``` --- :::{tip} If you have more than 10,000 cache entries, migrating from JSONL to SQLite L2 backend can significantly improve lookup performance (O(log n) vs O(n)). ::: ### Migration from JSONL to SQLite L2 ```python from proxywhirl.cache.tiers import DiskCacheTier, TierType from proxywhirl.cache import CacheTierConfig, CredentialEncryptor from pathlib import Path # Initialize new SQLite-based L2 tier config = CacheTierConfig(max_entries=5000) encryptor = CredentialEncryptor() cache_dir = Path(".cache/proxies") tier = DiskCacheTier(config, TierType.L2_FILE, cache_dir, encryptor) # Migrate from old JSONL shards migrated = tier.migrate_from_jsonl() print(f"Successfully migrated {migrated} entries from JSONL to SQLite") # Old JSONL files can now be safely removed # for shard in cache_dir.glob("shard_*.jsonl"): # shard.unlink() ``` --- ## Performance Considerations ### Tier Selection **L1 (Memory):** - Fastest (O(1) lookup) - Limited capacity (default: 1000 entries) - Use for hot proxies **L2 (Disk/SQLite):** - Medium speed (O(log n) indexed lookup) - Moderate capacity (default: 5000 entries) - Persistent across restarts - Use for warm proxies **L3 (SQLite):** - Slower (database overhead, but indexed) - Unlimited capacity - Full health history tracking - Use for cold storage and analytics ### Optimization Tips 1. **Tune tier sizes** based on workload 2. **Enable background cleanup** to avoid lazy cleanup overhead 3. **Use encryption** for sensitive credentials in L2/L3 4. **Monitor failure rates** for graceful degradation 5. **Leverage indexes** in L2/L3 for fast queries --- ## Thread Safety All tier implementations use internal locking for thread-safe operations. The `CacheTier` base class provides `handle_failure()` and `reset_failures()` methods for graceful degradation tracking. --- ## Error Handling Tiers implement graceful degradation: - After 3 consecutive failures, tier auto-disables (`enabled = False`) - Successful operations reset failure counter - Operations on disabled tiers return failure without attempting - Parent cache manager can detect degraded tiers via tier.enabled --- ## See Also - [Python API](python-api.md) -- Main ProxyWhirl API (CacheManager, CacheConfig usage) - [Configuration](configuration.md) -- TOML cache configuration options - [Exceptions](exceptions.md) -- Cache-specific exceptions (CacheCorruptionError, CacheStorageError, CacheValidationError) - [Rate Limiting API](rate-limiting-api.md) -- Rate limiting integration - [Caching](../guides/caching.md) -- Cache configuration patterns and optimization - [Deployment Security](../guides/deployment-security.md) -- Production cache security - [Getting Started](../getting-started/index.md) -- Getting started guide