Cache API Reference¶

Complete reference for ProxyWhirl’s multi-tier caching system with L1 (memory), L2 (disk), and L3 (SQLite) support.

Overview¶

The cache subsystem provides three-tier storage for proxies with automatic promotion, credential encryption, TTL management, and health-based invalidation. It supports graceful degradation when tiers fail and provides comprehensive statistics for monitoring.

Architecture:

L1 (Memory): Fast in-memory cache using OrderedDict with LRU eviction
L2 (Disk): Configurable persistent cache with two backend options:
- JSONL (default): File-based using sharded JSON Lines files, human-readable, portable, best for <10K entries
- SQLite: Database-based with indexed lookups, faster for >10K entries with O(log n) performance
L3 (SQLite): Full database cache with SQL indexing, health history tracking, and complete queryability

Data Models¶

CacheEntry¶

Container for a single cached proxy with metadata, TTL, and health tracking.

class CacheEntry¶

Pydantic model that stores proxy information with TTL, health status, and access tracking. Credentials are SecretStr in memory, encrypted at rest in L2/L3.

Example:

from proxywhirl.cache import CacheEntry, HealthStatus
from datetime import datetime, timezone, timedelta
from pydantic import SecretStr

entry = CacheEntry(
    key="abc123",
    proxy_url="http://proxy.example.com:8080",
    username=SecretStr("user"),
    password=SecretStr("pass"),
    source="api",
    fetch_time=datetime.now(timezone.utc),
    last_accessed=datetime.now(timezone.utc),
    ttl_seconds=3600,
    expires_at=datetime.now(timezone.utc) + timedelta(seconds=3600),
    health_status=HealthStatus.HEALTHY
)

# Check expiration
if entry.is_expired:
    print("Entry has expired")

# Check health
if entry.is_healthy:
    print("Proxy is healthy")

Fields¶

Identity:

key (str): Unique cache key (proxy URL hash)
proxy_url (str): Full proxy URL (scheme://host:port)

Credentials (encrypted at rest in L2/L3):

username (SecretStr | None): Proxy username
password (SecretStr | None): Proxy password

Metadata:

source (str): Proxy source identifier
fetch_time (datetime): When proxy was fetched
last_accessed (datetime): Last cache access time
access_count (int): Number of cache hits (default: 0)

TTL & Health:

ttl_seconds (int): Time-to-live in seconds (≥0)
expires_at (datetime): Absolute expiration time
health_status (HealthStatus): Current health status (default: UNKNOWN)
failure_count (int): Consecutive failures (≥0, default: 0)
evicted_from_l1 (bool): Whether entry was evicted from L1 cache (default: False)

Health Monitoring (Feature 006):

last_health_check (datetime | None): Last health check timestamp
consecutive_health_failures (int): Consecutive health check failures (≥0, default: 0)
consecutive_health_successes (int): Consecutive successful health checks (≥0, default: 0)
recovery_attempt (int): Current recovery attempt count (≥0, default: 0)
next_check_time (datetime | None): Scheduled next health check
last_health_error (str | None): Last health check error message
total_health_checks (int): Total health checks performed (≥0, default: 0)
total_health_check_failures (int): Total health check failures (≥0, default: 0)

Properties¶

property is_expired: bool¶

Check if entry has expired based on TTL.

Returns:: True if current time ≥ expires_at, False otherwise

property is_healthy: bool¶

Check if proxy is healthy enough to use.

Returns:: True if health_status == HEALTHY, False otherwise

CacheConfig¶

Configuration for cache behavior and tier settings.

class CacheConfig¶

Pydantic model that aggregates configuration for all three tiers plus global settings like TTL, cleanup intervals, and storage paths.

Example:

from proxywhirl.cache import CacheConfig, CacheTierConfig, L2BackendType
from pydantic import SecretStr

# Default JSONL backend (file-based, portable)
config = CacheConfig(
    # Tier configurations
    l1_config=CacheTierConfig(
        enabled=True,
        max_entries=1000,
        eviction_policy="lru"
    ),
    l2_config=CacheTierConfig(
        enabled=True,
        max_entries=5000,
        eviction_policy="lru"
    ),
    l2_backend=L2BackendType.JSONL,  # or L2BackendType.SQLITE for large caches
    l3_config=CacheTierConfig(
        enabled=True,
        max_entries=None,  # Unlimited
        eviction_policy="lru"
    ),

    # TTL Configuration
    default_ttl_seconds=3600,
    ttl_cleanup_interval=60,
    enable_background_cleanup=True,
    cleanup_interval_seconds=60,
    per_source_ttl={
        "api": 7200,      # API sources: 2 hours
        "scraper": 1800   # Scrapers: 30 minutes
    },

    # Storage Paths
    l2_cache_dir=".cache/proxies",
    l3_database_path=".cache/db/proxywhirl.db",

    # Encryption
    encryption_key=SecretStr("your-32-byte-url-safe-base64-key"),

    # Health Integration
    health_check_invalidation=True,
    failure_threshold=3,

    # Performance Tuning
    enable_statistics=True,
    statistics_interval=5
)

# SQLite backend for large caches (>10K entries)
large_cache_config = CacheConfig(
    l2_backend=L2BackendType.SQLITE,
    l2_config=CacheTierConfig(max_entries=50000)
)

Fields¶

Tier Configuration:

l1_config (CacheTierConfig): L1 (Memory) configuration (default: max_entries=1000)
l2_config (CacheTierConfig): L2 (Disk) configuration (default: max_entries=5000)
l2_backend (L2BackendType): L2 storage backend - “jsonl” or “sqlite” (default: JSONL)
l3_config (CacheTierConfig): L3 (SQLite) configuration (default: max_entries=None)

TTL Configuration:

default_ttl_seconds (int): Default TTL for cached proxies (≥60, default: 3600)
ttl_cleanup_interval (int): Background cleanup interval (≥10, default: 60)
enable_background_cleanup (bool): Enable background TTL cleanup thread (default: False)
cleanup_interval_seconds (int): Interval between cleanup runs (≥5, default: 60)
per_source_ttl (dict[str, int]): Per-source TTL overrides (default: empty dict)

Storage Paths:

l2_cache_dir (str): Directory for L2 cache (JSONL shards or SQLite database) (default: “.cache/proxies”)
l3_database_path (str): SQLite database path for L3 (default: “.cache/db/proxywhirl.db”)

Encryption:

encryption_key (SecretStr | None): Fernet encryption key (from env: PROXYWHIRL_CACHE_ENCRYPTION_KEY)

Health Integration:

health_check_invalidation (bool): Auto-invalidate on health check failure (default: True)
failure_threshold (int): Failures before health invalidation (≥1, default: 3)

Performance Tuning:

enable_statistics (bool): Track cache statistics (default: True)
statistics_interval (int): Stats aggregation interval (≥1, default: 5)

CacheTierConfig¶

Configuration for a single cache tier.

class CacheTierConfig¶

Pydantic model that defines capacity, eviction policy, and enable/disable state for one tier (L1, L2, or L3).

Example:

from proxywhirl.cache import CacheTierConfig

config = CacheTierConfig(
    enabled=True,
    max_entries=1000,
    eviction_policy="lru"  # "lru", "lfu", or "fifo"
)

Fields¶

enabled (bool): Enable this tier (default: True)
max_entries (int | None): Max entries (None=unlimited, default: None)
eviction_policy (str): Eviction policy: “lru”, “lfu”, or “fifo” (default: “lru”)

Validators¶

classmethod validate_policy(v: str) → str¶

Validate eviction policy is supported.

Parameters:: v – Policy name to validate
Raises:: ValueError – If policy is not one of [“lru”, “lfu”, “fifo”]
Returns:: Validated policy name

CacheStatistics¶

Aggregate cache statistics across all tiers.

class CacheStatistics¶

Pydantic model that combines tier-level statistics and tracks cross-tier operations like promotions and demotions.

Example:

from proxywhirl.cache import CacheStatistics

stats = CacheStatistics()
stats.l1_stats.hits = 100
stats.l1_stats.misses = 20

print(f"L1 hit rate: {stats.l1_stats.hit_rate:.2%}")
print(f"Overall hit rate: {stats.overall_hit_rate:.2%}")
print(f"Total size: {stats.total_size}")

# Export to monitoring
metrics = stats.to_metrics_dict()

Fields¶

Per-Tier Statistics:

l1_stats (TierStatistics): L1 statistics (default: empty TierStatistics)
l2_stats (TierStatistics): L2 statistics (default: empty TierStatistics)
l3_stats (TierStatistics): L3 statistics (default: empty TierStatistics)

Cross-Tier Operations:

promotions (int): L3→L2→L1 promotions (≥0, default: 0)
demotions (int): L1→L2→L3 demotions (≥0, default: 0)

Degradation Tracking:

l1_degraded (bool): L1 tier unavailable (default: False)
l2_degraded (bool): L2 tier unavailable (default: False)
l3_degraded (bool): L3 tier unavailable (default: False)

Computed Properties¶

property overall_hit_rate: float¶

Overall hit rate across all tiers (0.0 to 1.0).

Uses max of per-tier misses to avoid triple-counting misses that cascade through L1→L2→L3 lookups.

property total_size: int¶: Total cached entries across all tiers.

Methods¶

to_metrics_dict() → dict[str, float]¶

Convert to flat metrics dict for monitoring systems.

Returns:: Dictionary with metric names and float values

Example:

metrics = stats.to_metrics_dict()
# {
#     "cache.l1.hit_rate": 0.85,
#     "cache.l2.hit_rate": 0.60,
#     "cache.l3.hit_rate": 0.40,
#     "cache.overall.hit_rate": 0.75,
#     "cache.total_size": 1500.0,
#     "cache.promotions": 250.0,
#     "cache.demotions": 150.0,
#     "cache.l1.size": 1000.0,
#     "cache.l2.size": 450.0,
#     "cache.l3.size": 50.0
# }

TierStatistics¶

Statistics for a single cache tier.

class TierStatistics¶

Pydantic model that tracks hits, misses, evictions by reason, and computes hit rate.

Example:

from proxywhirl.cache import TierStatistics

stats = TierStatistics(hits=100, misses=20)
print(f"Hit rate: {stats.hit_rate:.2%}")  # 83.33%
print(f"Total evictions: {stats.total_evictions}")

Fields¶

hits (int): Cache hits (≥0, default: 0)
misses (int): Cache misses (≥0, default: 0)
current_size (int): Current number of entries (≥0, default: 0)
evictions_lru (int): LRU evictions (≥0, default: 0)
evictions_ttl (int): TTL-based evictions (≥0, default: 0)
evictions_health (int): Health-based evictions (≥0, default: 0)
evictions_corruption (int): Corruption-based evictions (≥0, default: 0)

Computed Properties¶

property hit_rate: float¶

Cache hit rate (0.0 to 1.0).

Formula:: hits / (hits + misses) if total > 0, else 0.0

property total_evictions: int¶

Total evictions across all reasons.

Formula:: evictions_lru + evictions_ttl + evictions_health + evictions_corruption

HealthStatus (Enum)¶

Proxy health status for cache entries (imported from proxywhirl.models).

class HealthStatus¶

String enum representing proxy health status with 5 states.

Values:

UNKNOWN = "unknown" - Not yet tested (default)
HEALTHY = "healthy" - Working normally
DEGRADED = "degraded" - Partial functionality (some failures)
UNHEALTHY = "unhealthy" - Experiencing issues (many failures)
DEAD = "dead" - Not responding (completely unusable)

Example:

from proxywhirl.cache import HealthStatus

status = HealthStatus.HEALTHY
print(status.value)  # "healthy"

# All 5 states are available
for state in HealthStatus:
    print(f"{state.name}: {state.value}")

CacheTierType (Enum)¶

Type of cache tier.

class CacheTierType¶

String enum representing cache tier types.

Values:

L1 = "l1" - Memory tier
L2 = "l2" - Disk tier
L3 = "l3" - SQLite tier

Example:

from proxywhirl.cache import CacheTierType

tier = CacheTierType.L1
print(tier.value)  # "l1"

L2BackendType (Enum)¶

L2 cache backend type selection.

class L2BackendType¶

String enum for selecting the L2 disk cache storage backend.

Values:

JSONL = "jsonl" - File-based JSONL with sharding (default, best for <10K entries)
SQLITE = "sqlite" - SQLite database (faster for >10K entries)

Example:

from proxywhirl.cache import CacheConfig, L2BackendType

# Default JSONL backend
config = CacheConfig()
assert config.l2_backend == L2BackendType.JSONL

# SQLite backend for large caches
config = CacheConfig(l2_backend=L2BackendType.SQLITE)

When to use each backend:

Backend	Best For	Performance	Features
JSONL	<10K entries	O(n) lookups	Human-readable, portable, simple debugging
SQLite	>10K entries	O(log n) lookups	Indexed queries, faster batch operations

Tier Implementations¶

CacheTier (Abstract Base Class)¶

Abstract base class for cache tier implementations.

class CacheTier¶

Defines the interface that all cache tiers (L1, L2, L3) must implement, including graceful degradation on repeated failures.

Attributes:

config (CacheTierConfig) - Configuration for this tier
tier_type (TierType) - Type of tier (L1/L2/L3)
enabled (bool) - Whether tier is operational
failure_count (int) - Consecutive failures for degradation tracking
failure_threshold (int) - Failures before auto-disabling tier (default: 3)

Constructor¶

__init__(config: CacheTierConfig, tier_type: TierType) → None¶

Initialize cache tier with configuration.

Parameters:

config – Configuration for this tier
tier_type – Type of tier (L1/L2/L3)

Abstract Methods¶

abstractmethod get(key: str) → CacheEntry | None¶

Retrieve entry by key, None if not found or expired.

Parameters:: key – Cache key to lookup
Returns:: CacheEntry if found and valid, None otherwise

abstractmethod put(key: str, entry: CacheEntry) → bool¶

Store entry, return True if successful.

Parameters:

key – Cache key for entry
entry – CacheEntry to store

Returns:

True if stored successfully, False otherwise

abstractmethod delete(key: str) → bool¶

Remove entry by key, return True if existed.

Parameters:: key – Cache key to delete
Returns:: True if entry existed and was deleted, False if not found

abstractmethod clear() → int¶

Clear all entries, return count of removed entries.

Returns:: Number of entries removed

abstractmethod size() → int¶

Return current number of entries.

Returns:: Number of entries in tier

abstractmethod keys() → list[str]¶

Return list of all keys.

Returns:: List of cache keys

abstractmethod cleanup_expired() → int¶

Remove all expired entries in bulk.

Returns:: Number of entries removed

Concrete Methods¶

handle_failure(error: Exception) → None¶

Handle tier operation failure for graceful degradation.

Increments failure count and disables tier if threshold exceeded. Called by implementations when operations fail.

Parameters:: error – Exception that occurred

reset_failures() → None¶

Reset failure count on successful operation.

Re-enables tier if previously disabled and resets failure counter. Implementations should call this after successful operations.

MemoryCacheTier¶

L1 in-memory cache using OrderedDict for LRU tracking.

class MemoryCacheTier(CacheTier)¶

Provides O(1) lookups with automatic LRU eviction when max_entries exceeded.

Example:

from proxywhirl.cache.tiers import MemoryCacheTier, TierType
from proxywhirl.cache import CacheTierConfig

config = CacheTierConfig(max_entries=1000, eviction_policy="lru")
tier = MemoryCacheTier(config, TierType.L1_MEMORY)

# Store entry
tier.put(key, entry)

# Retrieve entry (moves to end for LRU)
cached = tier.get(key)

# Delete entry
deleted = tier.delete(key)

# Get all keys
keys = tier.keys()

# Get size
size = tier.size()

# Clear all
cleared = tier.clear()

# Cleanup expired
removed = tier.cleanup_expired()

Constructor¶

__init__(config: CacheTierConfig, tier_type: TierType, on_evict: Callable[[str, CacheEntry], None] | None = None) → None

Initialize memory cache with LRU tracking.

Parameters:

config – Tier configuration
tier_type – Type of tier (L1/L2/L3)
on_evict – Optional callback when entry is evicted (key, entry)

Features¶

O(1) lookups
Automatic LRU eviction when max_entries exceeded
Thread-safe with failure tracking
No persistence
Callbacks on eviction for demotion to L2

JsonlCacheTier¶

L2 file-based cache using sharded JSONL files with encryption.

class JsonlCacheTier(CacheTier)¶

File-based cache tier using JSON Lines format with consistent-hash sharding. Best for <10K entries. Human-readable, portable, and git-friendly.

Uses sharded JSONL files with:

Consistent hash sharding (default 16 shards)
In-memory index for O(1) key→shard lookups
File locking (portalocker) for concurrent access safety
Fernet encryption for credentials at rest
Human-readable JSON Lines format

Example:

from proxywhirl.cache.tiers import JsonlCacheTier, TierType
from proxywhirl.cache import CacheTierConfig, CredentialEncryptor
from pathlib import Path

config = CacheTierConfig(max_entries=5000, eviction_policy="lru")
encryptor = CredentialEncryptor()
cache_dir = Path(".cache/proxies")

tier = JsonlCacheTier(
    config=config,
    tier_type=TierType.L2_FILE,
    cache_dir=cache_dir,
    encryptor=encryptor,
    num_shards=16  # Default
)

# Store entry (writes to appropriate shard file)
tier.put(key, entry)

# Retrieve entry (uses in-memory index for O(1) shard lookup)
cached = tier.get(key)

# Delete entry
deleted = tier.delete(key)

# Get all keys (from index)
keys = tier.keys()

# Get size
size = tier.size()

# Clear all (removes all shard files)
cleared = tier.clear()

# Cleanup expired entries
removed = tier.cleanup_expired()

Constructor¶

__init__(config: CacheTierConfig, tier_type: TierType, cache_dir: Path, encryptor: CredentialEncryptor | None = None, num_shards: int = 16) → None

Initialize JSONL file cache with sharding and encryption.

Parameters:

config – Tier configuration
tier_type – Type of tier (L1/L2/L3)
cache_dir – Directory for shard files
encryptor – Optional encryptor for credentials
num_shards – Number of shard files (default: 16)

File Structure¶

.cache/proxies/
├── shard_00.jsonl
├── shard_01.jsonl
├── ...
└── shard_15.jsonl

Each shard file contains JSON Lines entries:

{"key": "abc123", "proxy_url": "http://proxy:8080", "source": "free-proxy-list", "ttl_seconds": 3600, ...}
{"key": "def456", "proxy_url": "socks5://proxy:1080", "source": "geonode", "ttl_seconds": 7200, ...}

Features¶

Human-readable JSON Lines format
Portable (can copy/move files)
Git-friendly for version control
Consistent-hash sharding for distribution
In-memory index for fast lookups
File locking for concurrent access
Encrypted credentials at rest
Best for <10K entries

When to Use JSONL vs SQLite¶

Factor	JSONL (JsonlCacheTier)	SQLite (DiskCacheTier)
Entry count	<10K entries	>10K entries
Lookup speed	O(n) per shard	O(log n) indexed
Portability	Copy files anywhere	Single .db file
Git-friendly	Yes	Not recommended
Human-readable	Yes	No (binary)
Concurrent writes	File locking	WAL mode

DiskCacheTier¶

L2 SQLite-based cache with encryption and indexed lookups.

class DiskCacheTier(CacheTier)¶

Optimized for >10K entries using SQLite with B-tree indexes instead of JSONL. Provides O(log n) lookups vs O(n) for JSONL, achieving <10ms reads for 10K+ entries.

Uses a lightweight SQLite database with:

Primary key index on cache key for fast lookups
Encrypted credentials stored as BLOB
Efficient bulk operations (cleanup, size, keys)
File-based persistence without complex sharding

Example:

from proxywhirl.cache.tiers import DiskCacheTier, TierType
from proxywhirl.cache import CacheTierConfig, CredentialEncryptor
from pathlib import Path

config = CacheTierConfig(max_entries=5000, eviction_policy="lru")
encryptor = CredentialEncryptor()
cache_dir = Path(".cache/proxies")

tier = DiskCacheTier(config, TierType.L2_FILE, cache_dir, encryptor)

# Same interface as MemoryCacheTier
tier.put(key, entry)
cached = tier.get(key)

Constructor¶

__init__(config: CacheTierConfig, tier_type: TierType, cache_dir: Path, encryptor: CredentialEncryptor | None = None) → None

Initialize SQLite-based L2 cache.

Parameters:

config – Tier configuration
tier_type – Type of tier (should be L2_FILE)
cache_dir – Directory for cache database
encryptor – Credential encryptor for username/password

Methods¶

migrate_from_jsonl(jsonl_dir: Path | None = None) → int¶

Migrate existing JSONL shard files to SQLite L2 cache.

This method provides a migration path from the old JSONL-based L2 cache to the new SQLite-based implementation. It reads all shard_*.jsonl files from the specified directory and imports them into the SQLite database.

Parameters:: jsonl_dir – Directory containing shard_*.jsonl files (defaults to self.cache_dir)
Returns:: Number of entries successfully migrated

Example:

tier = DiskCacheTier(config, TierType.L2_FILE, cache_dir)
migrated = tier.migrate_from_jsonl()
print(f"Migrated {migrated} entries from JSONL to SQLite")

close() → None¶

Close the persistent SQLite connection and release database resources.

Should be called when the cache tier is no longer needed to properly release database resources and file locks. Safe to call multiple times. Thread-safe via internal lock.

Example:

tier = DiskCacheTier(config, TierType.L2_FILE, cache_dir, encryptor)
try:
    tier.put(key, entry)
    cached = tier.get(key)
finally:
    tier.close()

Features¶

O(log n) indexed lookups using SQLite B-tree
Encrypted credential storage (BLOB fields)
Atomic operations with SQLite transactions
Efficient bulk cleanup using SQL DELETE
Simple file-based persistence (single .db file)
Automatic schema initialization

Database Schema¶

CREATE TABLE l2_cache (
    key TEXT PRIMARY KEY,
    proxy_url TEXT NOT NULL,
    username_encrypted BLOB,
    password_encrypted BLOB,
    source TEXT NOT NULL,
    fetch_time REAL NOT NULL,
    last_accessed REAL NOT NULL,
    access_count INTEGER DEFAULT 0,
    ttl_seconds INTEGER NOT NULL,
    expires_at REAL NOT NULL,
    health_status TEXT DEFAULT 'unknown',
    failure_count INTEGER DEFAULT 0,
    evicted_from_l1 INTEGER DEFAULT 0
);

CREATE INDEX idx_l2_expires_at ON l2_cache(expires_at);
CREATE INDEX idx_l2_source ON l2_cache(source);

SQLiteCacheTier¶

L3 SQLite database cache with encrypted credentials and health history.

class SQLiteCacheTier(CacheTier)¶

Provides durable persistence with SQL indexing for fast lookups and comprehensive health history tracking.

Example:

from proxywhirl.cache.tiers import SQLiteCacheTier, TierType
from proxywhirl.cache import CacheTierConfig, CredentialEncryptor
from pathlib import Path

config = CacheTierConfig(max_entries=None, eviction_policy="lru")  # Unlimited
encryptor = CredentialEncryptor()
db_path = Path(".cache/db/proxywhirl.db")

tier = SQLiteCacheTier(config, TierType.L3_SQLITE, db_path, encryptor)

# Same interface as other tiers
tier.put(key, entry)
cached = tier.get(key)

# Optimized bulk cleanup with SQL DELETE
removed = tier.cleanup_expired()  # O(1) instead of O(n)

Constructor¶

__init__(config: CacheTierConfig, tier_type: TierType, db_path: Path, encryptor: CredentialEncryptor | None = None) → None

Initialize SQLite cache.

Parameters:

config – Tier configuration
tier_type – Type of tier (should be L3_SQLITE)
db_path – Path to SQLite database file
encryptor – Credential encryptor for username/password

Features¶

Full persistence
SQL indexing for fast lookups
Health history tracking with separate table
Automatic schema migration
Optimized bulk cleanup (O(1) using SQL DELETE)
Credential encryption with BLOB storage
Foreign key constraints

Database Schema¶

CREATE TABLE cache_entries (
    key TEXT PRIMARY KEY,
    proxy_url TEXT NOT NULL,
    username_encrypted BLOB,
    password_encrypted BLOB,
    source TEXT NOT NULL,
    fetch_time REAL NOT NULL,
    last_accessed REAL NOT NULL,
    access_count INTEGER DEFAULT 0,
    ttl_seconds INTEGER NOT NULL,
    expires_at REAL NOT NULL,
    health_status TEXT DEFAULT 'unknown',
    failure_count INTEGER DEFAULT 0,
    created_at REAL NOT NULL,
    updated_at REAL NOT NULL,
    -- Health monitoring fields
    last_health_check REAL,
    consecutive_health_failures INTEGER DEFAULT 0,
    consecutive_health_successes INTEGER DEFAULT 0,
    recovery_attempt INTEGER DEFAULT 0,
    next_check_time REAL,
    last_health_error TEXT,
    total_health_checks INTEGER DEFAULT 0,
    total_health_check_failures INTEGER DEFAULT 0,
    evicted_from_l1 INTEGER DEFAULT 0
);

CREATE TABLE health_history (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    proxy_key TEXT NOT NULL,
    check_time REAL NOT NULL,
    status TEXT NOT NULL,
    response_time_ms REAL,
    error_message TEXT,
    check_url TEXT NOT NULL,
    FOREIGN KEY (proxy_key) REFERENCES cache_entries(key) ON DELETE CASCADE
);

-- Indexes
CREATE INDEX idx_expires_at ON cache_entries(expires_at);
CREATE INDEX idx_source ON cache_entries(source);
CREATE INDEX idx_health_status ON cache_entries(health_status);
CREATE INDEX idx_last_accessed ON cache_entries(last_accessed);
CREATE INDEX idx_health_history_proxy ON health_history(proxy_key);
CREATE INDEX idx_health_history_time ON health_history(check_time);

Utilities¶

CredentialEncryptor¶

Warning

If no encryption key is provided and the PROXYWHIRL_CACHE_ENCRYPTION_KEY environment variable is not set, a new key is generated automatically. This means cached data encrypted with a previous key will be unreadable. Always persist your encryption key for production use.

Handles encryption/decryption of proxy credentials using Fernet symmetric encryption (AES-128-CBC + HMAC). Supports key rotation via MultiFernet: set PROXYWHIRL_CACHE_KEY_PREVIOUS to the old key when rotating, allowing decryption of data encrypted with either key while new encryptions use the current key.

class CredentialEncryptor¶

Provides Fernet symmetric encryption for proxy credentials at rest (L2/L3 tiers). Uses environment variable PROXYWHIRL_CACHE_ENCRYPTION_KEY for key management.

Example:

from proxywhirl.cache import CredentialEncryptor
from pydantic import SecretStr
import os

# Option 1: Use environment variable
os.environ["PROXYWHIRL_CACHE_ENCRYPTION_KEY"] = "your-32-byte-url-safe-base64-key"
encryptor = CredentialEncryptor()

# Option 2: Provide key directly
from cryptography.fernet import Fernet
key = Fernet.generate_key()
encryptor = CredentialEncryptor(key=key)

# Encrypt credentials
plaintext = SecretStr("mypassword")
encrypted = encryptor.encrypt(plaintext)  # bytes

# Decrypt credentials
decrypted = encryptor.decrypt(encrypted)  # SecretStr
print(decrypted.get_secret_value())  # "mypassword"

Constructor¶

__init__(key: bytes | None = None) → None

Initialize encryptor with Fernet key.

Parameters:: key – Optional Fernet key (32 url-safe base64-encoded bytes). If None, reads from PROXYWHIRL_CACHE_ENCRYPTION_KEY env var. If env var not set, generates a new key (WARNING: regenerated keys cannot decrypt existing cached data).
Raises:: ValueError – If provided key is invalid for Fernet

Attributes:

key (bytes) - Fernet encryption key
_cipher (Fernet) - Fernet cipher instance

Methods¶

encrypt(secret: SecretStr) → bytes¶

Encrypt a SecretStr to bytes.

Parameters:: secret – SecretStr containing plaintext to encrypt
Returns:: Encrypted bytes suitable for storage in BLOB fields
Raises:: ValueError – If encryption fails

Example:

encrypted = encryptor.encrypt(SecretStr("password123"))
# b'gAAAAA...'

decrypt(encrypted: bytes) → SecretStr¶

Decrypt encrypted bytes back to SecretStr.

Parameters:: encrypted – Encrypted bytes from storage
Returns:: SecretStr containing decrypted plaintext (never logs value)
Raises:: ValueError – If decryption fails (wrong key, corrupted data)

Example:

decrypted = encryptor.decrypt(encrypted_bytes)
print(decrypted.get_secret_value())  # "password123"

CacheManager¶

Main orchestrator for multi-tier proxy caching with automatic promotion/demotion, TTL management, and health-based invalidation.

class CacheManager¶

Manages caching across three tiers:

L1 (Memory): Fast in-memory cache using OrderedDict (LRU)
L2 (Disk): Persistent cache with configurable backend (JSONL or SQLite)
L3 (SQLite): Database cache for cold storage with full queryability

Supports TTL-based expiration, health-based invalidation, and graceful degradation when tiers fail. Thread-safe via threading.RLock.

Example:

from proxywhirl.cache import CacheManager, CacheConfig, CacheEntry, HealthStatus
from datetime import datetime, timezone, timedelta

config = CacheConfig()
manager = CacheManager(config)

# Store an entry
entry = CacheEntry(
    key="abc123",
    proxy_url="http://proxy.example.com:8080",
    source="api",
    fetch_time=datetime.now(timezone.utc),
    last_accessed=datetime.now(timezone.utc),
    ttl_seconds=3600,
    expires_at=datetime.now(timezone.utc) + timedelta(seconds=3600),
    health_status=HealthStatus.HEALTHY
)
manager.put(entry.key, entry)

# Retrieve (promotes to higher tiers on hit)
retrieved = manager.get(entry.key)

# Delete from all tiers
manager.delete(entry.key)

# Statistics
stats = manager.get_statistics()
print(f"Overall hit rate: {stats.overall_hit_rate:.2%}")

# Export/import
manager.export_to_file("proxies.jsonl")
manager.warm_from_file("proxies.jsonl", ttl_override=3600)

Constructor¶

__init__(config: CacheConfig) → None

Initialize cache manager with configuration.

Parameters:: config – Cache configuration with tier settings (required)

Initializes L1 (memory), L2 (disk), and L3 (SQLite) tiers based on config. Starts background TTL cleanup if enable_background_cleanup is True.

Methods¶

get(key: str) → CacheEntry | None

Retrieve entry from cache with tier promotion.

Checks L1 → L2 → L3 in order. Promotes entries to higher tiers on hit. Updates access_count and last_accessed on successful retrieval. Expired entries are automatically deleted from all tiers.

Parameters:: key – Cache key to retrieve
Returns:: CacheEntry if found and not expired, None otherwise

put(key: str, entry: CacheEntry) → bool

Store entry in all enabled tiers.

Writes to all tiers for redundancy. Credentials are automatically redacted in logs.

Parameters:

key – Cache key
entry – CacheEntry to store

Returns:

True if stored in at least one tier, False otherwise

delete(key: str) → bool

Delete entry from all tiers.

Parameters:: key – Cache key to delete
Returns:: True if deleted from at least one tier, False if not found

clear() → int

Clear all entries from all tiers.

Returns:: Total number of entries cleared

invalidate_by_health(key: str) → None¶

Mark proxy as unhealthy and evict if failure threshold reached.

Increments the failure_count and sets health_status to UNHEALTHY. If failure_count reaches the configured failure_threshold, the proxy is removed from all cache tiers.

Parameters:: key – Cache key to invalidate

get_statistics() → CacheStatistics¶

Get current cache statistics.

Returns:: CacheStatistics with hit rates, sizes, and tier degradation status

export_to_file(filepath: str) → dict[str, int]¶

Export all cache entries to a JSONL file.

Parameters:: filepath – Path to export file
Returns:: Dict with exported and failed counts

warm_from_file(file_path: str, ttl_override: int | None = None) → dict[str, int]¶

Load proxies from a file to pre-populate the cache.

Supports JSON (array), JSONL (newline-delimited), and CSV formats. Invalid entries are skipped with warnings logged.

Parameters:

file_path – Path to file containing proxy data
ttl_override – Optional TTL in seconds (overrides default_ttl_seconds)

Returns:

Dict with loaded, skipped, and failed counts

static generate_cache_key(proxy_url: str) → str¶

Generate cache key from proxy URL using SHA256 hash.

Parameters:: proxy_url – Proxy URL to hash
Returns:: Hex-encoded SHA256 hash (first 16 chars)

Crypto Utilities¶

The proxywhirl.cache.crypto module provides helper functions for encryption key management and rotation.

from proxywhirl.cache.crypto import get_encryption_keys, create_multi_fernet, rotate_key

`get_encryption_keys() -> list[bytes]`¶

Get all valid encryption keys for MultiFernet. Returns keys in priority order: current key first, then previous key. Reads from PROXYWHIRL_CACHE_ENCRYPTION_KEY and PROXYWHIRL_CACHE_KEY_PREVIOUS environment variables. Generates a new key if no env vars are set.

`create_multi_fernet() -> MultiFernet`¶

Create a MultiFernet instance with all valid encryption keys. MultiFernet tries keys in order for decryption (newest first). All new encryptions use the first (current) key.

`rotate_key(new_key: str) -> None`¶

Rotate encryption keys by setting a new current key. Moves the current PROXYWHIRL_CACHE_ENCRYPTION_KEY to PROXYWHIRL_CACHE_KEY_PREVIOUS and sets the new key as current. This allows gradual migration: new data uses the new key, old data can still be decrypted with the previous key.

from cryptography.fernet import Fernet
from proxywhirl.cache.crypto import rotate_key

# Generate new key and rotate
new_key = Fernet.generate_key().decode()
rotate_key(new_key)
# Old data remains readable via PROXYWHIRL_CACHE_KEY_PREVIOUS

TTLManager¶

Manages TTL-based expiration with hybrid lazy + background cleanup. Used internally by CacheManager when enable_background_cleanup=True.

class TTLManager¶

Combines two cleanup strategies:

Lazy expiration: Check TTL on every get() operation
Background cleanup: Periodic scan of all tiers to remove expired entries

Example:

from proxywhirl.cache.manager import TTLManager, CacheManager
from proxywhirl.cache import CacheConfig

config = CacheConfig(enable_background_cleanup=False)
manager = CacheManager(config)

# Manually create and start TTL manager
ttl_mgr = TTLManager(manager, cleanup_interval=60)
ttl_mgr.start()

# ... later ...
ttl_mgr.stop()

Constructor¶

__init__(cache_manager: CacheManager, cleanup_interval: int = 60) → None

Parameters:

cache_manager – Parent CacheManager instance
cleanup_interval – Seconds between cleanup runs (default: 60)

Methods¶

start() → None¶: Start background cleanup thread. Idempotent.

stop() → None¶: Stop background cleanup thread. Safe to call if not running.

Attributes¶

enabled (bool): Whether background cleanup is running
cleanup_interval (int): Seconds between cleanup runs

Usage Examples¶

Working with Cache Tiers Directly¶

from proxywhirl.cache.tiers import MemoryCacheTier, DiskCacheTier, SQLiteCacheTier, TierType
from proxywhirl.cache import CacheTierConfig, CacheEntry, CredentialEncryptor, HealthStatus
from datetime import datetime, timezone, timedelta
from pathlib import Path
from pydantic import SecretStr

# Initialize tiers
config = CacheTierConfig(max_entries=1000, eviction_policy="lru")
encryptor = CredentialEncryptor()

l1 = MemoryCacheTier(config, TierType.L1_MEMORY)
l2 = DiskCacheTier(config, TierType.L2_FILE, Path(".cache/l2"), encryptor)
l3 = SQLiteCacheTier(config, TierType.L3_SQLITE, Path(".cache/l3.db"), encryptor)

# Create entry
entry = CacheEntry(
    key="proxy1",
    proxy_url="http://proxy.example.com:8080",
    username=SecretStr("user"),
    password=SecretStr("pass"),
    source="api",
    fetch_time=datetime.now(timezone.utc),
    last_accessed=datetime.now(timezone.utc),
    ttl_seconds=3600,
    expires_at=datetime.now(timezone.utc) + timedelta(seconds=3600),
    health_status=HealthStatus.HEALTHY
)

# Store in L1
l1.put(entry.key, entry)

# Retrieve from L1 (O(1) lookup)
cached = l1.get(entry.key)
if cached:
    print(f"L1 hit: {cached.proxy_url}")

# Store in L2 (persisted to disk)
l2.put(entry.key, entry)

# Retrieve from L2 (O(log n) SQLite lookup)
cached = l2.get(entry.key)
if cached:
    print(f"L2 hit: {cached.proxy_url}")

# Store in L3 (full database persistence)
l3.put(entry.key, entry)

# Cleanup expired entries
removed_l1 = l1.cleanup_expired()
removed_l2 = l2.cleanup_expired()
removed_l3 = l3.cleanup_expired()
print(f"Removed: L1={removed_l1}, L2={removed_l2}, L3={removed_l3}")

Encryption and Security¶

from proxywhirl.cache import CredentialEncryptor
from cryptography.fernet import Fernet
from pydantic import SecretStr
import os

# Generate and save encryption key
key = Fernet.generate_key()
os.environ["PROXYWHIRL_CACHE_ENCRYPTION_KEY"] = key.decode()

# Initialize encryptor
encryptor = CredentialEncryptor()

# Encrypt credentials
username = SecretStr("admin")
password = SecretStr("secret123")

encrypted_user = encryptor.encrypt(username)
encrypted_pass = encryptor.encrypt(password)

print(f"Encrypted username: {encrypted_user.hex()}")
print(f"Encrypted password: {encrypted_pass.hex()}")

# Decrypt credentials
decrypted_user = encryptor.decrypt(encrypted_user)
decrypted_pass = encryptor.decrypt(encrypted_pass)

print(f"Decrypted: {decrypted_user.get_secret_value()}")  # "admin"
# Password value never logged by SecretStr

Tip

If you have more than 10,000 cache entries, migrating from JSONL to SQLite L2 backend can significantly improve lookup performance (O(log n) vs O(n)).

Migration from JSONL to SQLite L2¶

from proxywhirl.cache.tiers import DiskCacheTier, TierType
from proxywhirl.cache import CacheTierConfig, CredentialEncryptor
from pathlib import Path

# Initialize new SQLite-based L2 tier
config = CacheTierConfig(max_entries=5000)
encryptor = CredentialEncryptor()
cache_dir = Path(".cache/proxies")

tier = DiskCacheTier(config, TierType.L2_FILE, cache_dir, encryptor)

# Migrate from old JSONL shards
migrated = tier.migrate_from_jsonl()
print(f"Successfully migrated {migrated} entries from JSONL to SQLite")

# Old JSONL files can now be safely removed
# for shard in cache_dir.glob("shard_*.jsonl"):
#     shard.unlink()

Performance Considerations¶

Tier Selection¶

L1 (Memory):

Fastest (O(1) lookup)
Limited capacity (default: 1000 entries)
Use for hot proxies

L2 (Disk/SQLite):

Medium speed (O(log n) indexed lookup)
Moderate capacity (default: 5000 entries)
Persistent across restarts
Use for warm proxies

L3 (SQLite):

Slower (database overhead, but indexed)
Unlimited capacity
Full health history tracking
Use for cold storage and analytics

Optimization Tips¶

Tune tier sizes based on workload
Enable background cleanup to avoid lazy cleanup overhead
Use encryption for sensitive credentials in L2/L3
Monitor failure rates for graceful degradation
Leverage indexes in L2/L3 for fast queries

Thread Safety¶

All tier implementations use internal locking for thread-safe operations. The CacheTier base class provides handle_failure() and reset_failures() methods for graceful degradation tracking.

Error Handling¶

Tiers implement graceful degradation:

After 3 consecutive failures, tier auto-disables (enabled = False)
Successful operations reset failure counter
Operations on disabled tiers return failure without attempting
Parent cache manager can detect degraded tiers via tier.enabled

Cache API Reference¶

Overview¶

Data Models¶

CacheEntry¶

Fields¶

Properties¶

CacheConfig¶

Fields¶

CacheTierConfig¶

Fields¶

Validators¶

CacheStatistics¶

Fields¶

Computed Properties¶

Methods¶

TierStatistics¶

Fields¶

Computed Properties¶

HealthStatus (Enum)¶

CacheTierType (Enum)¶

L2BackendType (Enum)¶

Tier Implementations¶

CacheTier (Abstract Base Class)¶

Constructor¶

Abstract Methods¶

Concrete Methods¶

MemoryCacheTier¶

Constructor¶

Features¶

JsonlCacheTier¶

Constructor¶

File Structure¶

Features¶

When to Use JSONL vs SQLite¶

DiskCacheTier¶

Constructor¶

Methods¶

Features¶

Database Schema¶

SQLiteCacheTier¶

Constructor¶

Features¶

Database Schema¶

Utilities¶

CredentialEncryptor¶

Constructor¶

Methods¶

CacheManager¶

Constructor¶

Methods¶

Crypto Utilities¶

get_encryption_keys() -> list[bytes]¶

create_multi_fernet() -> MultiFernet¶

rotate_key(new_key: str) -> None¶

TTLManager¶

Constructor¶

Methods¶

Attributes¶

Usage Examples¶

Working with Cache Tiers Directly¶

Encryption and Security¶

Migration from JSONL to SQLite L2¶

Performance Considerations¶

Tier Selection¶

Optimization Tips¶

Thread Safety¶

Error Handling¶

See Also¶

`get_encryption_keys() -> list[bytes]`¶

`create_multi_fernet() -> MultiFernet`¶

`rotate_key(new_key: str) -> None`¶