Cache API Reference

Complete reference for ProxyWhirl’s multi-tier caching system with L1 (memory), L2 (disk), and L3 (SQLite) support.

See also

For cache configuration patterns and optimization tips, see Caching. For TOML cache configuration options, see Configuration.

from proxywhirl.cache import (
    CacheManager,
    CacheConfig,
    CacheEntry,
    HealthStatus,
    CacheTierType,
    MemoryCacheTier,
    DiskCacheTier,
    SQLiteCacheTier,
    CredentialEncryptor,
)

# L2BackendType and JsonlCacheTier are available via submodule imports:
from proxywhirl.cache.models import L2BackendType
from proxywhirl.cache.tiers import JsonlCacheTier

Overview

The cache subsystem provides three-tier storage for proxies with automatic promotion, credential encryption, TTL management, and health-based invalidation. It supports graceful degradation when tiers fail and provides comprehensive statistics for monitoring.

Architecture:

  • L1 (Memory): Fast in-memory cache using OrderedDict with LRU eviction

  • L2 (Disk): Configurable persistent cache with two backend options:

    • JSONL (default): File-based using sharded JSON Lines files, human-readable, portable, best for <10K entries

    • SQLite: Database-based with indexed lookups, faster for >10K entries with O(log n) performance

  • L3 (SQLite): Full database cache with SQL indexing, health history tracking, and complete queryability

Data Models

CacheEntry

Container for a single cached proxy with metadata, TTL, and health tracking.

class CacheEntry

Pydantic model that stores proxy information with TTL, health status, and access tracking. Credentials are SecretStr in memory, encrypted at rest in L2/L3.

Example:

from proxywhirl.cache import CacheEntry, HealthStatus
from datetime import datetime, timezone, timedelta
from pydantic import SecretStr

entry = CacheEntry(
    key="abc123",
    proxy_url="http://proxy.example.com:8080",
    username=SecretStr("user"),
    password=SecretStr("pass"),
    source="api",
    fetch_time=datetime.now(timezone.utc),
    last_accessed=datetime.now(timezone.utc),
    ttl_seconds=3600,
    expires_at=datetime.now(timezone.utc) + timedelta(seconds=3600),
    health_status=HealthStatus.HEALTHY
)

# Check expiration
if entry.is_expired:
    print("Entry has expired")

# Check health
if entry.is_healthy:
    print("Proxy is healthy")

Fields

Identity:

  • key (str): Unique cache key (proxy URL hash)

  • proxy_url (str): Full proxy URL (scheme://host:port)

Credentials (encrypted at rest in L2/L3):

  • username (SecretStr | None): Proxy username

  • password (SecretStr | None): Proxy password

Metadata:

  • source (str): Proxy source identifier

  • fetch_time (datetime): When proxy was fetched

  • last_accessed (datetime): Last cache access time

  • access_count (int): Number of cache hits (default: 0)

TTL & Health:

  • ttl_seconds (int): Time-to-live in seconds (≥0)

  • expires_at (datetime): Absolute expiration time

  • health_status (HealthStatus): Current health status (default: UNKNOWN)

  • failure_count (int): Consecutive failures (≥0, default: 0)

  • evicted_from_l1 (bool): Whether entry was evicted from L1 cache (default: False)

Health Monitoring (Feature 006):

  • last_health_check (datetime | None): Last health check timestamp

  • consecutive_health_failures (int): Consecutive health check failures (≥0, default: 0)

  • consecutive_health_successes (int): Consecutive successful health checks (≥0, default: 0)

  • recovery_attempt (int): Current recovery attempt count (≥0, default: 0)

  • next_check_time (datetime | None): Scheduled next health check

  • last_health_error (str | None): Last health check error message

  • total_health_checks (int): Total health checks performed (≥0, default: 0)

  • total_health_check_failures (int): Total health check failures (≥0, default: 0)

Properties

property is_expired: bool

Check if entry has expired based on TTL.

Returns:

True if current time ≥ expires_at, False otherwise

property is_healthy: bool

Check if proxy is healthy enough to use.

Returns:

True if health_status == HEALTHY, False otherwise


CacheConfig

Configuration for cache behavior and tier settings.

class CacheConfig

Pydantic model that aggregates configuration for all three tiers plus global settings like TTL, cleanup intervals, and storage paths.

Example:

from proxywhirl.cache import CacheConfig, CacheTierConfig, L2BackendType
from pydantic import SecretStr

# Default JSONL backend (file-based, portable)
config = CacheConfig(
    # Tier configurations
    l1_config=CacheTierConfig(
        enabled=True,
        max_entries=1000,
        eviction_policy="lru"
    ),
    l2_config=CacheTierConfig(
        enabled=True,
        max_entries=5000,
        eviction_policy="lru"
    ),
    l2_backend=L2BackendType.JSONL,  # or L2BackendType.SQLITE for large caches
    l3_config=CacheTierConfig(
        enabled=True,
        max_entries=None,  # Unlimited
        eviction_policy="lru"
    ),

    # TTL Configuration
    default_ttl_seconds=3600,
    ttl_cleanup_interval=60,
    enable_background_cleanup=True,
    cleanup_interval_seconds=60,
    per_source_ttl={
        "api": 7200,      # API sources: 2 hours
        "scraper": 1800   # Scrapers: 30 minutes
    },

    # Storage Paths
    l2_cache_dir=".cache/proxies",
    l3_database_path=".cache/db/proxywhirl.db",

    # Encryption
    encryption_key=SecretStr("your-32-byte-url-safe-base64-key"),

    # Health Integration
    health_check_invalidation=True,
    failure_threshold=3,

    # Performance Tuning
    enable_statistics=True,
    statistics_interval=5
)

# SQLite backend for large caches (>10K entries)
large_cache_config = CacheConfig(
    l2_backend=L2BackendType.SQLITE,
    l2_config=CacheTierConfig(max_entries=50000)
)

Fields

Tier Configuration:

  • l1_config (CacheTierConfig): L1 (Memory) configuration (default: max_entries=1000)

  • l2_config (CacheTierConfig): L2 (Disk) configuration (default: max_entries=5000)

  • l2_backend (L2BackendType): L2 storage backend - “jsonl” or “sqlite” (default: JSONL)

  • l3_config (CacheTierConfig): L3 (SQLite) configuration (default: max_entries=None)

TTL Configuration:

  • default_ttl_seconds (int): Default TTL for cached proxies (≥60, default: 3600)

  • ttl_cleanup_interval (int): Background cleanup interval (≥10, default: 60)

  • enable_background_cleanup (bool): Enable background TTL cleanup thread (default: False)

  • cleanup_interval_seconds (int): Interval between cleanup runs (≥5, default: 60)

  • per_source_ttl (dict[str, int]): Per-source TTL overrides (default: empty dict)

Storage Paths:

  • l2_cache_dir (str): Directory for L2 cache (JSONL shards or SQLite database) (default: “.cache/proxies”)

  • l3_database_path (str): SQLite database path for L3 (default: “.cache/db/proxywhirl.db”)

Encryption:

  • encryption_key (SecretStr | None): Fernet encryption key (from env: PROXYWHIRL_CACHE_ENCRYPTION_KEY)

Health Integration:

  • health_check_invalidation (bool): Auto-invalidate on health check failure (default: True)

  • failure_threshold (int): Failures before health invalidation (≥1, default: 3)

Performance Tuning:

  • enable_statistics (bool): Track cache statistics (default: True)

  • statistics_interval (int): Stats aggregation interval (≥1, default: 5)


CacheTierConfig

Configuration for a single cache tier.

class CacheTierConfig

Pydantic model that defines capacity, eviction policy, and enable/disable state for one tier (L1, L2, or L3).

Example:

from proxywhirl.cache import CacheTierConfig

config = CacheTierConfig(
    enabled=True,
    max_entries=1000,
    eviction_policy="lru"  # "lru", "lfu", or "fifo"
)

Fields

  • enabled (bool): Enable this tier (default: True)

  • max_entries (int | None): Max entries (None=unlimited, default: None)

  • eviction_policy (str): Eviction policy: “lru”, “lfu”, or “fifo” (default: “lru”)

Validators

classmethod validate_policy(v: str) str

Validate eviction policy is supported.

Parameters:

v – Policy name to validate

Raises:

ValueError – If policy is not one of [“lru”, “lfu”, “fifo”]

Returns:

Validated policy name


CacheStatistics

Aggregate cache statistics across all tiers.

class CacheStatistics

Pydantic model that combines tier-level statistics and tracks cross-tier operations like promotions and demotions.

Example:

from proxywhirl.cache import CacheStatistics

stats = CacheStatistics()
stats.l1_stats.hits = 100
stats.l1_stats.misses = 20

print(f"L1 hit rate: {stats.l1_stats.hit_rate:.2%}")
print(f"Overall hit rate: {stats.overall_hit_rate:.2%}")
print(f"Total size: {stats.total_size}")

# Export to monitoring
metrics = stats.to_metrics_dict()

Fields

Per-Tier Statistics:

  • l1_stats (TierStatistics): L1 statistics (default: empty TierStatistics)

  • l2_stats (TierStatistics): L2 statistics (default: empty TierStatistics)

  • l3_stats (TierStatistics): L3 statistics (default: empty TierStatistics)

Cross-Tier Operations:

  • promotions (int): L3→L2→L1 promotions (≥0, default: 0)

  • demotions (int): L1→L2→L3 demotions (≥0, default: 0)

Degradation Tracking:

  • l1_degraded (bool): L1 tier unavailable (default: False)

  • l2_degraded (bool): L2 tier unavailable (default: False)

  • l3_degraded (bool): L3 tier unavailable (default: False)

Computed Properties

property overall_hit_rate: float

Overall hit rate across all tiers (0.0 to 1.0).

Uses max of per-tier misses to avoid triple-counting misses that cascade through L1→L2→L3 lookups.

property total_size: int

Total cached entries across all tiers.

Methods

to_metrics_dict() dict[str, float]

Convert to flat metrics dict for monitoring systems.

Returns:

Dictionary with metric names and float values

Example:

metrics = stats.to_metrics_dict()
# {
#     "cache.l1.hit_rate": 0.85,
#     "cache.l2.hit_rate": 0.60,
#     "cache.l3.hit_rate": 0.40,
#     "cache.overall.hit_rate": 0.75,
#     "cache.total_size": 1500.0,
#     "cache.promotions": 250.0,
#     "cache.demotions": 150.0,
#     "cache.l1.size": 1000.0,
#     "cache.l2.size": 450.0,
#     "cache.l3.size": 50.0
# }

TierStatistics

Statistics for a single cache tier.

class TierStatistics

Pydantic model that tracks hits, misses, evictions by reason, and computes hit rate.

Example:

from proxywhirl.cache import TierStatistics

stats = TierStatistics(hits=100, misses=20)
print(f"Hit rate: {stats.hit_rate:.2%}")  # 83.33%
print(f"Total evictions: {stats.total_evictions}")

Fields

  • hits (int): Cache hits (≥0, default: 0)

  • misses (int): Cache misses (≥0, default: 0)

  • current_size (int): Current number of entries (≥0, default: 0)

  • evictions_lru (int): LRU evictions (≥0, default: 0)

  • evictions_ttl (int): TTL-based evictions (≥0, default: 0)

  • evictions_health (int): Health-based evictions (≥0, default: 0)

  • evictions_corruption (int): Corruption-based evictions (≥0, default: 0)

Computed Properties

property hit_rate: float

Cache hit rate (0.0 to 1.0).

Formula:

hits / (hits + misses) if total > 0, else 0.0

property total_evictions: int

Total evictions across all reasons.

Formula:

evictions_lru + evictions_ttl + evictions_health + evictions_corruption


HealthStatus (Enum)

Proxy health status for cache entries (imported from proxywhirl.models).

class HealthStatus

String enum representing proxy health status with 5 states.

Values:

  • UNKNOWN = "unknown" - Not yet tested (default)

  • HEALTHY = "healthy" - Working normally

  • DEGRADED = "degraded" - Partial functionality (some failures)

  • UNHEALTHY = "unhealthy" - Experiencing issues (many failures)

  • DEAD = "dead" - Not responding (completely unusable)

Example:

from proxywhirl.cache import HealthStatus

status = HealthStatus.HEALTHY
print(status.value)  # "healthy"

# All 5 states are available
for state in HealthStatus:
    print(f"{state.name}: {state.value}")

CacheTierType (Enum)

Type of cache tier.

class CacheTierType

String enum representing cache tier types.

Values:

  • L1 = "l1" - Memory tier

  • L2 = "l2" - Disk tier

  • L3 = "l3" - SQLite tier

Example:

from proxywhirl.cache import CacheTierType

tier = CacheTierType.L1
print(tier.value)  # "l1"

L2BackendType (Enum)

L2 cache backend type selection.

class L2BackendType

String enum for selecting the L2 disk cache storage backend.

Values:

  • JSONL = "jsonl" - File-based JSONL with sharding (default, best for <10K entries)

  • SQLITE = "sqlite" - SQLite database (faster for >10K entries)

Example:

from proxywhirl.cache import CacheConfig, L2BackendType

# Default JSONL backend
config = CacheConfig()
assert config.l2_backend == L2BackendType.JSONL

# SQLite backend for large caches
config = CacheConfig(l2_backend=L2BackendType.SQLITE)

When to use each backend:

Backend

Best For

Performance

Features

JSONL

<10K entries

O(n) lookups

Human-readable, portable, simple debugging

SQLite

>10K entries

O(log n) lookups

Indexed queries, faster batch operations


Tier Implementations

CacheTier (Abstract Base Class)

Abstract base class for cache tier implementations.

class CacheTier

Defines the interface that all cache tiers (L1, L2, L3) must implement, including graceful degradation on repeated failures.

Attributes:

  • config (CacheTierConfig) - Configuration for this tier

  • tier_type (TierType) - Type of tier (L1/L2/L3)

  • enabled (bool) - Whether tier is operational

  • failure_count (int) - Consecutive failures for degradation tracking

  • failure_threshold (int) - Failures before auto-disabling tier (default: 3)

Constructor

__init__(config: CacheTierConfig, tier_type: TierType) None

Initialize cache tier with configuration.

Parameters:
  • config – Configuration for this tier

  • tier_type – Type of tier (L1/L2/L3)

Abstract Methods

abstractmethod get(key: str) CacheEntry | None

Retrieve entry by key, None if not found or expired.

Parameters:

key – Cache key to lookup

Returns:

CacheEntry if found and valid, None otherwise

abstractmethod put(key: str, entry: CacheEntry) bool

Store entry, return True if successful.

Parameters:
  • key – Cache key for entry

  • entry – CacheEntry to store

Returns:

True if stored successfully, False otherwise

abstractmethod delete(key: str) bool

Remove entry by key, return True if existed.

Parameters:

key – Cache key to delete

Returns:

True if entry existed and was deleted, False if not found

abstractmethod clear() int

Clear all entries, return count of removed entries.

Returns:

Number of entries removed

abstractmethod size() int

Return current number of entries.

Returns:

Number of entries in tier

abstractmethod keys() list[str]

Return list of all keys.

Returns:

List of cache keys

abstractmethod cleanup_expired() int

Remove all expired entries in bulk.

Returns:

Number of entries removed

Concrete Methods

handle_failure(error: Exception) None

Handle tier operation failure for graceful degradation.

Increments failure count and disables tier if threshold exceeded. Called by implementations when operations fail.

Parameters:

error – Exception that occurred

reset_failures() None

Reset failure count on successful operation.

Re-enables tier if previously disabled and resets failure counter. Implementations should call this after successful operations.


MemoryCacheTier

L1 in-memory cache using OrderedDict for LRU tracking.

class MemoryCacheTier(CacheTier)

Provides O(1) lookups with automatic LRU eviction when max_entries exceeded.

Example:

from proxywhirl.cache.tiers import MemoryCacheTier, TierType
from proxywhirl.cache import CacheTierConfig

config = CacheTierConfig(max_entries=1000, eviction_policy="lru")
tier = MemoryCacheTier(config, TierType.L1_MEMORY)

# Store entry
tier.put(key, entry)

# Retrieve entry (moves to end for LRU)
cached = tier.get(key)

# Delete entry
deleted = tier.delete(key)

# Get all keys
keys = tier.keys()

# Get size
size = tier.size()

# Clear all
cleared = tier.clear()

# Cleanup expired
removed = tier.cleanup_expired()

Constructor

__init__(config: CacheTierConfig, tier_type: TierType, on_evict: Callable[[str, CacheEntry], None] | None = None) None

Initialize memory cache with LRU tracking.

Parameters:
  • config – Tier configuration

  • tier_type – Type of tier (L1/L2/L3)

  • on_evict – Optional callback when entry is evicted (key, entry)

Features

  • O(1) lookups

  • Automatic LRU eviction when max_entries exceeded

  • Thread-safe with failure tracking

  • No persistence

  • Callbacks on eviction for demotion to L2


JsonlCacheTier

L2 file-based cache using sharded JSONL files with encryption.

class JsonlCacheTier(CacheTier)

File-based cache tier using JSON Lines format with consistent-hash sharding. Best for <10K entries. Human-readable, portable, and git-friendly.

Uses sharded JSONL files with:

  • Consistent hash sharding (default 16 shards)

  • In-memory index for O(1) key→shard lookups

  • File locking (portalocker) for concurrent access safety

  • Fernet encryption for credentials at rest

  • Human-readable JSON Lines format

Example:

from proxywhirl.cache.tiers import JsonlCacheTier, TierType
from proxywhirl.cache import CacheTierConfig, CredentialEncryptor
from pathlib import Path

config = CacheTierConfig(max_entries=5000, eviction_policy="lru")
encryptor = CredentialEncryptor()
cache_dir = Path(".cache/proxies")

tier = JsonlCacheTier(
    config=config,
    tier_type=TierType.L2_FILE,
    cache_dir=cache_dir,
    encryptor=encryptor,
    num_shards=16  # Default
)

# Store entry (writes to appropriate shard file)
tier.put(key, entry)

# Retrieve entry (uses in-memory index for O(1) shard lookup)
cached = tier.get(key)

# Delete entry
deleted = tier.delete(key)

# Get all keys (from index)
keys = tier.keys()

# Get size
size = tier.size()

# Clear all (removes all shard files)
cleared = tier.clear()

# Cleanup expired entries
removed = tier.cleanup_expired()

Constructor

__init__(config: CacheTierConfig, tier_type: TierType, cache_dir: Path, encryptor: CredentialEncryptor | None = None, num_shards: int = 16) None

Initialize JSONL file cache with sharding and encryption.

Parameters:
  • config – Tier configuration

  • tier_type – Type of tier (L1/L2/L3)

  • cache_dir – Directory for shard files

  • encryptor – Optional encryptor for credentials

  • num_shards – Number of shard files (default: 16)

File Structure

.cache/proxies/
├── shard_00.jsonl
├── shard_01.jsonl
├── ...
└── shard_15.jsonl

Each shard file contains JSON Lines entries:

{"key": "abc123", "proxy_url": "http://proxy:8080", "source": "free-proxy-list", "ttl_seconds": 3600, ...}
{"key": "def456", "proxy_url": "socks5://proxy:1080", "source": "geonode", "ttl_seconds": 7200, ...}

Features

  • Human-readable JSON Lines format

  • Portable (can copy/move files)

  • Git-friendly for version control

  • Consistent-hash sharding for distribution

  • In-memory index for fast lookups

  • File locking for concurrent access

  • Encrypted credentials at rest

  • Best for <10K entries

When to Use JSONL vs SQLite

Factor

JSONL (JsonlCacheTier)

SQLite (DiskCacheTier)

Entry count

<10K entries

>10K entries

Lookup speed

O(n) per shard

O(log n) indexed

Portability

Copy files anywhere

Single .db file

Git-friendly

Yes

Not recommended

Human-readable

Yes

No (binary)

Concurrent writes

File locking

WAL mode


DiskCacheTier

L2 SQLite-based cache with encryption and indexed lookups.

class DiskCacheTier(CacheTier)

Optimized for >10K entries using SQLite with B-tree indexes instead of JSONL. Provides O(log n) lookups vs O(n) for JSONL, achieving <10ms reads for 10K+ entries.

Uses a lightweight SQLite database with:

  • Primary key index on cache key for fast lookups

  • Encrypted credentials stored as BLOB

  • Efficient bulk operations (cleanup, size, keys)

  • File-based persistence without complex sharding

Example:

from proxywhirl.cache.tiers import DiskCacheTier, TierType
from proxywhirl.cache import CacheTierConfig, CredentialEncryptor
from pathlib import Path

config = CacheTierConfig(max_entries=5000, eviction_policy="lru")
encryptor = CredentialEncryptor()
cache_dir = Path(".cache/proxies")

tier = DiskCacheTier(config, TierType.L2_FILE, cache_dir, encryptor)

# Same interface as MemoryCacheTier
tier.put(key, entry)
cached = tier.get(key)

Constructor

__init__(config: CacheTierConfig, tier_type: TierType, cache_dir: Path, encryptor: CredentialEncryptor | None = None) None

Initialize SQLite-based L2 cache.

Parameters:
  • config – Tier configuration

  • tier_type – Type of tier (should be L2_FILE)

  • cache_dir – Directory for cache database

  • encryptor – Credential encryptor for username/password

Methods

migrate_from_jsonl(jsonl_dir: Path | None = None) int

Migrate existing JSONL shard files to SQLite L2 cache.

This method provides a migration path from the old JSONL-based L2 cache to the new SQLite-based implementation. It reads all shard_*.jsonl files from the specified directory and imports them into the SQLite database.

Parameters:

jsonl_dir – Directory containing shard_*.jsonl files (defaults to self.cache_dir)

Returns:

Number of entries successfully migrated

Example:

tier = DiskCacheTier(config, TierType.L2_FILE, cache_dir)
migrated = tier.migrate_from_jsonl()
print(f"Migrated {migrated} entries from JSONL to SQLite")
close() None

Close the persistent SQLite connection and release database resources.

Should be called when the cache tier is no longer needed to properly release database resources and file locks. Safe to call multiple times. Thread-safe via internal lock.

Example:

tier = DiskCacheTier(config, TierType.L2_FILE, cache_dir, encryptor)
try:
    tier.put(key, entry)
    cached = tier.get(key)
finally:
    tier.close()

Features

  • O(log n) indexed lookups using SQLite B-tree

  • Encrypted credential storage (BLOB fields)

  • Atomic operations with SQLite transactions

  • Efficient bulk cleanup using SQL DELETE

  • Simple file-based persistence (single .db file)

  • Automatic schema initialization

Database Schema

CREATE TABLE l2_cache (
    key TEXT PRIMARY KEY,
    proxy_url TEXT NOT NULL,
    username_encrypted BLOB,
    password_encrypted BLOB,
    source TEXT NOT NULL,
    fetch_time REAL NOT NULL,
    last_accessed REAL NOT NULL,
    access_count INTEGER DEFAULT 0,
    ttl_seconds INTEGER NOT NULL,
    expires_at REAL NOT NULL,
    health_status TEXT DEFAULT 'unknown',
    failure_count INTEGER DEFAULT 0,
    evicted_from_l1 INTEGER DEFAULT 0
);

CREATE INDEX idx_l2_expires_at ON l2_cache(expires_at);
CREATE INDEX idx_l2_source ON l2_cache(source);

SQLiteCacheTier

L3 SQLite database cache with encrypted credentials and health history.

class SQLiteCacheTier(CacheTier)

Provides durable persistence with SQL indexing for fast lookups and comprehensive health history tracking.

Example:

from proxywhirl.cache.tiers import SQLiteCacheTier, TierType
from proxywhirl.cache import CacheTierConfig, CredentialEncryptor
from pathlib import Path

config = CacheTierConfig(max_entries=None, eviction_policy="lru")  # Unlimited
encryptor = CredentialEncryptor()
db_path = Path(".cache/db/proxywhirl.db")

tier = SQLiteCacheTier(config, TierType.L3_SQLITE, db_path, encryptor)

# Same interface as other tiers
tier.put(key, entry)
cached = tier.get(key)

# Optimized bulk cleanup with SQL DELETE
removed = tier.cleanup_expired()  # O(1) instead of O(n)

Constructor

__init__(config: CacheTierConfig, tier_type: TierType, db_path: Path, encryptor: CredentialEncryptor | None = None) None

Initialize SQLite cache.

Parameters:
  • config – Tier configuration

  • tier_type – Type of tier (should be L3_SQLITE)

  • db_path – Path to SQLite database file

  • encryptor – Credential encryptor for username/password

Features

  • Full persistence

  • SQL indexing for fast lookups

  • Health history tracking with separate table

  • Automatic schema migration

  • Optimized bulk cleanup (O(1) using SQL DELETE)

  • Credential encryption with BLOB storage

  • Foreign key constraints

Database Schema

CREATE TABLE cache_entries (
    key TEXT PRIMARY KEY,
    proxy_url TEXT NOT NULL,
    username_encrypted BLOB,
    password_encrypted BLOB,
    source TEXT NOT NULL,
    fetch_time REAL NOT NULL,
    last_accessed REAL NOT NULL,
    access_count INTEGER DEFAULT 0,
    ttl_seconds INTEGER NOT NULL,
    expires_at REAL NOT NULL,
    health_status TEXT DEFAULT 'unknown',
    failure_count INTEGER DEFAULT 0,
    created_at REAL NOT NULL,
    updated_at REAL NOT NULL,
    -- Health monitoring fields
    last_health_check REAL,
    consecutive_health_failures INTEGER DEFAULT 0,
    consecutive_health_successes INTEGER DEFAULT 0,
    recovery_attempt INTEGER DEFAULT 0,
    next_check_time REAL,
    last_health_error TEXT,
    total_health_checks INTEGER DEFAULT 0,
    total_health_check_failures INTEGER DEFAULT 0,
    evicted_from_l1 INTEGER DEFAULT 0
);

CREATE TABLE health_history (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    proxy_key TEXT NOT NULL,
    check_time REAL NOT NULL,
    status TEXT NOT NULL,
    response_time_ms REAL,
    error_message TEXT,
    check_url TEXT NOT NULL,
    FOREIGN KEY (proxy_key) REFERENCES cache_entries(key) ON DELETE CASCADE
);

-- Indexes
CREATE INDEX idx_expires_at ON cache_entries(expires_at);
CREATE INDEX idx_source ON cache_entries(source);
CREATE INDEX idx_health_status ON cache_entries(health_status);
CREATE INDEX idx_last_accessed ON cache_entries(last_accessed);
CREATE INDEX idx_health_history_proxy ON health_history(proxy_key);
CREATE INDEX idx_health_history_time ON health_history(check_time);

Utilities

CredentialEncryptor

Warning

If no encryption key is provided and the PROXYWHIRL_CACHE_ENCRYPTION_KEY environment variable is not set, a new key is generated automatically. This means cached data encrypted with a previous key will be unreadable. Always persist your encryption key for production use.

Handles encryption/decryption of proxy credentials using Fernet symmetric encryption (AES-128-CBC + HMAC). Supports key rotation via MultiFernet: set PROXYWHIRL_CACHE_KEY_PREVIOUS to the old key when rotating, allowing decryption of data encrypted with either key while new encryptions use the current key.

class CredentialEncryptor

Provides Fernet symmetric encryption for proxy credentials at rest (L2/L3 tiers). Uses environment variable PROXYWHIRL_CACHE_ENCRYPTION_KEY for key management.

Example:

from proxywhirl.cache import CredentialEncryptor
from pydantic import SecretStr
import os

# Option 1: Use environment variable
os.environ["PROXYWHIRL_CACHE_ENCRYPTION_KEY"] = "your-32-byte-url-safe-base64-key"
encryptor = CredentialEncryptor()

# Option 2: Provide key directly
from cryptography.fernet import Fernet
key = Fernet.generate_key()
encryptor = CredentialEncryptor(key=key)

# Encrypt credentials
plaintext = SecretStr("mypassword")
encrypted = encryptor.encrypt(plaintext)  # bytes

# Decrypt credentials
decrypted = encryptor.decrypt(encrypted)  # SecretStr
print(decrypted.get_secret_value())  # "mypassword"

Constructor

__init__(key: bytes | None = None) None

Initialize encryptor with Fernet key.

Parameters:

key – Optional Fernet key (32 url-safe base64-encoded bytes). If None, reads from PROXYWHIRL_CACHE_ENCRYPTION_KEY env var. If env var not set, generates a new key (WARNING: regenerated keys cannot decrypt existing cached data).

Raises:

ValueError – If provided key is invalid for Fernet

Attributes:

  • key (bytes) - Fernet encryption key

  • _cipher (Fernet) - Fernet cipher instance

Methods

encrypt(secret: SecretStr) bytes

Encrypt a SecretStr to bytes.

Parameters:

secret – SecretStr containing plaintext to encrypt

Returns:

Encrypted bytes suitable for storage in BLOB fields

Raises:

ValueError – If encryption fails

Example:

encrypted = encryptor.encrypt(SecretStr("password123"))
# b'gAAAAA...'
decrypt(encrypted: bytes) SecretStr

Decrypt encrypted bytes back to SecretStr.

Parameters:

encrypted – Encrypted bytes from storage

Returns:

SecretStr containing decrypted plaintext (never logs value)

Raises:

ValueError – If decryption fails (wrong key, corrupted data)

Example:

decrypted = encryptor.decrypt(encrypted_bytes)
print(decrypted.get_secret_value())  # "password123"

CacheManager

Main orchestrator for multi-tier proxy caching with automatic promotion/demotion, TTL management, and health-based invalidation.

class CacheManager

Manages caching across three tiers:

  • L1 (Memory): Fast in-memory cache using OrderedDict (LRU)

  • L2 (Disk): Persistent cache with configurable backend (JSONL or SQLite)

  • L3 (SQLite): Database cache for cold storage with full queryability

Supports TTL-based expiration, health-based invalidation, and graceful degradation when tiers fail. Thread-safe via threading.RLock.

Example:

from proxywhirl.cache import CacheManager, CacheConfig, CacheEntry, HealthStatus
from datetime import datetime, timezone, timedelta

config = CacheConfig()
manager = CacheManager(config)

# Store an entry
entry = CacheEntry(
    key="abc123",
    proxy_url="http://proxy.example.com:8080",
    source="api",
    fetch_time=datetime.now(timezone.utc),
    last_accessed=datetime.now(timezone.utc),
    ttl_seconds=3600,
    expires_at=datetime.now(timezone.utc) + timedelta(seconds=3600),
    health_status=HealthStatus.HEALTHY
)
manager.put(entry.key, entry)

# Retrieve (promotes to higher tiers on hit)
retrieved = manager.get(entry.key)

# Delete from all tiers
manager.delete(entry.key)

# Statistics
stats = manager.get_statistics()
print(f"Overall hit rate: {stats.overall_hit_rate:.2%}")

# Export/import
manager.export_to_file("proxies.jsonl")
manager.warm_from_file("proxies.jsonl", ttl_override=3600)

Constructor

__init__(config: CacheConfig) None

Initialize cache manager with configuration.

Parameters:

config – Cache configuration with tier settings (required)

Initializes L1 (memory), L2 (disk), and L3 (SQLite) tiers based on config. Starts background TTL cleanup if enable_background_cleanup is True.

Methods

get(key: str) CacheEntry | None

Retrieve entry from cache with tier promotion.

Checks L1 → L2 → L3 in order. Promotes entries to higher tiers on hit. Updates access_count and last_accessed on successful retrieval. Expired entries are automatically deleted from all tiers.

Parameters:

key – Cache key to retrieve

Returns:

CacheEntry if found and not expired, None otherwise

put(key: str, entry: CacheEntry) bool

Store entry in all enabled tiers.

Writes to all tiers for redundancy. Credentials are automatically redacted in logs.

Parameters:
  • key – Cache key

  • entry – CacheEntry to store

Returns:

True if stored in at least one tier, False otherwise

delete(key: str) bool

Delete entry from all tiers.

Parameters:

key – Cache key to delete

Returns:

True if deleted from at least one tier, False if not found

clear() int

Clear all entries from all tiers.

Returns:

Total number of entries cleared

invalidate_by_health(key: str) None

Mark proxy as unhealthy and evict if failure threshold reached.

Increments the failure_count and sets health_status to UNHEALTHY. If failure_count reaches the configured failure_threshold, the proxy is removed from all cache tiers.

Parameters:

key – Cache key to invalidate

get_statistics() CacheStatistics

Get current cache statistics.

Returns:

CacheStatistics with hit rates, sizes, and tier degradation status

export_to_file(filepath: str) dict[str, int]

Export all cache entries to a JSONL file.

Parameters:

filepath – Path to export file

Returns:

Dict with exported and failed counts

warm_from_file(file_path: str, ttl_override: int | None = None) dict[str, int]

Load proxies from a file to pre-populate the cache.

Supports JSON (array), JSONL (newline-delimited), and CSV formats. Invalid entries are skipped with warnings logged.

Parameters:
  • file_path – Path to file containing proxy data

  • ttl_override – Optional TTL in seconds (overrides default_ttl_seconds)

Returns:

Dict with loaded, skipped, and failed counts

static generate_cache_key(proxy_url: str) str

Generate cache key from proxy URL using SHA256 hash.

Parameters:

proxy_url – Proxy URL to hash

Returns:

Hex-encoded SHA256 hash (first 16 chars)


Crypto Utilities

The proxywhirl.cache.crypto module provides helper functions for encryption key management and rotation.

from proxywhirl.cache.crypto import get_encryption_keys, create_multi_fernet, rotate_key

get_encryption_keys() -> list[bytes]

Get all valid encryption keys for MultiFernet. Returns keys in priority order: current key first, then previous key. Reads from PROXYWHIRL_CACHE_ENCRYPTION_KEY and PROXYWHIRL_CACHE_KEY_PREVIOUS environment variables. Generates a new key if no env vars are set.

create_multi_fernet() -> MultiFernet

Create a MultiFernet instance with all valid encryption keys. MultiFernet tries keys in order for decryption (newest first). All new encryptions use the first (current) key.

rotate_key(new_key: str) -> None

Rotate encryption keys by setting a new current key. Moves the current PROXYWHIRL_CACHE_ENCRYPTION_KEY to PROXYWHIRL_CACHE_KEY_PREVIOUS and sets the new key as current. This allows gradual migration: new data uses the new key, old data can still be decrypted with the previous key.

from cryptography.fernet import Fernet
from proxywhirl.cache.crypto import rotate_key

# Generate new key and rotate
new_key = Fernet.generate_key().decode()
rotate_key(new_key)
# Old data remains readable via PROXYWHIRL_CACHE_KEY_PREVIOUS

TTLManager

Manages TTL-based expiration with hybrid lazy + background cleanup. Used internally by CacheManager when enable_background_cleanup=True.

class TTLManager

Combines two cleanup strategies:

  • Lazy expiration: Check TTL on every get() operation

  • Background cleanup: Periodic scan of all tiers to remove expired entries

Example:

from proxywhirl.cache.manager import TTLManager, CacheManager
from proxywhirl.cache import CacheConfig

config = CacheConfig(enable_background_cleanup=False)
manager = CacheManager(config)

# Manually create and start TTL manager
ttl_mgr = TTLManager(manager, cleanup_interval=60)
ttl_mgr.start()

# ... later ...
ttl_mgr.stop()

Constructor

__init__(cache_manager: CacheManager, cleanup_interval: int = 60) None
Parameters:
  • cache_manager – Parent CacheManager instance

  • cleanup_interval – Seconds between cleanup runs (default: 60)

Methods

start() None

Start background cleanup thread. Idempotent.

stop() None

Stop background cleanup thread. Safe to call if not running.

Attributes

  • enabled (bool): Whether background cleanup is running

  • cleanup_interval (int): Seconds between cleanup runs


Usage Examples

Working with Cache Tiers Directly

from proxywhirl.cache.tiers import MemoryCacheTier, DiskCacheTier, SQLiteCacheTier, TierType
from proxywhirl.cache import CacheTierConfig, CacheEntry, CredentialEncryptor, HealthStatus
from datetime import datetime, timezone, timedelta
from pathlib import Path
from pydantic import SecretStr

# Initialize tiers
config = CacheTierConfig(max_entries=1000, eviction_policy="lru")
encryptor = CredentialEncryptor()

l1 = MemoryCacheTier(config, TierType.L1_MEMORY)
l2 = DiskCacheTier(config, TierType.L2_FILE, Path(".cache/l2"), encryptor)
l3 = SQLiteCacheTier(config, TierType.L3_SQLITE, Path(".cache/l3.db"), encryptor)

# Create entry
entry = CacheEntry(
    key="proxy1",
    proxy_url="http://proxy.example.com:8080",
    username=SecretStr("user"),
    password=SecretStr("pass"),
    source="api",
    fetch_time=datetime.now(timezone.utc),
    last_accessed=datetime.now(timezone.utc),
    ttl_seconds=3600,
    expires_at=datetime.now(timezone.utc) + timedelta(seconds=3600),
    health_status=HealthStatus.HEALTHY
)

# Store in L1
l1.put(entry.key, entry)

# Retrieve from L1 (O(1) lookup)
cached = l1.get(entry.key)
if cached:
    print(f"L1 hit: {cached.proxy_url}")

# Store in L2 (persisted to disk)
l2.put(entry.key, entry)

# Retrieve from L2 (O(log n) SQLite lookup)
cached = l2.get(entry.key)
if cached:
    print(f"L2 hit: {cached.proxy_url}")

# Store in L3 (full database persistence)
l3.put(entry.key, entry)

# Cleanup expired entries
removed_l1 = l1.cleanup_expired()
removed_l2 = l2.cleanup_expired()
removed_l3 = l3.cleanup_expired()
print(f"Removed: L1={removed_l1}, L2={removed_l2}, L3={removed_l3}")

Encryption and Security

from proxywhirl.cache import CredentialEncryptor
from cryptography.fernet import Fernet
from pydantic import SecretStr
import os

# Generate and save encryption key
key = Fernet.generate_key()
os.environ["PROXYWHIRL_CACHE_ENCRYPTION_KEY"] = key.decode()

# Initialize encryptor
encryptor = CredentialEncryptor()

# Encrypt credentials
username = SecretStr("admin")
password = SecretStr("secret123")

encrypted_user = encryptor.encrypt(username)
encrypted_pass = encryptor.encrypt(password)

print(f"Encrypted username: {encrypted_user.hex()}")
print(f"Encrypted password: {encrypted_pass.hex()}")

# Decrypt credentials
decrypted_user = encryptor.decrypt(encrypted_user)
decrypted_pass = encryptor.decrypt(encrypted_pass)

print(f"Decrypted: {decrypted_user.get_secret_value()}")  # "admin"
# Password value never logged by SecretStr

Tip

If you have more than 10,000 cache entries, migrating from JSONL to SQLite L2 backend can significantly improve lookup performance (O(log n) vs O(n)).

Migration from JSONL to SQLite L2

from proxywhirl.cache.tiers import DiskCacheTier, TierType
from proxywhirl.cache import CacheTierConfig, CredentialEncryptor
from pathlib import Path

# Initialize new SQLite-based L2 tier
config = CacheTierConfig(max_entries=5000)
encryptor = CredentialEncryptor()
cache_dir = Path(".cache/proxies")

tier = DiskCacheTier(config, TierType.L2_FILE, cache_dir, encryptor)

# Migrate from old JSONL shards
migrated = tier.migrate_from_jsonl()
print(f"Successfully migrated {migrated} entries from JSONL to SQLite")

# Old JSONL files can now be safely removed
# for shard in cache_dir.glob("shard_*.jsonl"):
#     shard.unlink()

Performance Considerations

Tier Selection

L1 (Memory):

  • Fastest (O(1) lookup)

  • Limited capacity (default: 1000 entries)

  • Use for hot proxies

L2 (Disk/SQLite):

  • Medium speed (O(log n) indexed lookup)

  • Moderate capacity (default: 5000 entries)

  • Persistent across restarts

  • Use for warm proxies

L3 (SQLite):

  • Slower (database overhead, but indexed)

  • Unlimited capacity

  • Full health history tracking

  • Use for cold storage and analytics

Optimization Tips

  1. Tune tier sizes based on workload

  2. Enable background cleanup to avoid lazy cleanup overhead

  3. Use encryption for sensitive credentials in L2/L3

  4. Monitor failure rates for graceful degradation

  5. Leverage indexes in L2/L3 for fast queries


Thread Safety

All tier implementations use internal locking for thread-safe operations. The CacheTier base class provides handle_failure() and reset_failures() methods for graceful degradation tracking.


Error Handling

Tiers implement graceful degradation:

  • After 3 consecutive failures, tier auto-disables (enabled = False)

  • Successful operations reset failure counter

  • Operations on disabled tiers return failure without attempting

  • Parent cache manager can detect degraded tiers via tier.enabled


See Also