Prefix Cache Hit Rate Calculator

LLM Inference & Memory Systems DS practice problem on Onlearn.

Difficulty: medium.

Topics: Prefix Cache Hit Rate Calculator, Prefix Hashing, Cache Eviction Policies, Prompt Deduplication, Memory Fragmentation, Throughput Benchmarking, Distributed Systems, LLM Inference Optimization, Memory Management, Performance Engineering, Computational Complexity, KV Cache Management, Request Scheduling, Latency Modeling, Memory Hierarchy, Tokenization Strategies.

In large language model (LLM) serving systems, a prefix cache stores previously computed key value (KV) pairs for token sequences that have already been processed. When a new prompt shares a common prefix with a cached entry, the system can skip recomputing those tokens, significantly reducing latency and compute cost. Implement a function that calculates the prefix cache hit rate given a list of incoming tokenized prompts and a list of cached token prefixes. For each prompt, determine the longest cached prefix that exactly matches the beginning of that prompt. A cached prefix matches a prompt if the prompt starts with that exact sequence of token IDs. Sum up all matched (cached) tokens across all prompts and compute the hit rate as the ratio of cached tokens to total tokens. Input: prompts: A list of lists of integers, where each inner list represents a tokenized prompt. cache: A list of lists of integers, where each inner list represents a cached token prefix. Output: A dictionary with three keys: 'hit rate': The ratio of cached tokens to total tokens, rounded to 4 decimal places. 'cached tokens': The total number of tokens served from the cache (integer). 'total tokens': The total number of tokens across all prompts (integer). If there are no tokens (empty prompts list or all prompts are empty), hit rate should be 0.0.