TTFT ITL and TPS from a Token Timestamp Stream

LLM Inference & Memory Systems DS practice problem on Onlearn.

Difficulty: easy.

Topics: TTFT ITL and TPS from a Token Timestamp Stream, Time to First Token (TTFT), Inter-Token Latency (ITL), Tokens Per Second (TPS), KV Cache Management, Request Batching, LLM Inference Systems, Performance Engineering, Distributed Systems, Time Series Analysis, Software Telemetry, Latency Metrics, Throughput Optimization, Token Generation Pipelines, Resource Scheduling, System Observability.

When serving Large Language Models (LLMs), monitoring inference latency is crucial for maintaining quality of service. Three key metrics used to evaluate real time token generation performance are: TTFT (Time To First Token): The elapsed time from when a request is submitted until the first output token is produced. This reflects the model's prefill/prompt processing latency. ITL (Inter Token Latency): The average time gap between consecutive generated tokens (excluding the initial prefill). This measures how smoothly and quickly the model decodes subsequent tokens. TPS (Tokens Per Second): The overall output throughput, measuring how many tokens are produced per unit time from the user's perspective (from request submission to the last token). Given a list of timestamps where the first element is the request start time and all subsequent elements are the times at which each output token was generated, implement a function that computes these three inference metrics. Your function should: 1. Accept a list of floats representing timestamps (minimum length 2) 2. Return a dictionary with keys 'ttft', 'tps', and 'itl' containing the corresponding metric values as floats 3. Handle the edge case where only a single token is generated (set ITL to 0.0 in that case)