ML Pipeline DAG Scheduler with Critical Path Analysis

Data Pipelines, Monitoring & Reliability DS practice problem on Onlearn.

Difficulty: hard.

Topics: ML Pipeline DAG Scheduler with Critical Path Analysis, Critical Path Method, Topological Sorting, Backpressure Mechanisms, Task Dependency Resolution, Idempotency Constraints, Distributed Systems, Graph Theory, Operations Research, Software Engineering, Cloud Infrastructure, Task Scheduling Algorithms, Directed Acyclic Graph Topology, Resource Allocation Strategies, Pipeline Orchestration Patterns, Latency and Throughput Analysis.

In production MLOps systems, ML pipelines consist of multiple interdependent tasks (data loading, feature engineering, training, evaluation, etc.) organized as a Directed Acyclic Graph (DAG). Understanding task dependencies, execution order, and the critical path is essential for pipeline optimization and resource allocation. Given a list of pipeline tasks where each task has an ID, duration, and list of dependencies (task IDs that must complete before this task can start), implement a function analyze ml pipeline(tasks) that performs complete pipeline analysis. The function should compute: 1. Execution Order : A valid topological ordering of tasks (use alphabetical ordering when multiple tasks are available) 2. Earliest Start/Finish Times : For each task, the earliest possible start and finish times assuming tasks start as soon as dependencies complete 3. Latest Start/Finish Times : For each task, the latest times the task can start/finish without delaying the overall pipeline 4. Slack Time : For each task, the amount of time the task can be delayed without affecting the makespan 5. Critical Path : The sequence of tasks with zero slack that determines the minimum pipeline duration 6. Makespan : The total time required to complete all tasks If the input is empty, return a result with empty collections and makespan of 0. Assume the input DAG has no cycles.