Build a Simple ETL Pipeline (MLOps)
Data Pipelines, Monitoring & Reliability DS practice problem on Onlearn.
Difficulty: medium.
Topics: Understanding Build a Simple ETL Pipeline (MLOps), Data Extraction, Feature Transformation, Aggregation Logic, Input Validation, Schema Enforcement, MLOps, Data Engineering, Software Engineering, Distributed Systems, Database Management, ETL Pipelines, Data Preprocessing, Functional Programming, Data Serialization, Error Handling.
Problem Implement a simple ETL (Extract Transform Load) pipeline for model ready data preparation. Given a CSV like string containing user events with columns: user id,event type,value (header included), write a function run etl(csv text) that: 1. Extracts rows from the raw CSV text. 2. Transforms data by: Filtering only rows where event type == "purchase". Converting value to float and dropping invalid rows. Aggregating total purchase value per user id. 3. Loads the transformed results by returning a list of (user id, total value) tuples sorted by user id ascending. Assume small inputs (no external libs), handle extra whitespace, and ignore blank lines.