Data Quality Scoring for ML Pipelines
Data Preparation & Feature Engineering DS practice problem on Onlearn.
Difficulty: medium.
Topics: Understanding Data Quality Scoring for ML Pipelines, Missing Value Imputation Strategy, Z-Score Normalization, Duplicate Record Filtering, Distribution Shift Detection, Threshold-based Alerting, Data Engineering, Machine Learning Lifecycle, Statistical Analysis, Software Testing, Data Governance, Data Validation, Feature Health Monitoring, Anomaly Detection, Pipeline Observability, Data Integrity.
Implement a function 'calculate data quality score' that evaluates a numerical dataset. The score is calculated as a float between 0 and 1, where 1 is perfect. The score should penalize: 1) Missing values (nulls) by 20%, 2) Duplicate rows by 30%, and 3) Outliers (values 3 standard deviations from the mean) by 50%. If the dataset is empty, return 0.0.