Data Quality Scoring for ML Pipelines

Data Preparation & Feature Engineering DS practice problem on Onlearn.

Difficulty: medium.

Topics: Understanding Data Quality Scoring for ML Pipelines, Missing Value Imputation Strategy, Z-Score Normalization, Duplicate Record Filtering, Distribution Shift Detection, Threshold-based Alerting, Data Engineering, Machine Learning Lifecycle, Statistical Analysis, Software Testing, Data Governance, Data Validation, Feature Health Monitoring, Anomaly Detection, Pipeline Observability, Data Integrity.

Implement a function 'calculate data quality score' that evaluates a numerical dataset. The score is calculated as a float between 0 and 1, where 1 is perfect. The score should penalize: 1) Missing values (nulls) by 20%, 2) Duplicate rows by 30%, and 3) Outliers (values 3 standard deviations from the mean) by 50%. If the dataset is empty, return 0.0.

dsData Preparation & Feature Engineering

Tutor

Waking the tutor…

Data Preparation & Feature Engineering

0 of 21 solved

Back to roadmap