Feature Drift Detection System
End-to-end ML monitoring pipeline that detects distribution drift between reference and production data streams using Kolmogorov–Smirnov tests.
Overview
Production ML models degrade silently when the data they see drifts away from the data they were trained on. This project is a complete monitoring pipeline that compares reference datasets against live production streams and raises alerts when feature distributions shift.
Drift is measured feature-by-feature with the Kolmogorov–Smirnov statistical test, with configurable alert thresholds per feature. The pipeline is optimized for large datasets through vectorized NumPy/Pandas operations.
It includes modular preprocessing (imputation, scaling, encoding) and a visualization dashboard with histograms, density plots, and drift scores for feature-wise interpretation — plus automated retraining triggers when drift exceeds tolerance.