A reusable, end-to-end MLOps framework automating the full model lifecycle from data versioning to production deployment.
This project implements a complete, reusable MLOps framework designed to eliminate manual steps in the ML model lifecycle. From raw data ingestion to production deployment, every step is automated, versioned, and reproducible.
DVC (Data Version Control) manages datasets and model artifacts, ensuring any experiment can be reproduced exactly. MLflow handles experiment tracking, model comparison, and the model registry. GitHub Actions drives the CI/CD pipeline — running tests, retraining models on schedule, and deploying new versions automatically.
The framework is container-first: all services run in Docker, making the pipeline portable across local machines, CI runners, and cloud instances. Jenkins is used for advanced pipeline orchestration in enterprise environments.
Setting up the repository structure, DVC remote configuration, and MLflow tracking server.
Building DVC stages for data download, validation with Great Expectations, and feature engineering.
Automated training, evaluation, and model comparison stages fully managed by DVC.
Logging hyperparameters, metrics, confusion matrices, and feature importance plots automatically.
Writing workflows that trigger on push: run tests, execute DVC pipeline, register model if improved.
Building production Docker images and configuring Jenkins for scheduled retraining and approval gates.