ML Model Serving Platform
Serving machine learning models in production with consistent low latency, high availability, and the ability to safely roll out new model versions is a core MLOps challenge that most tutorials gloss over. In this challenge, you will design a model serving platform on AWS SageMaker that handles real-time inference, batch predictions, and near-real-time streaming inference for multiple ML models across different teams. The real-time inference tier uses SageMaker endpoints with auto-scaling based on invocations per instance and model latency P99 metrics. You will design a multi-model endpoint architecture where a single endpoint hosts multiple models (reducing cost) with intelligent routing based on the request payload. For safe model deployment, the architecture implements SageMaker's deployment guardrails: canary deployments shift 10% of traffic to the new model version, monitor key metrics (latency, error rate, data drift) for 30 minutes, then automatically promote or rollback. A/B testing uses SageMaker's production variant feature to split traffic between model versions with statistical significance tracking. The batch prediction tier uses SageMaker Batch Transform for overnight scoring jobs, with S3 input/output and Step Functions orchestration for the ETL-predict-load workflow. The platform includes a model registry in SageMaker Model Registry with approval workflows — data scientists register model artifacts, automated quality gates check accuracy thresholds, and engineering leads approve production deployments. Feature consistency is ensured by sharing a SageMaker Feature Store between training and serving pipelines, eliminating training-serving skew. Observability covers model performance monitoring using SageMaker Model Monitor for data drift detection, custom CloudWatch metrics for business KPIs, and automated retraining triggers when model accuracy degrades below threshold. This challenge teaches model serving architecture, safe deployment strategies for ML, and the operational patterns required for reliable ML in production.
AWS Services You'll Use
Challenge Details
- Path
- AI/ML Infrastructure
- Difficulty
- Advanced
- Duration
- 70 min
- Plan
- Pro
Architecture Patterns You'll Learn
Why This Challenge?
Unlike whiteboard exercises or multiple-choice quizzes, this challenge requires you to design a real architecture with actual AWS services, evaluate trade-offs, and defend your decisions. Our automated validators check your design against production-grade criteria. Complete it and it shows up in your verified portfolio with your architecture diagram and design rationale.
More from AI/ML Infrastructure
RAG Pipeline Architecture
Design a Retrieval-Augmented Generation pipeline that grounds LLM responses in enterprise knowledge bases.
Advanced · 70 minMulti-Agent Orchestration
Design a multi-agent system where specialized AI agents collaborate to solve complex tasks.
Advanced · 75 minAI Gateway Security Layer
Design a security gateway that enforces responsible AI policies, rate limits, and content filtering for LLM APIs.
Advanced · 65 minReady to design this for real?
Get the full scenario, design your architecture using real AWS services, and validate against production-grade criteria. Your completed challenge shows up in your verified portfolio.
Start Challenge