Advanced65 min

ML Feature Store

Training-serving skew — where the features used during model training differ from those available at inference time — is the most insidious source of ML model degradation in production. Feature stores solve this by providing a single source of truth for feature computation, storage, and serving. In this challenge, you will design a feature store architecture using Amazon SageMaker Feature Store that supports both batch feature ingestion for training and low-latency online feature retrieval for real-time inference. The offline store uses S3 with Parquet format and Glue Data Catalog integration, enabling Athena queries for training dataset construction with point-in-time correctness — crucial for preventing data leakage in time-series features. The online store uses SageMaker Feature Store's built-in low-latency storage backed by DynamoDB, providing single-digit-millisecond feature retrieval during inference. Feature ingestion runs on two paths: batch pipelines using Glue jobs that compute features from raw data in S3 and ingest them on a schedule, and streaming pipelines using Kinesis Data Streams with Lambda consumers that compute real-time features (like rolling averages, session counts) and ingest them immediately. You will design the feature group schema strategy — organizing features by entity (user features, product features, interaction features) with a consistent naming convention and version tracking. Feature transformations use SageMaker Processing jobs for batch and Lambda for streaming, with shared transformation code packaged as a Lambda layer to ensure consistency. The architecture includes a feature freshness monitoring system using CloudWatch metrics that track ingestion lag per feature group and alert when features become stale. Data quality validation runs on every batch ingestion using Great Expectations-style checks implemented in Lambda, blocking ingestion of features that fail schema or statistical distribution checks. Access control uses IAM policies scoped to feature groups, so ML teams can only read features relevant to their models. This challenge teaches feature store architecture, training-serving consistency, and the data engineering patterns that make ML systems reliable.

Start Challenge Back to AI/ML Infrastructure

AWS Services You'll Use

SageMaker Feature StoreS3GlueAthenaKinesis Data StreamsLambdaDynamoDBCloudWatch

Challenge Details

Path: AI/ML Infrastructure
Difficulty: Advanced
Duration: 65 min
Plan: Pro

Architecture Patterns You'll Learn

offline/online storepoint-in-time correctnessfeature freshness monitoringbatch/streaming dual ingestiondata quality validation

Why This Challenge?

Unlike whiteboard exercises or multiple-choice quizzes, this challenge requires you to design a real architecture with actual AWS services, evaluate trade-offs, and defend your decisions. Our automated validators check your design against production-grade criteria. Complete it and it shows up in your verified portfolio with your architecture diagram and design rationale.

Ready to design this for real?

Get the full scenario, design your architecture using real AWS services, and validate against production-grade criteria. Your completed challenge shows up in your verified portfolio.

Start Challenge

ML Feature Store

AWS Services You'll Use

Challenge Details

Architecture Patterns You'll Learn

Why This Challenge?

More from AI/ML Infrastructure

RAG Pipeline Architecture

Multi-Agent Orchestration

ML Model Serving Platform

Ready to design this for real?