Advanced75 min

IoT Data Ingestion Pipeline

Internet of Things deployments generate massive volumes of telemetry data from devices with intermittent connectivity, limited compute power, and strict latency requirements for alerting. In this challenge, you will design an IoT data ingestion pipeline on AWS that handles sensor data from 100,000+ devices reporting metrics every 10 seconds, with real-time anomaly detection and long-term trend analysis. The device connectivity layer uses AWS IoT Core with MQTT protocol, supporting persistent sessions for devices with intermittent connectivity — messages are queued and delivered when the device reconnects. Device provisioning uses IoT Core's fleet provisioning with X.509 certificates generated per device and stored in AWS IoT's certificate registry. IoT Core rules engine routes incoming telemetry to multiple destinations simultaneously: real-time processing via Lambda for immediate anomaly detection, Kinesis Data Streams for stream aggregation, and S3 via IoT Core's direct S3 action for raw data archival. The anomaly detection layer uses Lambda functions that compare incoming readings against device-specific baselines stored in DynamoDB, triggering SNS alerts when readings exceed statistical thresholds (z-score > 3). For sophisticated pattern detection, Kinesis Data Analytics applies tumbling window aggregations to detect trends like gradual temperature increases that individual readings would miss. The storage architecture uses a tiered approach: DynamoDB for the latest device state (the device shadow pattern), Timestream for time-series queries over the last 90 days, and S3 with Parquet partitioned by device type and date for historical analysis via Athena. The pipeline handles device clock drift using server-side timestamping at IoT Core, and manages back-pressure when a burst of devices reconnect simultaneously using Kinesis shard-level throttling. Device management includes over-the-air updates via IoT Core Jobs, device grouping using thing groups for fleet-wide configuration, and device lifecycle events for tracking provisioning, connection, and decommissioning. This challenge teaches IoT architecture patterns, time-series data management, and the unique challenges of building reliable systems for unreliable edge devices.

AWS Services You'll Use

IoT CoreLambdaKinesis Data StreamsDynamoDBTimestreamS3AthenaSNSKinesis Data Analytics

Challenge Details

Path
Data-Intensive Systems
Difficulty
Advanced
Duration
75 min
Plan
Pro

Architecture Patterns You'll Learn

device shadowMQTT persistent sessionsanomaly detectiontime-series tieringfleet provisioning

Why This Challenge?

Unlike whiteboard exercises or multiple-choice quizzes, this challenge requires you to design a real architecture with actual AWS services, evaluate trade-offs, and defend your decisions. Our automated validators check your design against production-grade criteria. Complete it and it shows up in your verified portfolio with your architecture diagram and design rationale.

Ready to design this for real?

Get the full scenario, design your architecture using real AWS services, and validate against production-grade criteria. Your completed challenge shows up in your verified portfolio.

Start Challenge