Advanced75 min

Real-Time Search Engine

Search is one of the most complex distributed systems problems — combining text processing, relevance ranking, low-latency retrieval, and index management into a system that users expect to respond instantly. In this challenge, you will design a real-time search engine on AWS that indexes millions of documents with sub-second index freshness and sub-100ms query latency. The indexing pipeline starts with document ingestion from S3 and API sources, processed by Lambda functions that extract text, normalize content, and generate both sparse (BM25) and dense (vector) representations. Amazon OpenSearch Service provides the search backend with a cluster architecture designed for the workload: dedicated master nodes for cluster stability, hot data nodes with gp3 EBS for active indices, and UltraWarm nodes for time-series data older than 30 days. Index design uses time-based indices with aliases for zero-downtime reindexing, and a custom analyzer chain for multi-language support including tokenization, stemming, synonym expansion, and stop-word removal. The query processing pipeline uses a Lambda function that parses the user query, expands it with synonyms, runs hybrid search (combining BM25 relevance with vector similarity for semantic understanding), applies business rules for boosting and filtering, and returns results with highlighted snippets. Auto-complete uses OpenSearch's completion suggester with a separate lightweight index updated in real-time via Kinesis Data Streams. Search analytics capture every query and click-through event in Kinesis Data Firehose, feeding a relevance feedback loop that improves ranking over time. The architecture includes index lifecycle management policies that automatically migrate old indices to warm storage, and a circuit breaker that gracefully degrades search quality under extreme load rather than failing entirely. This challenge teaches search engine architecture, index design, relevance tuning, and the operational patterns for running OpenSearch at scale.

AWS Services You'll Use

OpenSearch ServiceLambdaKinesis Data StreamsKinesis Data FirehoseS3CloudWatch

Challenge Details

Path
Data-Intensive Systems
Difficulty
Advanced
Duration
75 min
Plan
Pro

Architecture Patterns You'll Learn

inverted indexhybrid searchindex lifecycle managementcircuit breakertime-based indexing

Why This Challenge?

Unlike whiteboard exercises or multiple-choice quizzes, this challenge requires you to design a real architecture with actual AWS services, evaluate trade-offs, and defend your decisions. Our automated validators check your design against production-grade criteria. Complete it and it shows up in your verified portfolio with your architecture diagram and design rationale.

Ready to design this for real?

Get the full scenario, design your architecture using real AWS services, and validate against production-grade criteria. Your completed challenge shows up in your verified portfolio.

Start Challenge