Real-Time Search Engine
Search is one of the most complex distributed systems problems — combining text processing, relevance ranking, low-latency retrieval, and index management into a system that users expect to respond instantly. In this challenge, you will design a real-time search engine on AWS that indexes millions of documents with sub-second index freshness and sub-100ms query latency. The indexing pipeline starts with document ingestion from S3 and API sources, processed by Lambda functions that extract text, normalize content, and generate both sparse (BM25) and dense (vector) representations. Amazon OpenSearch Service provides the search backend with a cluster architecture designed for the workload: dedicated master nodes for cluster stability, hot data nodes with gp3 EBS for active indices, and UltraWarm nodes for time-series data older than 30 days. Index design uses time-based indices with aliases for zero-downtime reindexing, and a custom analyzer chain for multi-language support including tokenization, stemming, synonym expansion, and stop-word removal. The query processing pipeline uses a Lambda function that parses the user query, expands it with synonyms, runs hybrid search (combining BM25 relevance with vector similarity for semantic understanding), applies business rules for boosting and filtering, and returns results with highlighted snippets. Auto-complete uses OpenSearch's completion suggester with a separate lightweight index updated in real-time via Kinesis Data Streams. Search analytics capture every query and click-through event in Kinesis Data Firehose, feeding a relevance feedback loop that improves ranking over time. The architecture includes index lifecycle management policies that automatically migrate old indices to warm storage, and a circuit breaker that gracefully degrades search quality under extreme load rather than failing entirely. This challenge teaches search engine architecture, index design, relevance tuning, and the operational patterns for running OpenSearch at scale.
AWS Services You'll Use
Challenge Details
- Path
- Data-Intensive Systems
- Difficulty
- Advanced
- Duration
- 75 min
- Plan
- Pro
Architecture Patterns You'll Learn
Why This Challenge?
Unlike whiteboard exercises or multiple-choice quizzes, this challenge requires you to design a real architecture with actual AWS services, evaluate trade-offs, and defend your decisions. Our automated validators check your design against production-grade criteria. Complete it and it shows up in your verified portfolio with your architecture diagram and design rationale.
More from Data-Intensive Systems
Real-Time Analytics Dashboard
Design an analytics platform that processes billions of events and renders dashboards with sub-second freshness.
Advanced · 80 minIoT Data Ingestion Pipeline
Design a pipeline that ingests, processes, and analyzes sensor data from thousands of IoT devices.
Advanced · 75 minCollaborative Document Editor
Design a real-time collaborative editor where multiple users edit the same document simultaneously.
Advanced · 80 minReady to design this for real?
Get the full scenario, design your architecture using real AWS services, and validate against production-grade criteria. Your completed challenge shows up in your verified portfolio.
Start Challenge