Candidate Generation is a critical pattern in large-scale recommendation systems that efficiently identifies a subset of potentially relevant items from a massive corpus (often billions of items) for further processing. This pattern serves as the first stage in multi-stage recommendation systems, focusing on high recall rather than precision to ensure potentially relevant items aren't missed before more computationally intensive ranking stages. The key insight is that generating a manageable set of candidates (typically hundreds to thousands) makes it feasible to apply more sophisticated, computationally expensive ranking models downstream. Without effective candidate generation, it would be impossible to perform real-time recommendations at scale due to the prohibitive cost of evaluating all possible items. ## Pattern Structure A well-designed candidate generation system typically consists of four main components: ### 1. Multiple Retrieval Sources - **Purpose**: Generate initial candidate pools using varied methods - **Common sources**: - Embedding-based retrieval (similarity search) - Graph-based retrieval (connections, co-engagement) - Query-based retrieval (search, semantic matching) - Popularity-based retrieval (trending, top items) - Personalization-based retrieval (user history, preferences) - Exploration-focused retrieval (diversity, novelty) ### 2. Candidate Scoring & Filtering - **Purpose**: Efficiently score and filter candidates from each source - **Components**: - Lightweight scoring models - Business rule filters - Diversity enforcers - Deduplication logic - Negative feedback filters ### 3. Candidate Blending & Merging - **Purpose**: Combine candidates from multiple sources - **Approaches**: - Round-robin selection - Score-based mixing - Adaptive blending - Quota allocation by source - Learning-to-blend models ### 4. Candidate Set Output - **Purpose**: Create final candidate set for downstream ranking - **Considerations**: - Size calibration - Diversity guarantees - Feature attachment - Scoring signals forwarding - Performance metrics collection ## Implementation Details ### Retrieval Source Comparison |Retrieval Source|Strengths|Weaknesses|When to Use| |---|---|---|---| |Embedding-based|- Captures semantic similarity<br>- Works for cold items<br>- Scalable with ANN|- Quality depends on embeddings<br>- May require frequent updates<br>- Computationally intensive|- Content-based recommendation<br>- Items with rich features<br>- Large, diverse catalogs| |Graph-based|- Captures social/interaction patterns<br>- Strong for social products<br>- Often highly relevant|- Requires social graph/interactions<br>- May reinforce filter bubbles<br>- Cold start issues|- Social networks<br>- Products with co-engagement<br>- Community-based platforms| |Query/Search-based|- High precision for explicit intents<br>- Leverages user's current context<br>- Good for task completion|- Requires query/intent signal<br>- May have limited coverage<br>- Can be over-specific|- Search interfaces<br>- Shopping experiences<br>- Knowledge-seeking contexts| |Popularity-based|- Simple to implement<br>- Always has candidates<br>- Good for cold users|- Not personalized<br>- Popularity bias<br>- May surface obvious items|- New users<br>- Trending sections<br>- Fallback mechanism| |History-based|- Highly personalized<br>- Strong engagement signals<br>- Contextually relevant|- Cold start for new users<br>- Can create echo chambers<br>- Requires interaction history|- Return users<br>- Content with recurring interest<br>- Engagement-focused systems| ### Candidate Set Size Considerations |Stage|Typical Size|Considerations| |---|---|---| |Individual Source|100-1,000 per source|- Enough diversity within source<br>- Quality vs. quantity tradeoff<br>- Source-specific tuning| |After Merging|500-5,000|- Coverage of different item types<br>- Computational budget for next stage<br>- Quality distribution across sources| |Final Output|500-2,000|- Downstream ranking capacity<br>- Response time requirements<br>- Coverage guarantees| ### Scoring Model Approaches |Approach|Description|Advantages|Disadvantages| |---|---|---|---| |Lightweight ML|Simple models (linear, shallow trees)|- Fast inference<br>- Easy to maintain<br>- Low resource requirements|- Limited expressiveness<br>- May miss complex patterns| |Heuristic Scoring|Rules-based scoring functions|- No training required<br>- Easily interpretable<br>- Predictable behavior|- Requires domain expertise<br>- Difficult to optimize<br>- Less personalized| |Embedding Distance|Similarity metrics between embeddings|- Efficient with ANN<br>- Works with limited signals<br>- Naturally handles content-based similarity|- Quality depends on embeddings<br>- May require calibration<br>- Harder to incorporate business logic| |Pre-computed Scores|Scores calculated offline and stored|- Extremely fast at serving time<br>- Can use complex models offline<br>- Consistent scoring|- Stale scores<br>- Storage overhead<br>- Limited real-time signals| ## Real-World Scenarios ### Multiple Retrieval Sources - **Social Graph-based**: Content from connections, groups, pages followed - **Embedding-based**: Similar content to previously engaged items using existing embedding infrastructure - **Interest-based**: Content matching user's inferred interests and affinities - **Exploration**: Novel content to discover new user interests - **Trending**: Popular content currently gaining engagement ### Scoring and Filtering - Lightweight ML models score each candidate's likelihood of engagement - Integrity filters remove policy-violating content - Personalized filters remove content types users have shown negative signals for - Diversity mechanisms ensure variety in content types ### Candidate Blending - Adaptive blending based on user's historical source engagement - Guaranteed quotas for important sources (e.g., close connections) - Time-dependent mixing (e.g., more exploration on weekends) - Different blending strategies for different surfaces (Feed vs. Explore) ### Output and Metrics - Final set of 1,000-2,000 candidates passed to ranking - Rich features attached for downstream ranking - Tracking of source diversity, recall metrics, and selection rate - A/B testing infrastructure to evaluate different generation strategies ## Key Tradeoffs and Decisions ### Recall vs. Efficiency |Approach|Description|When to Use| |---|---|---| |High Recall Focus|Generate larger candidate sets, more sources|- Quality-critical applications<br>- Where missing good items is costly<br>- When ranking capacity is high| |Efficiency Focus|Smaller candidate sets, fewer sources|- Latency-sensitive applications<br>- Mobile/resource-constrained environments<br>- Very high QPS services| |Adaptive Approach|Dynamically adjust recall based on context|- Mixed-use scenarios<br>- User-dependent quality requirements<br>- Systems with varying load patterns| ### Source Diversity vs. Quality |Emphasis|Implementation Approach|Considerations| |---|---|---| |Source Diversity|- Equal allocation across sources<br>- Guaranteed minimums per source<br>- Diversity-aware selection|- Better exploration<br>- Avoids source dominance<br>- May include lower-quality items| |Source Quality|- Quality-weighted allocation<br>- Dynamic source evaluation<br>- Elimination of underperforming sources|- Maximizes short-term engagement<br>- More predictable quality<br>- Risk of filter bubbles| ### Freshness vs. Stability |Approach|Description|Tradeoffs| |---|---|---| |Frequent Updates|Continuous or very frequent source updates|- Better freshness<br>- Captures trends quickly<br>- Higher computational cost| |Scheduled Updates|Regular but less frequent updates|- Predictable performance<br>- Lower resource usage<br>- Some staleness| |Hybrid Approach|Frequent updates for critical sources, scheduled for others|- Balances freshness and cost<br>- More complex implementation<br>- Good compromise for most systems| ## Case Studies from Research Papers ### 1. YouTube's Candidate Generation for Deep Neural Networks YouTube's seminal paper on their recommendation system describes a two-stage approach with dedicated candidate generation: - **Key innovations**: - User embedding based on watch history - Candidate generation uses collaborative filtering augmented with deep neural networks - Explicit handling of freshness and diversity - Multiple candidate sources combined with different weights - **Impact**: - Demonstrated the effectiveness of separating candidate generation from ranking - Showed how to effectively incorporate deep learning into the retrieval phase - Established patterns for large-scale video recommendation systems ### 2. Pinterest's Related Pin Recommendation Pinterest developed a multi-stage recommendation system where candidate generation plays a critical role: - **Key innovations**: - Utilizing both graph-based and embedding-based retrieval - Multi-stage candidate generation (broad retrieval → filtering → blending) - PinSage graph neural network for embedding generation - Importance of visual similarity in candidate selection - **Impact**: - Showed how to effectively combine multiple retrieval approaches - Demonstrated the value of specialized embeddings for specific content types - Highlighted the importance of visual understanding in content recommendation ## Common Pitfalls and Challenges ### 1. Source Dominance **Problem**: One source consistently outperforms others and begins to dominate the candidate pool, reducing diversity. **Solutions**: - Implement per-source quotas or guarantees - Use adaptive blending that considers historical performance - Apply diversity constraints at the final selection stage - Periodically evaluate and tune source performance ### 2. Cold Start Problems **Problem**: New users or items lack the signals needed for effective candidate generation. **Solutions**: - Develop specialized sources for cold start scenarios - Use content-based retrieval for new items - Implement exploration strategies for new users - Fall back to popularity-based retrieval when personalization signals are weak ### 3. Feedback Loops **Problem**: Sources that perform well get more exposure, generating more data, leading to even better performance. **Solutions**: - Implement exploration policies for underexposed sources - Use counterfactual learning to estimate performance of less-selected sources - Apply regularization to prevent overfitting to historical patterns - Periodically reset source performance metrics ### 4. Efficiency and Scaling **Problem**: As corpus size grows, efficient candidate generation becomes increasingly challenging. **Solutions**: - Implement sharding and partitioning strategies - Use approximate retrieval methods with configurable precision-recall tradeoffs - Develop caching mechanisms for common queries - Employ hierarchical retrieval for large-scale sources ## Implementation Best Practices ### 1. Source Design and Implementation - **Diversity in methods**: Implement sources using fundamentally different approaches - **Specialized sources**: Create dedicated sources for different user segments and contexts - **Modularity**: Design sources to be independently developed, deployed, and evaluated - **Fallback mechanisms**: Ensure robustness when specific sources fail or underperform ### 2. Scoring and Filtering - **Lightweight models**: Use simple, efficient models specifically designed for candidate scoring - **Feature selection**: Carefully select features that provide the most signal with minimal computation - **Early termination**: Implement mechanisms to skip detailed evaluation when possible - **Batched processing**: Score candidates in batches for computational efficiency ### 3. Candidate Blending - **Adaptive allocation**: Adjust source weightings based on historical performance - **Contextual blending**: Consider user context when determining source mix - **Quota-based approach**: Guarantee minimum representation for important sources - **Learning-based blending**: Train models to optimize the blending strategy ### 4. Monitoring and Evaluation - **Per-source metrics**: Track performance of individual sources - **Diversity metrics**: Measure diversity in the final candidate set - **Coverage analysis**: Ensure adequate coverage across item types and user segments - **A/B testing**: Regularly test improvements to candidate generation strategies ## Variants and Extensions ### 1. Query-Aware Candidate Generation Adapts candidate generation based on explicit query or contextual information: - **Query understanding**: Extract intent and entities from explicit queries - **Context injection**: Incorporate session context into retrieval - **Intent-specific sources**: Activate different sources based on inferred intent - **Hybrid retrieval**: Combine semantic and lexical matching for queries ### 2. Sequential Candidate Generation Incorporates sequential user behavior into the retrieval process: - **Session-based retrieval**: Generate candidates based on current session activity - **Sequential patterns**: Identify common patterns in user behavior sequences - **Next-best-action**: Predict and retrieve candidates for likely next actions - **Journey-aware retrieval**: Consider user's position in a longer journey or funnel ### 3. Multi-Objective Candidate Generation Generates candidates optimized for multiple, potentially competing objectives: - **Objective-specific sources**: Dedicated sources for different objectives - **Pareto-efficient selection**: Select candidates optimal across multiple dimensions - **Constrained generation**: Apply constraints based on secondary objectives - **Balanced pooling**: Ensure representation of candidates serving different objectives ### 4. Federated Candidate Generation Distributes the candidate generation process across multiple systems: - **Cross-silo retrieval**: Retrieve candidates from separate data stores or services - **Specialized engines**: Use purpose-built retrieval engines for different content types - **API-based integration**: Standardized interfaces for diverse retrieval systems - **Heterogeneous ranking**: Handle differently structured candidates from various sources ## Evaluation and Metrics ### Source-Level Metrics |Metric|Description|Importance| |---|---|---| |Recall|Percentage of relevant items retrieved by the source|Critical for ensuring good items aren't missed| |Precision|Percentage of retrieved items that are relevant|Important for efficient downstream processing| |Coverage|Percentage of item space that the source can retrieve from|Ensures comprehensive retrieval capabilities| |Diversity|Variety of items retrieved by the source|Prevents narrowing of recommendations| |Latency|Time taken to generate candidates|Critical for real-time applications| ### System-Level Metrics |Metric|Description|Target| |---|---|---| |Final Recall|Relevant items in final candidate set vs. all relevant items|Typically 80%+| |Source Contribution|Percentage of final recommendations from each source|Should be balanced unless intentionally skewed| |Throughput|Candidates generated per second|System-dependent, often thousands to millions QPS| |Resource Usage|CPU/memory/network resources consumed|Should be optimized for cost-efficiency| |End-to-End Latency|Total time from request to candidate set generation|Application-dependent, often <50ms| ### Quality Metrics |Metric|Description|Application| |---|---|---| |Selection Rate|Percentage of candidates selected by ranking stage|Indicates candidate quality| |Online Performance Lift|A/B test improvement from better candidates|Ultimate measure of system quality| |Novelty|Percentage of previously unseen items in candidate set|Important for discovery and exploration| |Personalization|Variation in candidate sets across different users|Indicates personalization effectiveness| |Cold Start Performance|Quality metrics for new users/items|Measures system adaptability| ## When to Use Candidate Generation This pattern is best suited for: - Large-scale recommendation systems with millions to billions of items - Multi-stage recommendation architectures - Systems with strict latency requirements - Applications where personalization is important - Use cases requiring a balance of relevance, diversity, and exploration It may be less appropriate for: - Small item catalogs where exhaustive evaluation is feasible - Simple recommendation scenarios with limited personalization - Batch recommendation systems without real-time constraints - Systems where extremely high precision is more important than recall ## Potential Interview Questions 1. **Conceptual Understanding** - Why is candidate generation necessary in large-scale recommendation systems? - How would you balance diversity and quality in a candidate generation system? - What are the key differences between retrieval-focused and ranking-focused models? - How would you handle the cold start problem in candidate generation? - Explain how feedback loops can affect candidate generation and how to mitigate them. 2. **System Design** - Design a candidate generation system for a social media feed with billions of potential items. - How would you implement efficient real-time candidate generation for a video recommendation platform? - What architecture would you use to combine candidates from multiple heterogeneous sources? - How would you design a candidate generation system that adapts to changing user preferences? - Describe how you would build a candidate generation system that optimizes for multiple objectives. 3. **Implementation Details** - What metrics would you use to evaluate and compare different candidate sources? - How would you implement an efficient blending strategy for multiple candidate sources? - What techniques can mitigate popularity bias in candidate generation? - How would you implement exploration in a candidate generation system? - What approaches can make candidate generation more computationally efficient? 4. **Specialized Scenarios** - How would candidate generation differ for short-form vs. long-form video content? - What special considerations would you have for candidate generation in a marketplace with buyers and sellers? - How would you adapt candidate generation for a system that needs to optimize for both engagement and revenue? - What approaches would you use for multilingual or cross-cultural candidate generation? - How would your candidate generation strategy change for mobile vs. desktop environments?