LLM-Powered Recommendation Platform
I built a near-production-grade AI shopping assistant fully deployed on Google Cloud Platform, combining Retrieval-Augmented Generation (RAG), BigQuery vector search (ScaNN), and Gemini 2.0 Flash to deliver semantic product recommendations and sentiment-aware review summaries from millions of Amazon reviews.
The system addresses critical limitations of traditional approaches: keyword-based search fails on natural language queries, and collaborative filtering collapses on the cold-start problem. By grounding every response in dense semantic retrieval and LLM-synthesized insights, the system handles queries like "gentle face wash for sensitive skin" even when products are described as "mild cleanser for delicate complexions."
System Architecture
Data Pipeline
Starting from the Amazon Reviews 2023 dataset (McAuley Lab, UCSD) — 571.54M review rows, 54.51M product metadata records across three categories (All_Beauty, Software, Baby_Products):
Backend — Request Flow
FastAPI backend with auth middleware (API-key validation), CORS, and rate limiting. The LLM generates structured JSON with product_title, features (Markdown bullets), sentiment_summary, and recommendation_rationale — prompt-engineered for consistent rendering.
GCP Infrastructure (Terraform IaC)
Technical Deep Dive: ScaNN Vector Search
BigQuery's vector search backend uses ScaNN (Scalable Nearest Neighbors) — Google's production-grade ANN library — combining two innovations:
Anisotropic Vector Quantization (AVQ)
Traditional Product Quantization minimizes reconstruction error uniformly. AVQ applies a learned diagonal scaling matrix S to transform vectors before quantization: v' = Sv. This rescaling prioritizes dimensions that matter most for cosine similarity, reducing false-negative retrievals. AVQ consistently outperforms standard PQ in recall at equivalent compression ratios.
Tree-AH (Asymmetric Hashing with Clustering)
Vectors are partitioned into a hierarchical k-means tree. At query time, only the most relevant partitions are explored: d'(q,v) = d(q, Cᵢ) + d_quantized(q,v) where Cᵢ is the cluster centroid. Result: index construction O(N log N), query search O(log N).
Evaluation Results
Ground-truth evaluation on 500 test queries spanning specific product, comparison, gift recommendation, and open-ended discovery types:
| Metric | Our System | Baseline (vector-only) | Industry Benchmark |
|---|---|---|---|
| MRR | 0.88 | 0.76 | ~0.80 |
| nDCG@3 | 0.85 | 0.70 | ~0.78 |
| Category Coverage | 92% | 84% | ~88% |
| Avg. Search Latency | ~6s | ~1s | ~2s |
Performance by query type:
| Query Type | MRR | nDCG@3 | Feature Relevance |
|---|---|---|---|
| Specific product queries | 0.93 | 0.91 | 4.5/5 |
| Comparison queries | 0.89 | 0.87 | 4.4/5 |
| Gift recommendations | 0.83 | 0.79 | 4.1/5 |
| Open-ended discovery | 0.78 | 0.74 | 3.9/5 |
RAG quality on 200 sampled recommendations: Extraction Precision 87% · Extraction Recall 82% · Feature Relevance to Query 4.3/5
Sentiment analysis on 500 labeled reviews: Classification Accuracy 88% · Review Selection Relevance 4.2/5 · Review Insight Quality 4.1/5
Frontend — React + Next.js
- Chat interface with minimized/maximized states; maximized view shows search history and settings navigation.
- Queries dispatched via Axios POST to
/search; responses render as dynamic ProductCard components. ProductCard.tsx: ReactMarkdown for bullet-formatted feature extraction,useMemofor client-side rating calculations,useEffectfor async image fetching via ASIN-based image scraper API route.
Future Work
- Long-tail categories (~15% lower retrieval accuracy) — category-specific embedding fine-tuning.
- Multimodal retrieval — integrate Vertex AI Vision image embeddings for aesthetic/design queries.
- Mixed-sentiment handling (~73% accuracy on praise+criticism in same passage) — aspect-based sentiment analysis (ABSA) with span-level labels.
- Latency — async Gemini calls + semantic caching to reduce P95 from ~6s to <2s.
