LLM-Powered Recommendation Platform

I built a near-production-grade AI shopping assistant fully deployed on Google Cloud Platform, combining Retrieval-Augmented Generation (RAG), BigQuery vector search (ScaNN), and Gemini 2.0 Flash to deliver semantic product recommendations and sentiment-aware review summaries from millions of Amazon reviews.

The system addresses critical limitations of traditional approaches: keyword-based search fails on natural language queries, and collaborative filtering collapses on the cold-start problem. By grounding every response in dense semantic retrieval and LLM-synthesized insights, the system handles queries like "gentle face wash for sensitive skin" even when products are described as "mild cleanser for delicate complexions."

0.88
MRR
0.85
nDCG@3
88%
Sentiment Accuracy
+21%
nDCG@3 vs Baseline
571M
Amazon Reviews
PySparkBigQueryScaNN (AVQ + TreeAH) Vertex AI text-embedding-005Gemini 2.0 Flash LangChainFastAPIReact / Next.js TerraformCloud RunGCS

System Architecture

Data Pipeline

Starting from the Amazon Reviews 2023 dataset (McAuley Lab, UCSD) — 571.54M review rows, 54.51M product metadata records across three categories (All_Beauty, Software, Baby_Products):

PySpark ETL (typed schemas, null imputation, string normalization)
Partitioned Parquet → GCS (by main_category)
BigQuery warehouse
Vertex AI text-embedding-005 (768-dim, normalized)
BigQuery Vector Index (Product + Review)

Backend — Request Flow

User query
Generate query embedding (Vertex AI)
BigQuery ANN search (ScaNN)
LangChain context assembly
Gemini 2.0 Flash → JSON response
React ProductCard

FastAPI backend with auth middleware (API-key validation), CORS, and rate limiting. The LLM generates structured JSON with product_title, features (Markdown bullets), sentiment_summary, and recommendation_rationale — prompt-engineered for consistent rendering.

GCP Infrastructure (Terraform IaC)

Cloud Run
FastAPI backend — auto-scaling, containerized
Cloud Functions
On-demand embedding generation
BigQuery
Structured storage + ScaNN vector indexes
Cloud Storage
Parquet data lake (partitioned by category)
Vertex AI
text-embedding-005 + Gemini 2.0 Flash serving
Cloud Monitoring
Health dashboards and alerting

Technical Deep Dive: ScaNN Vector Search

BigQuery's vector search backend uses ScaNN (Scalable Nearest Neighbors) — Google's production-grade ANN library — combining two innovations:

Anisotropic Vector Quantization (AVQ)

Traditional Product Quantization minimizes reconstruction error uniformly. AVQ applies a learned diagonal scaling matrix S to transform vectors before quantization: v' = Sv. This rescaling prioritizes dimensions that matter most for cosine similarity, reducing false-negative retrievals. AVQ consistently outperforms standard PQ in recall at equivalent compression ratios.

Tree-AH (Asymmetric Hashing with Clustering)

Vectors are partitioned into a hierarchical k-means tree. At query time, only the most relevant partitions are explored: d'(q,v) = d(q, Cᵢ) + d_quantized(q,v) where Cᵢ is the cluster centroid. Result: index construction O(N log N), query search O(log N).

The hybrid retrieval approach merges semantic vector similarity with metadata filtering (category, price range, average rating) — relevance ranking that is both semantically grounded and structured-data-aware. This is what drove the +21% nDCG@3 improvement over pure vector search.

Evaluation Results

Ground-truth evaluation on 500 test queries spanning specific product, comparison, gift recommendation, and open-ended discovery types:

MetricOur SystemBaseline (vector-only)Industry Benchmark
MRR0.880.76~0.80
nDCG@30.850.70~0.78
Category Coverage92%84%~88%
Avg. Search Latency~6s~1s~2s

Performance by query type:

Query TypeMRRnDCG@3Feature Relevance
Specific product queries0.930.914.5/5
Comparison queries0.890.874.4/5
Gift recommendations0.830.794.1/5
Open-ended discovery0.780.743.9/5

RAG quality on 200 sampled recommendations: Extraction Precision 87% · Extraction Recall 82% · Feature Relevance to Query 4.3/5

Sentiment analysis on 500 labeled reviews: Classification Accuracy 88% · Review Selection Relevance 4.2/5 · Review Insight Quality 4.1/5

+21% nDCG@3 improvement (0.85 vs. 0.70 baseline) attributable to review-integrated relevance reranking. Latency overhead (~5s from Gemini API call) is acceptable given the substantial quality gain. Future mitigation: async processing and semantic response caching.

Frontend — React + Next.js

  • Chat interface with minimized/maximized states; maximized view shows search history and settings navigation.
  • Queries dispatched via Axios POST to /search; responses render as dynamic ProductCard components.
  • ProductCard.tsx: ReactMarkdown for bullet-formatted feature extraction, useMemo for client-side rating calculations, useEffect for async image fetching via ASIN-based image scraper API route.

Future Work

  1. Long-tail categories (~15% lower retrieval accuracy) — category-specific embedding fine-tuning.
  2. Multimodal retrieval — integrate Vertex AI Vision image embeddings for aesthetic/design queries.
  3. Mixed-sentiment handling (~73% accuracy on praise+criticism in same passage) — aspect-based sentiment analysis (ABSA) with span-level labels.
  4. Latency — async Gemini calls + semantic caching to reduce P95 from ~6s to <2s.