LLM-Powered Recommendation Platform

I built a near-production-grade AI shopping assistant fully deployed on Google Cloud Platform, combining Retrieval-Augmented Generation (RAG), BigQuery vector search (ScaNN), and Gemini 2.0 Flash to deliver semantic product recommendations and sentiment-aware review summaries from millions of Amazon reviews.

The system addresses critical limitations of traditional approaches: keyword-based search fails on natural language queries, and collaborative filtering collapses on the cold-start problem. By grounding every response in dense semantic retrieval and LLM-synthesized insights, the system handles queries like "gentle face wash for sensitive skin" even when products are described as "mild cleanser for delicate complexions."

0.88

MRR

0.85

nDCG@3

88%

Sentiment Accuracy

+21%

nDCG@3 vs Baseline

571M

Amazon Reviews

PySparkBigQueryScaNN (AVQ + TreeAH) Vertex AI text-embedding-005Gemini 2.0 Flash LangChainFastAPIReact / Next.js TerraformCloud RunGCS

System Architecture

Data Pipeline

Starting from the Amazon Reviews 2023 dataset (McAuley Lab, UCSD) — 571.54M review rows, 54.51M product metadata records across three categories (All_Beauty, Software, Baby_Products):

PySpark ETL (typed schemas, null imputation, string normalization)

→

Partitioned Parquet → GCS (by main_category)

→

BigQuery warehouse

→

Vertex AI text-embedding-005 (768-dim, normalized)

→

BigQuery Vector Index (Product + Review)

Backend — Request Flow

User query

→

Generate query embedding (Vertex AI)

→

BigQuery ANN search (ScaNN)

→

LangChain context assembly

→

Gemini 2.0 Flash → JSON response

→

React ProductCard

FastAPI backend with auth middleware (API-key validation), CORS, and rate limiting. The LLM generates structured JSON with product_title, features (Markdown bullets), sentiment_summary, and recommendation_rationale — prompt-engineered for consistent rendering.

GCP Infrastructure (Terraform IaC)

Cloud Run

FastAPI backend — auto-scaling, containerized

Cloud Functions

On-demand embedding generation

BigQuery

Structured storage + ScaNN vector indexes

Cloud Storage

Parquet data lake (partitioned by category)

Vertex AI

text-embedding-005 + Gemini 2.0 Flash serving

Cloud Monitoring

Health dashboards and alerting

Technical Deep Dive: ScaNN Vector Search

BigQuery's vector search backend uses ScaNN (Scalable Nearest Neighbors) — Google's production-grade ANN library — combining two innovations:

Anisotropic Vector Quantization (AVQ)

Traditional Product Quantization minimizes reconstruction error uniformly. AVQ applies a learned diagonal scaling matrix S to transform vectors before quantization: v' = Sv. This rescaling prioritizes dimensions that matter most for cosine similarity, reducing false-negative retrievals. AVQ consistently outperforms standard PQ in recall at equivalent compression ratios.

Tree-AH (Asymmetric Hashing with Clustering)

Vectors are partitioned into a hierarchical k-means tree. At query time, only the most relevant partitions are explored: d'(q,v) = d(q, Cᵢ) + d_quantized(q,v) where Cᵢ is the cluster centroid. Result: index construction O(N log N), query search O(log N).

The hybrid retrieval approach merges semantic vector similarity with metadata filtering (category, price range, average rating) — relevance ranking that is both semantically grounded and structured-data-aware. This is what drove the +21% nDCG@3 improvement over pure vector search.

Evaluation Results

Ground-truth evaluation on 500 test queries spanning specific product, comparison, gift recommendation, and open-ended discovery types:

Metric	Our System	Baseline (vector-only)	Industry Benchmark
MRR	0.88	0.76	~0.80
nDCG@3	0.85	0.70	~0.78
Category Coverage	92%	84%	~88%
Avg. Search Latency	~6s	~1s	~2s

Performance by query type:

Query Type	MRR	nDCG@3	Feature Relevance
Specific product queries	0.93	0.91	4.5/5
Comparison queries	0.89	0.87	4.4/5
Gift recommendations	0.83	0.79	4.1/5
Open-ended discovery	0.78	0.74	3.9/5

RAG quality on 200 sampled recommendations: Extraction Precision 87% · Extraction Recall 82% · Feature Relevance to Query 4.3/5

Sentiment analysis on 500 labeled reviews: Classification Accuracy 88% · Review Selection Relevance 4.2/5 · Review Insight Quality 4.1/5

+21% nDCG@3 improvement (0.85 vs. 0.70 baseline) attributable to review-integrated relevance reranking. Latency overhead (~5s from Gemini API call) is acceptable given the substantial quality gain. Future mitigation: async processing and semantic response caching.

Frontend — React + Next.js

Chat interface with minimized/maximized states; maximized view shows search history and settings navigation.
Queries dispatched via Axios POST to /search; responses render as dynamic ProductCard components.
ProductCard.tsx: ReactMarkdown for bullet-formatted feature extraction, useMemo for client-side rating calculations, useEffect for async image fetching via ASIN-based image scraper API route.

Future Work

Long-tail categories (~15% lower retrieval accuracy) — category-specific embedding fine-tuning.
Multimodal retrieval — integrate Vertex AI Vision image embeddings for aesthetic/design queries.
Mixed-sentiment handling (~73% accuracy on praise+criticism in same passage) — aspect-based sentiment analysis (ABSA) with span-level labels.
Latency — async Gemini calls + semantic caching to reduce P95 from ~6s to <2s.

Share on

Twitter Facebook LinkedIn

Yupeng Tang