AI-Powered Product Recommendation System
Overview
Architected an AI-powered recommendation system that analyzes millions of Amazon product reviews to help users quickly discover the most relevant and high-quality items through semantic search and LLM-based understanding.
Key Technologies
- Data Processing: PySpark, BigQuery
- Machine Learning: Text-embedding-005, ScaNN, Google Gemini, LangChain
- Infrastructure: GCP (Cloud Run, BigQuery, Cloud Storage), Terraform, FastAPI
Achievements
- Developed a PySpark ETL pipeline to clean, tokenize, and embed reviews (768-dim via text-embedding-005)
- Designed hybrid retrieval engine improving nDCG@3 by +21% (0.85 vs 0.70) with MRR = 0.88
- Integrated RAG-based sentiment analysis achieving 88% accuracy and 4.3/5 relevance score
- Sustained ~6s query latency with 92% product-category coverage across 500 test queries
Technical Details
- Hybrid retrieval using ScaNN with approximate nearest neighbors (TreeAH + AVQ)
- Metadata filtering and reranking via FastAPI microservice
- Scalable infrastructure provisioned with Terraform on GCP
- Explainable recommendations through LLM-based feature summarization
