AI Virtual Try-On System
As an Machine Learning Engineer intern at GMI Cloud, I developed an automated AI virtual try-on pipeline that enables photorealistic garment swaps on e-commerce model images using a diffusion-based image generation API. The system takes a model photograph and a garment image, and produces a result where the model is wearing the target garment — with natural fabric draping, body-consistent proportions, and preserved facial identity.
This is not a single API call wrapped in a script. It's a batch orchestration system built to handle real production workloads: 2 model images × 9 garment items = 18 combinations per run, with asynchronous task tracking, empirically-tuned polling intervals, multi-format image handling, retry logic, and fault isolation so one API timeout doesn't kill the remaining 17 jobs.
How Diffusion-Based Try-On Works
The underlying model is a diffusion-based virtual dressing system (v2) operating in three stages:
The model automatically segments the input image to identify body regions (head, torso, arms, hands, legs, feet), masks the target region, and fills it with the garment — conditioned on the reference garment image. The diffusion model learns to respect fabric texture, color, draping physics, and body pose. Unmasked regions (face, background) are seamlessly blended back.
Generation Parameters I Tuned
Garment type is configurable per-job: "upper", "lower", or "full". I built separate test workflows for each mode before running full-outfit batch sweeps.
Async Task Architecture
Because the diffusion model takes 20–30 seconds per image, the API uses an asynchronous task model (submit → task_id → poll → result) rather than synchronous request-response.
Task Submission
Model and garment images are base64-encoded in memory and sent as a single JSON payload. The server queues the diffusion job and immediately returns a task_id. No image data touches disk during this step — everything stays in-process until the result comes back.
Smart Polling Strategy
My first version polled every 5 seconds uniformly — ~7 wasted polls per task. The calibrated version cuts that to ~4 average polls to completion:
| Poll | Wait Before | Cumulative | Notes |
|---|---|---|---|
| 1 | 5s | ~5s | Early check — unlikely done |
| 2 | 8s | ~13s | Still too early |
| 3 | 10s | ~23s | Approaching typical completion |
| 4 | 8s | ~31s | ★ Usually catches completion here (~27s typical) |
| 5+ | 10s each | — | Fallback cadence until 300s timeout |
Each poll returns one of four states: "in_queue" / "generating" (continue) · "done" (extract URLs) · "not_found" / "expired" (log and skip).
Result Download with Retry
- Chunked streaming (8 KB chunks) — avoids loading large images entirely into memory
- 30-second timeout per download — handles slow CDN responses
- 3 retries with delay — transient network errors
- Descriptive filenames — encode model ID, garment index, result number for full QA traceability
Batch Orchestration
The batch processor iterates the full cross-product of model images × garment images:
Authentication & Security
I implemented two authentication approaches:
- Signature-based (SigV4) — low-level API; each request cryptographically signed with HMAC-SHA256 over request method, host, path, query parameters, headers, and body. Same protocol as AWS. Proved valuable when debugging a signing bug that only appeared with certain garment image sizes (content hash computation sensitive to base64 padding).
- SDK-based — production batch orchestrator uses the platform's official SDK, which handles signing internally and provides a higher-level interface (
submit_task/get_result). I implemented both: low-level first to understand the auth mechanism, SDK second for production reliability.
Results & Impact
The system successfully processed all 18 model × garment combinations in a single automated run, producing photorealistic try-on images with consistent facial identity and natural garment appearance. The orchestration eliminated what would otherwise be a tedious 2-hour manual process of submitting images one-by-one through a web interface.
The pipeline was later adapted for internal A/B testing of garment presentation quality — the ability to quickly generate try-on images across model-garment pairings let the team evaluate visual consistency at scale rather than one-off spot checks.
What I Learned
The most interesting engineering challenge wasn't the ML model — it was designing robust orchestration around an asynchronous API I didn't control. Unlike the inference server where I owned the entire stack, here the model was behind someone else's API with its own queue, its own rate limits, and its own failure modes. Learning to build reliable orchestration on top of a service with variable latency and no SLA guarantees was a skill I hadn't developed in any academic project.
