Urban Microclimate Forecasting System

This is a research project I led as part of the Georgia Tech VIP-SMUR (Vertically Integrated Projects — Sustainable Microclimate Urban Research) group. The goal: predict temperature and relative humidity at hyper-local resolution — not city-scale or even block-scale, but at roughly 6-meter spacing across the entire Georgia Tech campus (~3.5 km²), using nothing more than 16 weather stations and publicly available geospatial data.

0.43°C

Temp RMSE

0.9928

Temp R²

1.28%

RH RMSE

100,283

Grid Points

337,742

Model Parameters

PyTorchMulti-Scale LSTMMulti-Head Attention (4 heads) Gated Residual NetworksRegression Kriging PyKrigeGeoPandasOSMnx Random ForestSLURM / A100 GPU

Model Architecture

The architecture combines four components chosen specifically for the microclimate forecasting problem: adaptive feature gating, learned input weighting, parallel temporal processing at two scales, and multi-head attention for long-range dependencies.

Core Components

Gated Residual Networks (GRN)

Two FC layers → sigmoid gate → learned interpolation between transformed and original input. Stabilizes training, maintains gradient flow in deep networks. Layer normalization applied after gated sum.

Variable Selection Network

Time-averaged input → GRN → softmax feature weights → applied element-wise to every timestep. The model learns solar altitude matters far more than minute-of-hour — adaptively, per sample.

Multi-Head Self-Attention (4 heads)

After LSTM processing, models long-range temporal dependencies across the output sequence. Attends to specific historical events (e.g., temperature spike 6 hours ago) beyond what LSTM hidden state captures.

Learnable Position Encoding

Temporal embeddings providing the attention layer with sequence position context independent of the LSTM's hidden state.

Key Design Decisions (with Rationale)

1. Multi-scale parallel LSTM branches. Rather than a single sequential LSTM, we use two parallel branches: a short-term branch (1 LSTM layer, hidden_dim/2) and a long-term branch (2 LSTM layers, hidden_dim/2), concatenated back to the full dimension. Rationale: microclimate evolution has two overlapping timescales — rapid fluctuations (a cloud passing, a building shadow) on minutes, and diurnal cycles (sunrise heating, nighttime cooling) on hours. A single LSTM collapses both; parallel branches preserve them separately.

2. Single-step point prediction. Given the past 24 hours (144 timesteps × 10 min), predict one timestep ahead (+10 min). For longer forecasts, iteratively roll — feed prediction back as input. This avoids distribution mismatch beyond the training horizon and produces cleaner inputs for downstream Regression Kriging (no multi-horizon uncertainty propagation).

3. No station identifiers. No station IDs anywhere in the input. This forces the model to learn location-independent patterns (physics-based features), enabling generalization to unseen stations — validated by our held-out station protocol.

Input Features (12 dimensions, physics-aware)

Category	Features	Encoding
Weather observations	Temperature, Relative Humidity	Raw (autoregressive signal)
Solar geometry	Altitude angle, Azimuth angle	Sin/cos trigonometric
Hour of day	24-hour diurnal cycle	Sin/cos (period = 24h)
Day of year	365-day seasonal cycle	Sin/cos (period = 365d)
Minute	Sub-hourly resolution	Sin/cos (period = 60 min)

All temporal features encoded as sin/cos pairs to avoid discontinuities (hour 23 → hour 0). Solar angles encoded trigonometrically so the model receives physically meaningful sun-position signals rather than raw degree values.

Data Engineering

~947,000 warm-season observations (April–September 2015–2019) from 16 stations at 10-minute intervals. The preprocessing pipeline:

1. Warm-season filter

→

2. Physical constraint cleaning

→

3. Per-station segmentation (60-min gap threshold)

→

4. Resample + interpolate gaps ≤60 min

→

5. Sliding window extraction (L=144, no boundary crossing)

→

6. Chronological split (90/10 per station) + 1 station held out

Scaling: Features use RobustScaler (median-centered, IQR-normalized — resistant to sporadic outliers). Targets use MinMaxScaler (range [0,1]). Both fit on training data only.

Training: AdamW (β₁=0.9, β₂=0.999) + weight decay 1e-3, initial LR 1e-3, ReduceLROnPlateau (factor=0.5, patience=7), gradient clipping max_norm=1.0, early stopping patience=20. Xavier uniform init for linear layers; orthogonal init for LSTM (critical for preventing vanishing gradients in the 2-layer long-term branch). Trained in ~23 minutes on A100 GPU (Phoenix cluster).

Results

Held-Out Station Evaluation

Variable	RMSE	MAE	MAPE	R²	Max Error
Temperature	0.43°C	0.29°C	1.15%	0.9928	6.98°C
Relative Humidity	1.28%	0.90%	1.51%	0.9952	22.23%

Sub-0.5°C temperature RMSE and R² > 0.99 on a station the model has never seen during training. That kind of generalization only happens when the model is learning physics (solar-driven diurnal cycles, humidity-temperature coupling), not memorizing station-specific patterns.

Overfitting Analysis

Variable	Train RMSE	Test RMSE	Test/Train Ratio
Temperature	0.41°C	0.43°C	1.057 ✓
Relative Humidity	1.19%	1.28%	1.074 ✓

Both ratios below 1.10 — excellent generalization. Ratios below 1.10 = excellent; 1.10–1.20 = good; above 1.20 = potential overfitting.

Feature Importance (Permutation-Based)

Rank	Feature	RMSE Increase When Removed
1	Relative Humidity	+2789%
2	Temperature	+372%
3	Hour (sin)	+7.0%
4	Solar altitude (sin)	+6.5%
5	Hour (cos)	+2.4%
6	Solar azimuth (sin)	+2.0%

Removing solar angles alone degrades RMSE by 2–6% — confirming the model uses sun position to predict temperature evolution, not just correlations with time-of-day.

Geospatial Inference: Stations → Campus Maps

The model outputs predictions at 16 discrete station locations. To predict at every campus point, I use Regression Kriging (PyKrige) — combining Random Forest regression on spatial covariates with kriging of the residuals.

9 real GIS covariates from OpenStreetMap + elevation data:

Distance features: Distance to nearest building, park, library, parking lot, footway
Elevation features: Ground elevation, building height, total elevation
Shadow features: Monthly shadow ratio (selected by target timestamp's month — computed from building geometry and solar angles)

Model predicts at 16 stations

→

Random Forest regression on 9 GIS covariates

→

Kriging of residuals (spherical variogram)

→

100,283 grid point predictions

→

OSMnx campus boundary overlay

→

300 DPI publication maps

After kriging, Random Forest feature importances are extracted to analyze which spatial characteristics most influence microclimate variation — connecting predictions back to actionable urban planning insights.

Extreme Scenario Mapping

An automated scenario generator scans the full warm-season dataset, identifies extreme-condition timestamps (hot/cold/dry/humid, filtered for ≥10 stations reporting), and runs the full inference → kriging → visualization pipeline for each:

🔥 Hottest

37.30°C avg, 34.6% RH — June 25, 2016 16:10

❄️ Coldest

8.45°C avg, 89.6% RH — May 6, 2017 06:30

🏜️ Driest

18.13°C avg, 18.0% RH — April 4, 2015 18:30

💧 Most Humid

16.73°C avg, 99.5% RH — April 18, 2015 04:40

Each comparison map shows side-by-side temperature and humidity predictions with station observations overlaid as square markers. All output maps are 300 DPI (~1.7–2.1 MB PNG) at 4800×3600 px resolution.

HPC Integration

Stage	Time	Resources
Training (16 stations)	~26 min	A100 GPU, 34 GB RAM
Inference (100,283 points)	~2 min 10 sec	A100 GPU
Visualization (3 maps, 300 DPI)	~1 min 31 sec	CPU
All 4 extreme scenarios	~3 min 4 sec	A100 GPU

Full SLURM job scripts for each stage, plus a comprehensive HPC tutorial I wrote for onboarding other VIP-SMUR team members (VPN setup, SSH, job submission, file transfer).

What I Learned

The most valuable lesson was about model design as scientific hypothesis testing. Every architectural decision encodes an assumption: parallel LSTM branches assume multiple temporal scales matter; removing station IDs assumes the model learns location-independent physics; using solar angles assumes that sun position causally drives temperature evolution. The performance metrics then confirm or reject each assumption. When removing solar angles degraded RMSE by 6.5%, that confirmed the assumption. This feedback loop — where the model is simultaneously a prediction tool and a scientific instrument — fundamentally changed how I think about model engineering.

Share on

Twitter Facebook LinkedIn

Yupeng Tang