Kaggle Competition: American Express Default Prediction

Overview

Led a predictive analytics project achieving Top 0.4% ranking (20th out of 4,874 teams) and Silver Medal in the American Express Default Prediction Kaggle competition.

Key Technologies

  • Machine Learning: LightGBM (DART), XGBoost (GPU-accelerated), CatBoost
  • Ensemble Methods: Linear weighted averages, correlation-based weighting
  • Feature Engineering: Time-series features, lag features, rolling statistics

Achievement

🏆 Ranked 20th out of 4,874 teams (Top 0.4%)
🥈 Silver Medal

Technical Approach

  • Developed weighted ensemble of LightGBM (DART) and GPU-accelerated XGBoost models
  • Processed 16 GB tabular time-series data covering transactions, balances, delinquencies, and repayments
  • Optimized hyperparameters via grid search and stratified 5-fold cross-validation
  • Designed diverse feature sets including:
    • Lag features
    • Rolling statistics
    • Trend indicators
    • Transaction patterns
    • Temporal aggregations

Model Strategy

  • Ensemble Architecture: Combined multiple model types with correlation-based weights
  • Stability Enhancement: Trained multiple seeds to boost prediction stability
  • Feature Diversity: Extracted temporal patterns from time-series credit data
  • Efficient Design: Delivered compact, high-performing solution

Impact

  • Demonstrated ability to work with large-scale financial data
  • Showcased expertise in ensemble learning and feature engineering
  • Achieved top-tier performance in highly competitive global competition
  • Led model tuning and ensemble strategy for the team

Competition Details

American Express – Default Prediction
Platform: Kaggle
Timeframe: May 2022 – August 2022
Dataset: 16 GB tabular time-series credit card data