| Rank | Feature | Importance | Value |
|---|
Built by Jim Pleuss and Dusty Turner, competing as "Beat Navy" on Kaggle since 2021.
28 raw A/B features, grid=5. Predicted Purdue as champion (actual winner: UConn).glm_quality_diff is 5x more important than any other featureVisual bracket comparisons showing how the 2026 XGBoost model differs from the 2024 Random Forest. Click through to explore each view in detail.
| Season | RF Brier | XGB Brier | Winner |
|---|
We retrained the exact models from 2021 and 2024 on 2026 team data and scored them against actual tournament results. Same algorithms, same features, same hyperparameters — just pointed at this year's bracket.
2021 Model: Random Forest, 28 raw A/B features, grid=3, tuned on accuracy. 2024 Model: Random Forest, 28 raw A/B features, grid=5, tuned on Brier. 2026 Model: XGBoost PRUNED-25, 25 diff features, grid=50, tuned on Brier.
| Metric | 2021 RF | 2024 RF | 2026 XGBoost |
|---|
| Round | Games | 2021 RF | 2024 RF | 2026 XGBoost |
|---|
Games where at least one model was right and another wrong. Y = correct, N = wrong.
| Game | 2021 | 2024 | 2026 | Result |
|---|
The 10 games with the largest gap between the 2024 RF and 2026 XGBoost predictions.
| Game | 2024 RF | 2026 XGB | Actual | Winner |
|---|
Using our women's model score (held constant) combined with each men's model. Leaderboard: 3,114 teams.
| Model | Est. Kaggle Score | Est. Rank | Percentile |
|---|
Through the first two rounds, the 2024 RF's conservative predictions (closer to 0.50) are rewarding it — when upsets happen, it gets punished less. The 2026 XGBoost makes sharper, more confident predictions that pay off on chalk games but cost more on upsets. On historical backtest (2021–2025), the 2026 model is clearly superior (0.1871 vs 0.1941 Brier). As later rounds play out with tighter matchups, expect the XGBoost to pull ahead.
0.1871 vs 0.1941).0.1871 vs 0.1944). Aggressive feature selection removes noise.glm_quality_diff is the single most important feature by a wide margin — 2x the importance of the next feature. It captures team quality via a generalized linear model on game outcomes.0.1658) underperforms separate gender-specific models (0.1607 combined).0.1882) is worse than the single best seed (0.1871).