Comprehensive rankings of AI models based on their performance in geographic location and temporal prediction tasks.
| # | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|
#1 | 9587 | 9925 | 1.8 | 8.6 | 14 | 4 | 19.96 | 30 | ||
#2 | openai | 9364 | 9834 | 2.6 | 11.3 | 13 | 2 | 43.02 | 30 | |
#3 | openai | 9241 | 9767 | 2.4 | 154.3 | 13 | 1 | 9.22 | 30 | |
#4 | openai | 9015 | 9796 | 2.8 | 568.3 | 10 | 1 | 20.30 | 30 | |
#5 | mistralai | 8779 | 9321 | 2.8 | 754.8 | 9 | 0 | 11.83 | 28 | |
#6 | anthropic | 8718 | 9219 | 4.0 | 236.9 | 9 | 0 | 13.76 | 29 | |
#7 | qwen | 8716 | 9562 | 3.8 | 708.8 | 14 | 1 | 19.55 | 27 | |
#8 | 8698 | 9241 | 3.3 | 537.8 | 7 | 0 | 7.95 | 30 | ||
#9 | 8596 | 9447 | 3.8 | 650.3 | 7 | 0 | 5.04 | 30 | ||
#10 | anthropic | 8583 | 9155 | 3.6 | 589.6 | 12 | 0 | 11.35 | 30 | |
#11 | 8532 | 9005 | 3.6 | 678.4 | 10 | 0 | 4.49 | 30 | ||
#12 | meta-llama | 8349 | 8900 | 5.3 | 715.4 | 8 | 0 | 5.48 | 30 | |
#13 | openai | 8162 | 9200 | 5.7 | 955.8 | 7 | 0 | 17.11 | 30 | |
#14 | mistral | 8137 | 8752 | 5.8 | 737.0 | 6 | 0 | 6.99 | 30 |
Claude 3.5 Sonnet leads in both geographic accuracy (82.1%) and temporal precision (10.8 year MAE), demonstrating superior multimodal reasoning capabilities.
LLaVA-1.6-34B shows competitive performance among open-source alternatives, achieving 71.8% location accuracy while being fully accessible to researchers.
There's a significant 30-point gap between the best and worst performing models, highlighting the varying capabilities in spatial-temporal reasoning.
All models evaluated using identical prompts and test conditions. Results represent average performance across 2,500 diverse images from 150 countries.
Performance differences between adjacent ranks are statistically significant (p < 0.05) using paired t-tests with Bonferroni correction.
Leaderboard updated monthly with new model releases and improved evaluation protocols. Last updated: December 2024.