Academic Research Benchmark

TimeGuessr AI Benchmark

A comprehensive benchmark tool for evaluating vision-capable LLMs on historical image analysis, testing their ability to infer geographic location and temporal context from TimeGuessr dataset imagery.

14
AI Models Tested
30
Test Images
30
Countries Covered
1934-2024
Year Range

Benchmark Overview

Models are tasked with predicting both the year and geographic location where historical photographs were captured, using the official TimeGuessr scoring algorithm with a maximum of 10,000 points per image.

Geographic Localization
Models predict geographic coordinates (latitude, longitude) where historical images were captured
  • • Distance-based scoring using Haversine formula
  • • Visual cues: architectural styles, vegetation, license plates
  • • Cultural and social indicators analysis
  • • Maximum 5,000 points for perfect location accuracy
Temporal Estimation
Models predict the year when historical images were captured (range: 1800-2024)
  • • Year-based scoring with absolute difference penalties
  • • Temporal cues: technology, fashion, vehicles, image quality
  • • Maximum 5,000 points for exact year prediction
  • • Mean absolute error analysis for year predictions

Top Performing Models

Current leaderboard leaders based on TimeGuessr benchmark performance

#1google logogemini-2.5-pro
Vision
google • Avg Score: 9587 pts
9587
Average Points (out of 10,000)
13.3%
Location Accuracy
1.8
Year MAE

Evaluated on 30 images using TimeGuessr scoring

#2openai logogpt-5
Vision
openai • Avg Score: 9364 pts
9364
Average Points (out of 10,000)
6.7%
Location Accuracy
2.6
Year MAE

Evaluated on 30 images using TimeGuessr scoring

#3openai logogpt-4o
Vision
openai • Avg Score: 9241 pts
9241
Average Points (out of 10,000)
3.3%
Location Accuracy
2.4
Year MAE

Evaluated on 30 images using TimeGuessr scoring

Explore the Full Results

Dive deep into model comparisons, interactive visualizations, and detailed analysis.