Academic Research Benchmark

TimeGuessr AI Benchmark

A comprehensive benchmark tool for evaluating vision-capable LLMs on historical image analysis, testing their ability to infer geographic location and temporal context from TimeGuessr dataset imagery.

AI Models Tested

Test Images

Countries Covered

1934-2024

Year Range

Benchmark Overview

Models are tasked with predicting both the year and geographic location where historical photographs were captured, using the official TimeGuessr scoring algorithm with a maximum of 10,000 points per image.

Geographic Localization

Models predict geographic coordinates (latitude, longitude) where historical images were captured

• Distance-based scoring using Haversine formula
• Visual cues: architectural styles, vegetation, license plates
• Cultural and social indicators analysis
• Maximum 5,000 points for perfect location accuracy

Temporal Estimation

Models predict the year when historical images were captured (range: 1800-2024)

• Year-based scoring with absolute difference penalties
• Temporal cues: technology, fashion, vehicles, image quality
• Maximum 5,000 points for exact year prediction
• Mean absolute error analysis for year predictions

Top Performing Models

Current leaderboard leaders based on TimeGuessr benchmark performance

gemini-2.5-pro

Vision

google • Avg Score: 9587 pts

9587

Average Points (out of 10,000)

13.3%

Location Accuracy

1.8

Year MAE

Evaluated on 30 images using TimeGuessr scoring

gpt-5

Vision

openai • Avg Score: 9364 pts

9364

Average Points (out of 10,000)

6.7%

Location Accuracy

2.6

Year MAE

Evaluated on 30 images using TimeGuessr scoring

gpt-4o

Vision

openai • Avg Score: 9241 pts

9241

Average Points (out of 10,000)

3.3%

Location Accuracy

2.4

Year MAE

Evaluated on 30 images using TimeGuessr scoring

Explore the Full Results

Dive deep into model comparisons, interactive visualizations, and detailed analysis.