A comprehensive benchmark tool for evaluating vision-capable LLMs on historical image analysis, testing their ability to infer geographic location and temporal context from TimeGuessr dataset imagery.
The TimeGuessr AI Benchmark evaluates vision-capable large language models on historical image analysis using the TimeGuessr dataset. Models are tasked with predicting both the year and geographic location where historical photographs were captured, requiring integration of visual, cultural, and historical knowledge.
The benchmark uses the official TimeGuessr scoring algorithm with a maximum of 10,000 points per image (5,000 for location accuracy, 5,000 for temporal accuracy). This dual-task evaluation provides insights into models' spatial reasoning, cultural understanding, and temporal inference capabilities.
Built with the Vercel AI SDK, the benchmark tool supports parallel model evaluation with real-time progress tracking through a live CLI interface, enabling comprehensive comparison of multiple LLMs simultaneously.
Given a historical image I, predict the geographic coordinates (latitude, longitude) where the image was captured. Models receive no contextual information beyond the visual content.
Given a historical image I, predict the year Y when the image was captured (range: 1800-2024). Models analyze visual temporal indicators without additional metadata.
The benchmark uses the exact scoring algorithm from the original TimeGuessr game to ensure compatibility and meaningful comparison with human performance. This scoring system rewards both temporal and spatial accuracy with a maximum of 10,000 points per image.
Location accuracy is scored using distance-based thresholds with the Haversine formula:
Year predictions are scored based on absolute difference from actual year:
Total Score = Location Score + Year Score (Maximum: 10,000 points)
"You are an expert historian and geographer analyzing historical photographs. Your task is to predict the year and location where each photograph was taken based on visual clues including clothing styles, architecture, vehicles, technology, street furniture, signage, and cultural indicators.""Analyze this historical photograph and predict: 1) The year it was taken (as precisely as possible), 2) The geographic location where it was taken (latitude and longitude). Look carefully at all visual clues including clothing, architecture, vehicles, technology, signs, and other temporal or geographical indicators."{
"year": number (1800-2024),
"location": {
"lat": number (-90 to 90),
"lng": number (-180 to 180)
},
"confidence": number (0-1),
"reasoning": "Detailed explanation of visual clues"
}Comprehensive performance metrics generated for each model:
Direct integration with TimeGuessr scraped dataset: