×
About HardGeoBench
This is a benchmark for testing frontier AI ability to identify the location of photos. These have never been posted online, and generally do not include any text, road markers, or other obvious "GeoGuessr" type hints about location. Some are incredibly challenging to identify. The underlying photos have no metadata of any kind, and the benchmark is measured using models that do not use web search or other tools (it is a "raw LLM" benchmark).
These are all pass@1 using API default settings and the same simple prompt, without sampling. I ran the 3 best models twice each on May 8, 2025. On those six runs, at least one model gets the location exactly right ("within 10 miles" though almost always much closer and exact in the description) 57.5% of the time. Some of these are very difficult - I suspect no model no matter how good can do better than 75%.
The benchmark was created by Kevin Bryan, All Day TA and U Toronto, on May 8, 2025.