City Scale Image Geolocalization via Dense Scene Alignment

Predicting where a photo was taken is quite important and yet a challenging task for computer vision algorithms. Our motivation is to solve this difficult problem in a city scale setting by employing a data-driven approach. In order to pursue this goal, we developed a fast and robust scene matching method that follows a coarse-to-fine strategy.

In particular, we combine scene retrieval via global features and dense scene alignment and use a large set of geo-tagged images of downtown San Francisco in our evaluation. The experimental results show that the proposed approach, despite its simplicity, is surprisingly effective and achieves comparable results with the state-of-the-art.