A GPU-accelerated two stage visual matching pipeline for image and video retrieval

We propose a two stage visual matching pipeline including a first step using VLAD signatures for filtering results, and a second step which reranks the top results using raw matching of SIFT descriptors. This enables adjusting the tradeoff between high computational cost of matching local descriptors and the insufficient accuracy of compact signatures in many application scenarios.

We describe GPU accelerated extraction and matching algorithms for SIFT, which result in a speedup factor of at least 4. The VLAD filtering step reduces the number of images/frames for which the local descriptors need to be matched, thus speeding up retrieval by an additional factor of 9-10 without sacrificing mean average precision over full raw descriptor matching.