In order to track an object in video, most tracking algorithms makes some assumptions about the nature the movement. For example, it is common to assume that the movement between two consequtive frames is small and/or that the object always is visible.
This work asks the question: What happens if this is not the case? It would be very unfortunate if an entire tracking algorithm failed because someone happens to walk in front of the camera, temporarily occluding everything else in view.
Tracking-by-recognition is a way of tracking an object using recognition techniques only. Each frame is then processed independently of every other frame, causing temporary occlusion to have zero impact in the long run. However, throwing away all temporal information is not really a good idea. This work tries to incorporate temporal information to tracking-by-recognition when such information is available.
Petter Strandmark, Feature Tracking with Multiple Models, Master's thesis, Lund University 2008
Petter Strandmark Irene Gu, Joint Random Sample Consensus and Multiple Motion Models for Robust Video Tracking, Scandinavian Conference on Image Analysis (SCIA) 2009. (to appear)
Daniel Persson and Björn Samvik uses some of these ideas in their Master's thesis about real-time hand gesture detection.