Robust camera tracking plays a key role in augmented video. This paper proposes an efficient and robust approach to structure and motion recovery for long video sequences with varying and unknown focal length. In this approach, a long sequence is abstracted as a sequence of key frames in-between which have long baselines in order to assure the preciseness of the solution. The sequence of the key frames are resolved incrementally in order to recover the structure of 3D points, by which the camera motion of all frames of the sequence is retrieved. The algorithm begins with three key frames suitable for initializing the sequential structure and motion computation, and the projective structure is upgraded to metric one in time though self-calibration. The implemented examples demonstrate very precise structure and motion recovery, and prove the efficiency and robustness of the proposed method.