This work addresses the problem of tracking humans with skeleton-based shape models where video footage is acquired by multiple cameras. Since the shape deformations are parameterized by the skeleton, the position, orientation, and configuration of the human skeleton are estimated such that the deformed shape model is best explained by the image data. To solve this problem, several algorithms have been proposed over the last years. The approaches usually rely on filtering, local optimization, or global optimization. The global optimization algorithms can be further divided into single hypothesis (SHO) and multiple hypothesis optimization (MHO). We briefly compare the underlying mathematical models and evaluate the performance of one representative algorithm for each class. Furthermore, we compare several likelihoods and parameter settings with respect to accuracy and computation cost. A thorough evaluation is performed on two sequences with uncontrolled lighting conditions and non-static background. In addition, we demonstrate the impact of the likelihood on the HumanEva benchmark. Our results provide a guidance on algorithm design for different applications related to human motion capture.