For image based body pose estimation, the relationship of body pose and image appearance has often been captured by the means of a geometric body model. In contrast to such handcrafted models, a learned model has many advantages; e.g. it can be learned from training data, and the computations can often be performed parametrically. Furthermore, learned probabilistic models are able to generalise over irrelevant variation such as the difference in the appearance of distinct subjects. The mapping from an image descriptor computed from the bounding box of the tracked person to its pose can be learned with regressors and therefore be computed analytically. However, when we consider the 2d bounding box tracking as an integral part of the body tracking and pose estimation problem, the learned regressors do not provide us with the necessary information, nor will it be possible to do all the computations parametrically. To overcome these problems, we propose to learn a model of the joint pdf of pose and image descriptors rather than the conditional, and, for inference we combine analytical and sample-based computation in a Rao-Blackwellised particle filter.