We present a feature-based framework that combines spatial feature clustering, guided sampling for pose generation, and model up- dating for 3D object recognition and pose estimation. Existing methods fails in case of repeated patterns or multiple instances of the same object, as they rely only on feature discriminability for matching and on the es- timator capabilities for outlier rejection. We propose to spatially separate the features before matching to create smaller clusters containing the ob- ject. Then, hypothesis generation is guided by exploiting cues collected off- and on-line, such as feature repeatability, 3D geometric constraints, and feature occurrence frequency. Finally, while previous methods over- load the model with synthetic features for wide baseline matching, we claim that continuously updating the model representation is a lighter yet reliable strategy. The evaluation of our algorithm on challenging video sequences shows the improvement provided by our contribution.