Global image features, although quite successful for a surprisingly large number of queries, are nevertheless restricted in their usefulness. They are less effective when e.g. looking for other images that contain the same object or simply the same scene but seen from a very different viewpoint. Relative amounts of colour and texture typically change dramatically in such cases. Hence, more local features are needed as well, that capture local image structure in a way that is robust against changes in viewpoint and illumination. We propose affinely invariant regions. These are extracted from single images in isolation. Their image shapes are viewpoint dependent in that they systematically cover the same physical part of surfaces. These 'affinely invariant regions' work in principle perfectly for planar shapes under pseudo-orthographic viewing conditions. Keeping them small helps to fulfill these conditions. Several ways of extracting these regions are discussed. Next, features are extracted from the colour patterns within the regions. These are invariant under the geometric and photometric changes that are considered. As a consequence, they can be directly compared between images. This is the basis for our simple retrieval algorithm.