The paper aims at estimating the geometric transformation parameters between two diverse visual modalities (e.g. an image and a map, or an uncalibrated SfM result and a Euclidean 3D model), relying only on semantic cues. The latter include semantically segmented regions or objectsâ bounding boxes. The proposed approach also differs from the traditional feature-to-feature correspondence reasoning. Starting from semantic regions on one side, we search for their possible assignment regions on the other, thus constraining the sought geometric transformation. This entails a simultaneous search for the transformation as well as the region-to-region correspondences. The optimization is based on requiring control points that define the source semantic region to map consistently with the target semantic region. The paper is the first to derive the conditions that must be satisfied if a convex region defined by control points can be transformed inside an ellipsoid. These conditions are expressed in the form of Linear Matrix Inequalities (LMIs), and used within a Branch-and-Prune search paradigm to obtain the globally optimal transformation parameters. Using some mild initial bound conditions, we provide the experimental results with two challenging example problems: (i) registration between a semantically segmented image and a map via a 2D projective homography; (ii) 3D reconstruction from uncalibrated cameras to the Euclidean 3D scene registration, using detection cues. Our experiments yield good results, that we also compare against other approaches.