In this paper, we address the problem of effective visualization of and interaction with multiple and multi-dimensional data supporting communication between project stakeholders in an information cave. More exactly, our goal is to enable multiple users to interact with multiple screens from any location in an information cave. We present here our latest advancements in developing a novel human-computer interaction system that is specifically targeted towards room setups with physically spread sets of screens. Our system consists of a set of video cameras overseeing the room, and of which the signals are processed in real-time to detect and track the participants, their poses and hand-gestures. The system is fed with camera based gesture recognition. Early experiments have been conducted in the Value Lab (see figure 1), that has been recently introduced at ETH Zurich, and they focus on enabling the interaction with large urban 3D models being developed for the design and simulation of future cities. For the moment, experiments consider only the interaction of a single user with multiple layers (points of view) of a large city model displayed on multiple screens. The results demonstrate the huge potential of the system, and the principle of vision based interaction for such environments. The work continues on the extension of the system to a multi-user level.