We present a crowd simulation that captures some of the semantics of a specific scene by partly reproducing its motion behaviors, both at a lower level using a steering model and at the higher level of goal selection. To this end, we use and generalize a steering model based on linear velocity prediction, termed LTA. From a goal selection perspective, we reproduce many of the motion behaviors of the scene without explicitly specifying them. Behaviors like ``wait at the tram stop'' or ``stroll-around'' are not explicitly modeled, but learned from real examples. To this end, we process real data to extract information that we use in our simulation. As a consequence, we can easily integrate real and virtual agents in a mixed reality simulation. We propose two strategies to achieve this goal and validate the results by a user xstudy.