BACKGROUND: The aims of this study are to determine construct validity for the HystSim virtual-reality (VR) training simulator for hysteroscopy via a new multimetric scoring system (MMSS) and to explore learning curves for both novices and experienced surgeons. METHODS: Fifteen relevant metrics had been identified for diagnostic hysteroscopy by means of hierarchical task decomposition. They were grouped into four modules (visualization, ergonomics, safety, and fluid handling) and individually weighted, building the MMSS for this study. In a first step, 24 novice medical students and 12 experienced gynecologists went through a self-paced teaching tutorial, in which all participants received clearly stated goals and instructions on how to carry out hysteroscopic procedures properly for this study. All subjects performed five repeated trials on two different exercises on HystSim (exploration and diagnosis exercises). After each trial the results were presented to the participants in the form of an automated objective feedback report (AOFR). Construct validity for the MMSS and learning curves were investigated by comparing the performance between novices and experienced surgeons and in between the repeated trials. To study the effect of repeated practice, 23 of the novices returned 2 weeks later for a second training session. RESULTS: Comparing novices with the experienced group, the ergonomics and fluid handling modules resulted in construct validity, while the visualization module did not, and for the safety module the experienced group even scored significantly lower than novices in both exercises. The overall score showed only construct validity when the safety module was excluded. Concerning learning curves, all subjects improved significantly during the training on HystSim, with clear indication that the second training session was beneficial for novice surgeons. CONCLUSIONS: Construct validity for HystSim has been established for different modules of VR metrics on a new MMSS developed for diagnostic hysteroscopy. Careful refinement and further testing of metrics and scores is required before using them as assessment tools for operative skills.