IMPROVING THE EDUCATIONAL EFFICACY OF SIMULATORS
DONALD A. RISUCCI, PhD, KEVIN C. WOLFE, MA
Validation is best viewed as an ongoing, iterative process that uses scientific evidence to continually improve the educational efficacy of simulators. Kirkpatrick’s [1] 4-level evaluation model is applicable to validation of simulator-based (SB) training and requires consideration of trainee reactions, learning, behavior, and results (Table 1). Validation of SB performance assessment requires consideration of the psychometric reliability and validity of assessment instruments.
VALIDATION OF SIMULATOR-BASED TRAINING
Reactions can be assessed by using surveys, ratings, logs, journals, or interviews. Though subjective, they are very important because negative reactions may severely limit simulator use despite scientific evidence of efficacy. Learning can be assessed by simulators measuring performance improvements in speed, accuracy, errors, efficiency, or decreased variability, or all of these, in trainees’ performance of simulated tasks. Learning can be assessed by comparing trainees randomly assigned to either a control or experimental group on simulator-based test scores administered after the training period. A pre-post test design may also be used to measure learning [2,3]. Measurement of variables that may influence individual learning, such as prior experience and visual spatial perceptual skills [4-6] can shed light on the optimal utilization of simulation for individuals with different aptitudes, learning styles, backgrounds, or all of these.
Evaluation of behavior involves generalizability and transfer. Generalizability is concerned with (1) the benefits of SB training of component skills used in the performance of an actual complex task or (2) the benefits of simulator-based training of a complex task on performance of other complex tasks. For example, Seymour and colleagues [7] used the Minimally Invasive Surgical Trainer-Virtual Reality (MIST-VR) to train surgical residents in a component skill (ie, manipulative diathermy) related to laparoscopic cholecystectomy and demonstrated that trainees were superior to controls in later performance of an actual laparoscopic cholecystectomy.
Analysis of transfer assesses the extent to which an increase in performance of a simulator-based procedure improves performance of the corresponding actual procedure. Transfer can be assessed either by looking for an increase in performance of an actual procedure resulting from training on a simulator or by comparing performance by trainees and controls of an actual procedure after the training period [8]. Analysis of outcomes (results in Kirkpatrick’s terminology) considers whether training has an impact on clinical or cost-effectiveness endpoints, or both, such as surgery time, operating room costs, complications, pain, recovery time, and other long-term outcomes or medical errors. Large samples and multivariate analyses are needed to isolate the specific effects of training on outcomes that are often determined by a multitude of individual and systemic factors. It is also very difficult to precisely measure individual performance in a manner that will generalize across patients, operating teams, evaluators, and other such things and to attribute changes in outcomes to variations in training and performance parameters.
VALIDATION OF SIMULATOR-BASED PERFORMANCE ASSESSMENT
One of the greatest potential benefits of simulation is the opportunity to objectively measure numerous aspects of performance and manipulate numerous parameters that can affect learning and performance. The reliability of any application of an assessment instrument must be estimated before consideration of validity.
Reliability refers to the degree to which assessment is free from errors of measurement. The actual score an individual obtains on a test can be conceptualized as representing the individual's true score (ie, his or her “true” level of proficiency) and a certain amount of systematic or random error, or both, that can have an effect on test scores. Systematic error is due to specific extraneous factors that influence performance. For example, if performance improved as the brightness of the lighting in the room where testing was conducted increased, lighting would constitute a systematic source of error that would have to be standardized and controlled. Random error is caused by extraneous or idiosyncratic factors (eg, luck, individuals not feeling well on a particular day, and other such things), or both, that can cause scores to increase or decrease in unpredictable ways.
One approach to estimate reliability is to examine the homogeneity of the test content or internal consistency. This is commonly assessed by computation of Coefficient Alpha. The Test-Retest method examines similarities in the pattern of test performance across administrations of a test over short time periods to the same cohort of individuals. The concordance/accuracy method is concerned with absolute agreement between test items and provides a more stringent test of reliability (Table 2).
Numerous studies have specifically examined the reliability of VR simulation for skill assessment [9,10]. As a general guideline, reliability coefficients should be ≥0.70 for an assessment to be considered sufficiently reliable for research purposes, ≥0.80 for low or medium-stakes testing, and approximately 0.90 for high-stakes testing.
Having demonstrated that a measurement instrument is sufficiently reliable, it is appropriate to investigate its validity. The question shifts from whether or not the test consistently measures individual performance to whether it actually measures what it purports to measure. Five types of validity criteria are generally applied to applications of an assessment instrument: face validity, content validity, construct validity, discriminate/convergent validity, and predictive validity (Table 3).
To date, the majority of the studies looking at validity have been concerned with construct validation, demonstrating the ability of VR-based assessments to differentiate among individuals with pre-existing differences in relevant experience [3,4,11-14].
SUMMARY
The description of validation criteria presented in this paper is intended to provide readers with a general understanding of the characteristics of SB training and assessment systems that must be apparent to the intended users of these systems for them to be accepted, adopted, utilized effectively and efficiently, and improved over time. These criteria should be considered and studied routinely by developers of SB technology beginning in the early stages of system design.
Address reprint requests to: Donald A. Risucci, PhD, NY Medical
College, Dept of Surgery, Munger Pavilion, Valhalla, NY 10595, USA.
Tel: 914 594 3246, Fax: 914 594 4359, E-mail: Donald_risucci@nymc.edu
Don Risucci, PhD, is Associate Professor of Surgery and Director of
Surgical Education and Research at New York Medical College and is
Co-Director, Minimally Invasive Surgical Skills Laboratory at
Westchester Medical Center. An applied psychologist, Dr Risucci has
been involved in surgical and educational research for over 15 years
and is currently the Vice President/President-elect of the Association
for Surgical Education. His research has focused primarily on
assessment of surgical competence, surgical technical skill, visual
spatial perception, and injury prevention.
Kevin Wolfe, MA, is a Research Assistant in the Department of Surgery
at New York Medical College. He received a Master's degree in
Industrial / Organizational Psychology in May 2004 and is currently
pursuing a PhD in Applied Organizational Psychology at Hofstra
University. His interests are in training and development and
performance assessment, with particular interests in improving the
accuracy of performance ratings as well as multisource feedback.
References
1. Kirkpatrick DL. Evaluating training programs: Evidence vs. proof. Train Dev J. 1977;31(11):9-12.
2. Hamilton EC, Scott DJ, Fleming JB, et al. Comparison of video trainer and virtual reality training systems on acquisition of laparoscopic skills. Surg Endosc. 2002;16:406-411.
3. Wilhelm DM, Ogan K, Roehrborn CG, Cadeddu JA, Pearle MS. Assessment of basic endoscopic performance using a virtual reality simulator. J Am Coll Surg. 2002;195(5):675-681.
4. Risucci DA, Geiss A, Gellman L, Pinard B, Rosser JC. Surgeon-specific factors in the acquisition of laparoscopic surgical skills. Am J Surg. 2001;181(4):289-293.
5. Schijven M, Jakimowicz J. Construct validity: experts and novices performing on the Xitact LS500 laparoscopy simulator. Surg Endosc. 2003;17:803-810.
6. Grantcharov TP, Bardram L, Funch-Jenson P, Rosenberg J. Learning curves and impact of previous operative experience on performance on a virtual reality simulator to test laparoscopic surgical skills. Am J Surg. 2002;185:146-149.
7. Seymour NE, Gallagher AG, Roman SA, et al. Virtual reality training improves operating room performance: results of a randomized, double-blinded study. Ann Surg. 2002;236(4):458-463.
8. Hyltander A, Liljegren E, Rhodin PH, Lonroth H. The transfer of basic skills learned in a laparoscopic simulator to the operating room. Surg Endosc. 2002;16:1324-1328.
9. Sung WH, Fung CP, Chen AC, Yuan CC, Ng HT, Doong JL. The assessment of stability and reliability of a virtual reality-based laparoscopic gynecology simulation system. Eur J Gynaecol Oncol. 2003;24(2):143-146.
10. Gallagher AG, Satava RM. Virtual reality as a metric for the assessment of laparoscopic psychomotor skills. Learning and reliability measures. Surg Endosc. 2002;16(12):1746-1752.
11. O'Toole RV, Playter RR, Krummel TM, et al. Measuring and developing suturing technique with a virtual reality surgical simulator. J Am Coll Surg. 1999;189(1):114-127.
12. McNatt SS, Smith CD. A computer-based laparoscopic skills assessment devices differentiates experienced form novice laparoscopic surgeons. Surg Endosc. 2001;15:1085-1089.
13. Haluck RS, Webster RW, Snyder AJ, et al. A virtual reality surgical trainer for navigation in laparoscopic surgery. Stud Health Technol Inform. 2001;81:171-176.
14. Gallagher AG, Richie K, McClure N, McGuigan J. Objective psychomotor skills assessment of experienced, junior, and novice laparoscopists with virtual reality. World J Surg. 2001;25:1478-1483.
www.Laparoscopy.org The Laparoscopic Surgery Information Source