From: Tools for measuring technical skills during gynaecologic surgery: a scoping review
Assessment tool | Scoring | Generalisation | Extrapolation |
---|---|---|---|
Objective Structured Assessment of technical Skills (OSATS) [17]. | Comparison of OSATS scores over time. | Not reported | Construct validity was demonstrated as a significant rise in score with increasing caseload as 1.10 OSATS point per assessed procedure (p = 0.008, 95% CI 0.44–1.77) |
Vaginal Surgical Skills Index (VSSI) [18]. | Comparing GRS and VSSI. A visual analogue scale was added for overall performance. | Internal consistency for the VSSI and GRS = (Cronbach’s alpha (0.95–0.97)) Interrater reliability = 0.53 and intrarater reliability = 0.82 | Construct validity was evaluated by measuring convergent validity using Pearson correlation coefficient (r) (VSSI = 0.64, p = 0.01, 95% CI 0.53–0.73) (GRS = 0.51, p = 0.001, 95% CI 0.40–0.61) and showed the ability to discriminate training levels by VSSI scores. |
Hopkins Assessment of Surgical Competency (HASC) [19]. | Surgeons rated on general surgical skills and case-specific surgical skills. No comparison. | Internal consistency reliability of the items using Cronbach’s alpha = 0.80 (p < 0.001) | Discriminative validity for inexperienced vs intermediate surgeons (p < 0.001) |
Objective Structured Assessment of Laparoscopic Salpingectomy (OSA-LS) [20]. | Surgeons rated by OSA-LS. No Comparison. | Interrater reliability =0.831. Intrarater reliability not reported. | Discriminative validity for inexperienced vs intermediate surgeon’s vs experienced surgeons (p < 0.03) |
Robotic Hysterectomy Assessment Score (RHAS) [21]. | Surgeons rated by expert viewers using RHAS. No Comparison, | Interrater reliability for total domain score = 0.600 (p < 0.001). Intrarater reliability not reported. | Discriminative validity for experts, advanced beginners and novice in all domains except vaginal cuff closure (p = 0.006). |
Competence Assessment for Laparoscopic Supracervical Hysterectomy (CAT-LSH) [22]. | Comparing GOALS and CAT-LSH | Interrater reliability = 0.75 Intrarater reliability not reported. | Discriminative validity for inexperienced vs intermediate (p < 0.001) and intermediate vs experts (p < 0.001) assessed by assistant surgeon. For blinded reviewers discriminative validity for inexperienced vs intermediate (p < 0.006) and intermediate vs experts (p < 0.011). |
Feasible rating scale for formative and summative feedback [23]. | Surgeons rated by expert viewers using 12-item procedure-specific checklist | Interrater reliability =0.996 for one rater and 0.0998 for two raters. Intrarater reliability not reported. | Discriminative validity for beginners and experienced surgeons (p = < 0.001) |
GERT = Generic Error Rating Tool [24]. | Comparing OSATS and GERT | Interrater reliability = > 0.95) Intrarater reliability = > 0.95) | Significant negative correlation between OSATS and GERT scores (rater 1: Spearman = − 0.76, (p < 0.001); rater 2 = − 0.88, (p < 0.001) |