Skip to main content

Table 4 Studies evaluated by Kane’s validity argument

From: Tools for measuring technical skills during gynaecologic surgery: a scoping review

Assessment tool

Scoring

Generalisation

Extrapolation

Objective Structured Assessment of technical Skills (OSATS) [17].

Comparison of OSATS scores over time.

Not reported

Construct validity was demonstrated as a significant rise in score with increasing caseload as 1.10 OSATS point per assessed procedure (p = 0.008, 95% CI 0.44–1.77)

Vaginal Surgical Skills Index (VSSI) [18].

Comparing GRS and VSSI. A visual analogue scale was added for overall performance.

Internal consistency for the VSSI and GRS = (Cronbach’s alpha (0.95–0.97)) Interrater reliability = 0.53 and intrarater reliability = 0.82

Construct validity was evaluated by measuring convergent validity using Pearson correlation coefficient (r) (VSSI = 0.64, p = 0.01, 95% CI 0.53–0.73) (GRS = 0.51, p = 0.001, 95% CI 0.40–0.61) and showed the ability to discriminate training levels by VSSI scores.

Hopkins Assessment of Surgical Competency (HASC) [19].

Surgeons rated on general surgical skills and case-specific surgical skills. No comparison.

Internal consistency reliability of the items using Cronbach’s alpha = 0.80 (p < 0.001)

Discriminative validity for inexperienced vs intermediate surgeons (p < 0.001)

Objective Structured Assessment of Laparoscopic Salpingectomy (OSA-LS) [20].

Surgeons rated by OSA-LS. No Comparison.

Interrater reliability =0.831. Intrarater reliability not reported.

Discriminative validity for inexperienced vs intermediate surgeon’s vs experienced surgeons (p < 0.03)

Robotic Hysterectomy Assessment Score (RHAS) [21].

Surgeons rated by expert viewers using RHAS. No Comparison,

Interrater reliability for total domain score = 0.600 (p < 0.001). Intrarater reliability not reported.

Discriminative validity for experts, advanced beginners and novice in all domains except vaginal cuff closure (p = 0.006).

Competence Assessment for Laparoscopic Supracervical Hysterectomy (CAT-LSH) [22].

Comparing GOALS and CAT-LSH

Interrater reliability = 0.75

Intrarater reliability not reported.

Discriminative validity for inexperienced vs intermediate (p < 0.001) and intermediate vs experts (p < 0.001) assessed by assistant surgeon. For blinded reviewers discriminative validity for inexperienced vs intermediate (p < 0.006) and intermediate vs experts (p < 0.011).

Feasible rating scale for formative and summative feedback [23].

Surgeons rated by expert viewers using 12-item procedure-specific checklist

Interrater reliability =0.996 for one rater and 0.0998 for two raters. Intrarater reliability not reported.

Discriminative validity for beginners and experienced surgeons (p = < 0.001)

GERT = Generic Error Rating Tool [24].

Comparing OSATS and GERT

Interrater reliability = > 0.95)

Intrarater reliability = > 0.95)

Significant negative correlation between OSATS and GERT scores (rater 1: Spearman = − 0.76, (p < 0.001); rater 2 = − 0.88, (p < 0.001)