From: Large language models for generating medical examinations: systematic review
Author | Medically Irrelevant Questions | Invalid for Medical Exam | Inaccurate/Wrong Question | Inaccurate/Wrong Answer or Alternative answers | Low Difficulty Level |
---|---|---|---|---|---|
Sevgi et al. | N/A | N/A | N/A | 1 (33.3%) | N/A |
Biswas | N/A | N/A | N/A | N/A | N/A |
Agarwal et al. | N/A | Highly valid | N/A | V/A | Somewhat difficult |
Ayub et al. | 9 (23%) | 24 (60%) | 5 (13%) | 5 (13%) | 10 (25%) |
Cheung et al. | 32 (64%) | 28 (56%) | 32 (64%) | 29 (58%) | N/A |
Totlis et al. | N/A | 8 (44.4%) | N/A | N/A | 8 (44.4%) |
Han et al. | N/A | N/A | N/A | N/A | 3 (100%) |
Klang et al. | 2 (0.95%) | 1 (0.5%) | 12 (5.7%) | 14 (6.6%) | 2 (0.95%) |