Skip to main content

Table 3 Present faulty questions generated by the AI

From: Large language models for generating medical examinations: systematic review

Author

Medically

Irrelevant

Questions

Invalid

for

Medical

Exam

Inaccurate/Wrong

Question

Inaccurate/Wrong

Answer

or

Alternative answers

Low

Difficulty

Level

Sevgi et al.

N/A

N/A

N/A

1 (33.3%)

N/A

Biswas

N/A

N/A

N/A

N/A

N/A

Agarwal et al.

N/A

Highly valid

N/A

V/A

Somewhat difficult

Ayub et al.

9 (23%)

24 (60%)

5 (13%)

5 (13%)

10 (25%)

Cheung et al.

32 (64%)

28 (56%)

32 (64%)

29 (58%)

N/A

Totlis et al.

N/A

8 (44.4%)

N/A

N/A

8 (44.4%)

Han et al.

N/A

N/A

N/A

N/A

3 (100%)

Klang et al.

2 (0.95%)

1 (0.5%)

12 (5.7%)

14 (6.6%)

2 (0.95%)

  1. Summary of faulty questions generated by the AI, November 2023