Teaching of Psychology, Ahead of Print.
BackgroundMultiple-choice item (MCI) assessments are burdensome for instructors to develop. Artificial intelligence (AI, e.g., ChatGPT) can streamline the process without sacrificing quality. The quality of AI-generated MCIs and human experts is comparable. However, whether the quality of AI-generated MCIs is equally good across various domain- and task-specific prompts remains to be determined. Therefore, we ask whether AI can generate high-quality MCIs to assess learning outcomes from a psychology textbook chapter reading.ObjectiveIn an exploratory study, we enlist Item Response Theory analysis and expert reviewers to assess MCIs generated by ChatGPT-4 from a psychology textbook chapter.MethodWe submitted a prompt and textbook chapter to ChatGPT-4 requesting 20 MCIs. One hundred ninety undergraduate participants read the chapter before responding to the MCIs. Expert reviewers assessed the MCIs for learning outcome alignment and quality.ResultsChatGPT-4-generated MCIs were low in difficulty and high in discrimination. Expert reviewers found that nearly all items were logically sound, aligned with learning objectives, and met prevailing standards of MCI quality.ConclusionWhen carefully prompted, ChatGPT-4 can rapidly generate high-quality MCIs to test comprehension of a psychology textbook chapter. However, due to the uniformly low difficulty of the items, we recommend enlisting ChatGPT-4 to write MCIs for formative, but not summative, assessments.