• Skip to primary navigation
  • Skip to main content
  • Skip to primary sidebar

information for practice

news, new scholarship & more from around the world


advanced search
  • gary.holden@nyu.edu
  • @ Info4Practice
  • Archive
  • About
  • Help
  • Browse Key Journals
  • RSS Feeds

Data extraction by generative artificial intelligence: Assessing determinants of accuracy using human-extracted data from systematic review databases.

Psychological Bulletin, Vol 151(10), Oct 2025, 1280-1306; doi:10.1037/bul0000501

Psychological science requires reliable measures. Within systematic literature reviews, reliability hinges on high interrater agreement during data extraction. Yet, the extraction process has been time-consuming. Efforts to accelerate the process using technology have shown limited success until generative artificial intelligence (genAI), particularly large language models (LLMs), accurately extracted variables from medical studies. Nonetheless, for psychological researchers, it remains unclear how to utilize genAI for data extraction, given the range of tested variables, the medical context, and the variability in accuracy. We systematically assessed extraction accuracy and error patterns across domains in psychology by comparing genAI-extracted and human-extracted data from 22 systematic review databases published in the Psychological Bulletin. Eight LLMs extracted 312,329 data points from 2,179 studies on 186 variables. LLM extractions achieved unacceptable accuracy on all metrics for 20% of variables. For 46% of variables, accuracy was acceptable for some metrics and unacceptable for others. LLMs reached acceptable but not high accuracy on all metrics in 15%, high but not excellent in 8%, and excellent accuracy in 12% of variables. Accuracy varied most between variables, less between systematic reviews, and least between LLMs. Moderator analyses using a hierarchical logistic regression, hierarchical linear model, and meta-analysis revealed that accuracy was higher for variables describing studies’ context and moderator variables compared to variables for effect size calculation. Also, accuracy was higher in systematic reviews with more detailed variable descriptions and positively correlated with model sizes. We discuss directions for investigating ways to use genAI to accelerate data extractions while ensuring meaningful human control. (PsycInfo Database Record (c) 2025 APA, all rights reserved)

Read the full article ›

Posted in: Journal Article Abstracts on 12/16/2025 | Link to this post on IFP |
Share

Primary Sidebar

Categories

Category RSS Feeds

  • Calls & Consultations
  • Clinical Trials
  • Funding
  • Grey Literature
  • Guidelines Plus
  • History
  • Infographics
  • Journal Article Abstracts
  • Meta-analyses - Systematic Reviews
  • Monographs & Edited Collections
  • News
  • Open Access Journal Articles
  • Podcasts
  • Video

© 1993-2026 Dr. Gary Holden. All rights reserved.

gary.holden@nyu.edu
@Info4Practice