• Skip to primary navigation
  • Skip to main content
  • Skip to primary sidebar

information for practice

news, new scholarship & more from around the world


advanced search
  • gary.holden@nyu.edu
  • @ Info4Practice
  • Archive
  • About
  • Help
  • Browse Key Journals
  • RSS Feeds

Comparing the accuracy of artificial intelligence models to detect alcohol in video images

Abstract

Background and aims

Thanks to smart devices, social media and streaming platforms, watching videos, like movies or short social media clips, has become extremely popular. Alcohol portrayals are frequent in videos, yet their prevalence is difficult to quantify using traditional methods such as manual coding. Artificial intelligence (AI) offers a scalable solution to analyse large volumes of video images. This study aimed to compare the accuracy of three AI models in detecting alcohol presence in video images.

Method

Experimental evaluation of three models: one supervised deep learning model (ABIDLA2) and two zero-shot learning models (ZSL-CLIP and ZSL-LLaVA). The models were tested on datasets of video frames that had been annotated by researchers for whether they included alcohol or not. Three datasets of increasing complexity were used: (1) a Google/Bing image set of clearly visible alcohol and non-alcohol images; (2) a set of movie frames manually annotated as containing or not containing alcohol; and (3) a contextually challenging set of movie frames from alcohol-related settings (e.g. bars, parties) that may or may not include visible alcohol. Model performance was assessed using accuracy, unweighted average recall (UAR) and F1 score, representing the balance between precision and recall. Execution time per frame was also measured to evaluate computational efficiency.

Results

Across the three datasets, ABIDLA2, ZSL-CLIP and ZSL-LLaVA achieved percentage accuracies of 90%, 91% and 92% on the Google/Bing images; 70%, 65% and 95% on the diverse movie-scene dataset; and 67%, 63% and 94% on the most complex alcohol-related dataset, respectively. In terms of execution time, ABIDLA2 processed a single frame the fastest (0.21 seconds), followed by ZSL-LLaVA (0.45 seconds), while ZSL-CLIP was the slowest (0.58 seconds).

Conclusion

Automated artificial intelligence (AI) models appear to be able to detect alcohol imagery in videos at large scale with high accuracy and in near real time. Of the three AI models tested, ZSL-LLaVA achieved the best balance between accuracy and speed. Offering a cost- and time-efficient alternative to labour-intensive manual coding, ZSL-LLaVA could be used to monitor alcohol-related visual content in videos across diverse media platforms.

Read the full article ›

Posted in: Journal Article Abstracts on 03/14/2026 | Link to this post on IFP |
Share

Primary Sidebar

Categories

Category RSS Feeds

  • Calls & Consultations
  • Clinical Trials
  • Funding
  • Grey Literature
  • Guidelines Plus
  • History
  • Infographics
  • Journal Article Abstracts
  • Meta-analyses - Systematic Reviews
  • Monographs & Edited Collections
  • News
  • Open Access Journal Articles
  • Podcasts
  • Video

© 1993-2026 Dr. Gary Holden. All rights reserved.

gary.holden@nyu.edu
@Info4Practice