Close Menu
Healthtost
  • News
  • Mental Health
  • Men’s Health
  • Women’s Health
  • Skin Care
  • Sexual Health
  • Pregnancy
  • Nutrition
  • Fitness
  • Recommended Essentials
What's Hot

Pediatric neurology and therapeutic carbohydrate restriction

April 9, 2026

5 pull-up alternatives to build upper body strength and correct weaknesses

April 9, 2026

Tulane Study Shows Team Approach Improves Hypertension Treatment Success

April 9, 2026
Facebook X (Twitter) Instagram
  • About Us
  • Contact Us
  • Privacy Policy
  • Terms and Conditions
  • Disclaimer
Facebook X (Twitter) Instagram
Healthtost
SUBSCRIBE
  • News

    Tulane Study Shows Team Approach Improves Hypertension Treatment Success

    April 9, 2026

    Virica Biotech and FUJIFILM Biosciences Collaborate on Canada-Japan Co-Innovation Program to Advance AAV Production Enhancers

    April 9, 2026

    Long-term overweight is a stronger predictor of cardiovascular risk

    April 8, 2026

    Sugar intake can reduce the effectiveness of relaxation exercises

    April 8, 2026

    AI tool predicts Barrett’s esophagus recurrence with high accuracy

    April 7, 2026
  • Mental Health

    the surprisingly common condition with a scary name

    April 6, 2026

    How yoga helps heal emotional wounds

    April 4, 2026

    Will medicinal cannabis help my mental health? Here are the facts and the risks

    April 1, 2026

    Does World Bipolar Day have an impact?

    March 29, 2026

    Worried about your preschooler’s anxiety? See how you can help

    March 28, 2026
  • Men’s Health

    Traveling by plane with BPH

    April 9, 2026

    30 Minute Kettlebell Full Body Workout for Over 50

    April 9, 2026

    The study shows that male depression is not just a pattern of men’s mental health

    April 7, 2026

    Dr. Jason Snibbe: Men’s health from a doctor who does it the right way

    April 6, 2026

    Coping with sexual health and erectile dysfunction as a couple

    April 3, 2026
  • Women’s Health

    Midlife Weight Gain Isn’t Just Willpower: Understanding Your Second Adolescence With WONDERBIOTICS

    April 8, 2026

    8 Things to Do When Attraction Dies in Your Marriage

    April 8, 2026

    I was finally diagnosed with Addison’s disease

    April 7, 2026

    I lost 60 pounds and got my life back

    April 7, 2026

    4.3 Friday Faves – The Fitnessista

    April 6, 2026
  • Skin Care

    What happens when you stop using hyaluronic acid – UMERE

    April 7, 2026

    The truth about "Pure Beauty" — What it means, what it doesn’t and what sensitive skin really needs

    April 6, 2026

    Backed by Science. Built for results. – Lifeline Skin Care

    April 4, 2026

    Best Facials | What to book for real results

    April 4, 2026

    Don’t Sabotage Your Laser Treatment Aftercare: 7 Mistakes

    April 3, 2026
  • Sexual Health

    Endometriosis procedures are reimbursed at lower rates, doctors say

    April 8, 2026

    Reflections two years later in a global context < SRHM

    April 8, 2026

    Can exercise improve HIV symptoms?

    April 7, 2026

    An Introduction to the Kink Literature Database — Sexual Health Alliance

    April 6, 2026

    No, abortion pills do not poison your drinking water

    April 1, 2026
  • Pregnancy

    How your partner can support a happier pregnancy

    April 9, 2026

    Exposure to plastic during pregnancy may be linked to more premature births than expected

    April 4, 2026

    How to relieve numbness and tingling in the legs in the third trimester?

    April 3, 2026

    The best stroller accessories for every type of stroller

    March 29, 2026

    A new study says pre-pregnancy health is a conversation between two parents

    March 29, 2026
  • Nutrition

    Pediatric neurology and therapeutic carbohydrate restriction

    April 9, 2026

    The Weekly Reset That Saves My Sanity (Lily’s Guacamole Recipe)

    April 7, 2026

    Double Chocolate Veggie Muffins (Kids and Lunchtime)

    April 7, 2026

    Nut Nutrition Comparison: Understanding Nutrient Content

    April 4, 2026

    Is Berberine ‘Nature’s Metformin’? | HUM Nutrition Blog

    April 3, 2026
  • Fitness

    5 pull-up alternatives to build upper body strength and correct weaknesses

    April 9, 2026

    Best Health & Fitness Certifications (My Favorites After 17+ Years in the Industry)

    April 6, 2026

    Dose 1 – Tony Gentilcore

    April 6, 2026

    How to take care of your internal organs

    April 5, 2026

    Doctors say these 5 daily habits can improve heart health naturally

    April 5, 2026
  • Recommended Essentials
Healthtost
Home»News»GPT-4 demonstrates high accuracy in parsing multilingual medical notes
News

GPT-4 demonstrates high accuracy in parsing multilingual medical notes

healthtostBy healthtostJanuary 6, 2025No Comments6 Mins Read
Facebook Twitter Pinterest LinkedIn Tumblr Reddit WhatsApp Email
Gpt 4 Demonstrates High Accuracy In Parsing Multilingual Medical Notes
Share
Facebook Twitter LinkedIn Pinterest WhatsApp Email

The study evaluates the GPT-4’s ability to process medical notes in English, Spanish and Italian, achieving physician agreement 79% of the time.

Study: The ability of Generative Pre-trained Transformer 4 (GPT-4) to analyze medical notes in three different languages: a retrospective model evaluation study. Image credit: SuPatMaN/Shutterstock.com

In a recent study published in Lancet Digital Healtha group of researchers evaluated the ability of Generative Pre-trained Transformer 4 (GPT-4) to answer predefined questions based on medical notes written in three languages ​​(English, Spanish and Italian).

Background

Medical notes contain valuable clinical knowledge, yet their unstructured narrative form poses challenges for automated analysis.

Large language models (LLMs) such as GPT-4 show promise in extracting explicit details such as medications, but often struggle with implicit understanding of contexts, crucial for nuanced medical decision making. Variability in documentation styles between providers adds to the complexity.

Existing research demonstrates the potential of LLMs for free-text medical data processing, including decoding abbreviations and extracting social determinants of health, however these studies mainly focus on English language notes.

Further research is vital to enhance the ability of LLMs to handle complex tasks, improve contextual reasoning, and assess performance in multiple languages ​​and settings.

About the study

The present retrospective model evaluation study involved eight university hospitals from four countries: the United States of America (USA), Colombia, Singapore, and Italy.

The participating institutions were part of the 4CE Consortium. They included Boston Children’s Hospital, the University of Michigan, the University of Wisconsin, the National University of Singapore, the University of Kansas Medical Center, the University of Pittsburgh Medical Center, the Universidad de Antioquia, and the Istituti Clinici Scientifici Maugeri.

The Department of Biomedical Informatics at Harvard University served as the coordinating center. Each site contributed seven de-identified medical notes written between February 1, 2020 and June 1, 2023, resulting in a total of 56 medical notes, with six sites submitting notes in English, one in Spanish, and one in Italian.

Participating sites selected notes based on proposed criteria, including patients aged 18–65 years with a diagnosis of obesity and coronavirus disease 2019 (COVID-19) at admission. Compliance with these criteria was optional.

Notes submitted included admission, progress and consultation notes, but not discharge summaries. The notes were removed in accordance with the guidelines of the US Health Insurance Portability and Accountability Act, regardless of country of origin.

The study used the GPT-4 API in Python to analyze medical notes through a predefined question-answer framework. Parameters such as temperature, top-p and frequency penalty were adjusted to optimize performance.

Physicians rated the free-text responses and indicated whether they agreed with the GPT-4 responses. They were masked in each other’s ratings but not in the GPT-4 responses.

Statistical analyzes were performed to assess agreement between the GPT-4 and physicians, exploring instances of disagreement and categorizing errors as issues of derivation, inference, or hallucinations.

Subgroup analyzes and sensitivity analyzes addressed variations in accuracy, such as differences in language and specific inclusion criteria.

The study highlighted the ability of GPT-4 to process medical notes in multiple languages, but noted challenges in inference based on context and variability in documentation styles. Data analyzes were performed in RStudio and no external funding supported the study.

Study results

A total of 56 medical records were collected from eight sites in four countries: USA, Colombia, Singapore and Italy. Of these, 42 (75%) notes were in English, seven (13%) in Italian and seven (13%) in Spanish. For each note, the GPT-4 generated responses to 14 predefined questions, resulting in 784 responses.

Among them, both physicians agreed with the GPT-4 in 622 (79%) responses, one physician agreed in 82 (11%) responses, and neither physician agreed in 80 (10%) responses. When the National University of Singapore data were excluded, agreement rates remained similar: 534 (78%) responses had double agreement, 82 (12%) had partial agreement, and 70 (10%) had no agreement.

Physicians were more likely to agree with the GPT-4 for Spanish (86/98, 88%) and Italian (82/98, 84%) notes than for English notes (454/588, 77%).

Note type or length did not affect agreement rates. In cases where only one physician agreed with the GPT-4 (82 responses), 59 (72%) disagreements arose from issues of inference, such as different interpretations of implicit information.

In one case, a physician concluded that a patient did not have COVID-19 based on a “recent infection with COVID-19” note, while the GPT-4 left the condition as undetermined. Extraction problems accounted for 8 (10%) of these disagreements, such as a physician overlooking a documented medical history that identified GPT-4. Differences in level of agreement accounted for the remaining 15 (18%) cases.

In responses where both clinicians disagreed with the GPT-4 (80 responses), inference issues were most frequent (47/80, 59%), followed by inference errors (23/80, 29%) and hallucinations (10/ 80, 13% ).

For example, GPT-4 sometimes failed to link complications such as multisystem inflammatory syndrome as related to COVID-19, a connection made by both doctors. Delusion issues included information about the construction of the GPT-4 that was not in the notes, such as falsely claiming that a patient had COVID-19 when it was not reported.

When evaluating the ability of the GPT-4 to select patients for hypothetical study enrollment based on four inclusion criteria (age, obesity, COVID-19 status, and admission note type), its sensitivity varied. GPT-4 showed high sensitivity for obesity (97%), COVID-19 (96%) and age (94%), but lower specificity for admission notes (22%).

When the acceptance note criterion was excluded, the GPT-4 accurately identified all three remaining criteria 90% of the time.

conclusions

In summary, the study showed that the GPT-4 accurately analyzed medical notes in English, Italian, and Spanish, even without a direct technique.

Surprisingly, it performed better on Italian and Spanish notes than on English, possibly due to the greater complexity of US medical notes, although note length did not affect performance. GPT-4 efficiently extracted explicit information, but its main limitation was extracting implicit details.

This aligns with previous findings that models optimized for medical tasks can overcome such challenges. While the GPT-4 excelled at identifying explicit study inclusion criteria such as age and obesity, it struggled to classify admission notes, likely due to reliance on indirect constructs.

accuracy demonstrates GPT4 high medical multilingual Notes parsing
bhanuprakash.cg
healthtost
  • Website

Related Posts

Tulane Study Shows Team Approach Improves Hypertension Treatment Success

April 9, 2026

Virica Biotech and FUJIFILM Biosciences Collaborate on Canada-Japan Co-Innovation Program to Advance AAV Production Enhancers

April 9, 2026

Long-term overweight is a stronger predictor of cardiovascular risk

April 8, 2026

Leave A Reply Cancel Reply

Don't Miss
Nutrition

Pediatric neurology and therapeutic carbohydrate restriction

By healthtostApril 9, 20260

Sarah Rice BSc. (Hons), MCOptom (UK), MHP, NNP Ketogenic diets for neurological conditions have been…

5 pull-up alternatives to build upper body strength and correct weaknesses

April 9, 2026

Tulane Study Shows Team Approach Improves Hypertension Treatment Success

April 9, 2026

Traveling by plane with BPH

April 9, 2026
Stay In Touch
  • Facebook
  • Twitter
  • Pinterest
  • Instagram
  • YouTube
  • Vimeo
TAGS
Baby benefits body brain cancer care Day Diet disease exercise finds Fitness food Guide health healthy heart Improve Life Loss Men mental Natural Nutrition Patients People Pregnancy research reveals risk routine sex sexual Skin Skincare study Therapy Tips Top Training Treatment ways weight women Workout
About Us
About Us

Welcome to HealthTost, your trusted source for breaking health news, expert insights, and wellness inspiration. At HealthTost, we are committed to delivering accurate, timely, and empowering information to help you make informed decisions about your health and well-being.

Latest Articles

Pediatric neurology and therapeutic carbohydrate restriction

April 9, 2026

5 pull-up alternatives to build upper body strength and correct weaknesses

April 9, 2026

Tulane Study Shows Team Approach Improves Hypertension Treatment Success

April 9, 2026
New Comments
    Facebook X (Twitter) Instagram Pinterest
    • About Us
    • Contact Us
    • Privacy Policy
    • Terms and Conditions
    • Disclaimer
    © 2026 HealthTost. All Rights Reserved.

    Type above and press Enter to search. Press Esc to cancel.