ASEA-UNINET – Clinical language processing using Artificial Intelligence

Project Description

Abstract:

This project explored the application of current artificial intelligence approaches, in particular generative AI for semantically analysing clinical documents from a Thai hospital. It included several aspects such as the creation of a text corpus with semantic annotations using codes from the SNOMED CT terminology, and the use of this benchmark for assessing the outcome of a prompt-based information extraction scenario using ChatGPT. During the stay, intensive interaction occurred with the annotator teams in Austria and Thailand, resulting in iterative refinements of an annotation guideline, maintained and developed by the host institution. The work was continued after the visit, and two scientific manuscripts are currently under review.

Implementation Period:

01/2024 – 03/2024

Project:

Project Objectives and Impact

Across the globe, most of the information in electronic health records (EHRs) exists as free text, capturing patients’ medical histories, diagnoses, treatments, and prescriptions. Extracting structured data from this narrative content is crucial for enhancing healthcare data accessibility, interoperability, and reusability. Providing clinicians with well-tailored, structured information can significantly improve decision support and, ultimately, patient care.

Semantic standards such as SNOMED CT and FHIR offer detailed descriptors for all aspects of clinical documentation. However, their effective and widespread adoption requires bridging the gap between unstructured text and structured data—without disrupting clinical workflows. Natural Language Processing (NLP) plays a pivotal role in this effort and is increasingly empowered by the rapid advancements in artificial intelligence. NLP applied to clinical narratives is complex due to the idiosyncratic nature of clinical language with its jargon, abbreviations, and highly variable phrasing. In Thai hospitals, clinical text typically mixes Thai and English terminology, further complicating automatic extraction. In particular, medication prescriptions present difficulties due to inconsistent formatting, local brand names, and shorthand notation. The impact of deep learning, transformer, large-language models (LLMs) and generative AI for solving the abovementioned problems are highly promising. Accurate recognition of meaningful text spans and normalization of the very heterogeneous formats of prescriptions might contribute to patient safety by reducing medication errors and improving the efficiency of clinical workflows.

During the cooperation a focus was laid on structuring of medication-related information from Thai EHR extracts. This work leveraged NLP techniques using named entity recognition and large language models (LLMs), particularly ChatGPT. Considerable work was also done for creating a ground truth for training and benchmarking. During the stay, intensive interaction occurred with the annotator teams in Austria and Thailand, resulting in iterative refinements of an annotation guideline, maintained and developed by the host institution.

Results and Contributions

The study developed an annotated dataset of 90 Thai discharge summaries annotated with SNOMED CT and FHIR. Several deep learning models were tested, including BioClinicalBERT, ClinicalBERT, and Microsoft BiomedNLP. Among these, ClinicalBERT achieved the highest F1-score in medication-related entity recognition, particularly for drug substances and dosage information. Additionally, the project explored the use of ChatGPT3.5 for automatic structuring and expansion of medication statements, demonstrating strong performance in NER (F1-score: 0.94) and text expansion (F1-score: 0.87) tasks. Few-shot prompting strategies proved particularly effective in minimizing hallucinations, a crucial factor for safety-relevant medication data processing. Comparative evaluations with models such as ChatGPT4o, Gemini 2.0 Flash, MedLM-1.5-Large, and DeepSeekV3 (during and after the stay) revealed that most models outperformed ChatGPT3.5 in both tasks. These results highlight the potential of transformer-based models for improving medication information extraction and enhancing the structured representation of clinical narratives. Two scientific manuscripts are currently under review.

Project Team:

Univ.-Prof. Dr.med. Stefan Schulz

Medical University of Graz, Austria
Institute for Medical Informatics, Statistics and Documentation

stefan.schulz@medunigraz.at

Full Professor of Medical Informatics at Medical University of Graz. His research focuses on biomedical informatics, applied ontology, natural language processing, and electronic health records. He also serves as Head of Medical Research Projects at Averbis GmbH in Freiburg, Germany. He has authored numerous publications and is a recognized expert in his field. https://user.medunigraz.at/stefan.schulz/

Assoc. Prof. Priv.-Doz. Dr.med. Markus Eduard Kreuzthaler

Medical University of Graz, Austria
Institute for Medical Informatics, Statistics and Documentation

markus.kreuzthaler@medunigraz.at

Markus Kreuzthaler’s research focuses on medical informatics, clinical natural language processing, and machine learning.
https://online.medunigraz.at/mug_online/visitenkarte.show_vcard?pPersonenId=FDC7BA2AB1BD0F02&pPersonenGruppe=3

Assist. Prof. Priv.-Doz. Dr.med. Sirikul Wachiranun

Chiang Mai University
Department of Community Medicine

wachiranun.sir@cmu.ac.th

Wachiranun Sirikul’s research focuses on health informatics, clinical information extraction, clinical decision support system, digital health, and preventive medicine.

Dr.med Natthanaphop Isaradech

Chiang Mai University

Natthanaphop Isaradech is a Ph.D. candidate in Digital Health at Chiang Mai University, specializing in clinical epidemiology, health informatics, and data science. Published several research articles on ICD-10 mapping, frailty screening, and patient treatment support systems.

Project Details

Date November 13, 2025
Tags Applied Research, Artificial Intelligence, Medicine