Project

Automated Clinical Coding Using AI Techniques

Clinical coding is essential for utilizing hospital data for funding allocation, benchmarking performance, monitoring outcomes, and research. However, it is a manual, labour-intensive process facing challenges such as a shortage of trained coders, increasing EMR volume and complexity, and coding quality issues. Many important patient outcome factors are not rountinely coded, such as patients' smoking status, functional and cognitive status, living arrangements, or social support.

This project aims to address these challenges using AI, combining deep learning-based large language models with rule-based symbolic AI.

The goals include automating diagnosis and health intervention codes and extracting uncoded clinical and lifestyle factors from EMRs. This innovative approach will enhance the efficiency and accuracy of clinical coding in Australia, using local patient data and coding standards to avoid biases and inaccuracies of foreign commercial solutions.

Aims

This project aims to develops novel approach for the automated clinical coding problem that uses the latest advances in large language models along with Symbolic AI to optimise clinical coding following the Australian coding guidelines and standards and adds new value by extending the content of coded data beyond diagnoses and procedures.

This project aims to develop:

  1. AI algorithms to automate coding of diagnosis and health intervention codes from diverse hosptial EMRs;
  2. AI algorithms to extract detailed clinical, health and lifestyle factors that currently are not rountinely coded - such as patients' smoking status, BMI, functional and cognitive status, living arrangements, social support and languages spoken - from EMRs.

Design

We will utilize open-source large language models (LLMs) in a secure environment, employing a combination of the following methodologies:

  1. Fine-Tuning of LLMs: We will training LLMs with EMR data using parameter-efficient methods such as LoRA, enabling them to perform clinical coding accurately.
  2. Prompt Engineering: We will develop a methodology using prompt engineering to enhance the performance of accuracy of the LLMs in generating clinical codes as well as clinical, health, and lifestyle factors.
  3. Symbolic AI - Post-Processing:  We will apply symbolic AI for post-processing to:

a. Detect rare or less frequent clinical codes that the deep learning model might miss.
b. Ensure consistency with Australian Coding Stanards.
c. Assist LLMs in the extraction of health and lifestyles factors.

Data will be collected from various medical records, including publicly available sources such as MIMIC, and Australian datasets like the Cardiac Analytics and Innovation (CardiacAI) Data Repository. The performance of the summarization tool will be assessed through rigorous testing and validation processes.

Centre

Centre for Big Data Research in Health

Primary supervisor

Dr Oscar Perez Concha

Joint supervisor

Dr Sanja Lujic

PhD Top-Up Scholarships

The Centre for Big Data Research in Health (CBDRH) is excited to launch Top-Up Scholarships for high-achieving domestic and international candidates seeking to start a PhD in 2025.

Our research home

The Centre for Big Data Research in Health (CBDRH) actively fosters a broad community of researchers who are adept in advanced analytic methods, agile in adopting new techniques and who embody best practices in data security and privacy protection.