Discoveries ' UHN News

From Cryptic to Clear

Machine-learning tool developed that can identify ambiguous terms in clinical notes.

Artificial intelligence systems are built on a foundation of clear, unambiguous data. A team from UHN has built a machine-learning tool to help automatically lay those foundations.

November 22, 2021 4 MIN READ

Health care can seem like a sea of specialized terms. In hospitals, medical, scientific and regulatory terms are brought together and often shortened in clinical notes, which are used by the care team to monitor and treat patients.

Hard to interpret acronyms can make it more challenging for researchers to extract usable medical data, hampering the progress research that aims to improve care.

Expanding the abbreviations in clinical notes can be a difficult problem without expert knowledge. For example, RA' could mean right atrium, rheumatoid arthritis or room air, depending on the context, Dr. Michael Brudno explains. Determining what an abbreviation means is usually simple for a human expert, but is a challenging task for automated systems.

To address this issue, Dr. Brudno led a team of researchers to build a machine learning approach to automatically identify the proper meaning of abbreviations in medical notes. Machine learning is an approach through which a computer algorithm can be taught' to solve complex problems, such as spotting patterns in large sets of data. However, in order to teach' the algorithm, large amounts of high-quality data are needed.

To overcome this issue, and the potential costs of creating this dataset (e.g., paying experts go back over clinical notes to expand any abbreviations), the research team customized their machine learning system so that it could overcome ambiguity in the clinical notes.

One of the keys to interpreting shortened medical terms is context. Context is everything, says Marta Skreta, the first author of the study. For this reason, we taught our system to scan the entire clinical note to establish a global context. For example, if the clinical note was about a heart condition, the system would be able to correctly identify RA' as right atrium'. Our system also uses related concepts from sentences close to the unknown abbreviation to further help build the context.

The team also incorporated ontologiesstructured sets of medical language termsto help identify related terms. Specifically, the system can pull information from NIH's Unified Medical Language System to identify related terms, their synonyms and to identify common abbreviations of these terms.

Once completed, the machine learning system was able to automatically scan medical notes and identify the terms that abbreviations referred with high accuracy.

The main application of the system will be to identify any abbreviations and create unambiguous data sets that can be better used by researchers and are more suitable for training other machine learning systems.

This work was supported by The Princess Margaret Cancer Foundation.

Skreta M, Arbabi A, Wang J, Drysdale E, Kelly J, Singh D, Brudno M. Automatically disambiguating medical acronyms with ontology-aware deep learning. Nat Commun. 2021 Sep 7. doi: 10.1038/s41467-021-25578-4

SEE ALL NEWS

About UHN Research

Scientists

Resources and Services

Institutional Authorization

From Cryptic to Clear