Health care can seem like a sea of specialized terms. In hospitals, medical, scientific and regulatory terms are brought together and often shortened in clinical notes, which are used by the care team to monitor and treat patients.
Hard to interpret acronyms can make it more challenging for researchers to extract usable medical data, hampering the progress research that aims to improve care.
“Expanding the abbreviations in clinical notes can be a difficult problem without expert knowledge. For example, ‘RA’ could mean right atrium, rheumatoid arthritis or room air, depending on the context,” Dr. Michael Brudno explains. Determining what an abbreviation means is usually simple for a human expert, but is a challenging task for automated systems.
To address this issue, Dr. Brudno led a team of researchers to build a machine learning approach to automatically identify the proper meaning of abbreviations in medical notes. Machine learning is an approach through which a computer algorithm can be ‘taught’ to solve complex problems, such as spotting patterns in large sets of data. However, in order to ‘teach’ the algorithm, large amounts of high-quality data are needed.
To overcome this issue, and the potential costs of creating this dataset (e.g., paying experts go back over clinical notes to expand any abbreviations), the research team customized their machine learning system so that it could overcome ambiguity in the clinical notes.
“One of the keys to interpreting shortened medical terms is context. Context is everything,” says Marta Skreta, the first author of the study. “For this reason, we taught our system to scan the entire clinical note to establish a global context. For example, if the clinical note was about a heart condition, the system would be able to correctly identify ‘RA’ as ‘right atrium’. Our system also uses related concepts from sentences close to the unknown abbreviation to further help build the context.”
The team also incorporated “ontologies”—structured sets of medical language terms—to help identify related terms. Specifically, the system can pull information from NIH’s Unified Medical Language System to identify related terms, their synonyms and to identify common abbreviations of these terms.
Once completed, the machine learning system was able to automatically scan medical notes and identify the terms that abbreviations referred with high accuracy.
The main application of the system will be to identify any abbreviations and create unambiguous data sets that can be better used by researchers and are more suitable for training other machine learning systems.
This work was supported by The Princess Margaret Cancer Foundation.
Skreta M, Arbabi A, Wang J, Drysdale E, Kelly J, Singh D, Brudno M. Automatically disambiguating medical acronyms with ontology-aware deep learning. Nat Commun. 2021 Sep 7. doi: 10.1038/s41467-021-25578-4
Senior author Dr. Michael Brudno (L) and lead author Marta Skreta (R).