NLP/IE Projects


Sponsored Projects

An Informatics Framework for Discovery and Ascertainment of Drug-Supplement Interactions

Funding Agency: NIH/NCCIH R01 AT009457
Project Dates: 04/01/2017-03/31/2021
The primary overarching goal is to use informatics approaches to enhance clinical research on DSIs and translation of research findings to clinical practice via clinical decision support systems.

An Informatics Framework for Discovery and Ascertainment of Drug-Supplement Interactions

Innovative Methods for Real-time Risk Modeling of Postoperative Complications

Funding Agency: NIH/NIGMS R01 GM120079
Project Dates: 04/01/2017-03/31/2021
This research will address the current critical challenge of post-operative complication prediction using real time intraoperative data and novel modeling methods. At a broader level, this work will improve the greater clinical research infrastructure at University of Minnesota for other use cases of real time clinical data, as well as provide infrastructure for contemporaneous feedback needed to realize a learning healthcare system and real time patient care improvement.

Innovative Methods for Real-time Risk Modeling of Postoperative Complications

Discovery and Visualization of New Information from Clinical Reports

Funding Agency: AHRQ R01 HS022085 (Melton-Meaux)
Project Dates: 09/30/2013–09/29/2017
This grant develops and evaluates visualization methods by “highlighting” important information from clinical texts, improving user interface design for clinical texts, and conducts a prospective clinical trial with a tool in the EHR to highlight new, non-redundant information in clinical documents.

Discovery and Visualization of New Information from Clinical Reports Website

Natural Language Processing for Clinical and Translational Research

Funding Agency: NIH/NIGMS R01 GM102282 (Liu/Pakhomov/Xu)
Project Dates: 04/01/2013–03/31/2018
The overall goal of this project is to develop a novel framework to enable the use of clinical information embedded in clinical narratives for clinical and translational research.

Leveraging the EHR to Collect and Analyze Social, Behavioral & Familial Factors

Funding Agency: NIH/NLM R01 LM011364 (Chen/Melton)
Project Dates: 09/01/2012–08/31/2017
The overall goal of this project is to develop and evaluate computational methods for generating knowledge regarding the relationships between diseases and social, behavioral, and familial factors.

Social and Family History - Extraction, Representation, and Evaluation (SFHERE) Website

University of Minnesota Clinical and Translational Science Institute (CTSI)

The major goals of this infrastructure award are to support clinical and translational research at the University of Minnesota to transform research processes within the institution and community. The NLP-IE group is developing an NLP platform for use by clinical researchers as a resource.

Research Areas of Interest

Automated sentiment and topic analysis of medical training evaluation text

Medical post-graduate residency training and other aspects of medical training increasingly utilize electronic systems to evaluate trainee performance based on defined training competencies with quantitative and qualitative data, the later of which typically consist of text comments. This work utilizes text-mining techniques to assist medical educators in the analysis of residency evaluations to identify statement topics and perform sentiment analysis on statements. In addition to validation of these techniques, this work aims to correlate automated findings with objective trainee outcomes.

Discovery of drug-drug interactions from biomedical literature

DDI is a serious concern in clinical practice as physicians strive to provide the highest quality patient care. While DDI lists are commonly used in clinical practice to alert clinicians during prescribing, many DDIs resulting from various pathways are not widely known. Such interactions may be indirectly derived from the scientific literature through informatics methods. The objective of this study is to use semantic MEDLINE to uncover potential DDIs in clinical data.

Identification of new versus redundant information from clinical notes

In EHR systems, a clinician can create new notes by “copy and pasting” text from previous notes. Additionally, some EHR systems “pull” known information such as the medication list, past medical history, and other parts of the record directly into clinical notes. This results in significant amounts of redundant information in clinical texts, which make the readability and mental sifting of information in these notes difficult for practicing clinicians who must use these notes. Redundant information also increases the length of clinical notes and de-emphasizes important new information, thus placing an additional cognative load on clinicians who must read and synthesize these notes, who must often function in a time-constrained clinical environment with frequent interruptions. The goal of this research is to develop computational methods customized for clinical texts to identify new (non-redundant) information from the clinical notes in the EHR.

Information extraction from operative reports

As an important branch of medicine, surgery is concerned with treatment of injuries or disorders of the body through operative procedure interventions. Various factors such as technique used, incision length, or supplies used (e.g., mesh type, prosthetic) can affect surgical patient outcomes. Surgeons, who perform surgeries with specialized training in operative procedures, need to determine the best way to perform procedures based on accessible sources of the best evidence available. The goal of this research is to extract information on the techniques, instruments, materials, and other factors surrounding operative procedures from operative reports to build methods to efficiently extract the necessary information in a succinct and easily comprehensible fashion for secondary uses like summarization or use of this information for high-throughput clinical research.

Semantic similarity and relatedness

Identifying semantically similar and related terms in the biomedical and clinical domains have proven useful in a various Natural Language Processing (NLP) tasks such as Question-Answering and Information Extraction. The goal of this research on semantic similarity and relatedness seeks to develop methods that leverage domain knowledge contained within biomedical thesauri such as the Unified Medical Language System (UMLS) and clinical corpora to develop new methods, specific to clinical text, for computing semantic similarity and relatedness and incorporate semantic similarity and relatedness into NLP tasks.

We have developed the semantic similarity and relatedness package:

Word sense disambiguation of acronyms, abbreviations, and symbols

The goal of this research is to develop effective techniques for word sense disambiguation (WSD) of acronyms, abbreviations, and symbols in clinical documents, an essential and unsolved issue for effective medical NLP systems. A key step towards this is work to build a comprehensive clinical sense inventory based upon the integration of available biomedical resources and upon senses from a large corpus of clinical notes. Our research explores issues related to optimization of automated machine-learning techniques including minimization of sample sizes, the contributions and values of different feature types, use of semi- and unsupervised techniques, techniques to deal with rare sense detection, and variation in window size and orientation used for extraction of features with machine-learning algorithms.

Our work includes de-identified resources for symbols and acronyms/abbreviations available for research purposes: