Sponsored Projects

SCH: A New Computational Framework for Learning from Imbalanced Biomedical Data

Funding Agency: NIH/NCI R01CA287413-01
Project Dates: 08/01/2023-07/31/2027
The primary goal of this project is to develop an innovative AI-driven framework that addresses the persistent issue of data imbalance (including multimodal data types) in biomedical data science, with a specific focus on cardiotoxicity prediction for cancer survivors.  

A Translational Informatics Framework to Mine Efficacy and Safety of Dietary Supplements

Funding Agency: NIH 2R01AT009457-05A1
Project Dates: 01/20/2023-11/30/2027
This grant will expand prior work to create an enriched dietary supplement knowledge base (eDISK) and to a develop translational informatics framework (iDISK-Mine) with innovative informatics approaches to facilitate dietary supplement research using real-world, multi-site EHR data.

Detecting synergistic effects of pharmacological and non-pharmacological interventions for AD/ADRD

Funding Agency: NIH/NIA 1R01AG078154-01
Project Dates: 07/2022-06/2027
This grant is to develop translational informatics approaches to aggregate, standardize and discover the synergistic effects of pharmacological and non-pharmacological intervention candidates on AD/ADRD using multi-modal data resources (i.e., literature, EHR, clinical trials) followed by animal model validation.

Open Health Natural Language Processing Collaboratory

Funding Agency: NIH/NCATS U01 TR002062
Project Dates: 09/01/2017-08/31/2023
We aim to address the challenges of the restrictions on the NLP workforce to expand the usability, portability, and generalizability of NLP systems by extending our existing collaboration among multiple CTSA hubs (Mayo Clinic, UTHealth, and University of Minnesota) on open health natural language processing (OHNLP) to share distributional information of NLP artifacts acquired from real EHRs across multiple institutions.

An Informatics Framework for Discovery and Ascertainment of Drug-Supplement Interactions

Funding Agency: NIH/NCCIH R01 AT009457
Project Dates: 04/01/2017-03/31/2021
The primary overarching goal is to use informatics approaches to enhance clinical research on DSIs and translation of research findings to clinical practice via clinical decision support systems.

Innovative Methods for Real-time Risk Modeling of Postoperative Complications

Funding Agency: NIH/NIGMS R01 GM120079
Project Dates: 04/01/2017-03/31/2021
This research will address the current critical challenge of post-operative complication prediction using real time intraoperative data and novel modeling methods. At a broader level, this work will improve the greater clinical research infrastructure at University of Minnesota for other use cases of real time clinical data, as well as provide infrastructure for contemporaneous feedback needed to realize a learning healthcare system and real time patient care improvement.

Leveraging Advanced Informatics To Automate Data Collection Of Healthcare Associated Infections (HAI) And Other Surgical Performance Measures

Funding Agency: AHRQ R01 HS024532 
Project Dates: 12/01/2016-11/30/2020
This proposal focuses on building the foundation required for broad dissemination of electronic performance measures ("emeasures") to US hospitals. 

University of Minnesota Clinical and Translational Science Institute (CTSI)

The major goals of this infrastructure award are to support clinical and translational research at the University of Minnesota to transform research processes within the institution and community. The NLP-IE group is developing an NLP platform for use by clinical researchers as a resource.

Research Areas of Interest

Automated sentiment and topic analysis of medical training evaluation text

Medical post-graduate residency training and other aspects of medical training increasingly utilize electronic systems to evaluate trainee performance based on defined training competencies with quantitative and qualitative data, the later of which typically consist of text comments. This work utilizes text-mining techniques to assist medical educators in the analysis of residency evaluations to identify statement topics and perform sentiment analysis on statements. In addition to validation of these techniques, this work aims to correlate automated findings with objective trainee outcomes.

Discovery of drug-drug interactions from biomedical literature

DDI is a serious concern in clinical practice as physicians strive to provide the highest quality patient care. While DDI lists are commonly used in clinical practice to alert clinicians during prescribing, many DDIs resulting from various pathways are not widely known. Such interactions may be indirectly derived from the scientific literature through informatics methods. The objective of this study is to use semantic MEDLINE to uncover potential DDIs in clinical data.

Identification of new versus redundant information from clinical notes

In EHR systems, a clinician can create new notes by “copy and pasting” text from previous notes. Additionally, some EHR systems “pull” known information such as the medication list, past medical history, and other parts of the record directly into clinical notes. This results in significant amounts of redundant information in clinical texts, which make the readability and mental sifting of information in these notes difficult for practicing clinicians who must use these notes. Redundant information also increases the length of clinical notes and de-emphasizes important new information, thus placing an additional cognative load on clinicians who must read and synthesize these notes, who must often function in a time-constrained clinical environment with frequent interruptions. The goal of this research is to develop computational methods customized for clinical texts to identify new (non-redundant) information from the clinical notes in the EHR.

Information extraction from operative reports

As an important branch of medicine, surgery is concerned with treatment of injuries or disorders of the body through operative procedure interventions. Various factors such as technique used, incision length, or supplies used (e.g., mesh type, prosthetic) can affect surgical patient outcomes. Surgeons, who perform surgeries with specialized training in operative procedures, need to determine the best way to perform procedures based on accessible sources of the best evidence available. The goal of this research is to extract information on the techniques, instruments, materials, and other factors surrounding operative procedures from operative reports to build methods to efficiently extract the necessary information in a succinct and easily comprehensible fashion for secondary uses like summarization or use of this information for high-throughput clinical research.

Semantic similarity and relatedness

Identifying semantically similar and related terms in the biomedical and clinical domains have proven useful in a various Natural Language Processing (NLP) tasks such as Question-Answering and Information Extraction. The goal of this research on semantic similarity and relatedness seeks to develop methods that leverage domain knowledge contained within biomedical thesauri such as the Unified Medical Language System (UMLS) and clinical corpora to develop new methods, specific to clinical text, for computing semantic similarity and relatedness and incorporate semantic similarity and relatedness into NLP tasks.

We have developed the semantic similarity and relatedness package:

Word sense disambiguation of acronyms, abbreviations, and symbols

The goal of this research is to develop effective techniques for word sense disambiguation (WSD) of acronyms, abbreviations, and symbols in clinical documents, an essential and unsolved issue for effective medical NLP systems. A key step towards this is work to build a comprehensive clinical sense inventory based upon the integration of available biomedical resources and upon senses from a large corpus of clinical notes. Our research explores issues related to optimization of automated machine-learning techniques including minimization of sample sizes, the contributions and values of different feature types, use of semi- and unsupervised techniques, techniques to deal with rare sense detection, and variation in window size and orientation used for extraction of features with machine-learning algorithms.

Our work includes de-identified resources for symbols and acronyms/abbreviations available for research purposes: