Projects

Sponsored Projects

Mining minority enriched AllofUs data for innovative ethnic specific risk prediction modeling

Funding Agency: NIH/NIMHD 5R21MD019134-02
Project Dates: 09/25/2023-05/31/2025
This project will develop innovative methods for risk modeling in AllofUs data tailored for minority populations and its validation on external healthcare data. We will showcase the proposed methods in two use cases: 1) rheumatoid arthritis (RA) and 2) cancer cardiotoxicity prediction. 

University of Minnesota Clinical and Translational Science Institute (CTSI)

Funding Agency: NIH 1UM1TR004405-01A1
Project Dates: 09/18/2023-07/31/3030
The major goals of this infrastructure award are to support clinical and translational research at the University of Minnesota to transform research processes within the institution and community. The NLP-IE group is developing an NLP platform for use by clinical researchers as a resource.

SCH: A New Computational Framework for Learning from Imbalanced Biomedical Data

Funding Agency: NIH/NCI R01CA287413-01
Project Dates: 08/01/2023-07/31/2027
The primary goal of this project is to develop an innovative AI-driven framework that addresses the persistent issue of data imbalance (including multimodal data types) in biomedical data science, with a specific focus on cardiotoxicity prediction for cancer survivors.  

Improving Completion, Accuracy, and Dissemination of Surgical Advanced Care Planning (I CAN DO ACP) Trial

Funding Agency: NIH/NIA UG3 AG081663
Project Dates: 08/01/2023-07/31/2024
The major goals of this project are to apply and rigorously evaluate a patient-centered, expanded ACP paradigm for older adults undergoing elective surgery. By leveraging a range of digital approaches and using our expanded ACP paradigm we will enable broader adoption of ACP in surgical patients. The goal of this strategy is to enable older adults who are planning to undergo major surgery to prepare for and engage in conversations with their surgical care teams, to re-assess their overarching treatment goals and promote goal-concordant surgical care, as appropriate. 

Detecting synergistic effects of pharmacological and non-pharmacological interventions for AD/ADRD

Funding Agency: NIH/NIA 5R01AG078156-03
Project Dates: 09/01/2022-05/31/2027
This grant is to develop translational informatics approaches to aggregate, standardize and discover the synergistic effects of pharmacological and non-pharmacological intervention candidates on AD/ADRD using multi-modal data resources (i.e., literature, EHR, clinical trials) followed by animal model validation.

Open Health Natural Language Processing Collaboratory

Funding Agency: NIH/NCATS U01 TR002062-05
Project Dates: 09/01/2017-08/31/2023
We aim to address the challenges of the restrictions on the NLP workforce to expand the usability, portability, and generalizability of NLP systems by extending our existing collaboration among multiple CTSA hubs (Mayo Clinic, UTHealth, and University of Minnesota) on open health natural language processing (OHNLP) to share distributional information of NLP artifacts acquired from real EHRs across multiple institutions.

A Translational Informatics Framework to Mine Efficacy and Safety of Dietary Supplements

Funding Agency: NIH/NCCIH 5R01AT009457-06
Project Dates: 04/01/2017-11/30/2027
This grant will expand prior work to create an enriched dietary supplement knowledge base (eDISK) and to a develop translational informatics framework (iDISK-Mine) with innovative informatics approaches to facilitate dietary supplement research using real-world, multi-site EHR data.

An Informatics Framework for Discovery and Ascertainment of Drug-Supplement Interactions

Funding Agency: NIH/NCCIH 5R01 AT009457-04
Project Dates: 04/01/2017-03/31/2022
The primary overarching goal is to use informatics approaches to enhance clinical research on DSIs and translation of research findings to clinical practice via clinical decision support systems.

Innovative Methods for Real-time Risk Modeling of Postoperative Complications

Funding Agency: NIH/NIGMS 5R01 GM120079-04
Project Dates: 04/01/2017-03/31/2022
This research will address the current critical challenge of post-operative complication prediction using real time intraoperative data and novel modeling methods. At a broader level, this work will improve the greater clinical research infrastructure at University of Minnesota for other use cases of real time clinical data, as well as provide infrastructure for contemporaneous feedback needed to realize a learning healthcare system and real time patient care improvement.

Leveraging Advanced Informatics To Automate Data Collection Of Healthcare Associated Infections (HAI) And Other Surgical Performance Measures

Funding Agency: AHRQ 5R01 HS024532-05 
Project Dates: 09/30/2016-09/29/2022
This proposal focuses on building the foundation required for broad dissemination of electronic performance measures ("emeasures") to US hospitals. 

Leveraging the EHR to Collect and Analyze Social, Behavioral & Familial Factors

Funding Agency: NIH/NLM 07R01LM011364-04 
Project Dates: 09/01/2012-08/31/2017
This project seeked to use advanced computational methods to transform social, behavioral, and familial factors from the EHR into a rich longitudinal resource for generating knowledge regarding various determinants of health including their temporal progression, severity, and relationship to health conditions.

Research Areas of Interest

Automated sentiment and topic analysis of medical training evaluation text

Medical post-graduate residency training and other aspects of medical training increasingly utilize electronic systems to evaluate trainee performance based on defined training competencies with quantitative and qualitative data, the later of which typically consist of text comments. This work utilizes text-mining techniques to assist medical educators in the analysis of residency evaluations to identify statement topics and perform sentiment analysis on statements. In addition to validation of these techniques, this work aims to correlate automated findings with objective trainee outcomes.

Discovery of drug-drug interactions from biomedical literature

DDI is a serious concern in clinical practice as physicians strive to provide the highest quality patient care. While DDI lists are commonly used in clinical practice to alert clinicians during prescribing, many DDIs resulting from various pathways are not widely known. Such interactions may be indirectly derived from the scientific literature through informatics methods. The objective of this study is to use semantic MEDLINE to uncover potential DDIs in clinical data.

Identification of new versus redundant information from clinical notes

In EHR systems, a clinician can create new notes by “copy and pasting” text from previous notes. Additionally, some EHR systems “pull” known information such as the medication list, past medical history, and other parts of the record directly into clinical notes. This results in significant amounts of redundant information in clinical texts, which make the readability and mental sifting of information in these notes difficult for practicing clinicians who must use these notes. Redundant information also increases the length of clinical notes and de-emphasizes important new information, thus placing an additional cognative load on clinicians who must read and synthesize these notes, who must often function in a time-constrained clinical environment with frequent interruptions. The goal of this research is to develop computational methods customized for clinical texts to identify new (non-redundant) information from the clinical notes in the EHR.

Information extraction from operative reports

As an important branch of medicine, surgery is concerned with treatment of injuries or disorders of the body through operative procedure interventions. Various factors such as technique used, incision length, or supplies used (e.g., mesh type, prosthetic) can affect surgical patient outcomes. Surgeons, who perform surgeries with specialized training in operative procedures, need to determine the best way to perform procedures based on accessible sources of the best evidence available. The goal of this research is to extract information on the techniques, instruments, materials, and other factors surrounding operative procedures from operative reports to build methods to efficiently extract the necessary information in a succinct and easily comprehensible fashion for secondary uses like summarization or use of this information for high-throughput clinical research.

Semantic similarity and relatedness

Identifying semantically similar and related terms in the biomedical and clinical domains have proven useful in a various Natural Language Processing (NLP) tasks such as Question-Answering and Information Extraction. The goal of this research on semantic similarity and relatedness seeks to develop methods that leverage domain knowledge contained within biomedical thesauri such as the Unified Medical Language System (UMLS) and clinical corpora to develop new methods, specific to clinical text, for computing semantic similarity and relatedness and incorporate semantic similarity and relatedness into NLP tasks.

We have developed the semantic similarity and relatedness package:
http://search.cpan.org/dist/UMLS-Similarity

Word sense disambiguation of acronyms, abbreviations, and symbols

The goal of this research is to develop effective techniques for word sense disambiguation (WSD) of acronyms, abbreviations, and symbols in clinical documents, an essential and unsolved issue for effective medical NLP systems. A key step towards this is work to build a comprehensive clinical sense inventory based upon the integration of available biomedical resources and upon senses from a large corpus of clinical notes. Our research explores issues related to optimization of automated machine-learning techniques including minimization of sample sizes, the contributions and values of different feature types, use of semi- and unsupervised techniques, techniques to deal with rare sense detection, and variation in window size and orientation used for extraction of features with machine-learning algorithms.

Our work includes de-identified resources for symbols and acronyms/abbreviations available for research purposes: