SeDIE: A Semantic-Driven Engine for Integration of Healthcare Data

A wider adoption of Electronic Health Records (EHR) has produced massive scale of clinical data that are stored in distributed and heterogeneous repositories using different formats including structured and unstructured formats. These data are critically important for an analysis of patient health. However, for an effective analysis, integration of data is \textit{sine qua non} because it provides a unified view which enables healthcare professionals to extract meaningful information from data collected from various sources. Several standards have been developed to improve data integration and system interoperability. RDF as a Semantic Web standard and flexible schema-less data model which could facilitate linking data from heterogeneous sources. The HL7 messaging is the most successful standard for medical data exchange that facilitates interoperability between health systems. Therefore, converting HL7 messages to RDF paves an efficient means to perform highly efficient graph analysis with medical data. It essentially enables the data scientists to develop graph-based analytics model. However, unstructured data such as physician’s notes, pathology report, etc., very often hinder the process of integrating data. These data fosters a huge challenge in using these data in analysis alongside the structured data. Unfortunately, the state of the art solutions are not able to perform such integration efficiently.

In this work, we propose a semantic-driven engine called SeDIE for integrating healthcare data. It is built on a novel approach using a statistical method and Multiple-Criteria Decision-Making (MCDM) model to overcome the barriers of integrating unstructured and structured data.

In our approach, HL7 message segments are annotated with predefined medical semantic type; and the medical entities extracted from the free text are mapped to the corresponding semantic types of those segments. Moreover, we used the Unified Medical Language System (UMLS) as Metathesaurus to provide a link between different terminologies. Our experiment results show that the use of semantic techniques and healthcare interoperability tools does enhance data integration, enabling researchers to discover data regardless of provenance and format by using a Scored-SPARQL query. Thereby promoting improved outcomes in the healthcare industry

Looking for Exploring the Power of Data?