Improving a full-text search engine: the importance of negation detection and family history context to identify cases in a biomedical data warehouse
Garcelon Nicolas, Neuraz Antoine, Benoit Vincent, Salomon Rémi, Burgun Anita
Objective: The repurposing of electronic health records (EHRs) can improve clinical and genetic research for rare diseases. However, significant information in rare disease EHRs is embedded in the narrative reports, which contain many negated clinical signs and family medical history. This paper presents a method to detect family history and negation in narrative reports and evaluates its impact on selecting populations from a clinical data warehouse (CDW). Materials and Methods: We developed a pipeline to process 1.6 million reports from multiple sources. This pipeline is part of the load process of the Necker Hospital CDW. Results: We identified patients with “Lupus and diarrhea,” “Crohn’s and diabetes,” and “NPHP1” from the CDW. The overall precision, recall, specificity, and F-measure were 0.85, 0.98, 0.93, and 0.91, respectively. Conclusion: The proposed method generates a highly accurate identification of cases from a CDW of rare disease EHRs.