“Semantic Harmonization of Alzheimer’s Disease Datasets Using AD-Mapper”
Authors: Philipp Wegner, Helena Balabin,Mehmet Can Ay, Sarah Bauermeister, Lewis Killin, John Gallacher, Martin Hofmann-Apitius, Yasamin Salimi, for the Alzheimer’s Disease Neuroimaging Initiative, the Japanese Alzheimer’s Disease Neuroimaging Initiative, the Aging Brain: Vasculature, Ischemia, and Behavior Study, the Alzheimer’s Disease Repository Without Borders Investigators, and the European Prevention of Alzheimer’s Disease (EPAD) Consortium
Abstract:
Background:Despite numerous past endeavors for the semantic harmonization of Alzheimer’s disease (AD) cohort studies, an automatic tool has yet to be developed.
Objective: As cohort studies form the basis of data-driven analysis, harmonizing them is crucial for cross-cohort analysis. We aimed to accelerate this task by constructing an automatic harmonization tool.
Methods: We created a common data model (CDM) through cross-mapping data from 20 cohorts, three CDMs, and ontology terms, which was then used to fine-tune a BioBERT model. Finally, we evaluated the model using three previously unseen cohorts and compared its performance to a string-matching baseline model.
Results: Here, we present our AD-Mapper interface for automatic harmonization of AD cohort studies, which outperformed a string-matching baseline on previously unseen cohort studies. We showcase our CDM comprising 1218 unique variables.
Conclusion: AD-Mapper leverages semantic similarities in naming conventions across cohorts to improve mapping performance.
DOI: 10.3233/JAD-240116
Published online on 11 June 2024 in the Journal of Alzheimer’s Disease