2025 Research Days
Binghamton Research Days Student Presentations

Imputation and k-NN based Haplotype Refinement of Simulated Ancient Mitochondrial Genomes

Authors: Nathaniel Plummer, Yaun Fang, Michel Shamoon-Pour, Ari Cozzarelli, Suhail Ghafoor, Laure Spake, Matthew Emery

Field of Study: Science, Technology, Engineering, and/or Math

Faculty Mentors: Matthew Emery

Easel: 40

Timeslot: Afternoon

Abstract: Ancient DNA (aDNA) is fragmented and degraded due to post-mortem processes, leading to low coverage and incomplete sequences. Mitochondrial DNA (mtDNA), despite its high copy number, is similarly subject to degradation processes, requiring statistical imputation for whole mtDNA reconstruction. While nuclear genome imputation is well-established, ancient mtDNA imputation remains underexplored. Here, this study benchmarks two mtDNA imputation pipelines—MitoIMP, and a novel pipeline integrating Minimac4’s Hidden Markov Model (HMM) with a k-Nearest Neighbors (k-NN) algorithm. Using the largest mtDNA imputation panel to date (n = 46,000), reconstruction accuracy was tested on 100 simulated ancient mtDNAs (≥0.25X coverage) generated with Gargammel and processed via EAGER (Efficient Ancient Genome Reconstruction). Imputation performance was evaluated using HaploGrep3 haplogroup classifications. The results show that combining a HMM with k-NN refinement significantly improves imputation accuracy, particularly at ultra-low coverage, and increases the likelihood of assigning accurate haplotypes to highly degraded aDNA and forensic DNA samples.