• Genome Data Science

    We develop methods and tools to work with tens of thousands of genomes and analyze and integrate the corresponding data.

    Model of DNA double helix in front of a student.
    © Universität Bielefeld

392107 Data Structures in Pangenomics

392107 Schönhuth / Ghaffaari Summer 2026 Thu 12-14 in S1-147 (S)

Contact

Structure

  • Individual presentations (S): 22 minutes, followed by 5 minutes of discussion
  • Report (S): 10-12 pages long, excluding references (but title, table of content, figures, and tables included)
  • Blog (S): ca. 30 minutes of reading time
  • Coding project (Ü): implement and reproduce the results of the paper you selected for your presentation, using the datasets provided in the paper where possible.

Papers

Please see on Google Drive. The link is password protected.
If there is any problem with accessing a paper, write an email to Jasper Matzat.

Time table seminar sessions

Date Topic
16.04.2026 Introduction to seminar, Course organisation, How to present
23.04.2026 (no session)
30.04.2026 (no session)
07.05.2026 (no session)
14.05.2026 (no session)
21.05.2026 (no session)
28.05.2026 INTRODUCTORY SESSION, Report Writing Guidelines
04.06.2026 (holiday)
11.06.2026 Block presentations (3)
18.06.2026 Elastic-Degenerate String Matching via Fast Matrix Multiplication Matthis
Nucleotide Transformer: building and evaluating robust foundation models for human genomics Eylül
Phyloformer: Fast, Accurate, and Versatile Phylogenetic Reconstruction with Deep Neural Networks Lena
25.06.2026 ODGI: understanding pangenome graphs Lukas
Wheeler graphs: A framework for BWT-based data structures Malte
Efficient and accurate search in petabase-scale sequence repositories Breia
02.07.2026 Indexing and real-time user-friendly queries in terabyte-sized complex genomic datasets with kmindex and ORA Lea
Movi: A fast and cache-efficient full-text pangenome index Özay
Geometric deep learning framework for de novo genome assembly Tim
09.07.2026 Efficient dynamic variation graphs Nityakrushna
Block presentations (2)
16.07.2026 CALDERA: finding all significant de Bruijn subgraphs for bacterial GWAS Timo
Mem-based pangenome indexing for k-mer queries Nils
Pangenome Graph Indexing via the Multidollar-BWT Alexander
23.07.2026 Block presentations (3)