Research
My research interests lie at the intersection of data management and machine learning, focusing on how to make machine learning scalable, decentralized, and private – ideally all at once. I am particularly passionate about building data infrastructures that benefit the common good.
In my PhD research, I am developing systems that address data management problems in end-to-end machine learning pipelines. My work aims to improve the efficiency of existing pipelines and extend machine learning to new applications. Specifically, I focus on dataset search for machine learning use cases and optimizing data loading pipelines during model training, drawing on techniques from systems engineering, relational databases, data integration, data discovery, data compression, and machine learning.
Beyond my core research, I collaborate on a range of data management projects and advise student theses. I am always eager to work with dedicated students and researchers. If you are interested in writing a BSc/MSc thesis with me or working as a research assistant at BIFOLD, feel free to email me with your CV and research interests.
Publications
Teaching Large-Scale Data Management to Large Cohorts of Undergraduate Students
Lennart Behme, Gereon Dusella, Rudi Poepsel Lemaitre, Alexander Borusan, and Volker Markl.
4th International Workshop on Data Systems Education @ SIGMOD (2025).
Finding What You're Looking For: A Distribution-Aware Dataset Search Engine in Action
Lennart Behme, Leonard Geißler, Pratham Agrawal, Emil Badura, Benjamin Ueber, Kaustubh Beedkar, and Volker Markl.
ACM International Conference on Management of Data (2025).
CompoDB: A Demonstration of Modular Data Systems in Practice
Haralampos Gavriilidis, Lennart Behme, Christian Munz, Varun Pandey, and Volker Markl.
Proceedings of the 28th International Conference on Extending Database Technology (2025).
Fainder: A Fast and Accurate Index for Distribution-Aware Dataset Search
Lennart Behme, Sainyam Galhotra, Kaustubh Beedkar, and Volker Markl.
Proceedings of the VLDB Endowment, Volume 17, Issue 10, Pages 3269 - 3282 (2024).
The Art of Losing to Win: Using Lossy Image Compression to Improve Data Loading in Deep Learning Pipelines
Lennart Behme, Saravanan Thirumuruganathan, Alireza Rezaei Mahdiraji, Jorge-Arnulfo Quiané-Ruiz, and Volker Markl.
39th IEEE International Conference on Data Engineering (2023).
Towards a Modular Data Management System Framework
Haralampos Gavriilidis, Lennart Behme, Sokratis Papadopoulos, Stefano Bortoli, Jorge-Arnulfo Quiané-Ruiz, and Volker Markl.
1st International Workshop on Composable Data Management Systems @ VLDB (2022).
You can also find me on dblp or Google Scholar.
Awards
- Software Campus grant for the project “Federated-Data-as-a-Service”
- Honored by the Electrical Engineering and Computer Science Faculty at TU Berlin for one of the three best 2021/22 Information Systems Management degrees
- Five-time recipient of the Deutschlandstipendium scholarship (awarded to 1.5% of the student body)
Teaching
Information Systems and Data Analysis ISDA
- Summer ‘25, Summer ‘24, Summer ‘23, Summer ‘22
Big Data Systems Project BDSPRO
- Winter ‘24, Winter ‘25
Database Lab DBPRA
- Winter ‘23, Winter ‘22
Seminar on Advanced Topics in Database and Information Systems DBSEM
- Summer ‘22
Database Project DBPRO
- Summer ‘22
Research Oriented Course on Data Science and Engineering Systems and Technologies ROC
- Winter ‘21
Seminar on Hot Topics in Information Management IMSEM
- Winter ‘21
Advising
Hardware-Conscious Performance Engineering of Fainder for Distribution-Aware Dataset Search
Tarik Abu Mukh M.Sc.Investigating the Platonic Representation Hypothesis on Table Embeddings
Emil Badura B.Sc.Towards an End-to-End Dataset Search Engine for Distribution-Aware Dataset Search
Leonard Geißler B.Sc.Investigating Joinable Table Search Methods for Dataset Search over Federated Data Repositories
Till Kurek B.Sc.Investigating the Potential of Wavelet Transforms as a Dataset Summary for Answering Data Discovery Queries
Johanna Schicktanz B.Sc.Query Relaxation for Numerical Query Predicates
Seungmi Lee B.Sc.Investigating the Effectiveness of Cross-Silo Datasets for Training Federated Learning Models
Florian Haberkorn B.Sc.
Community Service
External reviewer SIGMOD 2023
Member of the Availability Committee SIGMOD 2023
External reviewer ICDE 2023
External reviewer CIDR 2023