Home | Sitemap | english  | Impressum | KIT

Large-Scale Data Management & Analysis

KIT is actively pursuing solutions and support for the data life cycle in science. SCC carries out R&D for data intensive science and operates large storage and analysis facilities.

The amount of data collected by science is rising fast and in many sciences the increasing amount of is are reaching the limit of established data handling and processing. Data is the basis of modern research and the key to new knowledge and competitive development lies in efficient and effective data management and subsequent analysis. The projects LSDF and LSDMA address the requirements of modern science for special purpose facilities, infrastructures and support to handle the data life cycle for large amounts of data. LSDMA and LSDF are very close. LSDMA builds on services and infrastructure the LSDF provides.

With funding of the state of Baden-Württemberg the project bwLSDF evaluates technologies to allow research institutes and Universities state wide flexible access to central and federated storage and archives. At completion, bwLSDF will provide modern secure shared storage services accessible from pda, desktop or HPC clusters for the scientific community in Baden-Württemberg.


All these tasks need the best people to join our team. We are actively hiring seasoned administrators and developers, supervisors and senior team leaders, students, scientific experts. Please look here for details. http://wiki.scc.kit.edu/lsdf/index.php/Positions



The Large Scale Data Facility aims to address the data management and processing requirements of several forthcoming data intensive experiments at KIT. In particular, the High Throughput Microscopy (HTM) experiments used for biological studies of zebra fishes and the Tomography beamline of the synchrotron radiation source ANKA, are planning to collect data amounting to several Petabytes in the coming years. Providing not only the storage capacity, but also the data management and analysis services was identified as a critical aspect where a unified Campus-wide approach would profit from many synergies and relieve the communities from administering an IT Infrastructure. The LSDF at BioQuant of the University of Heidelberg is a peer project with similar resources and is connected with a dedicated 10 gigabit network link. The link will be upgraded to 100 gigabit in 2013. The LSDF at BioQuant and at KIT share computing and archival storage.

Currently the LSDF is providing storage through common protocols, as well as an iRODS Datagrid. Data from experiments benefit from data management functionality if ingested with the tools developed within the project at the Institute for Data Processing and Electronics (IPE) of KIT. Additionally, the data analysis can be carried out by means of the project's Cloud or hadoop services.


The LSDF hosts around 5 PB of data and provides hadoop and cloud resources to over 100 scientists at 27 KIT institutes. The LSDF will be part of the federated IT infrastructure of the Helmholtz Federation.

The LSDF was festively inaugurated during the LSDF-Kolloquium in February 2011.

A. García, S. Bourov, A. Hammad, J. van Wezel, B. Neumair, A. Streit, V. Hartmann, T. Jejkal, P. Neuberger, R. Stotzka. The Large Scale Data Facility: Data Intensive Computing for scientific experiments. Proceedings of The 12th IEEE International Workshop on Parallel and Distributed Scientific and Engineering Computing (PDSEC-11/IPDPS-11), IEEE Computer Society, 1467-1474 (2011), http://dx.doi.org/10.1109/IPDPS.2011.286

M. Sutter, V. Hartmann, M. Götter, J. van Wezel, A. Trunov, T. Jejkal, R. Stotzka.
File Systems and Access Technologies for the Large Scale Data Facility. In: Remote Instrumentation for eScience and Related Aspects, Springer, 2012, ISBN 978-1-4614-0508-5, pages 239-256, http://dx.doi.org/10.1007 978-1-4614-0508-5_16

 Contact: Jos van Wezel, email: jos.vanwezel∂kit.edu

You know Linux, programming, python, perl, C, java, big data, data management? We have a job for you  http://wiki.scc.kit.edu/lsdf/index.php/Positions



Modern science and scientific computing is about data. In the process from collecting data till publication, the data has been moved, aggregated, selected, visualized and analyzed. Regarding the steadily increasing amounts of data this process must be organized and structured. Data management is the organization and structuration of the data life cycle which will allow faster results and dependable long term references.


ReaReaching the fundamental goal of sustainably improving data analysis chains and data life cycles also depends on availability of data management components and their development. Standardized and generic tools have to be provided and have to be promptly researched, developed and established in a joint R&D program, run by data specialists and driven by user communities.

These two activities are reflected by the LSDMA  project structure: several Data Life Cycle Labs are closely connected to five of the six Helmholtz Association research fields enhanced with a Data Services Integration Team. Research focuses on:

  • Data-Intensive Computing and Application
  • Migration, Preservation und Curation
  • Universal Data Access
  • Storage System Design

The project LSDMA started on January 1st, 2012. Its initial phase ends on December 31st, 2016, but the project will be integrated into the sustainable programme-oriented funding framework of the Helmholtz Association. The project partners for the initial phase are four Helmholtz Association research centers, namely DESY, FZJ, GSI and KIT, as well as six universities, namely HTW Berlin, Technical University of Dresden, University of Frankfurt, University of Hamburg, University of Heidelberg, University of Ulm, and the German Climate Research Center DKRZ.

Various LSDMA events take place to present and  discuss recent topics and challenges in Big Data. The yearly International LSDMA Symposium is hosted by KIT. 

Contact: Joerg Meyer, email: joerg.meyer2∂kit.edu