The amount of data collected by science is rising fast and in many sciences the increasing amount of is are reaching the limit of established data handling and processing. Data is the basis of modern research and the key to new knowledge and competitive development lies in efficient and effective data management and subsequent analysis. The projects LSDF and LSDMA address the requirements of modern science for special purpose facilities, infrastructures and support to handle the data life cycle for large amounts of data. LSDMA and LSDF are very close. LSDMA builds on services and infrastructure the LSDF provides.
With funding of the state of Baden-Württemberg the project bwLSDF evaluates technologies to allow research institutes and Universities state wide flexible access to central and federated storage and archives. At completion, bwLSDF will provide modern secure shared storage services accessible from pda, desktop or HPC clusters for the scientific community in Baden-Württemberg.
The Large Scale Data Facility aims to address the data management and processing requirements of several forthcoming data intensive experiments at KIT. In particular, the High Throughput Microscopy (HTM) experiments used for biological studies of zebra fishes and the Tomography beamline of the synchrotron radiation source ANKA, are planning to collect data amounting to several Petabytes in the coming years. Providing not only the storage capacity, but also the data management and analysis services was identified as a critical aspect where a unified Campus-wide approach would profit from many synergies and relieve the communities from administering an IT Infrastructure. The LSDF at BioQuant of the University of Heidelberg is a peer project with similar resources and is connected with a dedicated 10 gigabit network link. The link will be upgraded to 100 gigabit in 2013. The LSDF at BioQuant and at KIT share computing and archival storage.
Currently the LSDF is providing storage through common protocols, as well as an iRODS Datagrid. Data from experiments benefit from data management functionality if ingested with the tools developed within the project at the Institute for Data Processing and Electronics (IPE) of KIT. Additionally, the data analysis can be carried out by means of the project's Cloud or hadoop services.
Currently, the LSDF hosts around 1 PB of data and provides hadoop and cloud resources to over 100 scientists at 17 KIT institutes. The LSDF will be part of the federated IT infrastructure of the Helmholtz Federation.
The LSDF was festively inaugurated during the LSDF-Kolloquium in February 2011.
A. García, S. Bourov, A. Hammad, J. van Wezel, B. Neumair, A. Streit, V. Hartmann, T. Jejkal, P. Neuberger, R. Stotzka. The Large Scale Data Facility: Data Intensive Computing for scientific experiments. Proceedings of The 12th IEEE International Workshop on Parallel and Distributed Scientific and Engineering Computing (PDSEC-11/IPDPS-11), IEEE Computer Society, 1467-1474 (2011), http://dx.doi.org/10.1109/IPDPS.2011.286
M. Sutter, V. Hartmann, M. Götter, J. van Wezel, A. Trunov, T. Jejkal, R. Stotzka. File Systems and Access Technologies for the Large Scale Data Facility. In: Remote Instrumentation for eScience and Related Aspects, Springer, 2012, ISBN 978-1-4614-0508-5, pages 239-256, http://dx.doi.org/10.1007 978-1-4614-0508-5_16
Contact: Jos van Wezel, email: jos.vanwezel∂kit.edu
Big Data Spin Off
The startup company da-cons excels in applied analysis, visualisation and archival of big image data sets, primarily in the area of biology and medicine. Their novel software helps scientists to get information from images and is based on developments and expertise from various departments of KIT. Together with SCC researchers da-cons brings science to public enterprise. da-cons received funding from BMBF through the EXIST program.