An important factor for future exascale HPC systems is the data management. Locality and efficient access to the data during a simulation are of great importance and contribute substantially to an increased performance.
At processor level, there are already efficient methods for caching data, however, the access to global parallel file systems is still a limiting factor. Some HPC systems have already local storage in their computing nodes, but these are not always suitable because of their capacity and therefore require the downloading of data from the global HPC file system.
Usually HPC file systems are used as a "shared" medium, i.e. the bandwidth and performance is distributed to all active jobs and processes. Thus, it is hardly possible for an application to predict the future I/O load of the HPC file system. Since the performance is also limited by the interface of global parallel file system and the computing nodes, the performance would increase enormously if the data could be moved closer to the compute nodes.
Through the use of an ad-hoc overlay file system the I/O performance for highly parallel applications can be be improved. For this purpose, it will be evaluated how temporary file systems can be efficiently provided for HPC environments. These file systems should thereby be created exclusively for one specific job and exists only during a simulation on the allocated computing nodes. The data needed, should be already on the private file system before the simulation starts. This should be achieved by an integration into the scheduling system of the HPC system. Subsequently, the data has to be migrated back into global file systems after the job finishes.
The research approach includes both, the design of the file system itself, as well as the questions about the proper scheduling strategy for planning the necessary I/O transfers.
Partner and promotion
- Center for Information Services and High Performance Computing (ZIH) at the Technical University Dresden
- Centre for Data Processing (ZDV) at the University of Mainz.
- Steinbuch Centre for Computing (SCC) at Karlsruhe Institute of Technology
The ongoing project (02/2016 - 01/2019) is funded by the German Research Foundation (DFG) under the priority program „software for exascale computing (SPPEXA)“.