Enabling HPC for ”Big Data” Physics

Many areas in fundamental sciences are facing a drastic increase of data volumes and hence a corresponding increase in computing requirements for data reconstruction, simulation and data analysis. Traditional infrastructure, such as specialised compute centres dedicated to individual science areas, cannot efficiently handle these new requirements alone. In recent years, a number of successful projects demonstrated that many of these compute tasks can be executed efficiently on HPC systems as well. This offers the opportunity to dynamically and transparently complement the existing computing centres of the large scientific communities with large-scale HPC resources such as the new HoreKa supercomputer at KIT. However, the practical application at scale faces challenges from provisioning of existing scientific software stacks, efficient multi-level scheduling respecting existing global workflow management systems, and the transparent, performant usage of very large remote data repositories. In this proposal, we address the most relevant issues to ensure the stable and sustainable operation of HoreKa for applications typical in particle physics (High Energy Physics, “HEP”) and similar fields such as Hadron and Nuclear physics (“HaN”), and Astroparticle Physics (“ATP”). Interesting further steps at a later stage are the inclusion of workflows relying on GPUs and the implementation of caching methods to enable fast, repeated access to data sets for the final analysis by individual scientists.