On 17.03.2020, the SCC will commission the new parallel computer system "bwUniCluster 2.0+GFB-HPC" (aka bwUniCluster 2.0) as a state service within the framework of the Baden-Württemberg Implementation Concept for High Performance Computing (bwHPC). The bwUniCluster 2.0 replaces the predecessor system "bwUniCluster" and also includes the extension of the predecessor system procured in November 2016.
The modern, advanced HPC system consists of more than 840 SMP nodes with 64-bit Xeon processors from Intel. It provides the universities of the state of Baden-Württemberg with basic computing power and can be used free of charge by the staff of all universities in Baden-Württemberg.
Each state university regulates the access authorization to this system for its employees itself. Users who currently have access to bwUniCluster 1 will automatically also have access to bwUniCluster 2.0+GFB-HPC. There is no need to apply for new entitlements or re-register. Further details regarding registration and access to this national service are available at wiki.bwhpc.de/e/bwUniCluster_2.0.
Configuration of the system
The bwUniCluster 2.0 contains
- 4 login nodes with 40 cores each in "Cascade Lake" architecture with 384 GB main memory per node,
- 2 login nodes, each with 20 cores in "Broadwell" architecture and 128 GB main memory per node
- 100 HTC computing nodes (Cascade Lake) with 40 cores each with 96 GB main memory per node,
- 360 HPC computing nodes (Cascade Lake) with 40 cores each with 96 GB main memory per node,
- 8 "fat" compute nodes (Cascade Lake) with 80 cores each with 3TB main memory per node,
- 14 GPU computing nodes (Cascade Lake) with 40 cores each with 384 GB main memory per node and 4X Nvidia Tesla V100 (32GB),
- 10 GPU computing nodes (Cascade Lake) with 40 cores each with 768 GB main memory per node and 8X Nvidia Tesla V100 (32GB),
- 352 computing nodes (Broadwell) with 28 cores each with 128 GB main memory per node (old bwUnicluster extension).
InfiniBand is used as a connection network in various generations and expansion stages:
The "high throughput" and "fat" nodes are connected to InfiniBand HDR100 with 100 Gbit/s. Since these nodes are not intended for jobs that occupy more than one node, the network topology in this area has a high blocking factor.
The "HPC" and "GPU" nodes are connected to InfiniBand HDR100 with 100 Gbit/s. Since these nodes are intended for parallel jobs, the network topology in this area does not have a blocking factor.
The "HPC" nodes of the expansion partition are connected to InfiniBand FDR at 56 Gbps without blocking factor.
The connection to the file systems is made via InfiniBand EDR with 100 Gbit/s.
The bwUniCluster 2.0 is a massive parallel computer with a total of 848 nodes. Across the entire system, a theoretical peak performance of approx. 1.4 PetaFLOPS and a total memory expansion of approx. 119 TB result.
The base operating system on each node is a Red Hat Enterprise Linux (RHEL) 7.x. The management software for the cluster is KITE, a software environment developed at SCC for the operation of heterogeneous computing clusters.
As a global file system, the scalable, parallel Lustre file system is connected via a separate InfiniBand network. By using multiple Lustre Object Storage Target (OST) servers and Meta Data Servers (MDS), both high scalability and redundancy in the event of failure of individual servers are achieved. The new parallel file system procured together with the bwUniCluster 2.0+GFB-HPC has a total capacity of about 5 PetaByte with a total throughput of about 72 Gigabyte per second.
Detailed short description of the nodes:
- Thin nodes (HTC+HPC): 2x Intel Xeon Gold 6230 (20 cores, 2.1 GHz - 125 Watt TDP), 96 GiB RAM (DDR4), 960 GB SATA SSD, 1x InfiniBand HDR 100
- Thick nodes (HTC): 4x Intel Xeon Gold 6230 (20 cores, 2.1 GHz - 125 Watt TDP), 3 TiB RAM (DDR4), 4.8 TB NVMe, 2x InfiniBand HDR 100
- 4 GPU nodes: 2x Intel Xeon Gold 6230 (20 cores, 2.1 GHz - 125 Watt TDP), 384 GiB RAM, 3.2 TB NVMe, 2x InfiniBand HDR 100 HDA, 4x Tesla V100 32GB NVLINK
- 8 GPU nodes: 2x Intel Xeon Gold 6248 (20 cores, 2.5 GHz - 125 Watt TDP), 768 GiB RAM, 6.4 TB NVMe, 4x InfiniBand HDR 100 HDA, 8x Tesla V100 32GB NVLINK,
- InfiniBand network: Mixed HDR 100/200 (two nodes with splitter cable on one switch port), EDR in the file system
- 352 28-way (computing) nodes, each with 2 14-core Intel Xeon E5-2660 v4 processors (Broadwell) with a standard clock frequency of 2.0 GHz, 128 GB main memory and 480 GB local SSD,