On Saturday 16th November a team of University Researchers from UCSD worked on creating a super computer.

The challenge was to explore the feasibility of running a large spike of GPU-based computing in the cloud. Since the University Researchers were aiming at renting the largest possible number of GPUs, they prepared the infrastructure to request VMs from all three major cloud providers, namely AWS, Microsoft Azure and Google Cloud. Since they did not have special reservations at any of them, they wanted to pick a Saturday with the hope that it was a time of low overall demand.

The team managed to get over 50k GPUs from the three largest cloud providers, integrated them in a single a HTCondor pool and ran production IceCube simulations on them.

The net result was a peak of about 51k GPUs of various kinds, with an aggregate peak of about 350 PFLOP32s (using NVIDIA provided specs). They used fp32 FLOPS numbers, since IceCube simulation is a purely 32-bit GPU compute application (OpenCL-based).

For comparison, the #1 TOP100 HPC system Summit has a nominal performance of about 400 PFLOP32s (27.6k V100 * 14.5 TFLOP32s). So, at peak, our Cloud-based cluster provided almost 90% of the performance of Summit, at least for the purpose of IceCube simulations.

To find out more information about this project check out this article here from one of the researchers. Click here.