Abacus Semiconductor Supercomputers versus Data Centers

Supercomputers versus Data Centers

Abacus Semiconductor oftentimes receives questions pertaining to Supercomputers and Data Centers and whether both are the same.

Here is a very short summary of what a data center is, and how a supercomputer differs.

In most cases, data centers contain thousands of industry standard x86-64 or ARM servers, and they are connected via top-of-the-rack switches which in turn connect to one or more levels of hierarchy of switches for data-center global connectivity. The interconnects are usually Gigabit or 10 Gigabit Ethernet switches, in rare cases 100 GbE. Supercomputers superficially look the same, except that a good percentage of the servers will be equipped with accelerators such as General-Purpose Graphics Processing Units or GPGPUs. They most likely also will be equipped with larger main memory than their counterparts in data centers. Supercomputers will also much more likely have higher-performance interconnects such as 100 GbE and InfiniBand. Oftentimes, data centers will deploy servers that are most cost-effective, whereas the supercomputers will use servers with higher absolute performance from their processors. However, the largest difference is that in a data center, tens or hundreds of thousands of users run their (small) applications, whereas in a supercomputer, only one application runs at any given point in time. Servers in data centers are usually virtualized to allow for workloads to be moved across physical servers if needed, whereas that 10 to 15% loss of performance is not acceptable in a supercomputer, and sometimes the HPC applications in a supercomputer run on what is called bare metal, i.e. without the use of an Operating System.

Data centers usually run applications on top of a Linux - Apache – mySQL – PHP (or LAMP) stack. Those applications therefore are interpreted, and as such they are independent of the processors’ Instruction Set Architecture (ISA). Typical use cases are web services, email hosting, business applications on top of LAMP, and a host of scripts for end users to enable backups and storage as well as shared document processing.

Supercomputers typically separate compute nodes from mass storage and general I/O nodes, oftentimes even physically. The compute nodes can be CPU-only nodes, accelerated nodes (CPU plus GPGPU or CPU plus FPGA) or a mix thereof.

Supercomputers are used to solve large-scale problems such as:

training of GPT-3, GPT-4 or ChatGPT models with hundreds of billions of parameters in the training set,
modeling of quantum computers,
optimization of battery materials, particularly in the anodes,
improving batteries to extend their useful life with new chemistries,
climate modeling,
weather forecast,
complex multiphysics such as that in aircraft turbines,
car crash test simulations,
complex computational fluid dynamics at the transition between laminar and turbulent flows,
for drug discovery and
for finding solutions which molecules dock to the spike protein of the COVID-19 virus,
for oil and gas (and fresh water aquifer) discoveries,
improving energy-efficiency of internal combustion engines,

and for a host of other computational problems that help us solve real-life problems. Existing supercomputers often take a week to complete a complex problem when a one-hour response time is needed.