Abacus Semiconductor Corporation is a fabless semiconductor company that designs and engineers processors, accelerators and smart multi-homed memories for use in supercomputers and in the backend for Artificial Intelligence and Machine Learning (including Large Language Models such as ChatGPT and GPT-4) as well as traditional High Performance Compute (HPC) applications.

Many attempts have been made to escape the bottlenecks imposed by the von Neumann architecture, and while todays' CPU cores deploy a Harvard architecture to separate out instruction and data caches, no effort has been made to resolve the scale-out problem. Abacus Semiconductor Corporation has developed a beyond-von-Neumann and beyond-Harvard CPU architecture.

With that architecture we address the more-than-order of magnitude gap between the theoretical peak performance of supercomputers and the real-life performance that users of today's supercomputers experience.

We have re-engineered the processors and accelerators as well as the memory that make up the foundation of HPC and of the infrastructure that is used to create Large Language Models for AI such as GPT. We think that the underlying limitations are not in the Instruction Set Architecture (ISA) nor the accelerators themselves. The limitations clearly are in the underlying architecture.

Server on a Chip

Server-on-a-Chip

Abacus Semiconductor has developed a Server-on-a-Chip that makes building a server cheaper, allows for higher integration, and that follows the same principles that have been successful before, namely in smart phones and in tablets (and mainframes of the past). We believe that offload engines in hardware and with proper firmware support provide for a better energy-efficiency of compute than a homogeneous cluster of host CPUs. This processor can be used for web services, file services and high-transaction applications as well as in traditional (i.e. disk-based) and in-memory database applications. We have added virtualization hardware to the CPU cores and the IOMMU so that they can be used in fully virtualized environments. It can be used in any LAMP environment without recompiling of code, and as a file server or the core of a storage appliance. The Server-on-a-Chip supports both DDR5 DIMMs and our HRAM.

HRAM

HRAM

Today's processors and GPGPUs rely on the same outdated concept that we have seen for the past 30 years. In essence, the memory subsystem has no built-in smarts. As a result, the performance, security and shareability of the memory subsystem is very limited. The CPU's or GPGPU's DRAM Controller has to manage all memory transactions, and as such, these memory controllers are very specific to the type of memory they were developed for. We believe that this is a mistake. Memory is just a resource, and it should be managed by the memory subsystem, and not by the CPU or GPGPU. We call our solution Heterogeneous Random Access Memory or HRAM. It consists of multiple types and hierarchies of memory, and the HRAM as an intelligent multi-homed memory subsystem contains all controllers needed. There is no need for any kind of memory controller to be present in the processor or accelerator.

Application Processor

Application Processor

This processor family is targeted towards general-purpose processing that requires cache-coherent Terabyte-size main memory. Its main target are the orchestration of tasks that are either distributed to other processors or specific accelerators in large-scale applications. It is intended to serve as the main processor that controls application program flow and distributes tasks to other processors or accelerators in HPC and in Big Data as well as in Artificial Intelligence (AI) and Machine Learning (ML) workloads. For the time being, its hardware will be identical to the Database Processor, but it will ship with different firmware.

Database Processor

Database Processor

This processor is targeted towards ultra high-frequency transaction and large-scale database applications, web services, and all other integer-only applications that require cache-coherent Terabyte-size main memory. Large-scale in-memory databases such as ScyllaDB benefit from the internal and external bandwidth of this processor, the multi-homing of the memory, and the scalability of the solution. It is a processor and an accelerator in one, and for the time being, its hardware will be identical to the Application Processor. It will use a different firmware than the Application Processor.

Math Processor

Math Processor

We believe that the current math coprocessor concept does not work as effectively and efficiently as it could. While GPGPUs have helped tremendously, there are inherent limitations in GPGPU-accelerated compute. Among others, these are effectively all SIMD engines, and while there are a few thousand of them on a single die, they are not optimized for many mathematical operations that reflect the physical problems that HPC users need to solve. Most GPGPU compute today is predicated on CUDA, and that is a captive and proprietary solution. We prefer open source programming frameworks such as openCL and openACC. In our experience, open source frameworks lead to better quality of code, greater width and depth of solutions, and ultimately better adoption. We have made life easier for programmers and for users with our built-in matrix and tensor math functions as well as for all transforms, such as Fourier transforms.