Deployment Guide
Abacus Semiconductor has developed processors, accelerators and smart multi-homed memory subsystems that are intended to reduce complexity and cost while maintaining a performance lead over traditional solutions. The following guide is intended to clarify which product can be used in a customer's application. In either case, if none of our partners has an applicable solution, we will design a system based on your requirements.
The areas of deployment are
Web Services (both for internal and external use) can be run on the Server-on-a-Chip at lower cost and the same level of performance of traditional solutions. The same is true for other backend applications, such as email. Inventory control systems, e-Commerce platforms and Point-of-Sales applications as well as messaging, project management and version control systems fall into the same category and can be run on top of a LAMP/FAMP stack.
High-Frequency OLTP applications can be run on the Server-on-a-Chip at lower cost and the same level of performance of traditional solutions, or if a quantum leap is necessary, then one or more of our Application or Database Processors in conjunction with a number of HRAMs with one or more Server-on-a-Chip as I/O frontend is advisable.
Depending on the performance level required multiple possible solutions exist for traditional (i.e. disk-based) database applications. On the lower end of the spectrum, we have the Server-on-a-Chip with performance levels of an established solution but lower price points and integrated RAID/ZFS functionality, and on the top end with vastly superior performance and scalability, we have the Application or Database Purpose Processors with the Server-on-a-Chip as a frontend, and HRAMs for large memory size and bandwidth.
Depending on the performance level and the total main memory size required multiple possible solutions exist for in-memory database applications. There is the Server-on-a-Chip with performance levels of an established solution but larger main memories as well as integrated RAID/ZFS functionality. If vastly superior performance and scalability are required then we recommend one or more Application or Database Processors with the Server-on-a-Chip as a frontend, and HRAMs for large memory size and bandwidth.
Today's Storage Appliances rely on hard disks, SSDs and on DDR4 or DDR5 DRAM as well as PCIe- or CXL-attached Flash memory or M.2 Flash. Hard disks and SSDs are subject to SAS and SATA "speed limits". M.2 as well as PCIe are faster than SAS or SATA, but still subject to severe limitations, both in terms of bandwidth and latency. Flash on the DDR4 bus has not really taken off, and even if it did, its write performance and the problem of aging would cause a problem that is not easily overcome. Our Server-on-a-Chip and the Application or Database Processor plus our HRAM would solve above problems, with up to 64 TB of main memory per processor in a one-hop single-deep memory configuration. The Server-on-a-Chip would make an excellent Controller for tape backup on LTO if that is required within a storage appliance. The data rate of 300 MB/s is easily supported by each of the SAS/SATA channels of the Server-on-a-Chip. The Server-on-a-Chip can encrypt and decrypt up to four SAS/SATA channels at line rate simultaneously using AES, and as a result, it can be used to encrypt cold storage and (tape) backups.
Traditional HPC relies on a mix of I/O, integer compute for general administration, and floating-point as well as UNUM/POSIT math. Linpack, BLAS and SGEMM/DGEMM benchmarks will show a significant performance gain. As a result, we suggest that a Server-on-a-Chip is paired with one Application or Database Processors and one Math Processor and at least four HRAMs. If that is not enough performance, just add multiple of these combinations until your performance requirements are met. If memory utilization is an issue, add more of the HRAMs. These processors and memories scale out very linearly. FEA and FEM applications such as crash tests and weather and climate research fall into this category.
GPT-3 and ChatGPT were trained on very large data sets, allegedly with 175 billion parameters. It is assumed that for GPT-4, 1 trillion parameters are needed. As a result, this training data set must be accessible in a shared memory across many processors and many more cores. Today's processors do not support physical address spaces of that size, and the cache coherency mechanisms do not span enough memory modules to support coherency, no matter if directory-based coherency is used or not. Our processors and smart multi-homed memories were built from day one to support 64 bit, 96 bit and 128 bit physical address spaces, and the coherency domains can span any subset of those.
"Big Data" is very similar to in-memory database applications, and as a result, the same recommendations apply.
Graph Search is very similar to in-memory database applications, and as a result, the same recommendations apply.
For TensorFlow and all similar methods of training we recommend massive math capabilities with enough memory bandwidth to avoid bottlenecks. Sets of Server-on-a-Chip processors and Math Processors with HRAMs as needed will solve the TensorFlow bottlenecks for ML training.
Large-Scale matrix and tensor math applications can be served by one or more Server-on-a-Chip as a frontend to as many Math Processors and HRAM as needed. The same applies to large-scale (discrete) Fourier transforms.
Large-Scale Cryptanalysis is fairly similar in its application profile to large-scale matrix math, but in addition to the floating-point performance, integer processing might be required and can be provided by adding Application or Database Processors.
There are certainly more and broader areas of deployment. This is list only intended to give you an overview of where our processors are superior to existing technology. There will also be plenty of cases in which our processors will not be able to displace existing solutions. An example of those is the need to natively execute x86-64 (or ARM or MIPS) instructions.