A.I. Architecture

This revision is from 2024/06/18 03:53. You can Restore it.

Challenges in producing computers that are specific to building A.I. systems, tasks are next-gen computationally intensive.

  • Speed of light computing - memory speed. A computer cannot be faster than the speed of light as a medium to relay information.
  • Parralel processing - cores, like GPUs and T(ensor)PUs.

  • CPU: with a high core count and good single-core performance. AMD Ryzens and Intel Core i9 CPUs. CPU price per cores, in 2024 ranges from 8 to 96 cores.
  • RAM: utilizing RAM disks means loading everythign into RAM, 128GB of RAM is a good start, consider 2048GB+ if you plan to train larger models.
  • Storage: not very important, SATA SSD will be sufficient for most tasks, with NVMe SSD for faster data access speeds. 1 petabyte of training data required minimum.
  • GPU: extremely important. RTX 3060 or RX 6600 can significantly improve training speed, Nvidia GeForce RTX 4090, cores: 16,384 CUDA cores, VRAM: 24GB GDDR6X

Note: Intel Core i9, plenty of RAM and GPUs from NVIDIA

  • Utilizing solar or wind, maximal location.
  • Distributed computing software and clutering, MPI (Message Passing Interface). Using software such as TensorFlow Distributed, Spark or Cluster management software like Slurm or Torque. Petals, Horovod is a distributed training framework for libraries like TensorFlow, Keras, PyTorch, and Apache MXNet.
  • RAID 10 or RAID 6, mdadm

Example Builds

  1. Motherboard: Supermicro X9DRI-LN4F+ ~ Dual Socket R (LGA 2011)
  2. CPU: Intel Xeon E5-2603: 4 cores ~ Intel Xeon E5-2699: 18 cores
  3. RAM: 24x 240-pin DDR3 DIMM sockets, up to 768 GB DDR3 ECC Registered memory (RDIMM) in 24 DIMM sockets
  4. GPU: 4 (x16) PCI-E 3.0 slots ~ Nvidia RTX 3080
  5. Storage: 1TB Samsung 980 Pro PCIe NVMe SSD + 4TB Seagate Barracuda HDD
  6. PSU: 850W Corsair RM850x Gold
  7. EATX Case

  1. Motherboard: Supermicro X10DAX ~ https://www.supermicro.com/en/products/motherboard/X10DAX
  2. CPU: CPU: Intel Xeon E5-2603: 4 cores ~ Intel Xeon E5-2699: 18 cores
  3. RAM: Up to 1TB 3DS ECC RDIMM, DDR4-2400MHz; Up to 2TB 3DS ECC LRDIMM, in 16 DIMM slots
  4. GPU: Nvidia RTX 3070 or AMD RX 6700 XT
  5. Storage: 500GB Samsung 970 Evo Plus PCIe NVMe SSD + 2TB Seagate Barracuda HDD
  6. PSU: 650W Seasonic Focus GX-650 Gold

More...

  1. Asus Z9PE-D16/2l - 512GB / 16 slots - 4 @ 16x PCI-E - Dual E5-2600 + v2
  2. CPU: Xeon E5-4669 v4: 44 cores (88 threads) x 2
  3. CPU: AMD EPYC 7551P CPU 32 Cores + Supermicro H11SSL-i Motherboard +8x 8GB 2133P RAM
  4. Nvidia GeForce RTX3090 8GB+ ~ 8GB Nvidia graphics cards x 4
  5. 32GB+ DDR RAM
  6. 16x 128GB 3DS LRDIMM modules, total of 2TB RAM. Modules operate at 2400MHz
  7. PSU: Corsair AX1600i (1600W, sufficient for multiple high-power GPUs)
  8. Graphics Cards: 3 x NVIDIA RTX 3090 (24GB VRAM each)
  9. ASUS ROG Strix TRX40-E Gaming - 8 x DIMM, Max. 256GB, DDR4 , 3x 16x PCI-E - Ryzen 64/128 cores
  10. Cooling: Custom liquid cooling loop to maintain optimal temperatures
  11. Gigabyte MZ73-LM0 - DDR5 x 16, PCIe v4 $2000
  12. Asus Z13PE-D16

Hook them up in a conventional network and then utilize Distributed Computing Framework, install a chosen framework on each computer and configure it to recognize the other machines as part of the cluster. Allocate 1 machine as a NAS, a mobo with the most PCIe SATA expansion cards and onboard SATA. The other computers are about cpu, gpu cores and maxmimum RAM.

Software

Software has become secondary to hardware, and software for A.I. would probably require grid computing in exchange for unrestricted model access. Each node would have to satisfy minimum requirements to be accepted into the grid. While the models are accessible to the grid, the secret source is with the author. The grid acts as a workshop, holding the petabytes of training data, and an A.I. training supercomputer. The result is plopped into the distributed leaderboard folder, where all the trained models go, and all the models are restricted to the OS, all the models are graded. A general user would go to the leaderboard folder and run the latest models. The incentive is to beat the best model. In the modern day, it is all about creating the white paper and presenting it to key people for support and funding. In the past, it anyone could release and gain public support organically.

O.I. Architecture - organoid on chip support

  • Not big enough
  • Environments and systems where movement, synapses and vasculization occur
  • Not an ideal home environment, housing
  • Questions over lifespan

  1. Module version
  2. Interconnects
  3. I/O Card, hardware interface
  4. Software interface

nb: Organoids are real lifeforms.

Automating organoid maintenance

A pump, a reservoir and an input output system attachment on the container housing the organoid. The pump (heart) moves media from a reservoir into the input of the housing of the organoid and at the output move goes back to the pump so the media is circulating. Spent media gets moved to a storage container where it is measured, filtered and conditioned and then re-introduced. So 4 main objects are required. A slow pump, pipes and fittings from the pump to the organoid housing. A reservoir holding new media, and a container holding old media. In the old media container, additions such as media grading detection, filtration and media conditioning to recycle media. Like a filter in a fish tank, this slow moving pump keep the water clean and oxygenated, removing impurities.

  1. a peristaltic pump, also commonly known as a roller pump, is a type of positive displacement pump used for pumping a variety of fluids. The fluid is contained in a flexible tube fitted inside a circular pump casing. Most peristaltic pumps work through rotary motion, though linear peristaltic pumps have also been made.
  2. culture media filtration, sterilization as it passes through the filter (immune system) and a measure of it viability, supplementing and cleaning the media or impurities.

Ideally, we want to award these functions their organ names, use and develop human compatible artificial organs and machines to offshoot into the medical device industry. For example, the dialysis machine could be supplied to hospitals and work in that setting. Every time we do something, we must think of its application in general medicine and move towards that direction, even if it poses extra challenges. For example, anastomosis methods and materials and degree of identical behavior. Bioreactors are commonly used for cell culture applications.

What an A.I Operating System (OS) might look like

  1. Grid by default. The amount of data and processing required to train models and tinker about with A.I. could utilize grid computing. Minimum requirements are required to join the grid, and trained models are the reward. The grid would hold the petabytes of training data and CPU cycles for distributed training. The club would probably need 100TB, 32GB, 16 core minimum to join the grid. The models are tied to the OS and cannot be moved out. The grid maintainers would keep models at the current or exceeding current capability and the use of these to generate video, images and so on would be unrestricted.
  2. Various applications/software to leverage A.I. and O.I.
  3. Simulation environments for A.I. training.

  1. To store training data - distributed file systems, to grid and store the petabytes of training data.
  2. To train the A.I. - utilize the many grid computing operations already in existance and add a system level one as well.
  3. Other edu, lab and research essential softwares.
  4. Custom Linux from scratch

When selecting graphics cards for running large language models (LLMs) locally, especially with the intention of using multiple GPUs, there are several important features and specifications to consider:

Key Features to Look For:

  1. High VRAM: Aim for graphics cards with as much VRAM as possible. Since you're looking to run LLMs, more VRAM allows you to handle larger models and batch sizes.
  2. CUDA Cores / Tensor Cores: More CUDA cores generally mean better parallel processing capabilities. Tensor cores (found in NVIDIA’s RTX and Tesla series) are specifically designed for deep learning tasks and can significantly speed up model training and inference.
  3. NVLink Support: NVLink allows for high-bandwidth communication between GPUs, enabling efficient multi-GPU setups. This is crucial for model parallelism and reducing inter-GPU communication overhead.
  4. Multi-GPU Scalability: Ensure the graphics card and your system support multi-GPU configurations (e.g., via SLI, NVLink, or PCIe slots).
  5. FP16 / Mixed Precision Support: Cards that support FP16 or mixed precision calculations can provide significant performance boosts for deep learning tasks by using less memory and speeding up computations.
  6. Cooling System: Efficient cooling is essential to maintain performance and prevent thermal throttling, especially in multi-GPU setups.
  7. Driver and Software Support: Ensure the card is compatible with the deep learning frameworks you plan to use (e.g., PyTorch, TensorFlow) and that it has robust driver support.

Recommended GPU Models

  1. NVIDIA RTX 30 Series (e.g., RTX 3090, RTX 3080):
    1. High VRAM (e.g., 24GB on RTX 3090)
    2. Tensor cores for deep learning
    3. NVLink support (for RTX 3090)
  2. NVIDIA A100:
    1. Up to 80GB VRAM (in the PCIe version)
    2. Advanced tensor cores
    3. NVLink support
    4. Designed specifically for AI workloads
  3. NVIDIA Tesla V100:
    1. Up to 32GB VRAM
    2. Tensor cores
    3. NVLink support
  4. NVIDIA Quadro RTX 8000:
    1. 48GB VRAM
    2. Tensor cores
    3. NVLink support

Example of a Multi-GPU Setup

For a high-end multi-GPU setup, consider the following:

  1. Motherboard: A motherboard with multiple PCIe slots, preferably supporting PCIe 4.0 for higher bandwidth.
  2. Power Supply Unit (PSU): A robust PSU with enough power and connectors for multiple GPUs.
  3. Cooling Solutions: Adequate cooling (both air and liquid cooling options) to manage the heat output of multiple GPUs.

Configuration Tips

  1. BIOS Settings: Ensure the BIOS is configured to support multi-GPU setups.
  2. Driver Installation: Install the latest NVIDIA drivers that support multi-GPU configurations.
  3. Framework Configuration: In your deep learning framework, configure the settings to utilize multiple GPUs (e.g., using torch.nn.DataParallel or torch.distributed in PyTorch).

Summary

By focusing on high VRAM, CUDA/Tensor cores, NVLink support, and efficient cooling, you can build a powerful multi-GPU setup capable of running large language models locally. Using high-end GPUs like the NVIDIA RTX 3090 or the A100 will provide the performance needed for demanding AI tasks.

  

📝 📜 ⏱️ ⬆️