Immortality Coin: Invest in Your Cure for Aging with Science

Hyperbus - Using System Ram eliminating VRAM

New name B I U S link image code HTML list	Show page Syntax
Makes onboard VRAM redundant and 1TB of VRam becomes possible to run LLM's Shared Graphics Memory: https://en.wikipedia.org/wiki/Shared_graphics_memory Integrated Graphics Processing Units (iGPUs) operate by utilizing a portion of the system's main memory as their Video Random Access Memory (VRAM), often referred to as "dedicated VRAM" in this context, though it is shared with the CPU. iGPUs do not have their own dedicated VRAM; instead, they use a portion of the system's RAM. This allocation is typically fixed by the system BIOS or operating system, reducing the available RAM for other tasks.The shared memory is generally slower than dedicated VRAM, such as GDDR5 or GDDR6, because it uses the system's DDR3 or DDR4 RAM. The iGPU uses the Direct Memory Access (DMA) controller to access shared memory efficiently, reducing CPU involvement. The NucBox K9 utilizes shared memory architecture, where the integrated Intel Arc GPU shares the system's RAM. This is a common design in systems with integrated graphics, as it allows the GPU to use a portion of the system's memory for graphical tasks. The NucBox K9 comes with 32GB of DDR5 dual-channel RAM running at 5600MHz, which is shared between the CPU and the integrated Intel Arc GPU312. This shared memory architecture allows the GPU to access the system's RAM for graphical processing, which can be beneficial for tasks that require significant memory bandwidth. The RAM is expandable up to 96GB using newly released 48GB RAM modules, providing additional memory for both the CPU and GPU to utilize. Intel Arc GPU is not as performant as Nvidia's GPU's, the DDR4 needs a new high speed standard. People are also clustering Mac Minis to create AI hubs. For instance, the Apple 2024 Mac Mini Desktop Computer with M4 Pro chip has a 12‑core CPU and a 16‑core GPU: with 24GB Unified Memory, 2TB SSD Storage, Gigabit Ethernet. Key point is that they can be clustered via Thunderbolt 4 port or ethernet, 3 Thunderbolt ports on these units allow for 4 machines, after which a hub is required at a performance loss. The latest Nvidia GPUs, particularly the RTX 40 series: * GDDR6 Memory: The primary VRAM used in consumer-grade GPUs, offering bus speeds up to 21 Gbps. The RTX 4090, for example, utilizes GDDR6 with a 384-bit bus, achieving a bandwidth of 1008 GB/s at 21 Gbps. * HBM2e Memory: Employed in high-end and professional cards, such as the RTX A6000, this memory type provides higher bandwidth through a wider bus, with speeds up to 3.2 Gbps per pin. * Bus Width Consideration: While bus speed is crucial, the total bandwidth is also influenced by the bus width. For instance, a 384-bit bus at 21 Gbps offers significant bandwidth compared to narrower buses. In conclusion, the latest consumer-grade Nvidia GPUs, as of 2023, achieve bus speeds up to 21 Gbps using GDDR6, while professional models leverage HBM2e for even higher bandwidths through wider buses. !!HyperPath Bus: The Bandwidth Engine '''Physical Layer:''' * 384-lane bus embedded in the PCB, using low-cost PAM-3 signaling (3 bits per cycle) at 24 GT/s. * Bandwidth: 384 lanes × 24 GT/s × 3 bits ÷ 8 = 3.456 TB/s (matches GDDR6X). * Integrated into PCIe 5.0 x16 slots via 200 auxiliary pins or uses a ZIF socket and bypasses PCIe. '''Protocol:''' * Direct RAM Mapping: GPU sees system RAM as contiguous VRAM space. * GPU-Direct Access: Bypasses the CPU and PCIe protocol overhead. The GPU’s memory controller communicates directly with DDR5 via HyperPath, treating system RAM as its own memory pool. * Burst Mode: Aggregates small GPU requests into large 512B packets to maximize bus efficiency * Priority-Based Arbitration: Critical GPU requests (e.g., texture fetches) override CPU tasks to minimize latency. !!Advanced Buffering for Low Latency * Caching * Lossless memory compression * Distributed training and inference cache * AirLLM, make AirLLM possible. https://github.com/lyogavin/airllm !!GPU Card Redesign: "VRAM-Less" Architecture The GPU card becomes a pure processor: * Could still utilize an L1 cache. * No GDDR Chips: Removes all VRAM, reducing PCB complexity, cost, and power draw. * HyperPath PHY Layer: A dedicated chip on the GPU converts memory requests into HyperPath signals. * Unified Memory Controller (UMC): Directly maps system RAM addresses to GPU memory space. '''Can we see this on a desktop PC with 1.5TB Ram to run 671B Deepseek R1 natively?'''
Password Summary of changes

📜 ⏱️ ⬆️