Menu
Home
About
Our Role
Goals
The Team
Roadmap
Tokenomics
How To Buy
Knowledge Base
Contacts
Sitemap & Links
A.I.
Chart
Shop
IMMORTALITY
🏠
⬇️
Ram Disk to speed up inference
New name
B
I
U
S
link
image
code
HTML
list
Show page
Syntax
Using a RAM disk to speed up inference. Does not work. LLM's require CPU/GPU operations. 1 token per second, and longer inferences are slower. 1 inferences comes back in about 1 or 2 minutes, for instance I say: Hello, LLM returns a minute later with the finished response. Too slow. Tested with 14B quant 8 Aim for 15-30 tokens per second
Password
Summary of changes
📜
⏱️
⬆️