Ram Disk to speed up inference
Using a RAM disk to speed up inference.
Does not work.
LLM's require CPU/GPU operations.
1 token per second, and longer inferences are slower. 1 inferences comes back in about 1 or 2 minutes, for instance I say: Hello, LLM returns a minute later with the finished response. Too slow.
Tested with 14B quant 8
Aim for 15-30 tokens per second