Run LLM from Linux Command Line
This revision is from 2024/07/16 20:43. You can Restore it.
The .gguf format is typically used with the llama.cpp project and its Python bindings, not with the Hugging Face Transformers library. To use a .gguf file, you need to use a different library, such as llama-cpp-python.
python3 -m pip install --upgrade pip
pip install transformers
pip install torch
pip install llama-cpp-python
Python script:
import os
from transformers import pipeline
model_path = "/path/to/Lexi-Llama-3-8B-Uncensored_Q8_0.gguf"
prompt = "What is the meaning of life?"
# Load the modelllm = pipeline('text-generation', model=model_path)
# Generate a responseresponse = llm(prompt)
# Print the responseprint(response[0]['generated_text'])
Execute:
python llm_script.py
To load a hugging face transformer model directly...
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "internlm/internlm2-chat-7b" # This is the Hugging Face model ID
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)
prompt = "What is the meaning of life?"
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=100)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)