Run LLM from Linux Command Line

This revision is from 2024/07/16 20:43. You can Restore it.

The .gguf format is typically used with the llama.cpp project and its Python bindings, not with the Hugging Face Transformers library. To use a .gguf file, you need to use a different library, such as llama-cpp-python.

python3 -m pip install --upgrade pip

pip install transformers

pip install torch

pip install llama-cpp-python

Python script:

import os

from transformers import pipeline

model_path = "/path/to/Lexi-Llama-3-8B-Uncensored_Q8_0.gguf"

prompt = "What is the meaning of life?"

# Load the model

llm = pipeline('text-generation', model=model_path)

# Generate a response

response = llm(prompt)

# Print the response

print(response[0]['generated_text'])

Execute:

python llm_script.py

To load a hugging face transformer model directly...

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "internlm/internlm2-chat-7b" # This is the Hugging Face model ID

tokenizer = AutoTokenizer.from_pretrained(model_name)

model = AutoModelForCausalLM.from_pretrained(model_name)

prompt = "What is the meaning of life?"

inputs = tokenizer(prompt, return_tensors="pt")

outputs = model.generate(**inputs, max_new_tokens=100)

response = tokenizer.decode(outputs[0], skip_special_tokens=True)

print(response)

  

📝 📜 ⏱️ ⬆️