Menu
Home
About
Our Role
Goals
The Team
Roadmap
Tokenomics
How To Buy
Knowledge Base
Contacts
Sitemap & Links
A.I.
Chart
Shop
IMMORTALITY
🏠
⬇️
Run LLM from Linux Command Line
New name
B
I
U
S
link
image
code
HTML
list
Show page
Syntax
The .gguf format is typically used with the llama.cpp project and its Python bindings, not with the Hugging Face Transformers library. To use a .gguf file, you need to use a different library, such as llama-cpp-python. {pre} python3 -m pip install --upgrade pip pip install transformers pip install torch pip install llama-cpp-python {/pre} Python script: {pre} import os from llama_cpp import Llama model_path = "/home/x/Downloads/Lexi-Llama-3-8B-Uncensored_Q8_0.gguf" # Load the model llm = Llama(model_path=model_path) # Initialize conversation history conversation = [] print("Welcome! Type 'exit' to end the conversation.") while True: # Get user input user_input = input("You: ").strip() # Check if user wants to exit if user_input.lower() == 'exit': print("Goodbye!") break # Add user input to conversation history conversation.append(f"Human: {user_input}") # Construct the prompt with conversation history prompt = "\n".join(conversation) + "\nAI:" # Generate a response response = llm(prompt, max_tokens=200, stop=["Human:", "\n"]) # Extract and print the response ai_response = response['choices'][0]['text'].strip() print("AI:", ai_response) # Add AI response to conversation history conversation.append(f"AI: {ai_response}"){/pre} Execute: {pre} python llm_script.py {/pre} To load a hugging face transformer model directly... {pre}import torch from transformers import AutoModelForCausalLM, AutoTokenizer model_name = "internlm/internlm2-chat-7b" # This is the Hugging Face model ID tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModelForCausalLM.from_pretrained(model_name) # Move model to GPU if available device = "cuda" if torch.cuda.is_available() else "cpu" model = model.to(device) # Initialize conversation history conversation = [] print("Welcome! Type 'exit' to end the conversation.") while True: # Get user input user_input = input("You: ").strip() # Check if user wants to exit if user_input.lower() == 'exit': print("Goodbye!") break # Add user input to conversation history conversation.append(f"Human: {user_input}") # Construct the prompt with conversation history prompt = "\n".join(conversation) + "\nAI:" # Tokenize input inputs = tokenizer(prompt, return_tensors="pt").to(device) # Generate response with torch.no_grad(): outputs = model.generate( **inputs, max_new_tokens=200, num_return_sequences=1, do_sample=True, temperature=0.7, top_p=0.95, no_repeat_ngram_size=3, pad_token_id=tokenizer.eos_token_id ) # Decode and print the response response = tokenizer.decode(outputs[0], skip_special_tokens=True) ai_response = response[len(prompt):].strip() print("AI:", ai_response) # Add AI response to conversation history conversation.append(f"AI: {ai_response}") {/pre} Improvements... * Memory management: As the conversation grows longer, implement a sliding window or summarization technique to keep the context within the model's token limit. * Error handling: Try-except blocks to handle potential errors, especially for long-running sessions. * Saving conversations: Save the conversation to a file for later review. * Model parameters: Experiment with different values for temperature, top_p, and max_new_tokens to find the best balance of coherence and creativity. * Prompt engineering: Refine the prompt structure to potentially improve the model's responses. For example, you might include a system message at the beginning of each prompt to set the AI's behavior.
Password
Summary of changes
📜
⏱️
⬆️