hf to gguf- Immortality Knowledge Base

hf to gguf

New name B I U S link image code HTML list	Show page Syntax
Convert model to ht format {pre} from transformers import GPT2LMHeadModel, GPT2Tokenizer # Load the trained model and tokenizer model = GPT2LMHeadModel.from_pretrained("./trained_model") tokenizer = GPT2Tokenizer.from_pretrained("./trained_model") # Define the directory to save the model and tokenizer save_directory = "./trained_model/hf_model" # Save the model and tokenizer model.save_pretrained(save_directory) tokenizer.save_pretrained(save_directory) # Load the model and tokenizer from the saved directory loaded_model = GPT2LMHeadModel.from_pretrained(save_directory) loaded_tokenizer = GPT2Tokenizer.from_pretrained(save_directory) # Test the loaded model input_ids = loaded_tokenizer.encode("Hello, how are you?", return_tensors="pt") outputs = loaded_model.generate(input_ids) print(loaded_tokenizer.decode(outputs[0], skip_special_tokens=True)) {/pre} Convert model to gguf Source: https://www.substratus.ai/blog/converting-hf-model-gguf-model/ Downloading a HuggingFace model. There are various ways to download models, but in my experience the huggingface_hub library has been the most reliable. The git clone method occasionally results in OOM errors for large models. Install the huggingface_hub library: {pre} pip install huggingface_hub {/pre} Create a Python script named download.py with the following content: {pre} from huggingface_hub import snapshot_download model_id="lmsys/vicuna-13b-v1.5" snapshot_download(repo_id=model_id, local_dir="vicuna-hf", local_dir_use_symlinks=False, revision="main") {/pre} Run the Python script: {pre} python download.py {/pre} You should now have the model downloaded to a directory called vicuna-hf. Verify by running: {pre} ls -lash vicuna-hf {/pre} Converting the model: convert the downloaded HuggingFace model to a GGUF model. Llama.cpp comes with a converter script to do this. Get the script by cloning the llama.cpp repo: {pre} git clone https://github.com/ggerganov/llama.cpp.git {/pre} Install the required python libraries: {pre} pip install -r llama.cpp/requirements.txt {/pre} Verify the script is there and understand the various options: {pre} python llama.cpp/convert.py -h {/pre} Convert the HF model to GGUF model: {pre} python llama.cpp/convert.py vicuna-hf \ --outfile vicuna-13b-v1.5.gguf \ --outtype q8_0 {/pre} In this case we're also quantizing the model to 8 bit by setting --outtype q8_0. Quantizing helps improve inference speed, but it can negatively impact quality. You can use --outtype f16 (16 bit) or --outtype f32 (32 bit) to preserve original quality. Verify the GGUF model was created: {pre} ls -lash vicuna-13b-v1.5.gguf {/pre} Pushing the GGUF model to HuggingFace. You can optionally push back the GGUF model to HuggingFace. Create a Python script with the filename upload.py that has the following content: {pre} from huggingface_hub import HfApi api = HfApi() model_id = "substratusai/vicuna-13b-v1.5-gguf" api.create_repo(model_id, exist_ok=True, repo_type="model") api.upload_file( path_or_fileobj="vicuna-13b-v1.5.gguf", path_in_repo="vicuna-13b-v1.5.gguf", repo_id=model_id, ) {/pre} Get a HuggingFace Token that has write permission from here: https://huggingface.co/settings/tokens Set your HuggingFace token: {pre} export HUGGING_FACE_HUB_TOKEN=<paste-your-own-token> {/pre} Run the upload.py script: {pre} python upload.py {/pre}
Password Summary of changes

📜 ⏱️ ⬆️