bartowski/Samantha-Qwen-2-7B-GGUF-torrent

Last updated on Jun 18, 2024

Llamacpp imatrix Quantizations of Samantha-Qwen-2-7B

Using llama.cpp release b3152 for quantization, this repository provides various quantized versions of the Samantha-Qwen-2-7B model, originally from macadeliccc. The quantizations offer different quality and size tradeoffs to accommodate various hardware requirements and user preferences.

All quantizations were made using the imatrix option with a dataset from the link provided in the description. The prompt format used for the model is as follows:

<|im_start|>system
{system_prompt}<|im_end|>
<|im_start|>user
{prompt}<|im_end|>
<|im_start|>assistant

To download a specific quantized model file, install the huggingface-cli and use the following command:

pip install -U "huggingface_hub[cli]"
huggingface-cli download bartowski/Samantha-Qwen-2-7B-GGUF --include "Samantha-Qwen-2-7B-Q4_K_M.gguf" --local-dir ./

For models larger than 50GB, split into multiple files, download them all to a local folder with:

huggingface-cli download bartowski/Samantha-Qwen-2-7B-GGUF --include "Samantha-Qwen-2-7B-Q8_0.gguf/*" --local-dir Samantha-Qwen-2-7B-Q8_0

To select the appropriate model file, consider your hardware's RAM and VRAM capacity, as well as the desired quality and speed. For maximum quality, add both your system RAM and GPU's VRAM together and choose a file size 1-2GB smaller than the total. For faster performance, aim to fit the entire model on your GPU's VRAM, choosing a file size 1-2GB smaller than your GPU's total VRAM.

For quantizations below Q4, consider using I-quants (IQX_X) with Nvidia's cuBLAS, AMD's rocBLAS, or Apple Metal. While these offer better performance for their size, they may be slower on CPU or with Vulcan. K-quants (QX_K_X) are a safer choice for compatibility across different hardware and inference engines.

The I-quant feature matrix provided by Artefact2 can help users decide which quantization to choose based on their requirements.

Credit: This description was based on the original description provided by the repository owner, Bartowski.