bartowski/MadWizard-SFT-v2-Mistral-7b-v0.3-GGUF-torrent

bartowski/MadWizard-SFT-v2-Mistral-7b-v0.3-GGUF-torrent
Photo by Andre Hunter / Unsplash

This is a collection of quantized versions of the MadWizard-SFT-v2-Mistral-7b-v0.3 language model, optimized for different hardware configurations and performance requirements. The models are based on the Lumpen1/MadWizard-SFT-v2-Mistral-7b-v0.3 model and have been quantized using the imatrix option in llama.cpp release b3152.

The models are available in various sizes and quantization levels, ranging from Q8_1 with a file size of 7.95GB to IQ2_XS with a file size of 2.20GB. The quantization levels are denoted as QX_K_X or IQX_X, where X represents the quantization level, K is for K-quant (older but compatible with all hardware), and I is for I-quant (newer but not compatible with Vulcan).

To download a specific model, use the huggingface-cli tool with the following command:

huggingface-cli download bartowski/MadWizard-SFT-v2-Mistral-7b-v0.3-GGUF --include "MadWizard-SFT-v2-Mistral-7b-v0.3-Q4_K_M.gguf" --local-dir ./

Replace "MadWizard-SFT-v2-Mistral-7b-v0.3-Q4_K_M.gguf" with the desired model file name.

To choose the appropriate model for your hardware and performance needs, consider the following:

  1. Determine your available RAM and VRAM: Add your system RAM and GPU VRAM together to get the total available memory.
  2. Choose a model size: Aim for a model file size 1-2GB smaller than your total available memory for the best performance.
  3. Select a quantization level: For the best quality, choose a higher quantization level (e.g., Q8_1 or Q8_0). For lower memory requirements, choose a lower quantization level (e.g., Q3_K_L or IQ3_M).
  4. Decide between K-quant and I-quant: If you're running cuBLAS (Nvidia) or rocBLAS (AMD), consider using an I-quant for models below Q4. For CPU, Apple Metal, or Vulcan, use a K-quant.

For more detailed information on the quantization levels and performance comparisons, refer to the write-up by Artefact2 and the llama.cpp feature matrix.