aitorrent/Meta-Llama-3-70B-Instruct-GGUF-torrent
Llama.cpp Imatrix Quantizations of Meta-Llama-3-70B-Instruct
This repository provides various quantizations of the Meta-Llama-3-70B-Instruct model using the llama.cpp release b2777 and the imatrix option with a dataset provided by Kalomaze.
Original Model:
https://huggingface.co/meta-llama/Meta-Llama-3-70B-Instruct
Quantization Options:
We offer a range of quantization options to suit different use cases and hardware configurations. The file sizes and descriptions are as follows:
- Q8_0: Extremely high quality, generally unneeded but max available quant (74.97GB)
- Q6_K: Very high quality, near perfect, recommended (57.88GB)
- Q5_K_M: High quality, recommended (49.94GB)
- Q5_K_S: High quality, recommended (48.65GB)
- Q4_K_M: Good quality, uses about 4.83 bits per weight, recommended (42.52GB)
- Q4_K_S: Slightly lower quality with more space savings, recommended (40.34GB)
- IQ4_NL: Decent quality, slightly smaller than Q4_K_S with similar performance, recommended (40.05GB)
- IQ4_XS: Decent quality, smaller than Q4_K_S with similar performance, recommended (37.90GB)
- Q3_K_L: Lower quality but usable, good for low RAM availability (37.14GB)
- Q3_K_M: Even lower quality (34.26GB)
- IQ3_M: Medium-low quality, new method with decent performance comparable to Q3_K_M (31.93GB)
- IQ3_S: Lower quality, new method with decent performance, recommended over Q3_K_S quant, same size with better performance (30.91GB)
- Q3_K_S: Low quality, not recommended (30.91GB)
- IQ3_XS: Lower quality, new method with decent performance, slightly better than Q3_K_S (29.30GB)
- IQ3_XXS: Lower quality, new method with decent performance, comparable to Q3 quants (27.46GB)
- Q2_K: Very low quality but surprisingly usable (26.37GB)
- IQ2_M: Very low quality, uses SOTA techniques to also be surprisingly usable (24.11GB)
- IQ2_S: Very low quality, uses SOTA techniques to be usable (22.24GB)
- IQ2_XS: Very low quality, uses SOTA techniques to be usable (21.14GB)
- IQ2_XXS: Very low quality, uses SOTA techniques to be usable (19.09GB)
- IQ1_M: Extremely low quality, not recommended (16.75GB)
- IQ1_S: Extremely low quality, not recommended (15.34GB)
Downloading using Hugging Face CLI:
To download a specific file, use the following command:
huggingface-cli download bartowski/Meta-Llama-3-70B-Instruct-GGUF --include "Meta-Llama-3-70B-Instruct-Q4_K_M.gguf" --local-dir ./ --local-dir-use-symlinks False
Choosing the Right Quantization:
To determine which quantization to use, consider the following factors:
- Model size: Choose a quantization that fits within your available RAM and/or VRAM.
- Speed vs. quality: If you want the absolute maximum quality, choose a larger quantization. If you want faster performance, choose a smaller quantization.
- Hardware compatibility: If you're using an Nvidia GPU, choose a K-quant. If you're using an AMD GPU or CPU, consider an I-quant.
For more information on choosing the right quantization, refer to the original description.