aitorrent/dolphin-2.9.2-Phi-3-Medium-abliterated-GGUF-torrent
About
This repository contains static and weighted/imatrix quantized versions of the Dolphin-2.9.2-Phi-3-Medium-abliterated model. The quantized models are provided for model compression, which reduces the model size while maintaining performance. Model compression is essential for deploying large models on resource-constrained devices.
The original model is available at https://huggingface.co/cognitivecomputations/dolphin-2.9.2-Phi-3-Medium-abliterated.
Weighted/imatrix quantized models are available at https://huggingface.co/mradermacher/dolphin-2.9.2-Phi-3-Medium-abliterated-i1-GGUF.
Usage
If you are new to using GGUF files, refer to TheBloke's READMEs for detailed instructions on how to use them, including how to concatenate multi-part files.
Provided Quants
The provided quantized models are sorted by size, not necessarily by quality. IQ-quants (Intellectual Quotient) are often preferred over similar-sized non-IQ quants due to their superior performance.
Link Type Size/GB Notes
GGUF Q2_K 5.3
GGUF IQ3_XS 5.9
GGUF Q3_K_S 6.2
GGUF IQ3_S 6.2 beats Q3_K*
GGUF IQ3_M 6.4
GGUF Q3_K_M 6.9 lower quality
GGUF Q3_K_L 7.4
GGUF IQ4_XS 7.7
GGUF Q4_K_S 8.1 fast, recommended
GGUF Q4_K_M 8.5 fast, recommended
GGUF Q5_K_S 9.7
GGUF Q5_K_M 10.0
GGUF Q6_K 11.6 very good quality
GGUF Q8_0 14.9 fast, best quality
The following graph by ikawrakow compares the performance of lower-quality quant types, where lower values indicate better performance:
Artefact2's thoughts on model quantization can be found at https://gist.github.com/Artefact2/b5f810600771265fc1e39442288e8ec9.