aitorrent/Phi-3-mini-128k-instruct-abliterated-v3-GGUF-torrent

aitorrent/Phi-3-mini-128k-instruct-abliterated-v3-GGUF-torrent
Photo by Maksim Shutov / Unsplash

Phi-3-mini-128k-instruct-abliterated-v3 (GGUF & Quants) by Cognitive Computations

Cognitive Computations' "cookbook" to replicate the methodology can be found here, and a refined library is coming soon.

This is a description of the microsoft/Phi-3-mini-128k-instruct model with orthogonalized bfloat16 safetensor weights, generated with a refined methodology based on the preview paper/blog post: 'Refusal in LLMs is mediated by a single direction', which we encourage you to read to understand more.

This model has had certain weights manipulated to "inhibit" the model's ability to express refusal. It is not guaranteed that it won't refuse you, understand your request, or lecture you about ethics/safety, etc. It is tuned in all other respects the same as the original 70B instruct model was, just with the strongest refusal directions orthogonalized out.

In short, it's uncensored in the purest form we can manage -- no new or changed behavior in any other respect from the original model.

The term "abliterated" is a play-on-words using the original "ablation" term used in the original paper to refer to removing features. Ablate + obliterated = Abliterated. Both refer to the same thing, the technique in which the refusal feature was "ablated" from the model was via orthogonalization.

Ablation/augmentation are techniques to induce/remove very specific features that you'd have to spend way too many tokens on encouraging or discouraging in your system prompt. Instead, you just apply your system prompt in the ablation script against a blank system prompt on the same dataset and orthogonalize for the desired behavior in the final model weights.

Why ablation over fine-tuning? Ablation is much more surgical in nature whilst also being effectively executed with a lot less data than fine-tuning. As well, it keeps as much of the original model's knowledge and training intact, whilst removing its tendency to behave in one very specific undesireable manner.

Fine tuning is still exceptionally useful and the go-to for broad behavior changes; however, you may be able to get close to your desired behavior with very few samples using the ablation/augmentation techniques. It may also be a useful step to add to your model refinement: orthogonalize -> fine-tune or vice-versa.

We haven't really gotten around to exploring this model stacked with fine-tuning, and we encourage others to give it a shot if they've got the capacity.

This model is called V3 because we released a V2 of an abliterated model a while back for Meta-Llama-3-8B under Cognitive Computations. It ended up being not worth it to try V2 with larger models, so we decided to refine the model before wasting compute cycles on what might not even be a better model. We are however quite pleased about this latest methodology, it seems to have induced fewer hallucinations.

This model may come with interesting quirks, with the methodology being so new. We encourage you to play with the model, and post any quirks you notice in the community tab, as that'll help us further understand what this orthogonalization has in the way of side effects.

If you manage to develop further improvements, please share! This is really the most basic way to use ablation, but there are other possibilities that we believe are as-yet unexplored. Additionally, feel free to reach out in any way about this. We're on the Cognitive Computations Discord, we're watching the Community tab, reach out! We'd love to see this methodology used in other ways, and so would gladly support whoever whenever we can.