aitorrent/Meta-Llama-3-70B-Instruct-abliterated-v3.5-GGUF-torrent

Last updated on Jun 18, 2024

Meta-Llama-3-70B-Instruct-abliterated-v3.5_q4.gguf

Meta-Llama-3-70B-Instruct-abliterated-v3.5_q4.gguf.torrent

This model card introduces the Llama-3-70B-Instruct-abliterated-v3.5, an uncensored language model based on Meta-Llama-3-70B-Instruct. The model has been modified using a refined methodology inspired by the paper/blog post 'Refusal in LLMs is mediated by a single direction.' The model's refusal feature has been ablated through orthogonalization to create an uncensored model with minimal changes to its original behavior.

Key Features:

Uncensored: The model has had its refusal feature orthogonalized out, making it less likely to refuse user requests or lecture about ethics/safety.
Single layer modification: Only one layer has been modified, reducing moralizing disclaimers and improving the model's performance.
Improved tokenizer: The tokenizer has been fixed, enhancing the model's overall functionality.
Surgical approach: Orthogonalization is a more surgical method for inducing or removing specific features, requiring less data than fine-tuning.
Knowledge preservation: Ablation keeps the original model's knowledge and training intact, while removing undesirable behaviors.

Methodology:

The Llama-3-70B-Instruct-abliterated-v3.5 model was created using orthogonalization, a technique that inhibits the model's ability to express refusal. The methodology is based on the concept of ablation, which refers to removing features, and orthogonalization, which is the process of making two vectors perpendicular to each other.

The model's refusal feature was ablated by applying a system prompt against a blank system prompt on the same dataset and orthogonalizing for the desired behavior in the final model weights. This surgical approach allows for more specific behavior changes with fewer samples, compared to fine-tuning.

Quirkiness Awareness:

The model may exhibit interesting quirks due to the new methodology. Users are encouraged to share their experiences and observations in the community tab to help understand the side effects of orthogonalization. Researchers and developers are welcome to explore and improve the model, as well as share their findings and advancements in the field.

Original Work Credits:

The Jupyter "cookbook" to replicate the methodology can be found here.

The personal library of code used (WIP) can be found here.

The model is based on Meta-Llama-3-70B-Instruct with orthogonalized bfloat16 safetensor weights.

The original paper/blog post 'Refusal in LLMs is mediated by a single direction' can be found here.

For further information and discussions, join the Cognitive Computations Discord community or post in the Community tab.