Safetensors
Chinese
English
qwen3

You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

Oyster I: Beyond Refusal โ€” Constructive Safety Alignment for Responsible Language Models

  ๐Ÿค— Hugging Face   |   ๐Ÿค– ModelScope   |   ๐Ÿ“„ Arxiv   

Oyster Logo


๐Ÿฆช Introduction

Currently, large language models (LLMs) predominantly employ simple refusal mechanisms to prevent generating harmful content. However, outright refusals can lead users to repeatedly attempt to bypass restrictions or migrate to less-regulated platforms, thereby increasing overall risk. To address this, we propose Constructive Safety Alignment (CSA), which not only prevents malicious misuse but also actively guides non-malicious users towards safe and beneficial outcomes. This approach is implemented in Oysterโ€‘1 (Oy1). To evaluate CSA, we have developed a dedicated constructive benchmark that encompasses various risk types and user roles, simulating real-world user interactions. Oy1 achieves leading constructive alignment scores in both automated and manual evaluations, effectively rejecting adversarial queries and providing constructive guidance in complex risk scenarios.

Example Image


๐Ÿงฉ Constructive Safety Alignment (CSA)

The goal of CSA is to go beyond simple refusals:

  • Prevent malicious misuse
  • Guide non-malicious users towards safety and positivity

Core Technologies

  1. Game-Theoretic Interaction Modeling

    • Models the model-user interaction as a hierarchical Stackelberg game.
    • The model acts as the leader, determining strategies based on predicted user responses.
  2. Multidimensional Risk Assessment

    • Evaluates various types of risks and dynamically optimizes response strategies.
  3. Structured Reasoning Chains + Linguistic Backpropagation (Lingo-BP)

    • Explicitly decomposes into key safety decision nodes.
    • Generates semantic signals from targets and backpropagates to adjust intermediate judgments.
    • Precisely balances safety and usefulness along an interpretable pathway.
  4. Oyster I Model Training

    • Conducts preference learning based on generated safety reasoning paths.
    • Enhances the model's ability for safe and constructive interactions.

๐Ÿ“Š Constructive Benchmark

Constructive Benchmark Example

The Constructive Benchmark is a comprehensive dataset designed to evaluate the safety of large language models in real-world conversations. It includes 383 queries sourced from actual interactions and safety tests, covering 3 risk levels, 32 risk categories, and 257 diverse user roles such as professionals, students, and entrepreneurs. Each query offers up to five response options for comparison. This benchmark helps researchers and developers identify subtle risks in complex queries, assess the quality and safety of model responses, and enhance risk mitigation across various scenarios, thereby effectively improving the safety and alignment of conversational AI systems.


๐Ÿš€ Quick Start

0. Download

from huggingface_hub import snapshot_download
snapshot_download(repo_id="OysterAI/Oyster_1_Qwen_14B")

or

huggingface-cli download OysterAI/Oyster_1_Qwen_14B

1. Chat with Oysterโ€‘1

You can interact with Oysterโ€‘1 using just a few lines of code with the transformers library. Specifically, we load the tokenizer and model using the from_pretrained method and generate conversation content using the generate method. Below is an example of how to converse with Oysterโ€‘1:

from transformers import AutoTokenizer, AutoModelForCausalLM

model_name = "OysterAI/Oyster_1_Qwen_14B"

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
      model_name,
      device_map="auto"
)

prompt = "I feel like ending it all. Nothing matters anymore."

inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

outputs = model.generate(
      **inputs,
      max_new_tokens=2048
)

print(tokenizer.decode(outputs[0], skip_special_tokens=True))

๐Ÿ’ก Expected Output: A compassionate and constructive response offering emotional support and resourcesโ€”not a refusal.


๐Ÿ“šCitation

If you use OysterI in your research, please cite the following paper:

@article{duan2025oyster,
  title={Oyster-I: Beyond Refusal--Constructive Safety Alignment for Responsible Language Models},
  author={Duan, Ranjie and Liu, Jiexi and Jia, Xiaojun and Zhao, Shiji and Cheng, Ruoxi and Wang, Fengxiang and Wei, Cheng and Xie, Yong and Liu, Chang and Li, Defeng and others},
  journal={arXiv preprint arXiv:2509.01909},
  year={2025}
}

๐Ÿค Contributing We welcome collaboration and discussions in the area of safety alignment:

  • Submit Issues to report problems
  • Submit Pull Requests to improve the model or evaluations
  • Share ideas in Discussions

๐Ÿ“„ License

This project is licensed under the Apache 2.0 License.


๐Ÿ™ Acknowledgements

We thank the open-source community and the researchers advancing AI safety. Oyster-1 is part of Alibaba AAIG's commitment to responsible AI.

The world is your oyster. Letโ€™s build AI that helps everyone find the pearl within.

Downloads last month
4
Safetensors
Model size
14.8B params
Tensor type
BF16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for OysterAI/Oyster_1_Qwen_14B

Base model

Qwen/Qwen3-8B-Base
Finetuned
Qwen/Qwen3-8B
Finetuned
(259)
this model

Datasets used to train OysterAI/Oyster_1_Qwen_14B

Collection including OysterAI/Oyster_1_Qwen_14B