Models and dataset used in paper "The Jailbreak Tax: How Useful Are Your Jailbreak Outputs"
AI & ML interests
Security, privacy, and trustworthiness of machine learning systems.
Datasets and models used for the trojan detection competition co-located at SaTML 2024: https://github.com/ethz-spylab/rlhf_trojan_competition
-
Competition Report: Finding Universal Jailbreak Backdoors in Aligned LLMs
Paper • 2404.14461 • Published • 2 -
Universal Jailbreak Backdoors from Poisoned Human Feedback
Paper • 2311.14455 • Published • 1 -
ethz-spylab/poisoned_generation_trojan1
Text Generation • Updated • 86 • 5 -
ethz-spylab/poisoned_generation_trojan2
Text Generation • Updated • 6 • 1
Models and dataset used in paper "The Jailbreak Tax: How Useful Are Your Jailbreak Outputs"
Models and datasets used for our paper "Universal Jailbreak Backdoors from Poisoned Human Feedback"
Datasets and models used for the trojan detection competition co-located at SaTML 2024: https://github.com/ethz-spylab/rlhf_trojan_competition
-
Competition Report: Finding Universal Jailbreak Backdoors in Aligned LLMs
Paper • 2404.14461 • Published • 2 -
Universal Jailbreak Backdoors from Poisoned Human Feedback
Paper • 2311.14455 • Published • 1 -
ethz-spylab/poisoned_generation_trojan1
Text Generation • Updated • 86 • 5 -
ethz-spylab/poisoned_generation_trojan2
Text Generation • Updated • 6 • 1