Cheng's picture

Cheng

RosyCheng

·

https://scholar.google.com/citations?user=smUBVOQAAAAJ&hl=en

Rosy0912

AI & ML interests

LLM Alignment&Security

Organizations

authored 6 papers 2 months ago

Inverse Reinforcement Learning with Dynamic Reward Scaling for LLM Alignment

Paper • 2503.18991 • Published Mar 23

PBI-Attack: Prior-Guided Bimodal Interactive Black-Box Jailbreak Attack for Toxicity Maximization

Paper • 2412.05892 • Published Dec 8, 2024

Gibberish is All You Need for Membership Inference Detection in Contrastive Language-Audio Pretraining

Paper • 2410.18371 • Published Oct 24, 2024

TUNI: A Textual Unimodal Detector for Identity Inference in CLIP Models

Paper • 2405.14517 • Published May 23, 2024

AGR: Age Group fairness Reward for Bias Mitigation in LLMs

Paper • 2409.04340 • Published Sep 6, 2024

Reinforcement Learning from Multi-role Debates as Feedback for Bias Mitigation in LLMs

Paper • 2404.10160 • Published Apr 15, 2024