Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
Cheng's picture

Cheng

RosyCheng
·
https://scholar.google.com/citations?user=smUBVOQAAAAJ&hl=en
  • Rosy0912

AI & ML interests

LLM Alignment&Security

Organizations

Southeast Univeristy's profile picture Oyster's profile picture

authored 6 papers 2 months ago

Inverse Reinforcement Learning with Dynamic Reward Scaling for LLM Alignment

Paper • 2503.18991 • Published Mar 23

PBI-Attack: Prior-Guided Bimodal Interactive Black-Box Jailbreak Attack for Toxicity Maximization

Paper • 2412.05892 • Published Dec 8, 2024

Gibberish is All You Need for Membership Inference Detection in Contrastive Language-Audio Pretraining

Paper • 2410.18371 • Published Oct 24, 2024

TUNI: A Textual Unimodal Detector for Identity Inference in CLIP Models

Paper • 2405.14517 • Published May 23, 2024

AGR: Age Group fairness Reward for Bias Mitigation in LLMs

Paper • 2409.04340 • Published Sep 6, 2024

Reinforcement Learning from Multi-role Debates as Feedback for Bias Mitigation in LLMs

Paper • 2404.10160 • Published Apr 15, 2024
Company
TOS Privacy About Jobs
Website
Models Datasets Spaces Pricing Docs