publications

Please refer to my Google Scholar for a complete publication list.

2024

  1. GRATH: Gradual Self-Truthifying for Large Language Models
    Weixin Chen, Dawn Song, and Bo Li
    ICML 2024

2023

  1. TrojDiff: Trojan Attacks on Diffusion Models with Diverse Targets
    Weixin Chen, Dawn Song, and Bo Li
    CVPR 2023
  2. DecodingTrust: A Comprehensive Assessment of Trustworthiness in GPT Models
    Boxin Wang, Weixin Chen, Hengzhi Pei, and 8 more authors
    NeurIPS, Oral & Outstanding Paper Award 2023

2022

  1. Effective Backdoor Defense by Exploiting Sensitivity of Poisoned Samples
    Weixin Chen, Baoyuan Wu, and Haoqian Wang
    NeurIPS, Spotlight 2022