Please refer to my Google Scholar for a complete publication list.


  1. GRATH: Gradual Self-Truthifying for Large Language Models
    Weixin Chen, Dawn Song, and Bo Li
    ICML 2024


  1. TrojDiff: Trojan Attacks on Diffusion Models with Diverse Targets
    Weixin Chen, Dawn Song, and Bo Li
    CVPR 2023
  2. DecodingTrust: A Comprehensive Assessment of Trustworthiness in GPT Models
    Boxin Wang, Weixin Chen, Hengzhi Pei, and 8 more authors
    NeurIPS, Oral & Outstanding Paper Award 2023


  1. Effective Backdoor Defense by Exploiting Sensitivity of Poisoned Samples
    Weixin Chen, Baoyuan Wu, and Haoqian Wang
    NeurIPS, Spotlight 2022