Research FocusAs we deploy many machine learning systems in the real world, the reliability and safety of these systems are crucial. To cope with possible risks brought by these systems, we focus on investigating trustworthy machine learning. At the current stage, we mainly concentrate on statistical hypothesis testing and trustworthy machine learning algorithms. The former one provides fundamental tools for constructing trustworthy machine learning systems, and the latter focuses on addressing concrete risks of some existing machine learning algorithms.
ICML 2020] [ICML 2021] [NeurIPS 2021] [ICML 2022].
Defending against Adversarial AttacksDeep neural networks are susceptible to adversarial examples that are generated by changing natural inputs with malicious perturbation. Those examples are imperceptible to human eyes but can fool deep models to make wrong predictions with high confidence. Thus, to make deep neural networks more reliable, we focus on the following two topics [ICML 2021] [NeurIPS 2021] [ICML 2022] [ICML 2022].
Being Aware of Out-of-distribution/Open-set DataThe success of supervised learning is established on an implicit assumption that training and test data share the same distribution (especially share the same label set), i.e., the in-distribution (ID) assumption. However, test data distribution in many real-world scenarios may violate the assumption and, instead, contain out-of-distribution (OOD) data whose label set is different from ID data. Given a well-trained ID classifier, if this classifier classifies OOD data as ID classes, we might face serious accidents when deploying the classifier into real-world scenarios. To mitigate the risk of OOD data, we focus on the following topics [ICML 2021].
Learning/Inference under Distribution Shift (a.k.a., Transfer Learning)Test data are not perfect (e.g., only few data are available) and might have different distribution from the training data. To complete the task on such imperfect test data (or the target domain), we want to leverage the knowledge from domains with abundant labels (i.e., source domains)/pre-trained models (i.e., source models) to complete classification/clustering tasks in an unlabeled domain (i.e., target domain), where two domains are different but related. Specifically, we focus on the following topics [NeurIPS 2019] [IJCAI 2020] [AAAI 2021] [NeurIPS 2021 (Spotlight)] [ICLR 2022 (Spotlight)].
Protecting Data PrivacyWith the development of machine learning (ML) algorithms, deep neural networks (DNNs) are increasingly adopted in various privacy-sensitive applications, such as facial recognition, medical diagnoses, and intelligent virtual assistants. Since training DNNs could involve processing sensitive and proprietary datasets in privacy-related applications, there are great concerns about privacy leakage. To protect the privacy of individuals whose personal information is used during the training, enterprises typically release only well-trained DNNs through ML-as-a-services platforms, wherein users can download pre-trained models (e.g., Pytorch Hub) or query the model via some sort of programming or user interfaces (e.g., Amazon Recognition), which are referred to as white-box access and black-box access, respectively. However, a pre-trained model can still be used to restore the orginal training data. To prevent the data-leakage issue of pre-trained models, we focus on the following topics [KDD 2022].