Zheng Zhan

I am a final year PhD student in Machine Learning at Northeastern University, under the supervision of Prof. Yanzhi Wang. I also work closely with Prof. Stratis Ioannidis, Prof. Pu Zhao, and Prof. Jennifer G. Dy. My research interests focus on Efficient Generative AI, Model Compression, and Continual Learning.

I will be on job market since 2024 Fall (can graduate in 2024 Winter or 2025), and am actively looking for post-doctoral scholar and full-time research or MLE positions. Please feel free to contact me (zhan.zhe@northeastern.edu) if our research match.

Email  /  Google Scholar  /  LinkedIn  /  Resume

Research

Recent News
  • 2024/09: Two first-authored papers are accepted at NeurIPS 2024 , totally 3 papers are accepted at NeurIPS 2024 , Streamlined Inference code is available here, Mamba token pruning code is available here
  • 2024/09: One first-authored paper is accepted at EMNLP 2024 Main , code is available here
  • 2024/09: I will extend my internship at Microsoft GenAI (Phi-3.5) team as a part-time student researcher, focusing on LLM-MoE pretraining.
  • 2024/07: I will start a research internship with the Microsoft GenAI (Phi-3.5) team this July, focusing on LLM-MoE pretraining.
  • 2024/07: Two papers are accepted at ECCV 2024 , denoised-weights code is available here, Diffclass code is available here
  • 2024/05: One first-authored paper E2GAN is accepted at ICML 2024 , code is available here
  • 2024/02: One first-authored paper LOTUS is accepted at DAC 2024 , code is available here
  • 2023/08: Our paper MOC is accepted at ICCAD 2023
  • 2023/04: One first-authored paper DualHISC is accepted at ICML 2023 , code is available here
  • 2023/02: One first-authored paper Condense is accepted at DAC 2023
  • 2022/09: One first-authored paper on efficient continual learning is accepted at NeurIPS 2022 , code is available here
  • 2022/07: One first-authored paper All-in-One is accepted at ICCAD 2022
  • 2022/07: Our paper on compiler-aware NAS for real-time super-resolution on mobile is accepted at ECCV 2022 , code is available here
  • 2021/11: Our paper on automatic pruning scheme mapping is accepted at TODAES
  • 2021/09: Our paper on memory-Economic sparse training is accepted at NeurIPS 2021 , code is available here
  • 2021/07: One first-authored paper on NAS and pruning search for real-time super-resolution on mobile is accepted at ICCV 2021
  • Earlier news
Selected Publications
Google Scholar for all publications. * means equal contribution.
E2GAN: Efficient Training of Efficient GANs for Image-to-Image Translation
Yifan Gong*, Zheng Zhan*, Qing Jin, Yanyu Li, Yerlan Idelbayev, Xian Liu,
Andrey Zharkov, Kfir Aberman, Sergey Tulyakov, Yanzhi Wang, Jian Ren
International Conference on Machine Learning (ICML), 2024.
[paper] [demo] [code] [project]

E2GAN construct a lightweight GAN model and employ Low-Rank Adaptation (LoRA) with a simple yet effective rank search process, rather than fine-tuning the entire base model. We also investigate the minimal amount of data necessary for fine-tuning, further reducing the overall training time.

DualHSIC: HSIC-Bottleneck and Alignment for Continual Learning
Zheng Zhan*, Zifeng Wang*, Yifan Gong, Yucai Shao, Stratis Ioannidis, Yanzhi Wang, Jennifer Dy
International Conference on Machine Learning (ICML), 2023.
[paper] [code]

DualHSIC presents a method to improve rehearsal based approach for continual learning. The basic idea proposed here is to leverage inter-task relationships using two concepts related to the Hilbert Schmidt independence criterion (HSIC). One component (HSIC-Bottleneck for Rehearsal (HBR)) helps reduce interference between tasks and the other component (HSIC Alignment - HA) helps share task-invariant knowledge.

Condense: A Framework for Device and Frequency Adaptive Neural Network Models on the Edge
Yifan Gong*, Pu Zhao*, Zheng Zhan*, Yushu Wu, Chao Wu, Zhenglun Kong, Minghai Qin, Caiwen Ding, Yanzhi Wang
Design Automation Conference (DAC), 2023.
[paper]

Propose a two-level algorithm for obtaining subnets with arbitrary ratios in a single model with theoretical proof. Develop a framework which leverages the DVFS and compression techniques to get multiple subnetworks in one neural network to lower the variance of inference runtime for different hardware frequency levels. It’s a much more automatic framework.

SparCL: Sparse Continual Learning on the Edge
Zheng Zhan*, Zifeng Wang*, Yifan Gong, Geng Yuan, Wei Niu, Tong Jian, Bin Ren, Stratis Ioannidis, Yanzhi Wang, Jennifer Dy
Neural Information Processing Systems (NeurIPS), 2022.
[paper] [code]

SparCL explores sparsity for efficient continual learning and achieves both training acceleration and accuracy preservation through the synergy of three aspects: weight sparsity, data efficiency, and gradient sparsity.

All-in-One: A Highly Representative DNN Pruning Framework for Edge Devices with Dynamic Power Management
Zheng Zhan*, Yifan Gong*, Pu Zhao, Yushu Wu, Chao Wu, Caiwen Ding, Weiwen Jiang, Minghai Qin, Yanzhi Wang
International Conference on Computer-Aided Design (ICCAD), 2022.
[paper]

All-in-One, a highly representative pruning framework to work with dynamic power management using DVFS. The framework can use only one set of model weights and soft masks to represent multiple models of various pruning ratios. By re-configuring the model to the corresponding pruning ratio for a specific execution frequency (and voltage), we can keep the difference in speed performance under various execution frequencies as small as possible.

Compiler-Aware Neural Architecture Search for On-Mobile Real-time Super-Resolution
Yushu Wu*, Yifan Gong*, Pu Zhao, Yanyu Li, Zheng Zhan, Wei Niu, Hao Tang, Minghai Qin, Bin Ren, Yanzhi Wang
European Conference on Computer Vision (ECCV), 2022.
[paper] [code]

We propose a compiler-aware SR neural architecture search (NAS) framework that conducts depth search and per-layer width search with adaptive SR blocks. A speed model incorporated with compiler optimizations is leveraged to predict the inference latency of the SR block with various width configurations for faster convergence.

Achieving on-Mobile Real-Time Super-Resolution with Neural Architecture and Pruning Search
Zheng Zhan*, Yifan Gong*, Pu Zhao*, Geng Yuan, Wei Niu, Yushu Wu, Tianyun Zhang, Malith Jayaweera, David Kaeli, Bin Ren, Xue Lin, Yanzhi Wang
International Conference on Computer Vision (ICCV), 2021.
[paper]

We proposes to use neural architecture search and network pruning to find a highly efficient network for image super-resolution. we are the first to achieve real-time SR inference (with only tens of milliseconds per frame) for implementing 720p resolution with competitive perceptual performance on mobile platforms.

Universal Approximation Property and Equivalence of Stochastic Computing-Based Neural Networks and Binary Neural Networks
Yanzhi Wang, Zheng Zhan, Liang Zhao, Jian Tang, Siyue Wang, Jiayu Li, Bo Yuan, Wujie Wen, Xue Lin
AAAI Conference on Artificial Intelligence (AAAI), 2019.
[paper]

We prove that the ”ideal” SCNNs and BNNs satisfy the universal approximation property with probability 1 (due to the stochastic behavior), we further prove that SCNNs and BNNs exhibit the same energy complexity.

Template Credit: Jon Barron