Haisheng Chen

About Me

I am Haisheng Chen, a master’s student at ECE, UC San Diego and a research intern in Z Lab, under the supervision of Dr. Zhijian Liu. My research centers on efficient LLM – everything from model quantization and sparse-attention to building inference systems.

Before coming to UC San Diego, I spent a gap year at AMD’s Shanghai office as a full-time GPU software intern. There, I gained a deep understanding of GPU architectures and honed my skills in developing high-performance CUDA kernels. I also helped enable AMD’s support for the vLLM inference framework—designing Docker-based deployment pipelines and troubleshooting a variety of compatibility issues along the way.

I’m passionate about contributing to open-source LLM inference projects like vLLM and SGLang. Just like countless other community-driven initiatives, these systems thrive on collaboration from developers around the world. And beyond the thrill of low-level optimizations, I find it endlessly fascinating to put billions of matmuls and other computations into producing what feels like intelligence.

Career Goal

I will graduate from UC San Diego in December 2025. I am now actively seeking roles in the LLM systems industry. I am open to opportunities based in the United States or China and eager to contribute to efficient large-scale LLM deployment.

About this site

This site is designed with the following goals in mind:

  1. Discuss LLM System Technical Details
    Dive into the architectures, optimizations, and implementation tricks that power large-language-model inference systems.

  2. Showcase Research Projects

Author

Haisheng Chen

Posted on

2025-06-17

Updated on

2025-07-16

Licensed under