Haisheng Chen
About Me
I am Haisheng Chen, a master’s student at ECE, UC San Diego and a research intern in Z Lab, under the supervision of Dr. Zhijian Liu. My research centers on efficient LLM – everything from model quantization and sparse-attention to building inference systems.
Before coming to UC San Diego, I spent a gap year at AMD’s Shanghai office as a full-time GPU software intern. There, I gained a deep understanding of GPU architectures and honed my skills in developing high-performance CUDA kernels. I also helped enable AMD’s support for the vLLM inference framework—designing Docker-based deployment pipelines and troubleshooting a variety of compatibility issues along the way.
I’m passionate about contributing to open-source LLM inference projects like vLLM and SGLang. Just like countless other community-driven initiatives, these systems thrive on collaboration from developers around the world. And beyond the thrill of low-level optimizations, I find it endlessly fascinating to put billions of matmuls and other computations into producing what feels like intelligence.
Career Goal
I will graduate from UC San Diego in December 2025. I am now actively seeking roles in the LLM systems industry. I am open to opportunities based in the United States or China and eager to contribute to efficient large-scale LLM deployment.
About this site
This site is designed with the following goals in mind:
Discuss LLM System Technical Details
Dive into the architectures, optimizations, and implementation tricks that power large-language-model inference systems.Showcase Research Projects
Haisheng Chen