These updates mark key milestones along my journey.
Swipe or scroll horizontally
这些更新记录了我旅程中的重要节点。
左右滑动或横向滚动查看更多
I build high-performance computation systems by applying HPC principles to model inference and infrastructure. My goal is to make large-scale AI and HPC applicationstion faster, cheaper, and smarter.
Keywords
Skills
我通过将 HPC 原理应用到模型推理和基础设施,构建高性能计算系统,让大规模 AI 与 HPC 应用更快、更高效、更智能。
关键词
技能
Flatiron Institute, Simons Foundation
New York University
NYU High-Performance Computing Team
NYU High-Performance Computing Team
Flatiron Institute, Simons Foundation
New York University
NYU High-Performance Computing Team
NYU High-Performance Computing Team
I research heterogeneous computing for scientific applications and wafer-scale GPU architectures. My work includes collaborating with the Flatiron Institute on parallel optimization of periodograms and with the Scalable Architecture Lab on profiling and memory systems for large accelerators.
R01
Flatiron Institute, Simons Foundation
Sep 2025
Poster on optimizing and extending Lomb-Scargle periodogram computations for astronomy using OpenMP and CUDA
Mentor: Dr. Lehman Garrison
Publication: Optimizing and Extending Periodogram Computations for Astronomy
R02
Scalable Architecture Lab
March 2025
Hierarchical distributed page address translation for wafer-scale GPUs to improve memory-system scalability.
Mentor: Prof. Yifan Sun
Publication: HDPAT: Hierarchical Distributed Page Address Translation for Wafer-Scale GPUs
R03
Scalable Architecture Lab
Sep 2024
Dynamic binary instrumentation for AMD GPUs to study performance of large-scale accelerators.
Mentor: Prof. Yifan Sun
Publication: Luthier: A Dynamic Binary Instrumentation Framework Targeting AMD GPUs
我研究面向科学应用的异构计算与晶圆级 GPU 架构。工作包括与 Flatiron Institute 合作,优化周期图的并行计算,以及在可扩展体系结构实验室进行大规模加速器的性能分析与内存系统研究。
R01
Flatiron Institute, Simons Foundation
2025 年 9 月
海报展示:通过 OpenMP 和 CUDA 优化并扩展 Lomb-Scargle 周期图计算。
导师: Dr. Lehman Garrison
论文: Optimizing and Extending Periodogram Computations for Astronomy
R02
Scalable Architecture Lab
2025 年 3 月
提出晶圆级 GPU 的分层分布式页地址转换方案,提升内存系统可扩展性。
导师: Prof. Yifan Sun
论文: HDPAT: Hierarchical Distributed Page Address Translation for Wafer-Scale GPUs
R03
Scalable Architecture Lab
2024 年 9 月
为 AMD GPU 设计动态二进制插桩工具,研究GPU加速器的性能特征。
导师: Prof. Yifan Sun
论文: Luthier: A Dynamic Binary Instrumentation Framework Targeting AMD GPUs
Master of Computer Science
Highlights
B.S in CS, Minor in Math
Highlights
Honors
计算机科学硕士(MCS)
亮点
计算机科学学士,数学辅修
亮点
荣誉
Participated in the planning and implementation of the 'Buy Now' feature. Enhanced throughput from 1,000 to 50,000 QPS by integrating Redis and RocketMQ within a MySQL primary-replica architecture.
Developed a Python E20 assembler and C++ CPU and cache simulators to model instruction execution and memory behavior.
参与“立即购买”功能的规划与实现;在 MySQL 主从架构中引入 Redis 与 RocketMQ,将吞吐从 1,000 提升到 50,000 QPS。
用 Python 开发 E20 汇编器,并用 C++ 编写 CPU 与缓存模拟器,模拟指令执行与内存行为。