Hello! 👋

I'm Yuwei (Peter) Sun

MCS @ UIUC · High-Performance Computing Engineer

你好!👋

I'm 孙钰伟(Peter)

纽约大学(NYU) -> 伊利诺伊大学香槟分校(UIUC) · 高性能计算 & AI Infra

News

These updates mark key milestones along my journey.

Swipe or scroll horizontally

Nov 2025

Paper accepted to HPCA 2026

A paper I contributed to, HDPAT: Hierarchical Distributed Page Address Translation for Wafer-Scale GPUs, has been accepted. Many thanks to Daoxuan and Prof. Yifan.

Sep 2025

Poster accepted to SC25 Poster Session

A poster summarizing my summer internship work, Optimizing and Extending Periodogram Computations for Astronomy, has been accepted. Many thanks to my mentor, Dr. Lehman Garrison.

Aug 2025

Started MCS program at UIUC

Excited to begin the Master of Computer Science program at the University of Illinois Urbana-Champaign (UIUC) and to grow academically and professionally.

May 2025

Joined Flatiron Institute as HPC Intern

High-performance computing internship at the Flatiron Institute (Simons Foundation), focusing on parallel optimization for HPC applications.

Mar 2025

Paper accepted to ISPASS 2025

Our work, Luthier: A Dynamic Binary Instrumentation Framework Targeting AMD GPUs, has been accepted to ISPASS 2025. Many thanks to Daoxuan and Prof. Yifan.

Aug 2024

Joined the Scalable Architecture Lab as Research Intern

Worked on GPU performance profiling tools and the Wafer-Scale GPU project under the guidance of Prof. Yifan.

May 2024

Contributed to Reform Hipify

Will work at NYU HPC during the summer, contributing to the reformWeb and Reform projects and planning to continue working on AMD HPC Application Adaptation for the remainder of my time at NYU.

Nov 2023

Participated in SC23 Student Cluster Competition

Competed in the SC23 Student Cluster Competition, achieving 6th place globally —go team NYU!

Jan 2023

Received UCP Fellowship

Honored to be selected for the Uber Career Prep (UCP) 2023 Software Engineering Fellowship Program.

Apr 2022

Participated in Google HPS 2022

Participated in the Google Hardware Product Sprint 2022, developing a PCB-based clock as part of the EE track.

Dec 2021

Accepted to NYU Tandon

Accepted to NYU Tandon to study Mathematics and Computer Science and excited to begin my study abroad experience in the U.S.

最新动态

这些更新记录了我旅程中的重要节点。

左右滑动或横向滚动查看更多

2025 年 11 月

论文被 HPCA 2026 录用

我参与的论文 HDPAT: Hierarchical Distributed Page Address Translation for Wafer-Scale GPUs 被录用。感谢 Daoxuan学长 和 Yifan 教授的指导。

2025 年 9 月

海报入选 SC25 海报展示

暑期实习工作的总结海报 Optimizing and Extending Periodogram Computations for Astronomy 被录用。感谢导师 Dr. Lehman Garrison。

2025 年 8 月

开启 UIUC MCS 学习

很高兴加入伊利诺伊大学香槟分校 (UIUC) 计算机科学硕士项目,期待学术与职业上的成长。

2025 年 5 月

加入 Flatiron Institute 担任 HPC 实习生

在 Flatiron Institute (Simons Foundation) 从事高性能计算实习,专注于 HPC 应用的并行优化。

2025 年 3 月

论文被 ISPASS 2025 录用

我们的工作 Luthier: A Dynamic Binary Instrumentation Framework Targeting AMD GPUs 被 ISPASS 2025 录用。感谢 Daoxuan 学长 和 Yifan 教授。

2024 年 8 月

加入可扩展体系结构实验室

在 Yifan 教授指导下,参与 GPU 性能分析工具与晶圆级 GPU 项目的内存系统研究。

2024 年 5 月

参与 Reform Hipify

将在暑期继续在 NYU HPC 工作,贡献 reformWeb 和 Reform,并持续推进 AMD HPC 应用适配。

2023 年 11 月

参加 SC23 学生超算竞赛

参与 SC23 学生超算竞赛,取得 全球第六名,NYU 队加油!

2023 年 1 月

获得 UCP Fellowship

很荣幸入选 Uber Career Prep (UCP) 2023 软件工程奖学金项目。

2022 年 4 月

参加 Google HPS 2022

参加 Google Hardware Product Sprint 2022 电子电路工程方向,设计并制作一块基于 PCB 的时钟。

2021 年 12 月

被 NYU Tandon 录取

获得 NYU Tandon 数学与计算机科学录取,期待开始在美国的留学生活。

About Me

I build high-performance computation systems by applying HPC principles to model inference and infrastructure. My goal is to make large-scale AI and HPC applicationstion faster, cheaper, and smarter.

Keywords

High Performance Computing (HPC) AI Infrastructure Cloud Computing Kernel Engineering

Skills

Python C/C++ CUDA Flask Redis MySQL Docker SLURM Singularity CMake Git/GitHub Linux OpenMP MPI GCP CI/CD Raspberry Pi cProfile

关于我

我通过将 HPC 原理应用到模型推理和基础设施,构建高性能计算系统,让大规模 AI 与 HPC 应用更快、更高效、更智能。

关键词

高性能计算(HPC) AI 基础设施 云计算 内核工程

技能

Python C/C++ CUDA Flask Redis MySQL Docker SLURM Singularity CMake Git/GitHub Linux OpenMP MPI GCP CI/CD Raspberry Pi cProfile

Work Experience

High Performance Computing Intern

Flatiron Institute, Simons Foundation

May 2025 - Aug 2025
  • Extended nifty-ls to support multiple sinusoidal basis terms for flexible Lomb-Scargle periodograms.
  • Achieved a 100× speedup by binding and parallelizing compute-heavy routines from Python to C++ with nanobind and OpenMP.
  • Further accelerated batched computation using CuPy and kernel-level optimized CUDA kernels, improving scalability by 50×.
  • Improved FastChi2 numerical error from 1e-3 to 1e-6 by rewriting trigonometric summations and adopting FINUFFT.

Part-time Research Engineer

New York University

Jan 2024 - May 2025
  • Benchmarked and compiled Gaussian and Amber on NVIDIA Grace Hopper ARM CPUs, evaluating energy efficiency and compatibility.
  • Led migration from a physical HPC cluster to an OpenShift-based on-prem cloud using Podman, Singularity, and Kubernetes.

HPC Assistant

NYU High-Performance Computing Team

May 2024 - Sep 2024
  • Enhanced Reform with sequential processing, compressed file handling, and parallel execution for genomics workloads.
  • Updated ReformWeb to run Reform on HPC servers via a web UI with Flask, SQLite, Redis, Werkzeug, and Jinja2.
  • Built CI/CD with GitHub Actions and Python unittest plus a Bash-based verification pipeline with TTL-based log retention.

Part-time Research Engineer

NYU High-Performance Computing Team

Jan 2024 - May 2024
  • Translated CUDA codebases to HIP with HIPify, enabling AMD GPU support for AlphaFold and DualPhysics on ROCm.
  • Contributed fixes to HIPify for Math_constant.h issues to improve CUDA-to-HIP translation reliability.
  • Evaluated cloud based ARM CPU suitability on AWS Graviton and GCP Axion by compiling HPL, Amber24, and Gaussian and benchmarking price-performance.

工作经历

高性能计算实习生

Flatiron Institute, Simons Foundation

2025 年 5 月 - 2025 年 8 月
  • 扩展 nifty-ls,使其支持多组正弦基函数,灵活计算 Lomb-Scargle 周期图。
  • 用 nanobind 将计算密集函数从 Python 绑定到 C++ 并通过 OpenMP 并行化,单核状态下速度提升 100倍。
  • 借助 CuPy 和经核级优化的 CUDA kernel 加速批处理计算,可扩展性再提升 50倍。
  • 引入 FINUFFT重构三角求和计算核心,将 FastChi2 数值误差从 1e-3 降到 1e-6。

研究工程师(兼职)

New York University

2024 年 1 月 - 2025 年 5 月
  • 在 NVIDIA Grace Hopper ARM CPU 上编译并基准测试 Gaussian 与 Amber,评估ARM CPU能效与ARM 生态对于HPC的兼容性。
  • 使用 Podman、Singularity、Kubernetes 将物理 HPC 集群迁移计算环境和数据到基于 OpenShift 的本地云。

HPC 助理

NYU High-Performance Computing Team

2024 年 5 月 - 2024 年 9 月
  • 为 Reform 增强顺序处理、压缩文件处理和并行执行,适配基因组工作负载。
  • 更新 ReformWeb,使 Reform 可通过 Flask、SQLite、Redis、Werkzeug、Jinja2 在 HPC 服务器上运行。
  • 构建 GitHub Actions + Python unittest 的 CI/CD,并用 Bash 验证流水线与 TTL 日志保留。

兼职研究工程师

NYU High-Performance Computing Team

2024 年 1 月 - 2024 年 5 月
  • 用 HIPify 将 CUDA 代码翻译为 HIP,使 AlphaFold 和 DualPhysics 支持 AMD GPU 的 ROCm 平台。
  • 为 HIPify 修复 Math_constant.h 相关问题,提高 CUDA 到 HIP 的转换可靠性。
  • 在 AWS Graviton 与 GCP Axion 上编译 HPL、Amber24、Gaussian 并基准测试,评估基于 ARM CPU 云计算的性价比。

Research

I research heterogeneous computing for scientific applications and wafer-scale GPU architectures. My work includes collaborating with the Flatiron Institute on parallel optimization of periodograms and with the Scalable Architecture Lab on profiling and memory systems for large accelerators.

R01

HPC Scientific Software

Flatiron Institute, Simons Foundation

Sep 2025

Poster on optimizing and extending Lomb-Scargle periodogram computations for astronomy using OpenMP and CUDA

Mentor: Dr. Lehman Garrison

Publication: Optimizing and Extending Periodogram Computations for Astronomy

R02

Wafer-scale GPU Architecture

Scalable Architecture Lab

March 2025

Hierarchical distributed page address translation for wafer-scale GPUs to improve memory-system scalability.

Mentor: Prof. Yifan Sun

Publication: HDPAT: Hierarchical Distributed Page Address Translation for Wafer-Scale GPUs

R03

GPU Performance Tooling

Scalable Architecture Lab

Sep 2024

Dynamic binary instrumentation for AMD GPUs to study performance of large-scale accelerators.

Mentor: Prof. Yifan Sun

Publication: Luthier: A Dynamic Binary Instrumentation Framework Targeting AMD GPUs

研究

我研究面向科学应用的异构计算与晶圆级 GPU 架构。工作包括与 Flatiron Institute 合作,优化周期图的并行计算,以及在可扩展体系结构实验室进行大规模加速器的性能分析与内存系统研究。

R01

HPC 科学软件

Flatiron Institute, Simons Foundation

2025 年 9 月

海报展示:通过 OpenMP 和 CUDA 优化并扩展 Lomb-Scargle 周期图计算。

导师: Dr. Lehman Garrison

论文: Optimizing and Extending Periodogram Computations for Astronomy

R02

晶圆级 GPU 架构

Scalable Architecture Lab

2025 年 3 月

提出晶圆级 GPU 的分层分布式页地址转换方案,提升内存系统可扩展性。

导师: Prof. Yifan Sun

论文: HDPAT: Hierarchical Distributed Page Address Translation for Wafer-Scale GPUs

R03

GPU 性能工具链

Scalable Architecture Lab

2024 年 9 月

为 AMD GPU 设计动态二进制插桩工具,研究GPU加速器的性能特征。

导师: Prof. Yifan Sun

论文: Luthier: A Dynamic Binary Instrumentation Framework Targeting AMD GPUs

Education

University of Illinois Urbana-Champaign

Master of Computer Science

Aug 2025 - May 2027

Highlights

  • Focused on high-performance computing and GPU optimization.

New York University

B.S in CS, Minor in Math

Sep 2021 - May 2025

Highlights

  • HPC Assistant of the NYU High-Performance Computing Team.

Honors

  • University Honors
  • Tandon Dean's List
  • Uber Career Prep Fellowship

教育背景

伊利诺伊大学香槟分校(UIUC)

计算机科学硕士(MCS)

2025 年 8 月 - 2027 年 5 月

亮点

  • 专注高性能计算与 GPU 优化。

纽约大学(NYU)

计算机科学学士,数学辅修

2021 年 9 月 - 2025 年 5 月

亮点

  • NYU 高性能计算团队助理。

荣誉

  • University Honors 荣誉
  • Tandon 院长学生名单
  • 优步(Uber)软件开发职业奖学金