I am was a research scientist at Intel Labs. Before that, I was a Ph.D. student at Georgia Tech, where I worked on the topics in Computer Architecture with Prof. Hyesoon Kim and other researchers. My research interests are in the interactions between microarchitecture, compilers, and operating systems to efficiently enable emerging architectures and technologies. I worked as a system programmer at a startup in Korea (2002-2005, 2007-2008) and also as an intern at AMD Research (2012)/Intel Labs (2013). I received my Ph.D. from Georgia Tech (2015) and my B.S. degree from Seoul National University (2007).

Students

I am fortunate to work with talented and hardworking students. :D
MS/PhD
●  Junseo Lee (Spring 2022 -); Past: B.S. in ECE, Seoul National University
●  Kwanseok Choi (Spring 2022 -); Past: B.S. in ECE, Seoul National University
●  Wonbeom Lee (Spring 2023 -); Past: B.S. in ECE, Seoul National University
●  Jungi Lee (Spring 2023 -); Past: B.S. in ECE, Seoul National University
●  Seokwon Lee (Spring 2023 -); Past: B.S. in ECE, Seoul National University
●  Jaehoon Cho (Spring 2023 -); Past: B.S. in ECE, Seoul National University
●  Soohyun Cha (Spring 2024 -); Past: B.S. in ECE, Seoul National University
●  Junyong Park (Spring 2024 -); Past: B.S. in ECE, Seoul National University
●  Sangyun Jeon (Spring 2024 -); Past: B.S. in ECE, Seoul National University
●  Jaisung Kim (Fall 2024 -); Past: B.S. in Physics/ECE, Seoul National University
Alumni
●  Joonho Whangbo (2021-2022; Undergraduate Researcher); Now: PhD at UC Berkeley

! I will serve on the program committee for HPCA 2025. Please submit your best work!

Publications

OSDI'24 InfiniGen: Efficient Generative Inference of Large Language Models with Dynamic KV Cache Management
Wonbeom Lee†, Jungi Lee†, Junghwan Seo, Jaewoong Sim
Proc. of the 18th USENIX Symposium on Operating Systems Design and Implementation (OSDI), Santa Clara, CA, July 2024
[ code] [ talk]

ISCA-51 Tender: Accelerating Large Language Models via Tensor Decompostion and Runtime Requantization
Jungi Lee†, Wonbeom Lee†, Jaewoong Sim
Proc. of the 51st International Symposium on Computer Architecture (ISCA), Buenos Aires, Argentina, June 2024
[ code]

DAC-61 MoNDE: Mixture of Near-Data Experts for Large-Scale Sparse Models
Taehyun Kim, Kwanseok Choi, Youngmock Cho, Jaehoon Cho, Hyuk-Jae Lee, Jaewoong Sim
Proc. of the 61st Design Automation Conference (DAC), San Francisco, CA, June 2024

TODAES CuPBoP: Making CUDA a Portable Language
Ruobing Han, Jun Chen, Bhanu Garg, Xule Zhou, John Lu, Jeffrey Young, Jaewoong Sim, Hyesoon Kim
ACM Transactions on Design Automation of Electronic Systems (TODAES), June 2024

ASPLOS'24 GSCore: Efficient Radiance Field Rendering via Architectural Support for 3D Gaussian Splatting
Junseo Lee, Seokwon Lee, Jungi Lee, Junyong Park, Jaewoong Sim
Proc. of the 2024 International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), San Diego, CA, April 2024
[Lighting Talk]

PACT-32 SDM: Sharing-Enabled Disaggregated Memory System with Cache Coherent Compute Express Link
Hyokeun Lee, Kwanseok Choi, Hyuk Jae Lee, Jaewoong Sim
Proc. of the 32nd International Conference on Parallel Architectures and Compilation Techniques (PACT), Vienna, Austria, Oct 2023
[Slides]

ISCA-50 NeuRex: A Case for Neural Rendering Acceleration
Junseo Lee, Kwanseok Choi, Jungi Lee, Seokwon Lee, Joonho Whangbo, Jaewoong Sim
Proc. of the 50th International Symposium on Computer Architecture (ISCA), Orlando, FL, June 2023
[Lighting Talk]

PPoPP'23 CuPBoP: A Framework to Make CUDA Portable
Ruobing Han, Jun Chen, Bhanu Garg, Jeffrey Young, Jaewoong Sim, Hyesoon Kim
Proc. of the 28th International Symposium on Principles and Practice of Parallel Programming (PPoPP) - Poster, Montreal, Canada, Feb 2023

TACO COX: Exposing CUDA Warp-Level Functions to CPUs
Ruobing Han, Jaewon Lee, Jaewoong Sim, Hyesoon Kim
ACM Transactions on Architecture and Code Optimization (TACO), September 2022

TRETS Specializing FGPU for Persistent Deep Learning
Rui Ma, Jia-Ching Hsu, Tian Tan, Eriko Nurvitadhi, David Sheffield, Rob Pelt, Martin Langhammer, Jaewoong Sim, Aravind Dasu and Derek Chiou
ACM Transactions on Reconfigurable Technology and Systems (TRETS), June 2021

ASPLOS'20 Batch-Aware Unified Memory Management in GPUs for Irregular Workloads
Hyojong Kim, Jaewoong Sim, Prasun Gera, Ramyad Hadidi, Hyesoon Kim
Proc. of the 2020 International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), Lausanne, Switzerland, March 2020

JPDC Thermal-Aware Processing-in-Memory Instruction Offloading
Lifeng Nai, Ramyad Hadidi, He Xiao, Hyojong Kim, Jaewoong Sim, Hyesoon Kim
Jounal of Parallel and Distributed Computing (JPDC) 2019

FCCM'19 Why Compete When You Can Work Together: FPGA-ASIC Integration for Persistent RNNs
Eriko Nurvitadhi, Dongup Kwon, Ali Jafari, Andrew Boutros, Jaewoong Sim, Phillip Tomson, Huseyin Sumbul, Gregory Chen, Phil Knag, Raghavan Kumar, Ram Krishnamurthy and Debbie Marr
Proc. of the 27th IEEE International Symposium on Field-Programmable Custom Computing Machines (FCCM), San Diego, California, Apr 2019.

IPDPS'18 CoolPIM: Thermal-Aware Source Throttling for Efficient PIM Instruction Offloading
Lifeng Nai, Ramyad Hadidi, He Xiao, Hyojong Kim, Jaewoong Sim, Hyesoon Kim
Proc. of the 32nd IEEE International Parallel and Distributed Processing Symposium (IPDPS), Vancouver, BC, Canada, May 2018

FPGA'18 A Customizable Matrix Multiplication Framework for the Intel HARPv2 Xeon+FPGA Platform: A Deep Learning Case Study
Duncan Moss, Srivatsan Krishnan, Eriko Nurvitadhi, Piotr Ratuszniak, Chris Johnson, Jaewoong Sim, Asit Mishra, Debbie Marr, Suchit Subhaschandra, Philip H.W. Leong
Proc. of the 26th ACM International Symposium on Field-Programmable Gate Arrays (FPGA), Monterey, California, Feb 2018

FPL-27 High Performance Binary Neural Networks on the Xeon+FPGA Platform
Duncan Moss, Eriko Nurvitadhi, Jaewoong Sim, Asit Mishra, Debbie Marr, Suchit Subhaschandra, Philip H.W. Leong
Proc. of the 27th International Conference on Field-Programmable Logic and Applications (FPL), Ghent, Belgium, Sep 2017

FPGA'17 Can FPGAs Beat GPUs in Accelerating Next-Generation Deep Neural Networks?
Eriko Nurvitadhi, Ganesh Venkatesh, Jaewoong Sim, Debbie Marr, Randy Huang, Jason Gee Hock Ong, Yeong Tat Liew, Srivatsan Krishnan, Duncan Moss, Suchit Subhaschandra, Guy Boudoukh
Proc. of the 25th ACM International Symposium on Field-Programmable Gate Arrays (FPGA), Monterey, California, Feb 2017 (Covered in the news by the Next Platform)
[PDF]

HPCA-23 GraphPIM: Enabling Instruction-Level PIM Offloading in Graph Computing Frameworks
Lifeng Nai, Ramyad Hadidi, Jaewoong Sim, Hyojong Kim, Pranith Kumar, Hyesoon Kim
Proc. of the 23rd International Symposium on High Performance Computer Architecture (HPCA), Austin, TX, Feb 2017
[PDF]

FPT'16 Accelerating Binarized Neural Networks: Comparison of FPGA, CPU, GPU, and ASIC
Eriko Nurvitadhi, David Sheffield, Jaewoong Sim, Asit Mishra, Ganesh Venkatesh, Debbie Marr
Proc. of the 2016 International Conference on Field-Programmable Technology (FPT), Xi'an, China, Dec 2016
[PDF]

FPL-26 Accelerating Recurrent Neural Networks in Analytics Servers: Comparison of FPGA, CPU, GPU, and ASIC
Eriko Nurvitadhi, Jaewoong Sim, David Sheffield, Asit Mishra, Srivatsan Krishnan, Debbie Marr
Proc. of the 26th International Conference on Field-Programmable Logic and Applications (FPL), Lausanne, Switzerland, Aug 2016
[PDF]

PACT-24 BSSync: Processing Near Memory for Machine Learning Workloads with Bounded Staleness Consistency Models
Joo Hwan Lee, Jaewoong Sim, Hyesoon Kim
Proc. of the 24th International Conference on Parallel Architectures and Compilation Techniques (PACT), San Francisco, CA, Oct 2015 (Best Paper Award)
[PDF] [BibTex]

MICRO-47 Transparent Hardware Management of Stacked DRAM as Part of Memory
Jaewoong Sim, Alaa R. Alameldeen, Zeshan Chishti, Chris Wilkerson, Hyesoon Kim
Proc. of the 47th International Symposium on Microarchitecture (MICRO), Cambridge, UK, Dec 2014
[PDF] [Slides] [Poster] [BibTex]

TOPPICKS'14 A Configurable and Strong RAS Solution for Die-Stacked DRAM Caches
Jaewoong Sim, Gabriel H. Loh, Vilas Sridharan, Mike O'Connor
IEEE Micro, Special Issue: Micro's Top Picks from 2013 Computer Architecture Conferences (MICRO TOP PICKS), May/June 2014
[Paper] [BibTex]

ISCA-40 Resilient Die-stacked DRAM Caches
Jaewoong Sim, Gabriel H. Loh, Vilas Sridharan, Mike O'Connor
Proc. of the 40th International Symposium on Computer Architecture (ISCA), Tel-Aviv, Israel, June 2013
One of the 12 computer architecture papers of 2013 selected as Top Picks by IEEE MICRO
[PDF] [Slides] [BibTex]

MICRO-45 A Mostly-Clean DRAM Cache for Effective Hit Speculation and Self-Balancing Dispatch
Jaewoong Sim, Gabriel H. Loh, Hyesoon Kim, Mike O'Connor, Mithuna Thottethodi
Proc. of the 45th International Symposium on Microarchitecture (MICRO), Vancouver, BC, Canada, Dec 2012
[PDF] [Slides] [Poster] [BibTex]

ISCA-39 FLEXclusion: Balancing Cache Capacity and On-Chip Bandwidth via Flexible Exclusion
Jaewoong Sim, Jaekyu Lee, Moinuddin K. Qureshi, Hyesoon Kim
Proc. of the 39th International Symposium on Computer Architecture (ISCA), Portland, OR, June 2012
[PDF] [Slides] [BibTex]

PPoPP'12 A Performance Analysis Framework for Identifying Potential Benefits in GPGPU Applications
Jaewoong Sim, Aniruddha Dasgupta, Hyesoon Kim, Richard Vuduc
Proc. of the 17th International Symposium on Principles and Practice of Parallel Programming (PPoPP), New Orleans, LA, Feb 2012
[PDF] [Slides] [BibTex]

Patents

Method and System for Efficient Floating-Point Compression US11416248
Apparatus and Method for Gang Invariant Operation Optimizations US11093250
Multi-Level System Memory Having Near Memory Space Capable of Behaving as Near Memory Cache or Fast Addressable System Memory Depending on System State US20180088853
Method and Apparatus for Implementing a Heterogeneous Memory Subsystem US20150278091
Predicting Outcomes for Memory Requests in a Cache Memory US20140143502
Dynamically Configuring Regions of a Main Memory in a Write-Back Mode or a Write-Through Mode US20140143505
Memory Scheduling for RAM Caches Based on Tag Caching US20140181384
Dirty Cacheline Duplication US20140173379
Bypassing a Cache when Handling Memory Requests US20140143493
Partitioning Caches for Sub-Entities in Computing Devices US20140173211
Bypassing Memory Requests to a Main Memory US20140164713

Professional Services

  • Program Committee: HPCA 2025
  • Program Committee: MICRO 2024
  • Program Committee: ISCA 2024
  • Program Committee: ICCD 2024
  • Program Committee: ISPASS 2024
  • Program Committee: MICRO 2023
  • Program Committee: ISCA 2023
  • Program Committee: MICRO 2022
  • Program Committee: ICCD 2022
  • External Review Committee: ISCA 2022
  • Program Committee: ISPASS 2022
  • Registration Chair: HPCA 2022
  • Program Committee: IISWC 2021
  • Program Committee: ICCD 2021
  • External Review Committee: MICRO 2021
  • Registration Chair: PACT 2021
  • Program Committee: IPDPS 2021
  • Program Committee: PACT 2020
  • Registration Chair: PACT 2020
  • External Review Committee: MICRO 2020
  • Program Committee: PACT 2019
  • Program Committee: IPDPS 2019
  • External Review Committee: ICS 2019
  • External Reviewer: ISCA 2019
  • Program Committee: MICRO 2018
  • External Review Committee: HPCA 2018
  • Program Committee: MICRO 2016
  • External Review Committee: ISCA 2016
  • External Reviewer: ICS 2015
  • Publicity Chair: IISWC 2015
  • Reviewer of ACM Transactions on Architecture and Code Optimization
  • Reviewer of IEEE MICRO Special Issue on Machine Learning Acceleration
  • Reviewer of IEEE Transactions on Neural Networks and Learning Systems
  • Reviewer of IEEE Transactions on Very Large Scale Integration System
  • Reviewer of IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
  • Reviewer of IEEE Transactions on Multi-Scale Computing Systems
  • Reviewer of IEEE Computer Architecture Letters