profile

Reza Jahadi

Software Engineer and AI researcher

About Me

Hey, I'm Reza, a software engineer with a deep passion for building efficient systems; from hardware to high-level models. My work bridges high-level model optimization (pruning, quantization, transformer compression) and low-level optimized programming.

I operate at the intersection of software and hardware, where performance, scalability, and resource constraints drive innovation. Whether optimizing deep learning models for deployment or developing firmware and drivers on embedded platforms, I bring a systems-level mindset and hands-on experience across the entire ML stack.

But it doesn't stop at ML... Any workload running inefficiently on CPU or GPU is fair game; simulation software, scientific kernels, game engines, you name it.

Selected Projects

Parallel matrix-vector multiplication on GPU

Performing matrix-vector multiplication using row-wise decomposition on GPU and measuring the speedup.

CUDA C Bash Scripting
Iris dataset analysis using SVM and Neural Networds

The primary focus of this project is on data visualization, training Support Vector Machine (SVM) models, and building a neural network for classification tasks.

Python Machine Learning Neural Network
CNN acceleration on edge devices

Deploying a CNN model on Kria KV260 edge device and measuring the inference performance.

Python CNN Bash Linux
Distributed CNN Inference on MNIST

Implemented MPI-based CNN inference with pipelining on CPU clusters for speedup.

CNN C Distributed Systems
End-to-End AlexNet Training and Quantization on CIFAR-10

Complete pipeline for training, quantizing, and testing an AlexNet model on CIFAR-10 dataset using PyTorch and Vitis AI.

Python PyTorch CNN Quantization
High-Precision Arithmetic Engine in C++

Designed and developed a C++ engine for performing arithmetic operations on extremely large integers (up to 200 digits), bypassing the limitations of built-in data types.

C++ Arithmetic Algorithms
Face Detection & Verification App

A desktop application that detects and verifies passenger identities using face recognition and passport data, matching live webcam images with stored passport photos.

Python Deep Learning Computer Vision SQLite
View More Projects on GitHub

Publications

Sparse Attention: A Co-Design Approach for Efficient Transformer Execution on Tensor Cores
2025 IEEE 38th International System-on-Chip Conference (SOCC)

Authors: Reza Jahadi, Phil Munz & Ehsan Atoofian

Transformer attention layers are computationally expensive because they process every part of the input exhaustively. This paper prunes attention matrices into structured sparse patterns and extends GPU Tensor Core hardware to execute them efficiently — achieving 54.3% energy savings with negligible accuracy loss.

Sparse Attention GPU Architecture Transformer Inference
Paper
Fused Tensor Core: A Hardware–Software Co-Design for Efficient Execution of Attentions on GPUs
2025 IEEE Embedded Systems Letters

Authors: Reza Jahadi, Ehsan Atoofian

Developed a GPU hardware–software co-design to accelerate attention layers by reducing memory footprint and offloading non-MMA operations to tensor cores. Achieved 13.4% performance improvement and 18.3% energy-delay reduction over software-only optimizations.

LLM GPU optimization Computer Architecture
Paper
Low-Power Register File for Tensor Cores
2024 IEEE 15th International Green and Sustainable Computing Conference (IGSC)

Authors: Reza Jahadi, Ehsan Atoofian

We propose a value-aware register file design for Tensor Cores that leverages the bit-level sparsity in CNNs to reduce leakage power. By introducing LPS, LPS+, and PLPS+ SRAM cells, we achieve up to 77.3% power savings with negligible accuracy loss.

CNN Optimization GPU Tensor Cores Computer Architecture
Paper
PCTC: Hardware and Software Co-design for Pruned Capsule Networks
Euro-Par 2024: Parallel Processing

Authors: Mohammad Hafezan, Reza Jahadi & Ehsan Atoofian

We present PCTC, a co-designed hardware-software approach that enables efficient execution of Capsule Networks on NVIDIA Tensor Cores. By rearchitecting matrix-vector operations and introducing structured pruning tailored to capsule layers, PCTC achieves up to 31% energy savings.

Neural Networks Tensor Core Optimization GPU
Paper

Hobbies

Movies

I'm a big fan of movies, especially thriller and mystery genres. I enjoy stories that keep me guessing, with clever plot twists, psychological depth, and suspenseful narratives.

Travel

I love traveling and exploring new places. Being in nature whether it's hiking through forests, visiting lakes and waterfalls, or simply enjoying a scenic view that gives me a deep sense of peace and energy. Sometimes you just need to get away from all the noise and chill out in nature to clear your head.

Contact