Watch out Prof. Hwu and El Hajj YouTube Channel!
Instructors
Wen-mei W. Hwu received the PhD degree in computer science from the University of California, Berkeley, 1987. He is the Walter J. (“Jerry”) Sanders III-Advanced Micro Devices endowed chair of electrical and computer engineering at the University of Illinois at Urbana-Champaign. His research interests include the areas of architecture, implementation, software for high-performance computer systems, and parallel processing. He is a principal investigator (PI) for the petascale Blue Waters system, a codirector of the Intel and Microsoft funded Universal Parallel Computing Research Center (UPCRC), and PI for the world’s first NVIDIA CUDA Center of Excellence. He is the chief scientist of the Illinois Parallel Computing Institute and the director of the IMPACT lab.
For his contributions to the areas of compiler optimization and computer architecture, he received the 1993 Eta Kappa Nu Outstanding Young Electrical Engineer Award, the 1994 University Scholar Award of the University of Illinois, the 1997 Eta Kappa Nu Holmes MacDonald Outstanding Teaching Award, the 1998 ACM SigArch Maurice Wilkes Award, the 1999 ACM Grace Murray Hopper Award, the 2001 Tau Beta Pi Daniel C. Drucker Eminent Faculty Award, the 2006 most influential ISCA paper award, and the University of California, Berkeley distinguished alumni in computer science award. From 1997 to 1999, he was the chairman of the Computer Engineering Program at the University of Illinois. In 2007, he introduced a new engineering course in massively parallel processing with David Kirk of NVIDIA. He is a fellow of IEEE and of the ACM.

Juan Gómez Luna is a senior researcher and lecturer at SAFARI Research Group @ ETH Zürich. He received the BS and MS degrees in Telecommunication Engineering from the University of Sevilla, Spain, in 2001, and the PhD degree in Computer Science from the University of Córdoba, Spain, in 2012. Between 2005 and 2017, he was a faculty member of the University of Córdoba. His research interests focus on GPU and heterogeneous computing, processing-in-memory, memory systems, and hardware and software acceleration of medical imaging and bioinformatics. He is the lead author of PrIM (https://github.com/CMU-SAFARI/prim-benchmarks), the first publicly-available benchmark suite for a real-world processing-in-memory architecture, and Chai (https://github.com/chai-benchmarks/chai), a benchmark suite for heterogeneous systems with CPU/GPU/FPGA.

Izzat El Hajj is an Assistant Professor in the Department of Computer Science at the American University of Beirut. His research interests are in application acceleration and programming support for parallel processors and memory technologies, with a particular interest in GPUs and processing-in-memory. He is a co-author of the textbook Programming Massively Parallel Processors: A Hands-on Approach, 4th edition. He received his M.S. and Ph.D. in Electrical and Computer Engineering at the University of Illinois at Urbana-Champaign, where he worked with the IMPACT Research Group led by Prof. Wen-mei Hwu and received the Dan Vivoli Endowed Fellowship. Prior to that, he received his B.E. in Electrical and Computer Engineering at the American University of Beirut, where he graduated with high distinction and received the Distinguished Graduate Award.

Guray Ozen is a compiler engineer in the Machine Learning Compiler team at NVIDIA. His currently working on Cutlass Python DSL and cuTile and TileIr. His focus centres on optimizing compilers and programming languages for GPU utilization in machine learning (ML) and high-performance computing (HPC). He has made key contributions to several production-grade compilers, including Clang, Flang, MLIR, IREE, and NVIDIA HPC (formerly PGI). Previously, he was actively involved in language design for parallel programming models, such as OpenMP and OpenACC. He served as a voting member of the OpenMP Language Committee for NVIDIA and contributed extensively to the OpenACC language specification.

Antonio J. Peña holds a BS + MS degree in Computer Engineering (2006), and MS and PhD degrees in Advanced Computer Systems (2010, 2013), from Jaume I University of Castellón, Spain. He is currently a Leading Researcher at Barcelona Supercomputing Center (BSC), Computer Sciences Department, where he leads the “Accelerators and Communications for HPC” Group. Antonio is a Ramón y Cajal Fellow, former Marie Sklodowska-Curie Fellow, and former Juan de la Cierva Fellow, and a recipient of the 2017 IEEE TCHPC Award for Excellence for Early Career Researchers in High Performance Computing. He is also an ERC Consolidator Laureate and Sr. IEEE/ACM member. Antonio is also Teaching and Research Staff at Universitat Politècnica de Catalunya (UPC). His research interests in the area of runtime systems and programming models for high performance computing include resource heterogeneity and communications.

Leonidas Kosmidis is a Leading Researcher at the Barcelona Supercomputing Center (BSC) and the Universitat Politècnica de Catalunya (UPC). He holds a PhD and a MSc degree in Computer Architecture from UPC and a BSc in Computer Science from the University of Crete, Greece. He is leading the research on embedded GPUs for safety critical systems, both at hardware and system software level within the CAOS (Computer Architecture/Operating Systems) group. He is the PI of several projects funded by the European Space Agency (ESA) such as the GPU4S (GPU for Space) and the Horizon Europe METASAT project, as well as projects funded by industry such as the Airbus Defence and Space which focus on the adoption of GPUs in space and avionics systems. He is also participating in several standardisation efforts regarding GPU programming in safety critical systems. Dr. Kosmidis is the recipient of the RISC-V Educator of the Year Award in 2019 from the RISC-V Foundation and an Honourable Mention for the EuroSyS Roger Needham PhD Award in 2018, which is awarded to the best PhD thesis in Europe.

Xavier Martorell received the M.S. and Ph.D. degrees in Computer Science from the Universitat Politecnica de Catalunya (UPC) in 1991 and 1999, respectively. Since 1992 he has lectured on operating systems, parallel runtime systems, OS administration, and systems for data science. He has been an associate professor in the Computer Architecture Department at UPC since 2001. His research interests cover the areas of operating systems, runtime systems, compilers and applications for high-performance
multiprocessor systems. In 2003 he joined the IBM TJ Watson Research Center as a Visiting Scientist, and participated in the development of system software for the IBM BlueGene/L Supercomputer, which ranked top 1st in the Top500 list during 2004 and 2005. Since 2005 he is the Manager of the Parallel Programming Models team at the Barcelona Supercomputing Center. He has participated on several EU projects related to the use of FPGAs for HPC: AXIOM, EuroEXA, and LEGaTO, and the use of FPGAs for RISC-V emulation: Textarossa and MEEP. He is now participating in the Zettascale project, leading the porting of the OS and drivers to the Lagarto RISC-V developed by BSC. He has coauthored more than 80 publications in international journals and conferences. He has co-advised eight Ph.D. theses and he is currently co-advising three PhD students.

Xavier Teruel received the Technical Engineering and the Engineering degree in Computer Science at Universitat Politecnica de Catalunya (UPC) in 2003 and 2006, respectively. Since 2006, Xavier is working as a researcher within the group of Parallel Programming Models in the Computer Sciences department at the Barcelona Supercomputing Center (BSC).
His research interests include the areas of operating systems, programming languages, compilers, runtime systems and applications for high-performance architectures and multiprocessor systems, mostly focused in shared memory environments.

Marc Jordà received his M.S. in Computer Architecture, Networks and Systems in 2012 from the Universitat Politècnica de Catalunya, Barcelona. Since then, he has been a research engineer at the Barcelona Supercomputing Center – Centro Nacional de Supercomputación, working in several topics from the field of high-performance computing, including application acceleration with GPUs, GPU hardware simulation, and performance analysis.

Marc Clascà is a master student in Computer Engineering at Universitat Politècnica de Catalunya, and a research engineer at the Barcelona Supercomputing Center since 2020. His research interests include programming models, performance tools, performance analysis and specialized hardware and accelerators. He works in HPC parallel performance analysis in the Best Practices for Performance and Programmability (BePPP) group, aiming to provide the scientific community with the best practices in programming portable and performant codes. His current research focus is on analyzing the performance of scientific applications and AI use cases that use accelerators, with the aim of deriving new efficiency metrics and analysis methodologies. This includes exploring the potentials of GPU specific tracing and visualization tools, and understanding new programming models and communication patterns used in LLM training and inference.
