[Archimedes Talks&Tutorial Series] Tensor Decompositions in Large Scale Deep Learning
Talk (12:00-13:00, Mihalis Nicolaou)
Despite the initial success of deep learning in discriminative tasks, most recently, large-scale, often generative models (i.e., foundation models) have emerged as a dominant paradigm. Utilizing broad-data pre-training at scale, such models have proven more robust and generalizable to alternatives, albeit being considered more opaque due to their sheer scale. In this talk, we will discuss recent works that leverage tensor methods to make large-scale deep networks more interpretable, controllable, fair, and efficient – for example, by enabling unsupervised local editing in pre-trained networks, making fine-tuning of large models to new tasks efficient, scaling sub computations to achieve specialization, and grounding visual variability to concepts in vision-language models.
Tutorial (13:15-15:15, James Oldfield)
Modern deep learning architectures, such as transformers and convolutional neural networks (CNNs), leverage multi-dimensional representations (tensors) to process input data effectively. Consequently, many standard operations in deep neural networks can be understood through repeated multiplications and summations over various intermediate tensors and weights. In this tutorial, we explore how to unify these common operations in PyTorch through multilinear operations, and how this paradigm provides a flexible framework for designing and implementing novel deep learning architectures and techniques. Concrete examples from our recent work will be presented, including using factorized computation with einsum to efficiently scale the expert count in mixture of experts (μMoE layer).
Bio:Mihalis A. Nicolaou is Associate Professor at the Computation-based Science and Technology Research Center at The Cyprus Institute. Previously, he has held positions at Imperial College London and the University of London. He received the B.Sc. degree from the University of Athens, Greece, and the M.Sc. and Ph.D. degrees from the Department of Computing, Imperial College London, U.K.
Bio:James Oldfield is currently a PhD student at Queen Mary University of London. Previously, he was a research intern at The Cyprus Institute and Huawei Noah's Ark. His recent research focuses on interpretable and controllable deep learning models.