Computer Science Department , MS Thesis Presentation Harsh Patel " Rethinking RNN Training: A Dynamical Systems Approach to Training, Optimization, and Distillation of LLMs"
11:00 am to 12:00 pm
Harsh Patel
MS Student
WPI – Computer Science Department
Friday, April 18, 2025
Time: 11: 00 a.m. – 12:00 p.m.
Location: Rubin Campus Center, Morgan Conference Room
Advisor: Prof. Randy Paffenroth
Reader: Prof. Yanhua Li
Abstract :
There has been a growing consensus among the Computer Science community regarding the failings of RNNs. These machine learning model architectures are often seen as being difficult to work with for most real-world problems, with issues like vanishing and exploding gradients making it difficult for them to converge using normal training methods.
The goal of this research is to study methods used in the field of Dynamical Systems Theory and apply those methods to create a generalized dynamical system architecture that can close the gap between RNNs and other neural networks as well as provide alternative trade-offs not seen in other neural network models. In particular, we propose comparing our techniques to transformed based LLM models such as Falcon by using distillation to replicate the performance seen in modern Transformers.