Computer Science Department , MS Thesis Presentation Harsh Patel " Rethinking RNN Training: A Dynamical Systems Approach to Training, Optimization, and Distillation of LLMs"

Friday, April 18, 2025
11:00 am to 12:00 pm

 

Harsh Patel 

MS Student

WPI – Computer Science Department 

Friday, April 18, 2025 

Time: 11: 00 a.m.  – 12:00 p.m. 

Location: Rubin Campus Center, Morgan Conference Room 

Advisor: Prof. Randy Paffenroth 

Reader: Prof. Yanhua Li 

 Abstract :

There has been a growing consensus among the Computer Science community regarding the failings of RNNs. These machine learning model architectures are often seen as being difficult to work with for most real-world problems, with issues like vanishing and exploding gradients making it difficult for them to converge using normal training methods. 

The goal of this research is to study methods used in the field of Dynamical Systems Theory and apply those methods to create a generalized dynamical system architecture that can close the gap between RNNs and other neural networks as well as provide alternative trade-offs not seen in other neural network models. In particular, we propose comparing our techniques to transformed based LLM models such as Falcon by using distillation to replicate the performance seen in modern Transformers.

Audience(s)

Department(s):

Computer Science