Name: Computer Science Department , MS Thesis Presentation Harsh Patel " Rethinking RNN Training: A Dynamical Systems Approach to Training, Optimization, and Distillation of LLMs"
Start: 2025-04-18T11:00:00-0400
End: 2025-04-18T12:00:00-0400
Location: Worcester Polytechnic Institute

Harsh Patel

MS Student

WPI – Computer Science Department

Friday, April 18, 2025

Time: 11: 00 a.m. – 12:00 p.m.

Location: Rubin Campus Center, Morgan Conference Room

Advisor: Prof. Randy Paffenroth

Reader: Prof. Yanhua Li

Abstract :

There has been a growing consensus among the Computer Science community regarding the failings of RNNs. These machine learning model architectures are often seen as being difficult to work with for most real-world problems, with issues like vanishing and exploding gradients making it difficult for them to converge using normal training methods.

The goal of this research is to study methods used in the field of Dynamical Systems Theory and apply those methods to create a generalized dynamical system architecture that can close the gap between RNNs and other neural networks as well as provide alternative trade-offs not seen in other neural network models. In particular, we propose comparing our techniques to transformed based LLM models such as Falcon by using distillation to replicate the performance seen in modern Transformers.

Computer Science Department , MS Thesis Presentation Harsh Patel " Rethinking RNN Training: A Dynamical Systems Approach to Training, Optimization, and Distillation of LLMs"

Department(s):