DS Ph.D. Dissertation Proposal | Harsh Pathak | Monday, Dec. 4th, 12:00PM EST

Monday, December 4, 2023
12:00 pm to 1:00 pm

MA
United States

DATA SCIENCE 

Ph.D. Dissertation Proposal

Harsh Pathak, Ph.D. Candidate

Monday, December 4th, 2023 | 12:00PM - 1:00PM EST

Zoom Link: https://wpi.zoom.us/my/rcpaffenroth

 

Dissertation Committee:

Professor Randy Paffenroth, Advisor, WPI

Professor Jacob Whitehill, WPI

Professor Oren Mangoubi

Dr. Wei Lee Woon, External Committee Member, Expedia Group

 

Title: Continuation Methods for Deep Neural Networks: Theory and Practice

 

Abstract:

This proposal explores the landscape of training methods and non-convex optimization
in deep neural networks through the lens of continuation methods. While deep learning
has achieved remarkable success across domains, optimization remains a pivotal step in
shaping network performance. We focus on the interplay between network architecture,
training techniques, solvers, and hyper-parameters that give rise to complex optimization
landscapes.


When we apply the main idea of continuation methods which is gradually moving from simple
to more complex functions, we can come up with potential ways to devise novel training
routines for neural networks. Our proposed training methods can be combined with
popular solvers such as ADAM and RMSProp. We demonstrate accelerated convergence,
and improved generalization across tasks and network types.


Continuation Methods have wide applications to solve various Iterative Dynamical
Systems. Through the lens of iterative maps, we study and reformulate various neural
network architectures such as feedforward and recurrent neural networks. As a result, we
introduce the Sequential2D a generalized iterative map to model architectures that allows
more pathways of information flow through single or hybrid models. In this proposal,
we use Sequential2D to systematically add skip connections to GPT-2 with only 1% more
parameters. Experiments show boosted fine-tuning, highlighting Sequential2D’s potential.
Overall, this research advances both theoretical and practical methods to train deep
models. The proposed techniques offer pathways to faster, more efficient network training
with strong generalization performance.

Audience(s)

Department(s):

Data Science
Contact Person
Kelsey Briggs

Phone Number: