Name: Spring 2025 RBE Master's Capstone Presentations
Start: 2025-05-02T17:30:00-0400
End: 2025-05-02T21:00:00-0400
Location: Unity Hall

Spring 2025 RBE Master's Capstone Presentations

Screen A

Presenter: Atreya Bhat - Calli (5:30 - 5:55)

Title: DeepRL based Next-View-Planning for Feature-Driven 3D Reconstruction in Robotic Metal Scrap Recycling

Abstract: A novel reinforcement learning (RL) approach for active vision in metal scrap cutting applications, where an agent must discover the drawing or cutting path on complex geometries while minimizing the total viewpoints and exploration time. Existing methods often rely on sampling-based probabilistic planning, which can be computationally expensive and short-sighted. In contrast, our Soft Actor-Critic (SAC)–based framework is designed to address long-horizon considerations without per-step sampling and ranking. We demonstrate how two different action spaces—a Normal (Cartesian) Action Space (RLNAS) and a Spherical Action Space (RLSAS)—perform in simulated scenarios featuring diverse scrap objects and random initial camera viewpoints. Empirical results suggest that our RL-based solution can yield much faster exploration, fewer viewpoints, complete coverage, and reduced motion displacement compared to conventional next-best-view (NBV) approaches.

Presenter: Dev Soni and Muhammad Sultan - Calli (6:00 - 6:25)

Title: Dexterous Object Picking for Complex Multi-Object Scenes

Abstract: This work tackles the challenge of robotic picking in complex multi-object environments, where objects may be hard to grasp due to their size, shape, or being blocked by others. To address this, we introduce five picking skills inspired by human hand movements and use a deep learning-based approach to detect which skill to apply and when. The system considers object shapes, positions, and the surrounding environment to make these decisions. A flexible robotic hand carries out the selected skills. We test the approach in real-world scenarios to evaluate its effectiveness, especially in cluttered settings. This research highlights the potential of deep learning in improving robotic grasping in everyday situations.

Presenter: Benjamin Antupit, Alex Brattstrom, and Nicholas Moy - Calli (6:30 -6:55)

Title: Recycling Robotics Systems Integration

Abstract: Developing and improving a robotic system for sorting recyclables from a single waste stream. Focusing on systems integration and iterative improvement to existing systems and research.

Presenter: Filippo Marcantoni - Li, J. (7:00 -7:25)

Title: Multi-Camera Control with Voice Commands and Autonomous View Selection for Robotic Nursing Assistance

Abstract: Teleoperation in robotics has significantly advanced with the integration of multi-camera systems, which enhance operator performance and reduce cognitive load compared to traditional single-camera interfaces. This project aims to implement an intuitive multi-camera control system for the Intelligent Robotic Nursing Assistant (IONA) to support complex teleoperation tasks. The system features three fixed workspace cameras within the nursing setup, static cameras mounted on the robot’s chest and arms, and an active camera that mirrors the operator’s head movements to provide an immersive experience. To simply operator control, the system incorporates voice commands for switching camera views and employs an object detection model running on the ZED Mini workspace camera to estimate the robot’s pose and automatically select the most relevant camera view. The effectiveness of this system will be evaluated through a comparative user study across three interaction modalities: (i) Manual Mode, where camera views are switched via GUI buttons; (ii) Voice Mode, where views are changed using voice commands; and (iii) Automatic Mode, where camera views are selected autonomously based on the robot’s pose and teleoperation phase. This study aims to assess the impact of voice control and automated camera selection on teleoperation efficiency and user experience when operating the nursing assistant robot.

Screen C

Presenter: Butchi Venkatesh Adari - Chamzas (5:30 - 5:55)

Title: Monocular Grasping

Abstract: RGB image by leveraging monocular depth estimation and semantic segmentation. Our pipeline is structured into four major components: monocular depth estimation, semantic segmentation, heatmap generation, and grasp pose prediction. First, a depth estimation module infers per-pixel depth from an RGB input using a high-resolution transformer-based model. This depth map is combined with the original RGB image to guide both the heatmap generator and grasp pose decoder. The heatmap module employs an Attention U-Net architecture that highlights regions with high grasp potential through spatial attention mechanisms. Parallelly, a Grasp Transformer network fuses visual and depth features to regress grasp pose parameters in image space, including 3D position, orientation (as quaternions), and grasp confidence. A Vision Transformer (ViT) encoder integrates RGB features, while depth features are extracted through a GQ-CNN and refined via spatial attention for pose decoding. Additionally, a semantic segmentation branch based on a SWIN Transformer backbone provides object-wise instance masks to improve grasp localization accuracy and enforce semantic consistency during training. The model is trained with a composite loss function incorporating pixel-wise regression, heatmap peak emphasis, quaternion orientation alignment, and semantic consistency penalties. Extensive experiments on a multi-object grasp dataset show significant improvements in grasp success rate, position/orientation accuracy, and depth consistency compared to conventional modular approaches. Our system sets the foundation for scalable and robust vision-based grasping in real-world robotic systems.

Presenter: Vishwas Hegde and Dhrumil Sandeep Kotadia - Chamzas (6:00 - 6:25)

Title: Learning hyperparameters for planning time optimization

Abstract: Sampling based Motion planners have parameters (eg. Range, Goal Bias, etc.) which have a direct impact on planning time. The values of these parameters can change based on the environment and the type of robot. We propose a deep learning based method to predict these parameters based on the environment, ultimately leading to an improvement in the planning time.

Presenter: Samruddhi Jadhav - Pittiglio (6:30 - 6:55)

Title: Robotic 4‑DOF Endovascular Steering Platform Enabling Precision Translation & Rotation Control of Guidewires and Catheters for Neurovascular Navigation

Abstract: Introducing a Robotic 4‑DOF Endovascular Steering Platform that delivers independent translation and rotation of both guidewire and catheter for neurovascular navigation. High‑precision stepper motors coupled to lead and ball screws achieve linear resolutions of 200 steps/mm (guidewire) and 40 steps/mm (catheter), while dual rotational axes provide 0.1° incremental control. A unified serial command interface enables real‑time adjustment of speed, direction, and synchronized maneuvers. Bench validation demonstrates repeatable linear accuracy within ±0.05 mm and angular accuracy better than 0.2°. This system streamlines four‑degree‑of‑freedom steering, offering a robust platform for enhanced safety and efficiency in neurointerventional procedures.

Presenter: Sooumik Saswat Patnaik - Pittiglio (7:00 - 7:25)

Title: Self-Supervised 2D-3D Image Registration for Autonomous Navigation in Vascular Interventions

Abstract: Endovascular procedures rely heavily on precise navigation within complex vascular structures using limited intra-operative imaging. This project presents a self-supervised 2D–3D image registration framework leveraging differentiable rendering and test-time optimization to align 2D fluoroscopic X-rays with pre-operative 3D CT scans. This approach inspired by DiffDRR , DiffPose and XVR, eliminates the need for manual annotations by training a convolutional pose regressor on synthetic digitally reconstructed radiographs (DRRs). These DRRs are generated using sampled camera poses and a differentiable renderer. During inference, we refine the estimated pose using gradient-based optimization that minimizes image dissimilarity via multiscale normalized cross-correlation (mNCC), while enforcing geodesic consistency on SE(3) using Lie algebra. Evaluated on the Ljubljana dataset, this method achieves sub-millimeter accuracy with patient-specific fine-tuning in under 5 minutes. We further explore classical ITK-based optimization for Multimodal 3D-3D image registration and real world distance estimation from 2D fluoroscopy using a fiducial designed for this purpose. This work demonstrates the feasibility of real-time, annotation-free 2D–3D alignment in image-guided surgeries, enabling accurate autonomous navigation with clinical relevance.

Screen D

Presenter: Nicholas Johannessen - Li, G. (5:30 - 5:55)

Title: Design of an Agile Quadcopter Platform

Abstract: New research on applying high-performing sensing and control strategies to aerial robots has the potential to realize new capabilities of drones. One ongoing effort here at WPI is to detect and evade projectile obstacles launched at a quadcopter platform. This is a task with multiple challenges in sensing, control, and hardware. This project focuses on the development of a high-performance quadcopter hardware system capable of meeting various physical design requirements established in context of the greater mission of the project. The author presents the design derivation, modeling, and testing results of the physical prototype. The primary result is the functional hardware platform that will enable larger research efforts in UAV projectile avoidance.

Presenter: Chase Beausoleil - Fichera (6:00 - 6:25)

Title: Closed-Loop Control of Robotic Laser Surgery via Ultrasound Imaging

Abstract: This project aims to develop a closed-loop control system to ensure accurate incision depth during robotic laser surgery via ultrasound imaging. Robotic surgery is a rapidly growing method of enhancing manual surgical procedures, offering improved precision and visualization. One notable modality is robotic laser surgery, where a laser is mounted as the end effector to a robotic arm. Lasers are widely used in various clinical applications as they provide unique surgical capabilities, such as high precision incisions. However, accurately monitoring and controlling incision depth remains a challenge due to non-uniform laser-tissue interactions, especially in soft tissue. Inaccurate ablation can result in excessive cutting, potentially damaging critical structures like blood vessels and nerves. To address this issue, the proposed system uses ultrasound imaging to monitor incision depth during laser ablation. Depth measurements are extracted autonomously through an image processing pipeline that combines a Convolutional Neural Network – which detects the laser incision – and fitting algorithms that estimate incision depth. The resulting depth error is passed into a proportional-derivative (PD) controller, which adjusts laser parameters between consecutive passes to control subsequent ablation. This closed-loop approach enables submillimeter accuracy when ablating to a predefined target depth on ex vivo tissue samples.

Presenter: Cooper Ducharme - Onal (6:30 - 6:55)

Title: Iterative Development of Radially Symmetric Fin Ray Fingers

Abstract: The fin ray effect, a phenomenon sometimes referred to as the negative bending response and entailing the manner of deformation undergone by fish fins and similar mechanisms where material surrounding the area under an externally applied compressive load displaces towards the direction opposing the load force, has become widely used in the design of adaptive grippers for robotic manipulation tasks due to their ability to passively conform to the geometry of grasped objects. However, most conventional fin ray fingers are not designed to handle loads that are not coplanar with the plane normal to the fingertip’s edge, resulting in nonconformational, unexpected, and otherwise undesirable deformation. Attempts to design adaptive grippers with radially symmetric versions of fin ray fingers have encountered hurdles such as vulnerability to torsion, demanding fabrication processes, and mechanical characteristics that confound attempts at modeling strain. This project aimed to enable advancements in robotic dexterity by developing novel varieties of fin ray fingers that overcome the aforementioned challenges through cycles of design, testing, and critique, as well as the application of emerging metastructures, manufacturing methods, sensors, and machine learning algorithms.

Presenter: Ayesha Akhtar - Onal (7:00 - 7:25)

Title: DEPTH-GUIDED GRASP: Robust Multi-Object Grasping Through Depth-Informed Topological Analysis for Robotic Manipulation

Abstract: This project presents a novel approach to robotic grasping that leverages depth image topology for reliable object manipulation in cluttered environments. Our method, DEPTH-GUIDED GRASP, identifies optimal three-point grasp configurations by analyzing local depth minima and gradient maps to locate "valleys" in object topography that are ideal for stable contact. Unlike traditional methods that primarily rely on RGB data or simplistic depth thresholds, our system performs topological analysis to determine grasp poses that adapt to complex object geometries. We introduce a multi-objective optimization framework that simultaneously evaluates grasp stability, inter-object collision avoidance, and clearance constraints to generate multiple viable grasp configurations. The algorithm dynamically adjusts to object proximity in cluttered scenes by penalizing potential collisions with neighboring objects during grasp execution. Our ROS-integrated implementation demonstrates robust performance across various object geometries and arrangements, achieving a 92% success rate in experimental trials where traditional methods typically fail due to object interference or unstable contact points. This approach bridges the gap between analytical and data-driven grasping by incorporating explicit geometric reasoning while maintaining computational efficiency required for real-time robotic applications.

Screen E

Presenter: Hrishikesh Dhairyasheel Pawar - Sanket (5:30 - 5:55)

Title: Passive Depth Perception Using Event Cameras

Abstract: This project introduces a passive, defocus-based approach for visual navigation using a monocular event camera with a wide-aperture lens. Depth cues are extracted by leveraging optical blur, enabling foreground-background segmentation without explicit depth estimation or high computational load. Blurred regions indicate free space, while sharper areas denote obstacles. The method achieves a 80% success rate in simulation with a 62× reduction in runtime compared to conventional depth-based techniques, operating at 30 FPS on a CPU. Initial real-world results demonstrate promising applicability for low-power aerial navigation in cluttered environments.

Presenter: Albert Lewis - Leahy (6:00 - 6:25)

Title: Development of an Experimental Testbed for Safety in Robot Control

Abstract: This directed research project focuses on demonstrating advanced Control Barrier Functions (CBFs) for ensuring robot safety in unknown environments while utilizing noisy sensors. The project involves designing and implementing nontrivial CBFs on hardware platforms, such as mobile robots, to validate theoretical models in real-world scenarios. Multiple CBFs will be implemented to identify trade offs between the different approaches.

Presenter: Yiyu Wu - Yuan (6:30 - 6:55)

Title: A Progressive Multimodal Robot System for Emotional Learning in Autistic Children

Abstract: This research introduces an innovative social robotic system for emotion education therapy in children with autism spectrum disorder, easing caregiver demands. Integrating a NAO robot with a user-friendly interface for facilitator oversight, our approach features a staged five-session structure. Each session targets distinct emotions through increasingly complex multimodal exchanges, from basic dialogues to comprehensive social scenarios involving verbal, facial, bodily, and contextual cues. The system harnesses advanced technologies like ChatGPT/Whisper for dynamic conversation, DeepFace for affective state recognition, and MediaPipe for postural analysis. This holistic design incrementally increases interactional complexity across five structured activities: introductory conversation, facial emotion deciphering/expression, bodily emotion conveyance, narrative exploration with physical enactments, and a concluding musical interlude. This work substantiates the potential of robotic interventions to provide structured emotional literacy development under essential human guidance via an accessible control mechanism.

Presenter: Ashley Espeland - Xiao (7:00 - 7:25)

Title: Robotic Fine Manipulation with Limited Perception

Abstract: This project addresses the problem of how to achieve precise manipulation in the presence of sensing and robot motion uncertainties. It will consider several manipulation tasks with the pose of the target object for manipulation only roughly known, and the robot manipulator will need to acquire the accurate pose of the object for assembly (to mate another part with it) and for grasping and picking up the object. While there is prior work on assembly, little work is concerned with how to move a gripper contacting the target object in an undesired way due to uncertainty to the desired grasping configuration within a very confined space that requires minimum operation footprint to avoid collision with other objects. The thesis will develop strategies that combine perception from force/torque sensing and possibly vision sensing to adjust the robot gripper’s motion from an initial contact configuration to the goal grasping or alignment configuration compliantly, i.e., without losing contact with the target object as much as possible, to minimize the chance of colliding with some other objects along the way. The effectiveness of the strategies will be demonstrated in real-world tests.

Screen G

Presenter: Peter Abosede, Mark Caleca, Isaac Lau, and Samuel Markwick - Aloi (5:30 - 5:55)

Title: Board Game Robot

Abstract: This project presents the design and implementation of a teleoperated board game system that enables remote interaction with physical game pieces through a robotic manipulator. The robot has potential applications in enhancing accessibility, maintaining social connection, and supporting emotional well-being in long-distance or isolated settings. The system enables a remote user to control gameplay with a web-based application that streams a live camera feed of the physical board, enabling real-time selection and manipulation of pieces. A semi-autonomous robotic arm executes the user-desired pick-and-place actions using stereo vision to localize both source and destination positions for the game piece in the real world. Communication to the web interface, manipulation, and perception are coordinated via a state machine and Robot Operating System 2 packages. The system integrates vision-guided manipulation, path planning with obstacle avoidance, and an intuitive human-robot interface. A dual-module gripper enables the robot to handle a variety of game pieces and cards, with automatic selection of the appropriate gripper based on detected object characteristics. The final prototype demonstrates the feasibility of remote physical gameplay.

Presenter: Dhiraj Kumar Rouniyar - Aloi (6:00 - 6:25)

Title: Robot Navigation in new environment - Human Natural Commands

Abstract: The project aims to synergically exploit the capabilities of a YOLO and multimodal vision language model (VLM) to enable humans to interact naturally with autonomous robots through human high-level commands

Presenter: Harshal Suresh Bhat, Riley Blair, Jatin Kohli, Chirag Ashok Patel, Dharmik Jayeshkumar Raval, Jessica M Rhodes, Anthony Virone - Aloi (6:30 - 6:55)

Title: STACK-A-BOT: Smart Tech And Compact Knowledge A Better Optimization Tool

Abstract: In this project, we intend to reduce the inefficiencies of warehouse pallet stacking with multi-SKU boxes using a robotic arm and advanced computer vision. This innovation addresses a critical challenge in warehouse automation, potentially reducing labor costs while increasing throughput and safety in logistics operations. A six-degree-of-freedom (DOF) robot with a suction cup end-effector and eye-in-hand camera was formulated to autonomously detect and stack boxes intelligently onto a pallet. The algorithms employed generate coarse and fine point clouds using a foundational model, followed by a layered-stacking approach on coarse point cloud analysis. We utilize inverse kinematics, trajectory planning, and velocity control to navigate the environment, apply deep learning based approaches to object detection and localization, and use a novel algorithm to calculate a weighted stability score and predict the optimal grid position for each box on the pallet.

Presenter: Luis Fernando Recalde - Aloi (7:00 - 7:25)

Title: ES-HPC-MPC: Exponentially Stable Hybrid Perception Constrained MPC for Quadrotor with Suspended Payloads

Abstract: Aerial transportation using quadrotors with cable-suspended payloads holds great potential for applications in disaster response, logistics, and infrastructure maintenance. However, their hybrid and underactuated dynamics pose significant control and perception challenges.
Traditional approaches often assume a taut cable condition, limiting their effectiveness in real-world applications where slack-to-taut transitions occur due to disturbances. We introduce ES-HPC-MPC, a model predictive control framework that enforces exponential stability and perception-constrained control under hybrid dynamics. Our method leverages Exponentially Stabilizing Control Lyapunov Functions (ES-CLFs) to enforce stability during the tasks and Control Barrier Functions (CBFs) to maintain the payload within the onboard camera’s field of view (FoV). We validate our method through both simulation and real-world experiments, demonstrating stable trajectory tracking and reliable payload perception. We validate that our method maintains stability and satisfies perception constraints while tracking dynamically infeasible trajectories and when the system is subjected to hybrid mode transitions caused by unexpected disturbances.

Screen H

Presenter: Joseph Cardarelli - Aloi (5:30 - 5:55)

Title: Immersive Robotic Control using an Apple Vision Pro

Abstract: Augmented Reality (AR) presents new opportunities for intuitive and immersive robot control interfaces. This project develops an AR-based robot control application utilizing the Apple Vision Pro to enable real-time spatial interaction with robotic systems. By leveraging VisionOS’s 3D interface capabilities and hand/eye tracking features, we designed a control framework that allows users to interact with and command a robot directly within its physical environment. The system was tested using HIRO Lab's IONA robots. This work highlights the potential of spatial computing in human-robot interaction and explores the possibility of more accessible and immersive robot control systems.

Presenter: Smit Shah - Aloi (6:00 - 6:25)

Title: Enhancing Visual Odometry Robustness Using Reinforcement Learning with Dynamic Motion Segmentation

Abstract: In dynamic environments, traditional Visual Odometry (VO) algorithms often struggle due to unpredictable motions of moving objects, leading to inaccuracies in pose estimation. This project addresses these challenges by combining motion-aware optical flow and segmentation techniques (DytanVO) with a reinforcement learning (RL)-based VO framework. By explicitly integrating dynamic scene information into the RL decision-making pipeline, the system learns to adaptively select keyframes and refine its tracking strategy in real-time. The result is a more robust, accurate, and generalizable VO system suitable for complex, real-world scenarios such as autonomous driving, drone navigation, and augmented reality applications.

Presenter: Gaurang Manoj Salvi - Aloi (6:30 - 6:55)

Title: Terrain-Adaptive Morphing Wheel for Robotic Navigation on Unstructured Terrains

Abstract: Q7. Abstract: Conventional wheeled robots face significant challenges in navigating unstructured terrains such as sand, gravel, stairs, and uneven surfaces, which are common in search and rescue scenarios. While legged systems offer superior adaptability, they often compromise on speed and mechanical simplicity. This work presents the design and development of a passively morphing deformable wheel that transitions between rolling and leg-like behavior by opening and closing structural gaps. The wheel was designed using CAD tools and fabricated via 3D printing. Although integration of silicone-based air pockets remains planned for future iterations, the current prototype demonstrates improved adaptability over standard rubber wheels on varied surfaces. The design offers a low-complexity solution for enhancing robotic mobility in unpredictable environments, with potential applications in search and rescue, agriculture, and planetary exploration.