projects | Sri Aditya Deevi

Graduate Projects

Self Untangling Robotic Snake Arm with Dynamic Obstacle Avoidance

This project aims to develop effective methods for a robotic snake arm to perform complex tasks such as obstacle avoidance, targeting and touching a random object (e.g., a colored cube) with precise gripper orientation, and untangling itself from knots. We considered a simulated environment in ROS2, where obstacles like rocks fall vertically, presenting dynamic challenges for the robotic arm. Through various experiments and extensive analysis across different robots and experimental scenarios, we sought to optimize the performance and versatility of the robotic snake arm in real-world applications.

RRT Based Motion Planner for Non-Holonomic Mobile Robots

The project focuses on developing motion planning methods for non-holonomic mobile robots, specifically wheeled systems like cars, using Rapidly-exploring Random Tree (RRT) based algorithms. The objective was to create a planner capable of efficiently and effectively navigating through complex environments, avoiding obstacles in challenging scenarios such as narrow garages, parallel parking, and tight streets. Through a series of experiments, we tested the planner's performance and derived insightful inferences from our analyses, supported by compelling graphical results and visual stills.

Non Holonomic RRT with Dynamic Replanning and Obstacle Mapping

Report

Video

Abstract

This project aims to develop a real mobile robot (Raspberry Pi based) with advanced capabilities, including initial global localization using Monte Carlo Localization, smooth car-like driving, and efficient path planning through narrow and complex areas. The robot can detect and avoid new obstacles, dynamically update an "Obstacle Map," and replan its path. Utilizing RRT-based motion planners for non-holonomic robots, the project focuses on two-wheeled systems like cars and car-trailer combinations. These planners generate raw paths refined through post-processing for smooth, collision-free navigation. Experiments and analyses show the non-holonomic RRT algorithm's effectiveness, which can ehnace autonomous navigation in challenging environments for applications such as smart cars and planetary rovers.

Atmospheric Parameter Forecasting for Optical Channel Characterization

Abstract

This project aimed to enhance optical communication by forecasting atmospheric parameters. The focus was on two main tasks - Weather Forecasting and Predicting Atmospheric Coherence Length (r0). For Weather Forecasting, a sequence-to-sequence approach was used to predict temperature, air pressure, relative humidity, and wind speed at JPL weather stations. A GRU model achieved a 25% reduction in prediction errors for temperature and pressure, while more complex architectures were used for wind speed and humidity. Predicting r0 was challenging due to its high variability, but a hybrid approach combining analytically predicted r0 with a simple GRU model improved nowcasting accuracy. The best models were deployed for live weather forecasting at JPL stations, demonstrating their practical applicability.

RGB-X Object Detection via Scene-Specific Fusion Modules

Multimodal deep sensor fusion has the potential to enable autonomous vehicles to visually understand their surrounding environments in all weather conditions. However, existing deep sensor fusion methods usually employ convoluted architectures with intermingled multimodal features, requiring large coregistered multimodal datasets for training. In this work, we present an efficient and modular RGB-X fusion network that can leverage and fuse pretrained single-modal models via scene-specific fusion modules, thereby enabling joint input-adaptive network architectures to be created using small, coregistered multimodal datasets. Our experiments demonstrate the superiority of our method compared to existing works on RGB-thermal and RGB-gated datasets, performing fusion using only a small amount of additional parameters. Our code is available at https://github.com/dsriaditya999/RGBXFusion.

Undergraduate Thesis Project

Autonomous Robotic Grasping

Autonomous Robotic Grasping (ARG) is key to attaining the promise of intelligent robotics. The primary premise underlying vision-based ARG is the capacity of a robot to “perceive" its surroundings using vision sensors to constructively interact with various objects to accomplish a given task of interest. However, the real world consists of many highly variable aspects. It is neither tractable nor feasible for a robot to accurately represent its surroundings, the things in it, and the complex interactions among them. Therefore, learning is crucial in such intelligent autonomous systems to acquire the ability to perform skilled manipulation tasks. Effective ARG systems have many applications in various domains. They can be deployed in industries, spacecraft, restaurants, and homes to perform or assist human experts in performing versatile and repetitive manipulation tasks. In this project, the following two challenging ARG tasks (Objectives) are considered which are almost ubiquitously found problems that an intelligent robotic arm can automate => Task I - Grasping Various Objects in Diverse Environments and Task II - Dynamic Grasping of Moving Objects. In addition, the basic steps and tasks necessary for performing complex ARG tasks in a “real” robotic setup are taken into account as a part of the problem statement of this work.

Undergraduate Projects

Efficient Self-Supervised Neural Architecture Search

Slides

Abstract

Deep Neural Networks (DNNs) have successfully demonstrated superior performance on many tasks across multiple domains. Their success is made possible by expert practitioners' careful design of neural architectures. This manual handcrafted design requires a colossal number of computational resources, time, and memory to arrive at an optimal architecture. Automated Neural Architecture Search (NAS) is a promising area to explore to overcome these issues. However, optimizing a network for a job is a tedious task that requires lengthy search time, high processor needs, and a thorough examination of enormous possibilities. The need of the hour is to develop a strategy that saves time while maintaining an excellent level of accuracy. In this paper, we design, explore, and experiment with various differentiable NAS methods which are memory, time, and compute efficient. We also explore the role and efficacy of self-supervision to guide the search for optimal architectures. Self-supervision offers numerous advantages such as facilitating the use of unlabelled data and making the “learning” non-task specific, thereby improving transfer to other tasks. To study the inclusion of self-supervision into the search process, we propose a simple loss function consisting of a convex combination of supervised cross-entropy loss and self-supervision loss. In addition, we carried out various analyses to characterize the performance of different approaches considered in this paper. The inspection of results obtained from various experiments on CIFAR-10 reveals that the proposed methodology balances time and accuracy while staying as near as possible to the state-of-the-art results.

Pose Estimation for Autonomous Robotic Grasping

The ability of a robot to sense and “perceive" its surroundings to interact and influence various objects of interest by grasping them, using vision-based sensors is the main principle behind vision based Autonomous Robotic Grasping. To realise this task of autonomous object grasping, one of the critical sub-tasks is the 6D Pose Estimation of a known object of interest from sensory data in a given environment. The sensory data can include RGB images and data from depth sensors, but determining the object’s pose using only a single RGB image is cost-effective and highly desirable in many applications. In this work, we develop a series of convolutional neural network-based pose estimation models without post-refinement stages, designed to achieve high accuracy on relevant metrics for efficiently estimating the 6D pose of an object, using only a single RGB image. The designed models are incorporated into an end-to-end pose estimation pipeline based on Unity and ROS Noetic, where a UR3 Robotic Arm is deployed in a simulated pick-and-place task. The pose estimation performance of the different models is compared and analysed in both same-environment and cross-environment cases utilising synthetic RGB data collected from cluttered and simple simulation scenes constructed in Unity Environment. In addition, the developed models achieved high Average Distance (ADD) metric scores greater than 93% for most of the real-life objects tested in the LINEMOD dataset and can be integrated seamlessly with any robotic arm for estimating 6D pose from only RGB data, making our method effective, efficient and generic.

Scene Text Recognition

Report

Slides

Abstract

Text is one of the most effective forms of communication among human beings. Recognizing text automatically & efficiently in everyday scenes is an invaluable tool in many applications. In this project, we delve into the fascinating domain of Scene Text Recognition (STR) utilizing advanced deep learning methodologies. This project addresses the crucial task of automatic text recognition in natural scenes, a challenge due to the intertwined visual and semantic information and the variability in text appearance caused by environmental factors. We explore and experiment with various architectures, including Convolutional Neural Networks (CNNs) for visual feature extraction and Recurrent Neural Networks (RNNs) for semantic understanding, focusing on both regular and irregular text recognition. Notably, our work incorporates Spatial Transformation Networks (STNs) to rectify distorted text, enhancing the accuracy of text recognition. Through rigorous experimentation and analysis on diverse datasets, we provide insights into the models' performance, contributing valuable knowledge for applications in document analysis, autonomous vehicles, and augmented reality. Our findings indicate the potential for significant advancements in STR, paving the way for future research and practical implementations.

Swadeshi Microprocessor Challenge

Code

Abstract

Technological advancements in Signal Processing and millimeter-wave (mm-wave) semiconductor technology have significantly impacted the automobile industry, particularly in enhancing Autonomous Vehicles and Advanced Driving Assistance Systems (ADAS). Automotive radar, particularly Frequency Modulated Continuous Wave (FMCW) radars, has become a popular choice for robust and cost-effective performance in challenging environmental conditions. The proposed Ikshana FMCW Radar Module, utilizing the Vajra (C64-A100, Shakti C-class) SoC from IIT Madras, aims to provide a low-cost, indigenous solution for applications such as autonomous vehicles and night-vision goggles. Implemented on a Xilinx Arty A7 FPGA with planned soft-core extensions, this module represents an innovative approach to leveraging indigenous microprocessors, as highlighted in the Swadeshi Microprocessor Challenge by the Ministry of Electronics and Information Technology, Government of India.

Automatic Speaker Recognition System

Code

Slides

Abstract

The "Automatic Speaker Recognition" (ASR) project focuses on developing a robust system capable of recognizing speakers based on unique characteristics in their speech. This system utilizes key techniques such as Mel Frequency Cepstral Coefficients (MFCCs) for feature extraction and Vector Quantization (VQ) using the KMeans clustering algorithm for pattern recognition. During the training phase, speech samples from various speakers are processed to build individual speaker-specific codebooks. In the testing phase, unknown speech samples are matched against these codebooks to identify the speaker. The project includes validation using real-life speech data from multiple speakers, ensuring the system's accuracy and reliability under varied conditions. This project demonstrates practical applications in areas such as telephone banking, remote computer access security, and identity verification, highlighting the effectiveness of ASR systems in enhancing security and user authentication processes.

Anomaly Detection in Satellite Telemetry

Abstract

The project involved addressing various research problems within the "Integrated System Health Management for Power Systems (ISHM)" initiative. The primary focus was on developing anomaly detection techniques under Phase-II - Fault Detection. An Anomaly Detection System was developed, incorporating an LSTM-based Nominal Behaviour Modelling block and a Non-parametric Dynamic Error Thresholding block. This system was designed to detect anomalies in satellite telemetry data, thereby enhancing the reliability and safety of space subsystems through advanced fault detection methods.

ECG Beat Classification

Code

Abstract

One of the most crucial and informative tools available at the disposal of a Cardiologist for examining the condition of a patient’s cardiovascular system is the electrocardiogram (ECG/EKG). A major reason behind the need for accurate reconstruction of ECG comes from the fact that the shape of ECG tracing is very crucial for determining the health condition of an individual. Whether the patient is prone to or diagnosed with cardiovascular diseases (CVDs), this information can be gathered through examination of ECG signal. Among various other methods, one of the most helpful methods in identifying cardiac abnormalities is a beat-wise categorization of a patient’s ECG record. In this work, a highly efficient deep representation learning approach for ECG beat classification is proposed, which can significantly reduce the burden and time spent by a Cardiologist for ECG Analysis. This work consists of two sub-systems - denoising block and beat classification block. The initial block is a denoising block that acquires the ECG signal from the patient and denoises that. The next stage is the beat classification part. This processes the input ECG signal for finding out the different classes of beats in the ECG through an efficient algorithm. In both stages, deep learning-based methods have been employed for the purpose. Our proposed approach has been tested on PhysioNet’s MIT-BIH Arrhythmia Database, for beat-wise classification into ten important types of heartbeats. As per the results obtained, the proposed approach is capable of making meaningful predictions and gives superior results on relevant metrics.

IoT controlled Smart Home

This project exemplifies the transformative potential of the Internet of Things (IoT) by enabling voice-controlled automation of household lighting through Google Assistant. It utilizes an Arduino UNO, ESP8266 WiFi module, and a dual-channel relay to create an auxiliary circuit that can control a standard light bulb with voice commands. By integrating the system with ThingSpeak for data management and IFTTT for seamless interaction between Google Assistant and the IoT devices, users can effortlessly turn the light on or off by simply saying "OK Google, Lights ON" or "OK Google, Lights OFF." The project demonstrates the practical application of IoT in enhancing home automation, emphasizing ease of use, convenience, and the power of modern technology to simplify daily tasks. This innovative solution not only highlights the capabilities of IoT but also provides a glimpse into the future of smart homes, where everyday actions can be managed through intuitive voice commands.

Voice Controlled Robot

This project leverages voice recognition technology to enable precise and intuitive control over a robot's movements. Developed using an Arduino UNO, HC05 Bluetooth module, and L298N Motor Driver, the robot receives voice commands via a mobile phone application, translating spoken instructions into actions such as moving forward, backward, left, or right. The robot's design includes essential components like wheels, a chassis, DC motors, and an LM393 Speed Sensor, all integrated to ensure accurate movement and stability. The voice commands are processed and transmitted from the smartphone to the robot through the Bluetooth module, enabling seamless communication and control. This project showcases the potential of combining natural language processing with robotics, providing a user-friendly interface for operating robotic systems and demonstrating the practical applications of voice-controlled technology in enhancing human-robot interaction. The project's implementation not only highlights technical proficiency in robotics and programming but also emphasizes the innovative use of voice control to simplify and enhance the user experience.

Fiscal Responsibility Index

Report

Abstract

The Fiscal Responsibility Index (FRI) project is a comprehensive analysis aimed at quantifying the fiscal responsibility of a government by creating an index that evaluates how well a government manages its monetary resources. Developed as part of a B.Tech Economics project, the FRI considers various parameters such as fiscal deficit, revenue deficit, primary deficit, and their impact on the overall economy. The project involves extensive data collection from sources like the RBI and World Bank, and uses MATLAB for modeling these parameters to derive a mathematical formula that represents fiscal responsibility. The index helps in assessing the government’s efficiency in balancing developmental expenditures with debt management, ensuring sustainable growth. By analyzing historical data and incorporating factors like GDP growth rate and total outstanding liabilities, the FRI provides a nuanced report card of a government's fiscal discipline, offering valuable insights for policy-making and economic planning.

Fire Alarm

Report

Video

Abstract

The Fire Alarm with Intensity Meter project is an innovative analog electronics application designed to detect and indicate the presence and intensity of fire using temperature changes. The system employs an NTC (Negative Temperature Coefficient) thermistor to sense temperature variations, where its resistance decreases as temperature increases. When the temperature exceeds 50°C, the output from an IC741 comparator triggers a 555 Timer configured as an astable multivibrator, activating an alarm. Simultaneously, a series of LED indicators, driven by additional IC741 comparators, display the fire's intensity by lighting up progressively with each 10°C increase. The system is powered by multiple DC supplies and includes components like resistors, capacitors, and a speaker for alarm output. This project effectively demonstrates the integration of temperature sensing, signal processing, and user notification in a practical fire alarm system, providing a comprehensive solution for early fire detection and intensity measurement.