AiAI Research Projects

KatzBot

A chatbot developed from scratch can answer any questions about academics, faculty, and other information in real-time.

GitHub Repository Project Report

Image Generation for Medical Applications

The aim of this study is to utilize the diffusion model to augment the original dataset, thereby improving the performance of the Norberg Angle prediction model.

GitHub Repository

Course Video Understanding

Visual Question Answering (VQA) research aims to create AI systems that can answer natural language questions about images. However, traditional VQA methods often provide simplistic responses. This work introduces Visual Question Explanation (VQE) to enhance VQA by providing detailed explanations and facilitating complex interaction with visual content. We developed an MLVQE dataset from a machine learning course, comprising slide images, transcripts, and question-answer pairs. We propose SparrowVQE, a small multimodal model, trained with a three-stage mechanism: multimodal pre-training, instruction tuning, and domain fine-tuning. SparrowVQE, utilizing SigLIP and Phi-MLP models, outperforms state-of-the-art methods in benchmark VQA datasets, demonstrating superior performance and detailed understanding of visual content.

GitHub Repository

Veterinarian GPT

VetMedGPT is a specialized tool developed to assist in the initial diagnosis and first aid for animals, aiming to bridge the gap in the field of artificial intelligence (AI) by providing tailored support for veterinary medicine healthcare.

GitHub Repository

Machine Learning Chat Robot for Students

This research introduces a novel generative pre-trained transformer-based model, MLGPT, which utilizes a specialized machine learning question and answer dataset to enhance depth and precision in domain-specific queries. Additionally, we developed the MLGPT-C chatbot that supports interactive, audio-based conversations with real-time interruption capabilities, significantly outperforming existing methods in machine learning query resolution.

GitHub Repository Project Report

High quality Voice Clone

This work proposes a novel neural architecture Sabda2Baachan for text-to-speech synthesis capable of producing high-quality speech with natural prosody and speaker characteristics. The model employs a multi-stream approach, where distinct components predict various low-level prosodic features, including energy, pitch, and duration. The proposed model demonstrated superior performance compared to several state-of-the-art models, achieving remarkable naturalness, intelligibility, and speaker similarity in the synthesized speech.

GitHub Repository

3D Human Motion Generation

Explore the forefront of animation technology with our 3D Human Motion project. Utilizing advanced AI, our platform translates textual descriptions into realistic 3D human animations, revolutionizing the way digital content is created. Harness the power of our Text Residual Motion Encoder (TRME) to bring dynamic, precise human movements to life with just a few words.

GitHub Repository

Breast Cancer Detection Mobile App Design

Description of Project 8.

GitHub Repository

Voice Cloning

proposed Text-to-Speech (TTS) system architecture represents a meticulously designed sequence of components aimed at synthesizing natural and expressive voice from input text. At its core are three major components: the Text Encoder, Mel Spectrogram Encoder, and Voice Cloning Model. The Text Encoder serves as the initial step, translating input text into a robust representation through a series of intricate procedures, including character embeddings, bidirectional GRU processing, and attention mechanisms. Following this, the Mel Spectrogram Encoder generates an encoded representation of the mel spectrogram, capturing crucial acoustic subtleties. Finally, the Voice Cloning Model combines these encoded representations, employing a detailed decoder architecture with RNN layers and attention mechanisms to synthesize speech.

GitHub Repository

Course Attendance Robot

This project introduces a comprehensive system for managing attendance, harnessing facial detection and recognition technologies to identify individual students and register their attendance.

GitHub Repository

Video Generation

Description of Project 10.

GitHub Repository