This article was originally published in the April 2024 edition of YUToday.
MANISH KUMAR THOTA ’24, Katz School of Science and Health, M.S. Candidate in Artificial Intelligence
Imagine conversing with your most knowledgeable professor, the one who has a wealth of information and is eager to share it. That is the essence of my Visual Question Answering (VQA) model. The twist here is that the professor is not a person but a computer equipped to provide insights and answers with the same depth and accessibility.
That is how Manish Kumar Thota, who is pursuing a master’s degree in Artificial Intelligence (AI) at the Katz School of Science and Health, describes his research on developing a machine-learning chatbot that can assist students academically. “The challenge is to teach my VQA model, which I refer to as a digital brain, to understand both images and questions and respond the way your most knowledgeable professor would. It’s all about making computers as smart as the sharpest minds out there, one pixel at a time.”
Originally from Hyderabad, a major center for technology in southern India, Thota became interested in AI and data science during his undergraduate years when he worked on a project focused on “deep learning”—a method in AI that teaches computers to process data in a way that is inspired by the human brain. Deep-learning models can recognize complex patterns in pictures, text, sounds and other data to produce accurate insights and predictions with the precision one might expect from a seasoned pro. “That project fascinated me,” said Thota. “Together with my background in computer science, it led me to join Stryker R&D, working on cutting-edge projects in the Asia-Pacific region.”
His passion for AI research eventually drew him to the Katz School. “I chose Katz for my master’s degree because of its excellent course structure and alignment with my research interests,” said Thota. “The opportunity to work closely with faculty members whose work resonated with my interests was a major factor in my decision. Additionally, I had the privilege of interning as a machine-learning engineer at S&P Global during the summer and was fortunate to receive the Sacks Impact Graduate Fellowship in Ethics and Entrepreneurship from YU, covering my tuition expenses.”
At the Katz School, Thota is engaged in innovative projects, including the development of a VQA chatbot that integrates computer vision and natural-language processing technologies. It leverages a cutting-edge Generative Pre-trained Transformer (GPT) model to deliver enhanced personalized and interactive learning experiences. By incorporating machine learning algorithms, advanced natural-language processing and a comprehensive range of educational content, the chatbot enables students to learn at their preferred pace. They can revisit the material as often as necessary, engage with it more interactively and receive instant feedback on their comprehension.
“The goal of the project is to train a large-language model with vision capabilities that can recognize and teach course content, answer questions and provide interactions with audiences,” said Dr. Youshan Zhang, an assistant professor of AI and computer science in the Katz School and Thota’s adviser. “This model can then be used to improve the learning experience for students in online courses by providing them with a more interactive and engaging learning environment.”
When asked if people should be nervous or excited about what lies ahead, Thota smiled. “The future of chatbots holds great promise, revolutionizing how we interact with technology and business,” he said. “While some may feel apprehensive about their chatbots, there’s more reason to be excited. These AI-driven assistants will streamline tasks, improve customer service, enhance accessibility and make daily life easier and more efficient.”