Project Synopsis22

Download as pdf or txt
Download as pdf or txt
You are on page 1of 9

Final Year Computer Engineering

Project
ON

AI-Powered Sensory Augmentation and Visual Data


Processing

Submitted by
B5520 : Shubhankar Madhukar Patil
B5521 : Vedant Manohar Patil
B5452 : Sarthiki Hegade
B5524 : Vishal Nityanand Pawar
Guided by
Prof. S. S. Mane

B.E. (Comp)- 2024-25


DEPARTMENT OF COMPUTER ENGINEERING
STES’S NBN SINHGAD SCHOOL OF ENGINEERING
PUNE 411041
UNIVERSITY OF PUNE
2024-25
Abstract:

This project explores the development of intelligent systems that leverage


Artificial Intelligence (AI) to enhance human capabilities across various domains. By
integrating AI-driven technologies, the project aims to emulate and augment human
sensory and cognitive functions, creating tools that assist individuals in interacting
with their environment more effectively. These systems are designed to process
complex data in real-time, providing users with insightful feedback that can aid
decision-making, improve accessibility, and offer enhanced situational awareness.
The core focus is to employ machine learning and computer vision techniques
to address key challenges related to sensory augmentation, information processing,
and human assistance. The project also explores the potential for integrating AI with
real-time input systems, such as visual or auditory sensors, to deliver impactful
solutions for diverse applications. Through these innovations, the project
demonstrates the practical utility of AI in creating intelligent systems that bridge the
gap between human perception and digital intelligence.
By employing cutting-edge AI models, the developed solutions will not only
push the boundaries of human-machine interaction but also serve to assist individuals
with varying needs, offering an accessible and inclusive approach to enhancing day-
to-day life. The project, through its flexible and adaptive design, showcases the
transformative potential of AI in real-world applications.

Keywords: Computer Vision, AI, ML, Image Processing, Classification,


Visualization, Cross domain data integration, Speech to text, metadata
Literature Survey:

Paper Name Description


A Review on Machine Learning Styles in This research aims at how different machine
Computer Vision—Techniques and Future learning styles are used in computer vision,
Directions analyses its uses, and predicts future trends.

Machine Learning in Computer Vision: A In this paper, the importance of ML in the


Review image processing (IP) domain is highlighted
and address the processes of digital image and
how difficult is to feed the computer system

Research on Splicing Image Detection This study developed a new splicing image
Algorithms Based on Natural Image Statistical detection algorithm combining (DCT),
Characteristics (DWT), and the robust capabilities of (SVM)
classifier

CIFAKE: Image Classification and This study has proposed a method to improve
Explainable Identification of AI-Generated our waning ability to recognise AI-generated
Synthetic Images images through the use of Computer Vision
and to provide insight into predictions with
visual cues.

Research on Image Classification And This paper introduces EDNET, an innovative


Semantic Segmentation Model Based on segmentation strategy focused on multi-scale
Convolutional Neural Network feature consolidation.

Simple Image-Level Classifcation Improves This paper proposes SIC-CADS, a novel


Open-Vocabulary Object Detection approach for openvocabulary object detection
that leverages global scene under standing
capabilities of VLMs for generalized base and
novel category detection.

Methods of improving the quality of speech- A step-by-step algorithm for creating a speech-
to-text conversion to-text system, as well as possible ways to
improve mathematical, linguistic, and
engineering models to reduce conversion
errors, is proposed.
How to create and use a national cross-domain The experiences reported in this paper indicate
ontology and data infrastructure on the that creating and using a national semantic web
Semantic Web infrastructure is useful from the data
producers’ and data user’s point of view

Amalur: Data Integration Meets Machine This work explored the possibilities of
Learning bringing data integration and ML together.
Toward this direction, we have proposed a data
integration-aware ML system Amalur, which
supports machine learning training and
inference over silos

MobileCLIP: Fast Image-Text Models through In this work introduced MobileCLIP aligned
Multi-Modal Reinforced Training image-text backbones, designed for on-device
CLIP inference (low latency and size).
Introduction:

Artificial Intelligence (AI) has become a transformative technology,


influencing various aspects of human life, including business processes and
healthcare. Its ability to emulate cognitive functions like learning, reasoning, and
perception has led to groundbreaking innovations across diverse fields. One of the
most exciting applications of AI is its potential to augment and enhance human
sensory perception, allowing individuals to interact with their environment in new
and intuitive ways. This project focuses on developing AI-driven systems that bridge
the gap between human perception and machine intelligence, leveraging computer
vision, machine learning, and real-time data processing to provide users with
enhanced interactions with the world around them.

The rapid advancements in hardware and software, along with the increasing
accessibility of powerful AI models, have opened up new opportunities for
integrating intelligent systems into everyday tools and devices. This project aims to
contribute to this growing field by focusing on solutions that assist individuals,
particularly those who may benefit from enhanced sensory and cognitive aids.

The project aims to create practical AI-driven systems that demonstrate the
potential of technology and address real-world challenges. By utilizing advanced
algorithms and data processing techniques, the systems developed will highlight the
ability of AI to assist in complex tasks that require the interpretation and
understanding of vast amounts of sensory data. The primary motivation behind this
work is to design systems that can enhance everyday life, either by increasing
accessibility for those with impairments or offering innovative solutions to problems
that require advanced data interpretation.
Methodology:

Flowchart depicting Classification and Recognition of Images

The development of this project revolves around the integration of advanced


AI models that enhance human interaction with sensory data, such as visual inputs,
and deliver meaningful outputs. The project employs a combination of computer
vision and natural language processing techniques to implement two primary
components: image categorization and visual-to-speech accessibility. Each system
utilizes state-of-the-art AI models to achieve efficient processing and interpretation of
data.
The project incorporates Contrastive Language-Image Pretraining (CLIP), a
model designed to understand the relationship between visual and textual data. CLIP
leverages large-scale training on both images and their associated text descriptions,
allowing it to generate accurate and context-aware classifications of images. This
system enables the automatic tagging and categorization of images based on their
content without the need for extensive manual labeling. By using CLIP, the project
enhances the user’s ability to efficiently sort, organize, and retrieve visual data,
providing a streamlined solution for image management. CLIP’s adaptability across
diverse image domains ensures that the system can handle a wide range of visual
inputs, making it a versatile tool for practical applications.
This project integrates Bootstrapping Language-Image Pretraining (BLIP), a
model specifically designed to generate detailed descriptions of images. BLIP is used
to convert visual data from a live camera feed into natural language descriptions in
real-time, which are then converted into speech output. This system serves as an
accessibility tool for visually impaired individuals, enabling them to receive spoken
descriptions of their surroundings. BLIP’s ability to generate accurate and context-
rich captions from visual data ensures that users are provided with a clear and
relevant understanding of their environment. The real-time nature of this system
allows it to be used interactively, providing users with immediate feedback as they
move through different environments.

Both CLIP and BLIP are employed within the broader framework of this
project to demonstrate how AI models can be used to process sensory inputs, such as
images, and generate outputs that are accessible and useful to a wide range of users.
The integration of these models is done with scalability in mind, allowing the systems
to handle diverse use cases, ranging from personal data management to accessibility
solutions for individuals with disabilities. The systems developed under this
methodology are designed to be modular, ensuring that they can be adapted for
different contexts without extensive modification. By focusing on the generalized
application of AI for sensory augmentation, this project aims to showcase the
potential of these models to create intuitive, responsive systems that enhance human
experiences with minimal intervention. Through the careful use of AI, the project
ensures that each solution is both efficient and resource-conscious, maintaining
optimal performance across different hardware environments.
Conclusion:

In conclusion, this project demonstrates the potential of Artificial Intelligence


to significantly enhance human interaction with sensory data, making it more
accessible, efficient, and intuitive. By leveraging cutting-edge AI models like CLIP
and BLIP, the systems developed provide innovative solutions to real-world
challenges, whether in the realm of image categorization or assisting individuals with
visual impairments. Through the seamless integration of AI-driven tools into
everyday applications, this project showcases the versatility and practical benefits of
advanced technologies in addressing both personal and societal needs.

The systems developed are designed with adaptability and scalability in mind,
ensuring their usability across various contexts and environments. By focusing on
real-time processing and user-centric design, the project ensures that these AI
solutions are not only technically robust but also offer meaningful improvements in
quality of life and workflow efficiency. Moreover, this work highlights the broader
potential of AI in creating intelligent systems that can augment human capabilities
and extend the possibilities for interaction with the world around us.

As technology continues to evolve, projects like this pave the way for future
innovations that will further close the gap between human and machine intelligence.
The results of this project serve as a testament to the growing role AI plays in shaping
a more inclusive and efficient future, with applications that go beyond traditional
boundaries to redefine how we perceive and engage with data.
References:

1. S. V. Mahadevkar et al., "A Review on Machine Learning Styles in Computer

Vision—Techniques and Future Directions," in IEEE Access, vol. 10, pp.

107293-107329, 2022, doi: 10.1109/ACCESS.2022.3209825.

2. “Ayub Khan A, Laghari AA, Ahmed Awan S. Machine Learning in Computer

Vision: A Review.” EAI Endorsed Scal Inf Syst [Internet]. 2021 Apr. 21

3. https://arxiv.org/abs/2404.16296 “Research on Splicing Image Detection

Algorithms Based on Natural Image Statistical Characteristics”

arXiv:2404.16296

4. J. J. Bird and A. Lotfi, "CIFAKE: Image Classification and Explainable

Identification of AI-Generated Synthetic Images," in IEEE Access, vol. 12,

pp. 15642-15650, 2024

5. Li, M., Zhu, Z., Xu, R., Feng, Y., & Xiao, L. (2024). “Research on Image

Classification And Semantic Segmentation Model Based on Convolutional

Neural Network.” Journal of Computing and Electronic Information

Management, 12(3), 94-100.

Date: Time:

Project Guide Project Coordinator Head of the Department

(Prof. S.S. Mane) (Prof.M.B.Yelpale/Prof.P.S.Sajjanshetti) (Dr.S.P.Bendale)

You might also like