Final Report R
Final Report R
Final Report R
PROJECT SYNOPSIS
1) INTRODUCTION 5
2) OBJECTIVE 6
4) METHODOLOGY 8
6) SYSTEM DESIGN 10
7) TECHNOLOGY USED 11
8) MODULE USED 12
11) REFERENCES 28
PROJECT REPORT
1) ABSTRACT 30
2) SYSTEM ANALYSIS 31
2.1 IDENTIFICATION OF NEED
2.2 PREMINARY INVESTIGATION
2.3 FEASIBITY STUDY
2.4 PROJECT PLANNING
2.5 PROJECT SCHEDULING
2.6 SOFTWARE REQUIREMENT SPECIFICATION(SRS)
2.7 SOFTWARE ENGINEERING PARADIGM USED
2.8 DIAGRAMS
3) SYSTEM DESIGN 60
3.1 PROGRAM STRUCTURE
3.2 DATABASE DESIGN
3.3 MODULARIZATION DETAILS
3.4 DATA FLOW DIAGRAM
3.5 E-R DIAGRAM
3.6 CLASS DIAGRAM
3.7 SEQUENCE DIAGRAM
3.8 USER INTERFACE DESIGN
4) IMPLEMENTATION 81
4.1 CODING
4.2 CODE EFFICIENCY
4.3 PARAMETER PASSING / CALLING
4.4 VALIDATION CHECKS
5) TESTING 134
5.1 TESTING TECHNIQUES & STRATEGIES
5.2 CODE IMPROVEMENT
8) SCREENSHOTS 146
9.2 LIMITATIONS
There are a number of features that make this app useful, such as the ability to send emails, the ability to
open command prompts (such as your preferred IDE or notepad), and the ability to play music, as well as
the ability to run Wikipedia searches for you. Basic discussion is possible. There has been research on
the similarities and differences between various voice assistant devices and services
In the twenty-first century, whether it's your house or your automobile, everything is moving toward
automation. Over the last several years, technology has seen an incredible shift or growth. In today's
environment, you can communicate with your computer. As a human, how do you engage with a
computer? Obviously, you'll have to provide some input, but what if you don't type anything at all, but
instead use your own voice? With more than 70 percent of all intelligent voice assistant-enabled devices
running the Alexa platform, it is the dominating market leader (Griswold, 2018). Is it possible to have the
computer communicate with you in a similar way as a personal assistant? Isn't it possible that the
computer is not only providing you with the best results, but also suggesting a better alternative? Using
voice instructions to control a machine is a revolutionary new method of human-system interaction. We
need to utilise a voice to text API in order to interpret the input. Companies like Google and Amazon are
attempting to make this universally available. Aren't you awestruck that you can create reminders simply
by stating "Remind me to..." Set a timer for to wake me up, or set an alarm for to wake me up. A system
that can be installed anywhere in the neighbourhood and can be asked to assist you accomplish anything
for you only by speaking with the device has been developed to recognise the relevance of this issue.
Adding to this, you may link two of these gadgets together through Wi-Fi in the future to enable
communication between them. You may use this gadget on a daily basis, and it can help you perform
more effectively by continually reminding you of tasks and providing you with updates and notifications.
What's the point of it? Your voice, rather than an enter key, is becoming the ideal input method
Because the voice assistant is powered by Artificial Intelligence, the results it provides are very accurate
and efficient. Using an assistant reduces the amount of human work and time required to accomplish a
job; they eliminate the need for typing entirely and act as an additional person to whom we may converse
and delegate tasks. Science and the educational sector are also looking at whether these new gadgets can
aid education, as they do with every new ground breaking technology. In the past, personal computers
and tablet computers have had comparable issues (Algoufi, 2016; Gikas and Grant, 2013; Herrington and
Herrington, 2007).
We will be using Visual Studio Code to construct this project, and all of the py files were produced in
VSCode. The following modules and libraries were also utilised in my project: PyAudio, pyttsx3,
Wikipedia, Smtplib, pyAudio, OS, Webbrowser, and so on.
In today's world, virtual assistants are really helpful. It facilitates human existence in ways similar to
Using just vocal commands, run a computer or laptop. Using a virtual assistant saves time. We are able
to devote more time to other projects thanks to the help of a virtual assistant.
A virtual assistant is often a cloud-based application that works with devices connected to the
internet .As a means of developing a virtual assistant Python will take over our PC. Task oriented virtual
assistants are the most common kind of virtual assistant. The use of a remote assistance understanding of
and capacity to follow instructions. In a three-week study, Beirl et al. (2019) examined how Alexa was
used in the household. Studying how families use Alexa's new talents in music, storytelling, and gaming
was the goal of the research. A virtual assistant is a computer programme that is able to recognise and
respond to user requests. Clients' instructions are followed verbally and in writing. To put it simply,
they're ability to understand and react to human speech via the use of artificial voice syntheses. A variety
of voices are available. assistants on the market, such as Apple TV's Siri and Pixel phones' Google
Assistant An Alexa-powered smart speaker built on a Raspberry Pi and Microsoft Windows. There are
ten Cortanas in the world. Our own virtual assistant was produced in the same way as all other virtual
assistants. windows. This project would benefit greatly from the application of artificial intelligence
technologies. Python may also be used as the language, since python has a large number of well-known
libraries. A microphone is required to run this programme.
In this research, Savago et al. look at the usage of voice assistants by seniors (age 65 and above) (2019).
In order to better understand the usage of digital technology by older persons, the authors stress the need
of doing more study. Additionally, Kowalski et al. (2019) studied older individuals' usage of voice-
activated devices. The research included seven elderly persons. Voice-activated input and output devices.
Several diverse technologies, such as speech recognition, voice analysis, and language processing, are
used in this procedure. Natural language processing is used by virtual assistants to translate text and
voice input from users into actionable instructions. Audio signals are translated to digital signals when a
user instructs their personal virtual assistant to do a job.
OBJECTIVES
1. We need to create a voice assistant that will help the infants and children in such a way that they
can easily interact with the desktop voice assistant. Most of the voice assistant faces difficulties
while understanding the commands, they misinterpret the commands and then the commands gets
executed wrongly.
2. We will create a desktop voice assistant that will work even after having grammatical errors
while speaking so that will not be a barrier to use this Voice Assistant.
3. We will create a voice assistant that will take the commands from the user and then understands
what instructions users is giving and then perform it accordingly. The tasks may contains
performing queries, connecting to a device, or opening Netflix.
4. The integration of voice assistant with the system and applications. Integrate voice assistant in
such a way that it will perform the tasks seamlessly and efficiently.
5. Voice Assistant will provide the information also like providing updates regarding stocks,
weather, and sports.
6. Voice Assistant will ensure the security of user data. We will try to implement the encryption
methods and data handling.
PROPOSED PLAN OF WORK
Analyzing the user's microphone instructions was the first step in the process. Everything from
retrieving data to managing a computer's internal files falls under this category. Reading and
testing the cases from the literature cited above, this is an empirical qualitative investigation.
Programming is done according to books and internet resources, with the intention of
discovering best practises and a deeper grasp of Voice Assistant.
A limited subset of often asked questions (e.g., "what's the weather like today?") should be
prioritised by an ASR system while performing voice assistant activities. This is often done by
raising the weight of the language model (or reducing the AM by proxy) such that the probability
of recognising a frequent command or phrase is high. If the LM weight is too high, the model will
only produce sentences it has already seen. Dysfluencies such as sound repeats or prolongations
tend to generate mistakes in the AM component of the system, so we may exploit this trade-off.
There may be fewer repetitions with a lower level of resistance (LM) than with a higher level of
resistance (LM). While the default ASR decoder correctly identified "what is my brother's add
add add address" when applied to a dysfluent speech sample, increasing the LM weight improved
its accuracy to "what is my brother's address." Repetitive "EY d" sound repeats, which resulted in
the half word "add," were eliminated.
3.1 AI VOICE ASSISTANCE
AI role as a personal assistant, the end-user is assisted with everyday tasks such as general
human conversation, searching queries in various search engines like Google and Bing,
retrieving videos, live weather conditions, word meanings, searching for medicine details,
health recommendations based on symptoms, and reminding the user of the scheduled events
and tasks. Machine learning is used to determine the best course of action based on the user's
comments and requests.
Presently, Jarvis is being developed as an automation tool and virtual assistant. Among the
Various roles played by Jarvis are:
1. Search Engine with voice interactions
2. Medical diagnosis with Medicine aid.
3. Reminder and To-Do application.
4. Vocabulary App to show meanings and correct spelling errors.
5. Weather Forecasting Application.
Everything remains the same, even for a developer working on Linux who relies on running
queries. By allowing online searches for our voice assistant, we've fulfilled a critical need for
internet users. Node JS and the Selenium framework have been used in this example to both
extract and show the results from the web. Jarvis shows search results from a variety of search
engines, including Google, Bing, and Yahoo, by scraping the entered searches.
As a primary source of entertainment, videos have remained a top priority for virtual
assistants. These videos have a dual purpose: entertainment and education, since the majority
of educational and scientific endeavours now take place on YouTube. This facilitates a more
hands-on, outside-the-classroom learning experience.
The core Golang service manages a subprocess module that Jarvis uses to implement the
functionality. The Selenium WebDriver and the YouTube search query are scraped by this
service in a Node JS subprocess.
It is easier to send emails from Jarvis than it would be if you had to open the email account in
question. Jarvis eliminates the necessity for switching to another tab to do a common daily
job. Emails may be sent to the recipient of the user's choice. Once he selects Send mail, a form
will appear. Click the Send Email button after filling out the form.
METHODOLOGY
User input may be matched with executable instructions using Natural Language Processing
(NLP). An audio signal is translated into executable commands or digital data that may be
used by software to do a specific action when a user asks a query. VirtualAssistant is used to
operate machines based on your own instructions, and then this data is compared with software
data to obtain an appropriate solution. We utilise python installers like- to create virtual
assistants Horn proposes a classroom environment (2018). Each classroom should have
enough microphones to detect each student's voice and offer individualised replies to each
student's headphones via voice assistants, according to the author. Each classroom might have
a smart speaker where students can ask questions. Alternatively Teachers should have access
to voice assistant data in real time so they may step in as necessary. Teachers are not replaced
by the gadgets, but rather their job is amplified by their use of them.
Neiffer investigates the impact of intentional education using the intelligent voice assistant Siri
on student participation in science classes in upper elementary and middle school grades
(2018). Student involvement is connected with student graduation rates. High student
involvement leads to greater teacher’s satisfaction and pleasure. Research shows that there is
too much complexity in the relationship between technology and education to draw any firm
conclusions. Furthermore, there is no clear correlation between the use of Siri in 5th and
middle school science classrooms and an increase in students' interest in learning science. A
unique Alexa Skill on Scotland was made by Davie and Hilber (2018), who utilised it with
students prior to a trip to the country. Students utilised the Amazon Echo gadget and found the
talent to be interesting.
16
The proposed multi-domain ASR framework consists of three main modules: a basic ASR
module to conduct first-pass decoding and generate top N hypotheses of a speech query, a text
classification module to determine which domain the speech query belongs to, and a reranking
module to rescore n-best lists of the first-pass decoding output using domainspecific language
models. Figure 1 shows the diagram of the proposed multi-domain ASR framework
Speech recognition:
To translate spoken input into text, the system makes use of Google's online voice
recognition technology. A particular corpus of voice data is saved on a computer network
server at the information centre and then delivered to Google Cloud for speech recognition,
allowing users to talk and get the text as a result of their voice input. The voice assistant
application receives and sends the exact same text.
Backend in Python:
Python is used as the backend for the whole software. Context extraction, API calls,
and system calls are all types of calls that the Python Backend can distinguish between with the
use of a speech recognition module. The output is then provided back to the respondent.
The API's job is to act as a bridge between two programmes so that they may
communicate with one another. This means that APIs act as a messenger between the service
provider and the user, delivering their requests and subsequently returning their responses.
Content Extraction:
17
System Calls
For example, accessing the hard disc drive, creating new processes, and
communicating with process scheduling are all examples of System Calls. An key part of the
OS-process interaction is provided by this component.
Google-Text-to-Speech
For the most part, Text-To-Speech is used to turn user-provided Text into Speech.
Sound may be generated from the phonemic representation of the text by a TTS Engine once it
has been translated into waveform form. Third-party publishers have contributed a variety of
languages to the TTS's growing feature set.
1. PRESENT SYSTEM
Many current voice assistants, such as Alexa, Siri, Google Assistant, and Cortana, utilise
the language processing and speech recognition concepts that we are all acquainted with. They
pay attention to the user's instructions and carry out the requested task quickly and effectively.
Using Artificial Intelligence, these voice assistants are able to provide results that are
very accurate and efficient. Using these assistants, we may do more with less human effort and
time consumption, since they do not need any typing at all and act as if they were an actual
person to whom we were conversing and giving instructions. There is no comparison between
these helpers and a person, yet we can state that they are more effective and efficient at doing
any duty. Because of this, the method utilised to create these assistants minimises the amount
of time required.
2. PROPOSED SYSTEM
Creating my own personal helper was a fascinating challenge. With the use of a single
voice command, you can now send emails, search the internet, play music, and launch your
favourite IDE without ever having to open a browser. While most standard voice assistants rely
on an internet connection to get instructions, Jarvis is unique in that it is desktop-specific and
does not need a user account in order to use it.
VSCode is the IDE used in this project. Using VSCode, I was able to construct the
18
python files and install all of the essential dependencies. It was necessary to utilise the
following modules and libraries for this project, including pyttsx3, Speech Recognition and
Datetime. Using the JARVIS, I've constructed a live GUI that allows me to interact with it in a
more visually appealing way.
Tutor's growth means that he or she can complete any work as effectively as we can, or
even better. I discovered that the notion of AI in every sector reduces human work and saves
time via the creation of this project. Among the features of this project are the ability to send
emails and read PDF files; the ability to launch command prompt, your preferred IDE,
notepad, and other applications; the ability to play music; the ability to make Wikipedia
searches; and the ability to set up desktop reminders of your choosing. Basic discussion is
possible
The following functionalities will be included in the system as proposed:
1.) In order to respond to a call with the specified functionality, it always retains a list of its
name.
2.) In addition, it retains the sequence of inquiries asked of it in relation to its setting, which it
uses in the future. As a result, every time the identical situation is brought up, you'll be in a
position to bring up pertinent points of discussion.
3) Using voice instructions to do arithmetic computations and returning the results by voice.
4.)In this fourth step, the computer searches the Internet depending on the user's voice input
and returns a voice response with more interactive questions.
5) The data on its cloud server will maintain auto synchronisation up to date.
6.) Update the data in the cloud with the help of a Firebase server.
7.) User may connect smart devices and conduct actions such as turning on and off lights with
the assistance of the IoT architecture.
8.) Push notifications, such as email or text messages, may be used to alert the owner of a
smartphone.
9.) Some more options include playing music, setting an alarm, and monitoring local weather
conditions. The use of reminders, spell-checks, etc
REQUIREMENTS
Software requirements
Hardware requirements
Intel core i3
4gb RAM
30 Gb Hard drive space
SYSTEM DESIGN
Speech-to-Text Interface
The goal of voice recognition is to offer a way to convert spoken words into written ones. This
objective may be achieved in a variety of ways. Building models for each word that has to be
identified is the simplest method. Speech signal mainly transmits the words or message being
said. The underlying meaning of the utterance is the focus of speech recognition. Extracting
and modelling the speech-dependent properties that may successfully differentiate one word
from another is the key to success in speech recognition The system consists of a set of
components.
Due to the fact that all systems are based on machine learning and employ vast quantities of
data acquired from different sources and then trained on them, the source of this data plays a
vital part in their production. The kind of assistance that emerges depends on the quantity of
data gathered from various sources. Despite the wide variety of learning methodologies,
algorithms, and techniques, the basic building blocks of these systems remain essentially the
same across the industryAssistive technology A virtual assistant is often a cloud-based
application that works with devices connected to the internet. is the ability to contract for just
the services they need. As a means of developing a virtual assistant Python will take over your
PC. Task-oriented virtual assistants are the most common kind of virtual assistant. The use of a
remote assistance understanding of and capacity to follow instructions
20
DATA TRAFFIC.
HELLO
Start
The system is built on the idea of Artificial Intelligence and the relevant Python packages.
pyttsx3 can read PDFs using Python's various libraries and packages, such as python. Chapter
3 of this study goes into depth about these packages.
Everything in this project is based on human input, thus the assistant will do anything the user
commands it to do. Everything a user wishes to be done, in human language, may be entered as
a list of tasks. English.
Student involvement is connected with student graduation rates. High student involvement
leads to greater teacher’s satisfaction and pleasure. Research shows that there is too much
complexity in the relationship between technology and education to draw any firm conclusions.
22
DFD 0
DFD 1
DFD 2
USE CASE DIAGRAM
SEQUENCE DIAGRAM
TECHNOLOGY USED
Speech Recognition: Python's Speech Recognition library allows you to easily convert speech
to text, enabling your voice assistant to understand user commands and queries.
Natural Language Processing (NLP): Libraries such as nltk and spaCy can be used for NLP
tasks, such as tokenization, part-of-speech tagging, and named entity recognition, to better
understand the context of user inputs.
Text-to-Speech (TTS): Python provides libraries like pyttsx3 and gTTS for converting text to
speech, allowing your voice assistant to respond to user queries in a natural and human-like
voice.
User Interface: Python offers various GUI libraries such as tkinter and PyQt for creating
interactive user interfaces for your voice assistant, making it easy for users to interact with the
assistant.
Integration with APIs and Services: Python's requests library allows you to easily make HTTP
requests to APIs and services, enabling your voice assistant to fetch information from the
internet or interact with external services.
Platform Independence: Python is platform-independent, meaning your voice assistant can run
on different operating systems, such as Windows, macOS, and Linux, without modification.
Ease of Development: Python's simple and readable syntax makes it easy to develop and
maintain code, speeding up the development process for your voice assistant.
Community Support: Python has a large and active community of developers, providing access
to a wealth of libraries, tutorials, and resources to help you build your voice assistant.
By leveraging Python's features and libraries, you can create a simple desktop voice assistant that can
understand user commands, retrieve information, and provide helpful responses, enhancing the user's
desktop experience.
Modules that will be used for desktop voice assistant
You can utilize multiple modules and libraries to handle different areas of the assistant's functionality
when creating a desktop voice assistant in Python. The following are some essential Python modules
and libraries that are frequently used to create voice assistants:
In order to automate many of the routine desktop operations, such as playing music or
launching your preferred IDE, TUTOR, a desktop assistant, utilises a voice assistant.
While most standard voice assistants rely on an internet connection to get instructions,
Jarvis is unique in that it is desktop-specific and does not need a user account in order to
use it.
Installing all of the required packages and libraries is a good place to start. Installing the
libraries is as simple as running "pip install" and then importing the results. The
following components are included in the set:FUNCTIONS
1.) Use takeCommand() to get a command from the user's microphone and return it as
a string with the function's output.
2.) Good Morning, Good Afternoon, and Good Evening are some of the
greetings that wishMe() may send to the user based on the current time.
3.) SendEmail(), pdf reader(), news(), and numerous conditions in if conditions like
"open google," "open notepad," "search on Wikipedia," "play music," and so on and so forth are all
defined in taskExecution().
Without a doubt, the effectiveness and efficiency of Tutor as a voice assistant make it a
valuable tool for busy users. These limitations and opportunities for improvement were
discovered while working on this project, which are outlined in the following
sections.Artificial Intelligence and Natural Language Processing will be used to create a
voice-
28
activated personal assistant that can operate IoT devices and even search the web for answers
to specific questions. There are various subsystems that may be automated to reduce the
amount of time and effort required to communicate with the main system. The system's goal
is to make human existence as pleasant as possible. In further detail, this system is meant to
communicate intelligently with other subsystems and operate these devices, including
Internet of Things (IoT) devices or receiving news from the Internet, delivering other
information, obtaining customised data previously kept on the system, and so on. The
Android app should allow the user to add data, such as calendar entries, alarms, or reminders,
to the app. All of these platforms will be made more accessible with the help of the software,
which will go through the following stages: voice data collecting, analysis, text conversion,
data storage, and speech generation from text output processed via these stages. The data
collected at each stage may be utilised to identify trends and provide recommendations to the
user. Artificial intelligence devices that can learn and comprehend their users may utilise this
as a significant foundation. It has been determined that the suggested system would not only
make it easier for us to interface with other systems and modules, but it also helps us stay
organised. With a little help from the device we can help build a new generation of voice-
controlled devices and bring about a long-term change in the automation industry. A
prototype for a wide range of future applications can be found in this paper.
As a result, voice recognition systems have made their way into a wide range of industries.
The use of speech signals as input to a system is one of the many advantages of IVR
(Interactive Voice Response) systems. This is why we proposed the creation of an Interactive
Voice Response (IVR) system that includes automatic speech recognition (ASR). It was the
primary goal of the project to design a system that could recognise speech signals in the
Nepali language
SCOPE FOR FUTURE WORK
● For further protection, voice instructions may be encrypted. the severely disabled or those who have
suffered minor repetitive stress injuries, i.e., those who may need the assistance of others to manage their
surroundings.
He use of an IVR system is increasing on a daily basis. Such technologies make it easier for the user to
communicate with the computer system, which in turn facilitates the completion of a variety of activities. The
IVR system acts as an intermediary between humans and computers. Due to the time and research constraints,
the existing IVR system is only suitable to desktop computers and will not be implemented in real phone
devices. This is a disadvantage since the IVR system with Automatic Voice Recognition (AVR) may be used
in a wide range of applications. Although the project is still in its infancy, there is plenty of room for
improvement in the years to come. The following are some of the places that may be relevant:
1.Organizational inquiry desk: The system may be utilised in different organisations for simple access to
information about the organisation using the voice command.
2.Detection of isolated words is all that the suggested system does, but it might be expanded to include audio
to text conversion with additional improvements in algorithms.
3.It is possible to employ voice recognition of Nepali phrases in order to accomplish the work of freshly
generated apps and therefore create more user pleasant applications.
4 In embedded systems, voice commands may be used to handle multiple activities using speech recognition
technology. This promotes automation of labour and can thus be very advantageous in industrial process
automation.process automation
5.Application for People with Disabilities: People with disabilities may also benefit from voice recognition
software. It is particularly beneficial for those who are unable to use their hands.
REFERENCES
[1] Diksha Goutam” A REVIEW: DESKTOP VOICE ASSISTANT”,IJRASET,
Volume:05/Issue:01/January-2023, e-ISSN: 2582-5208
[2] Asodariya, H., Vachhani, K., Ghori, E., Babariya, B., & Patel, T. Desktop Voice Assistant.
[3] G Gaurav Agrawal*1, Harsh Gupta*2, Divyanshu Jain*3 , Chinmay Jain*4 , Prof. Ronak
Jain*5,” DESKTOP VOICE ASSISTANT” International Research Journal of Modernization in
Engineering Technology and Science Volume:02/Issue:05/May-2020.
[4] Bandari , Bhosale , Pawar , Shelar , Nikam , Salunkhe (2023). Intelligent Desktop Assistant.
2023 JETIR June 2023, Volume 10, Issue 6, www.jetir.org (ISSN-2349-5162)
[5] Vishal Kumar Dhanraj Lokeshkriplani , Semal Mahajan: ISSN (Online): 2320-9364, ISSN
(Print): 2320-9356 www.ijres.org Volume 10 Issue 2 ǁ 2022 ǁ PP. 15-20
[6] Ujjwal Gupta, Utkarsh Jindal, Apurv Goel, Vaishali Malik ” Desktop Voice
Assistant”,IJRASET, 2022-05-08
[7] Vishal Kumar Dhanraj (076) Lokeshkriplani (403) Semal Mahajan (427) ” Research Paper
onDesktop Voice Assistant” (IJRES) ISSN (Online): 2320-9364, ISSN (Print): 2320-9356
www.ijres.org Volume 10 Issue 2 ǁ 2022 ǁ PP. 15-20.
[8] V. Geetha, C.K.Gomathy, Kottamasu Manasa Sri Vardhan, Nukala Pavan Kumar,“The Voice
Enabled Personal Assistant for Pc using Python,”International Journal of Engineering and Advanced
Technology (IJEAT), April 2021.
[9] Chen, X., Liu, C., & Guo, W. (2020). A Survey on Voice Assistant Systems. IEEE Access, 8,
27056-27070
PROJECT
ABSTRACT
Voice assistants have improved accessibility and convenience across a range of devices in recent years,
becoming indispensable components of everyday life. The goal of this project is to create a Desktop
Voice Assistant system that will enable smooth voice-activated desktop computer interaction. To
comprehend customer inquiries and complete tasks quickly, the Desktop Voice Assistant system
combines speech recognition, natural language processing (NLP), and text-to-speech (TTS) algorithms.
In order to help users with tasks like online browsing, scheduling, reminders, and information retrieval,
the project's main goals are to build an intuitive user interface, implement strong speech recognition
capabilities, and integrate a varied variety of functionality. The Desktop Voice Assistant system uses
cutting-edge natural language processing (NLP) models to accurately understand user commands and
provide timely, pertinent information or actions in response. By using Visual Studio Code for
development with modules like Wikipedia, PyAudio, and pyttsx3, the project shows how Python can be
used to create complex voice assistant systems that are both feasible and adaptable. Voice assistants are
positioned to become commonplace companions as technology develops, streamlining activities and
improving human-computer connection. The potential for voice assistants to completely transform daily
life is growing thanks to continuous developments in artificial intelligence and speech recognition
technologies, which present fresh chances for creativity and effectiveness.
Problem Statement
Design and develop a desktop voice assistant application to enhance user productivity and accessibility
within a computer environment. The voice assistant should provide seamless interaction through voice
commands, catering to a diverse range of user needs and tasks.
SYSTEM ANALYSIS
Accessibility: Voice assistants improve accessibility for users with disabilities, allowing
them to use computers more easily and efficiently.
Increased Productivity: Voice assistants can help users complete tasks more quickly
and efficiently, reducing the time spent on manual input and navigation.
Natural Interaction: Voice interaction provides a more natural and intuitive way to
interact with computers, making technology more accessible to a wider range of users.
Personalization: Voice assistants can be personalized to understand and respond to
individual user preferences, providing a customized user experience.
Multitasking: Voice assistants allow users to multitask more effectively by enabling them
to perform tasks while keeping their hands and eyes focused on other activities.
Efficient Information Retrieval: Voice assistants can quickly retrieve information
from the internet or other sources, saving users time and effort.
Improved User Experience: Voice assistants can enhance the overall user experience
by providing a more interactive and engaging interface.
Integration with Other Applications: Voice assistants can be integrated with other
desktop applications, such as calendars, email clients, and task managers, to provide a
seamless user experience.
Future Technology Trends: Voice technology is a growing trend in computing, and
developing a desktop voice assistant can help users adapt to and benefit from future
technological advancements.
By conducting a thorough investigation across these areas, you can gain valuable insights into the
strengths and weaknesses of desktop voice assistants and make informed decisions about their
suitability for specific use cases.
Target Market
Based on your research, define your target market for the voice assistant. Consider factors
such as demographics, interests, and needs of the target market. This will help you tailor your
marketing efforts and product features to attract and retain users.
Cost Estimation
Estimate the costs associated with developing and maintaining the voice assistant. This
includes costs for hardware, software, personnel, and any other resources needed for
development. Consider both one-time costs for initial development and ongoing costs for
maintenance and updates.
Revenue Generation and Cost-Benefit Analysis
Identify potential revenue streams for the voice assistant. This may include selling the
application, offering premium features through a subscription model, or integrating
advertisements. Estimate the potential revenue from each stream based on market research
and competitive analysis and also conduct a cost-benefit analysis to assess the financial
viability of the project. Compare the estimated costs with the potential revenue to determine
if the project is financially feasible.
It includes some important points to be considered for Legal and Regulatory Feasibility-
Integration
Assess how easily the voice assistant can be integrated into existing desktop environments.
Consider factors such as compatibility with different operating systems and software
applications. Determine if any modifications or additional resources will be needed to ensure
smooth integration.
Resource Availability
Determine if there are enough resources, such as time, manpower, and expertise, available to
develop and maintain the voice assistant. Consider if additional resources may be needed and
if they can be obtained within the project's constraints. Ensure that there is adequate support
for the voice assistant's operation and maintenance after deployment.
2.3.6. Risk Analysis
Risk analysis identifies potential risks and uncertainties that could affect the success of the
project. It involves assessing the likelihood and impact of these risks and developing
strategies to mitigate them.
Identifying Risks
Identify potential risks that could impact the development and implementation of the voice
assistant. This may include technical challenges, such as difficulties with speech recognition
or natural language processing, as well as external factors like changes in user preferences or
market conditions.
-Facilitating communication
- Monitoring/measuring the project progress, and
- Provides overall documentation of assumptions/planning decisions
The Project Planning Phases can be broadly classified as follows:
-Development of the Project Plan
- Execution of the Project Plan
- Change Control and Corrective Actions
Project Planning spans across the various aspects of the Project. Generally Project Planning is
considered to be a process of estimating, scheduling and assigning the projects resources in order to
deliver an end product of suitable quality. However it is much more as it can assume a very strategic
role, which can determine the very success of the project. A Project Plan is one of the crucial steps in
Project Planning in General!
Typically Project Planning can include the following types of project Planning:
1) Project Scope Definition and Scope Planning
2) Project Activity Definition and Activity Sequencing
3) Time, Effort and Resource Estimation
4) Risk Factors Identification
5) Cost Estimation and Budgeting
6) Organizational and Resource Planning
7) Schedule Development
8) Quality Planning
9) Risk Management Planning
10) Project Plan Development and Execution
11) Performance Reporting
12) Planning Change Management
13) Project Rollout Planning
2) Quality Planning:
The relevant quality standards are determined for the project. This is an important aspect of Project
Planning. Based on the inputs captured in the previous steps such as the Project Scope,
Requirements, deliverables, etc. various factors influencing the quality of the final product are
determined. The processes required to deliver the Product as promised and as per the standards are
defined.
Project Scheduling is one of the most important task of Project Planning and also the most difficult
tasks. In very large projects it is possible that several teams work on developing the project. They
may work on it in parallel. However their work may be interdependent.
Popular Tools can be used for creating and reporting the schedules such as Gantt Charts
Program evaluation and review technique (PERT) and critical path method (CPM) are two project scheduling
methods that can applied to software development.
PERT chart for this application software is illustrated below in the figure, while critical path for this is design,
Code Generation and Integration & Testing.
Planning
Requirements
Design
Estimation
Development
Testing
Implementation
JUNE JULY AUG SEP NOV DEC FEB
INTEGRATION
CODING
&
TESTING
5th june to 10th Aug 2009
FINISH
PERT CHART
2.6 Software requirement specifications (SRS)
Requirements Analysis is done in order to understand the problem for which the software system is to solve.
Once the problem is analyzed and the essentials understood, the requirements must be specified in the
requirement specification document. For requirement specification in the form of document, some
specification language has to be selected (example: English, regular expressions, tables, or a combination of
these). The requirements documents must specify all functional and performance requirements, the formats of
inputs, outputs and any required standards, and all design constraints that exits due to political, economic
environmental, and security reasons. The phase ends with validation of requirements specified in the
document. The basic purpose of validation is to make sure that the requirements specified in the document,
actually reflect the actual requirements or needs, and that all requirements are specified. Validation is often
done through requirement review, in which a group of people including representatives of the client, critically
review the requirements specification.
A condition or capability that must be met or possessed by a system to satisfy a contract, standard,
specification, or other formally imposed document.
2.7 SOFTWARE ENGINEERING PARADIGM USED
The development of a desktop voice assistant typically involves the application of various software
engineering paradigms to ensure efficient design, implementation, and maintenance. Here are some
key paradigms commonly used in creating desktop voice assistants:
2.8 DIAGRAMS
USE CASE DIAGRAM
SYSTEM DESIGN
User
Key Text
Value Text
Lock Boolean
Password Text
Question
Qid Integer PRIMARY KEY
Query Text
Answer Text
Task
Tid Integer
PRIMARY KEY
Priority Integer
Reminder
Rid Integer PRIMARY KEY
Tid Integer FOREIGN KEY
What Text
When Time
On Date
Note
Nid Integer PRIMARY KEY
Data Text
Priority Integer
DFD 1
DFD 2
1. Minimalistic Interface:
Keep the interface clean and uncluttered to avoid overwhelming the user.
Prioritize essential features and information, and avoid unnecessary visual elements.
3. Feedback Mechanisms:
Provide feedback to users to confirm that their commands have been recognized and
understood.
Use visual and auditory cues, such as animations or voice responses, to acknowledge
user input.
5. Contextual Information:
Provide contextually relevant information based on the user's current interaction or
task.
Display relevant data, such as weather updates, upcoming events, or recent
notifications, when appropriate.
6. Customizable Preferences:
Allow users to customize their preferences and settings, such as language, voice, or
preferred applications.
Provide options for adjusting the assistant's behavior and personalizing the user
experience.
7. Accessibility Features:
Ensure accessibility for users with disabilities by incorporating features such as voice
commands, keyboard shortcuts, or screen reader compatibility.
Design the interface to be inclusive and accessible to all users, regardless of their
abilities.
8. Error Handling:
Design error messages and recovery mechanisms to help users understand and resolve
any issues that may arise.
Provide clear instructions on how to correct errors or retry commands, and offer
assistance when needed.
IMPLEMENTATION
4.1 CODING
import subprocess
import wolframalpha
import pyttsx3
import tkinter
import json
import random
import operator
import speech_recognition as sr
import datetime
import wikipedia
import webbrowser
import os
import winshell
import pyjokes
import feedparser
import smtplib
import ctypes
import time
import requests
import shutil
engine = pyttsx3.init('sapi5')
voices = engine.getProperty('voices')
engine.setProperty('voice', voices[1].id)
def speak(audio):
engine.say(audio)
engine.runAndWait()
def wishMe():
hour = int(datetime.datetime.now().hour)
else:
speak(assname)
def username():
uname = takeCommand()
speak("Welcome Mister")
speak(uname)
columns = shutil.get_terminal_size().columns
print("#####################".center(columns))
print("#####################".center(columns))
def takeCommand():
r = sr.Recognizer()
print("Listening...")
r.pause_threshold = 1
audio = r.listen(source)
try:
print("Recognizing...")
except Exception as e:
print(e)
return "None"
return query
server.ehlo()
server.starttls()
server.close()
if __name__ == '__main__':
clear()
wishMe()
username()
while True:
query = takeCommand().lower()
if 'wikipedia' in query:
speak('Searching Wikipedia...')
speak("According to Wikipedia")
print(results)
speak(results)
elif 'open youtube' in query:
webbrowser.open("youtube.com")
webbrowser.open("google.com")
webbrowser.open("stackoverflow.com")
# music_dir = "G:\\Song"
music_dir = "C:\\Users\\GAURAV\\Music"
songs = os.listdir(music_dir)
print(songs)
codePath = r"C:\\Users\\GAURAV\\AppData\\Local\\Programs\\Opera\\launcher.exe"
os.startfile(codePath)
try:
speak("What should I say?")
content = takeCommand()
sendEmail(to, content)
except Exception as e:
print(e)
try:
content = takeCommand()
to = input()
sendEmail(to, content)
except Exception as e:
print(e)
assname = query
assname = takeCommand()
speak(assname)
exit()
speak(pyjokes.get_joke())
client = wolframalpha.Client(app_id)
indx = query.lower().split().index('calculate')
answer = next(res.results).text
webbrowser.open(query)
os.startfile(power)
ctypes.windll.user32.SystemParametersInfoW(20,
0,
"Location of wallpaper",
0)
appli = r"C:\\ProgramData\\BlueStacks\\Client\\Bluestacks.exe"
os.startfile(appli)
try:
data = json.load(jsonObj)
i=1
print(item['description'] + '\n')
i += 1
except Exception as e:
print(str(e))
ctypes.windll.user32.LockWorkStation()
subprocess.call('shutdown / p /f')
speak("for how much time you want to stop jarvis from listening commands")
a = int(takeCommand())
time.sleep(a)
print(a)
location = query
speak(location)
subprocess.call(["shutdown", "/r"])
speak("Hibernating")
subprocess.call("shutdown / h")
time.sleep(5)
subprocess.call(["shutdown", "/l"])
note = takeCommand()
snfm = takeCommand()
file.write(strTime)
file.write(" :- ")
file.write(note)
else:
file.write(note)
elif "show note" in query:
speak("Showing Notes")
print(file.read())
speak(file.read(6))
speak("After downloading file please replace this file with the downloaded one")
total_length = int(r.headers.get('content-length'))
if ch:
Pypdf.write(ch)
wishMe()
speak(assname)
city_name = takeCommand()
response = requests.get(complete_url)
x = response.json()
if x["code"] != "404":
y = x["main"]
current_temperature = y["temp"]
current_pressure = y["pressure"]
current_humidiy = y["humidity"]
z = x["weather"]
weather_description = z[0]["description"]
print(" Temperature (in kelvin unit) = " +str(current_temperature)+"\n atmospheric pressure (in hPa
unit) ="+str(current_pressure) +"\n humidity (in percentage) = " +str(current_humidiy) +"\n description = "
+str(weather_description))
else:
message = client.messages \
.create(
body = takeCommand(),
from_='Sender No',
to ='Receiver No'
print(message.sid)
webbrowser.open("wikipedia.com")
speak(assname)
speak("I'm not sure about, may be you should give me some time")
client = wolframalpha.Client("API_ID")
res = client.query(query)
try:
print (next(res.results).text)
speak (next(res.results).text)
except StopIteration:
1. Algorithm Selection:
Choose algorithms and data structures that are well-suited for the tasks performed by
the voice assistant.
Opt for efficient algorithms with lower time complexity for tasks such as speech
recognition, natural language processing, and task execution.
3. Resource Management:
Manage system resources, such as memory and CPU usage, efficiently to prevent
bottlenecks and improve performance.
Implement techniques like lazy loading or caching to reduce resource consumption
and improve responsiveness.
By employing these strategies, you can enhance the code efficiency of a desktop voice assistant,
resulting in better performance, reduced resource consumption, and an overall improved user
experience.
4.3 PARAMETER PASSING / CALLING
In the implementation of a desktop voice assistant, parameter passing and calling are fundamental
concepts for passing information between different components of the system. Here's how parameter
passing and calling might be utilized:
def process_input(transcribed_text):
task_execution_module.execute_command(command, parameters)
def get_voice_input():
audio = speech_recognition_module.capture_audio()
return audio
def capture_audio():
# Logging Module
def log_error(error_message):
# Log error message to a file or console
Parameter passing and calling facilitate the flow of information between different modules or
components of the desktop voice assistant, enabling seamless communication and collaboration
within the system. By carefully designing and implementing these interactions, you can create a
robust and efficient voice assistant application.
4.4 VALIDATION CHECKS
1. Functionality Testing:
Verify that the voice assistant accurately understands and responds to user
commands.
Test a variety of commands and queries to ensure comprehensive coverage.
Check for proper handling of errors and fallback mechanisms when the assistant
doesn't understand a command.
4. Performance Testing:
Measure the response time of the voice assistant to user inputs.
Assess the system's performance under different loads and usage scenarios.
Check for any latency issues or delays in processing user requests.
5. Compatibility Testing:
Validate that the voice assistant works seamlessly on different desktop platforms
(e.g., Windows, macOS, Linux).
Ensure compatibility with various hardware configurations and system settings.
Test the assistant across different web browsers if it has a web-based component.
TESTING
2. Integration Testing:
Test the interaction and integration between different components of the voice
assistant.
Validate the flow of data and control between modules, such as speech recognition,
natural language understanding, and task execution.
Use integration tests to verify the behavior of the system as a whole, including the
handling of various user inputs and scenarios.
3. End-to-End Testing:
Conduct end-to-end tests to validate the entire user journey and interaction flow of
the voice assistant application.
Test common user scenarios, from initiating voice commands to receiving and
verifying the assistant's responses.
Use real-world user inputs or scripted scenarios to simulate typical usage patterns and
evaluate the system's behavior under different conditions.
4. Usability Testing:
Evaluate the usability and user experience of the voice assistant through usability
testing.
Gather feedback from real users to assess the intuitiveness, effectiveness, and
satisfaction of the voice interaction interface.
Identify usability issues, navigation difficulties, and areas for improvement based on
user feedback and observations.
5. Performance Testing:
Measure and evaluate the performance characteristics of the voice assistant
application.
Conduct performance tests to assess factors such as response time, latency,
throughput, and resource utilization under varying loads and conditions.
Identify performance bottlenecks, scalability limitations, and areas for optimization to
ensure the voice assistant meets performance requirements.
6. Security Testing:
Perform security testing to identify and mitigate potential vulnerabilities and threats
in the voice assistant application.
Assess the application's security posture by conducting penetration testing, code
reviews, and vulnerability assessments.
Verify the implementation of security best practices, such as data encryption, access
controls, and secure communication protocols, to protect sensitive information and
ensure user privacy.
7. Regression Testing:
Conduct regression testing to validate that recent changes or updates to the voice
assistant application do not introduce new defects or regressions in functionality.
Maintain a suite of automated regression tests to systematically verify the behavior of
the system across different use cases and scenarios.
8. Accessibility Testing:
Evaluate the accessibility of the voice assistant application to ensure it is usable by
individuals with disabilities.
Test for compliance with accessibility standards and guidelines, such as the Web
Content Accessibility Guidelines (WCAG), to support users with diverse needs and
abilities.
By employing a comprehensive testing approach that incorporates these techniques and strategies,
you can ensure the quality, reliability, and effectiveness of the desktop voice assistant application.
4. Parameterized Tests:
Use parameterized tests to test a function or method with different sets of input
parameters and expected outputs.
Parameterized tests help increase test coverage and reduce code duplication by testing
multiple scenarios with a single test case.
7. Mutation Testing:
Implement mutation testing to assess the quality of the test suite by introducing small
changes (mutations) to the code and checking if the tests detect these mutations.
Use mutation testing tools to measure the effectiveness of the test suite in detecting
faults and identifying areas for improvement.
By incorporating these code improvement strategies into the testing process, you can enhance the
reliability, maintainability, and quality of the desktop voice assistant project.
SYSTEM SECURITY
Ensuring the security of a desktop voice assistant project is essential to protect user privacy, data
integrity, and system confidentiality. Here are some key aspects to consider for system security:
1. Data Encryption:
Encrypt sensitive data, such as user voice recordings, command history, and personal
information, to prevent unauthorized access or interception.
Use strong encryption algorithms and key management practices to secure data both
at rest and in transit.
2. Access Control:
Implement access control mechanisms to restrict access to sensitive functionality and
resources within the voice assistant application.
Authenticate users and enforce proper authorization levels to prevent unauthorized
users from accessing privileged features or data.
3. Secure Communication:
Use secure communication protocols, such as HTTPS, SSL/TLS, or SSH, to encrypt
data exchanged between the voice assistant application and external services or APIs.
Verify the authenticity of remote endpoints and validate server certificates to prevent
man-in-the-middle attacks.
4. User Authentication:
Implement strong user authentication mechanisms, such as multi-factor authentication
(MFA) or biometric authentication, to verify the identity of users accessing the voice
assistant application.
Enforce password policies, session management, and account lockout mechanisms to
protect against unauthorized access and brute force attacks.
5. Secure Storage:
Store sensitive data securely using encrypted storage mechanisms and access controls
to prevent unauthorized access or data leakage.
Follow best practices for secure configuration and management of databases, file
systems, and other storage repositories.
6. Input Validation:
Validate and sanitize user input to prevent injection attacks, such as SQL injection or
cross-site scripting (XSS), which could compromise the security of the voice assistant
application.
Use input validation libraries or frameworks to enforce data integrity and mitigate the
risk of common security vulnerabilities.
8. Vulnerability Management:
Regularly scan the voice assistant application for security vulnerabilities using
automated scanning tools, vulnerability databases, and security assessments.
Maintain an up-to-date inventory of software dependencies and third-party libraries,
and promptly apply security patches and updates to address known vulnerabilities.
By incorporating these security measures into the design, development, and operation of the desktop
voice assistant project, you can mitigate security risks and safeguard the integrity, confidentiality,
and availability of the system and its data.
SCREENSHOTS
CONCLUSION & FUTURE SCOPE
We've covered Python-based Personal Virtual Assistants for Windows in this article. Humans' lives are made
simpler by virtual assistants. Using a virtual assistant gives you the freedom to contract for just the services
you need.. Python is used to create virtual assistants for all Windows versions, much as Alexa, Cortana, Siri,
and Google Assistant. Artificial Intelligence is used in this project, and virtual personal assistants are an
excellent method to keep track of your calendar. Because of their portability, loyalty, and availability at any
moment, virtual personal assistants are more dependable than human personal assistants. Our virtual assistant
will get to know you better and be able to provide ideas and follow orders. This device will most likely be
with us for the rest of our lives It is possible to enhance education by using immersive technology.
Voice assistants may help students study in new and innovative ways. This article contains studies on the use
of AI voice assistants in education. There hasn't been a lot of study done on voice assistants yet, but that's
about to change. New discoveries could be made in the future as a result of this study's results. Next years will
be all about voice-activated devices like smart speakers and virtual assistants. Exactly how they will be most
successful in the classroom is still a mystery. As a result, not all voice assistants are bilingual, and this might
be problematic. Additionally, voice assistants lack sufficient security safeguards and protection filters that
students may use in the classroom. The use of these devices in the classroom can only be successful if
instructors are given the proper training and incentives to do so. Despite the fact that most students and
teachers have reported positive results, the data are sparse, fragmentary, and unstructured. More research is
required to better understand the use of these devices in the classroom, according to our findings so far
9.1 LIMITATIONS
● The lack of voice command encryption raises concerns about the project's overall security.
● Unlike Google Assistant, which can be accessed by saying, "Ok Google! ", TUTOR cannot be accessed
externally at any time.”
● For further protection, voice instructions may be encrypted. the severely disabled or those who have
suffered minor repetitive stress injuries, i.e., those who may need the assistance of others to manage their
surroundings. Such technologies make it easier for the user to communicate with the computer system, which
in turn facilitates the completion of a variety of activities. The IVR system acts as an intermediary between
humans and computers.
Due to the time and research constraints, the existing IVR system is only suitable to desktop computers and
will not be implemented in real phone devices. This is a disadvantage since the IVR system with Automatic
Voice Recognition (AVR) may be used in a wide range of applications. Although the project is still in its
infancy, there is plenty of room for improvement in the years to come. The following are some of the places
that may be relevant:
1.Organizational inquiry desk: The system may be utilised in different organisations for simple access to
information about the organisation using the voice command.
2.Detection of isolated words is all that the suggested system does, but it might be expanded to include audio
to text conversion with additional improvements in algorithms.
3.It is possible to employ voice recognition of Nepali phrases in order to accomplish the work of freshly
generated apps and therefore create more user pleasant applications..
4 In embedded systems, voice commands may be used to handle multiple activities using speech recognition
technology. This promotes automation of labour and can thus be very advantageous in industrial process
automation.
5.Application for People with Disabilities: People with disabilities may also benefit from voice recognition
software. It is particularly beneficial for those who are unable to use their hands.
REFERENCES
[1] Diksha Goutam” A REVIEW: DESKTOP VOICE ASSISTANT”,IJRASET,
Volume:05/Issue:01/January-2023, e-ISSN: 2582-5208
[2] Asodariya, H., Vachhani, K., Ghori, E., Babariya, B., & Patel, T. Desktop Voice Assistant.
[3] G Gaurav Agrawal*1, Harsh Gupta*2, Divyanshu Jain*3 , Chinmay Jain*4 , Prof. Ronak
Jain*5,” DESKTOP VOICE ASSISTANT” International Research Journal of Modernization in
Engineering Technology and Science Volume:02/Issue:05/May-2020.
[4] Bandari , Bhosale , Pawar , Shelar , Nikam , Salunkhe (2023). Intelligent Desktop Assistant.
2023 JETIR June 2023, Volume 10, Issue 6, www.jetir.org (ISSN-2349-5162)
[5] Vishal Kumar Dhanraj Lokeshkriplani , Semal Mahajan: ISSN (Online): 2320-9364, ISSN
(Print): 2320-9356 www.ijres.org Volume 10 Issue 2 ǁ 2022 ǁ PP. 15-20
[6] Ujjwal Gupta, Utkarsh Jindal, Apurv Goel, Vaishali Malik ” Desktop Voice
Assistant”,IJRASET, 2022-05-08
[7] Vishal Kumar Dhanraj (076) Lokeshkriplani (403) Semal Mahajan (427) ” Research Paper
onDesktop Voice Assistant” (IJRES) ISSN (Online): 2320-9364, ISSN (Print): 2320-9356
www.ijres.org Volume 10 Issue 2 ǁ 2022 ǁ PP. 15-20.
[8] V. Geetha, C.K.Gomathy, Kottamasu Manasa Sri Vardhan, Nukala Pavan Kumar,“The Voice
Enabled Personal Assistant for Pc using Python,”International Journal of Engineering and Advanced
Technology (IJEAT), April 2021.
[9] Chen, X., Liu, C., & Guo, W. (2020). A Survey on Voice Assistant Systems. IEEE Access, 8,
27056-27070