Dissertation

Dissertation
issertation of Multimedia
edia
Name: BISENBAY GULDARIYA
Student ID: 2019380130
Data：2020.11
2020.11.21.
Title: How will the AI technology be
applied in the future multimedia
application?
Abstract:
Multimedia is the field concerned with the computer controlled integration of
graphics, drawings, still and moving images (Video), text, animation, audio, and
many other medias where every type of information can be signified, stored, illustrated,
communicated and handled digitally. Multimedia can be played or displayed, and
recorded, interacted with or accessed by information satisfied processing devices, such
as high-tech and automated devices, but it can also be part of a live presentation.
Multimedia devices are electronic media strategies used to store and involve
multimedia content. Multimedia is notable from diverse media in fine art; for model, by
containing audio it has a broader scope. Due to the large emerging multimedia
applications and the services in the past decade huge amount of multimedia data has
been made in order to advance research in multimedia. Furthermore, multimedia
researches had brought great improvements in image/video content analysis such as
multimedia content delivery, multimedia search, multimedia search and
recommendations etc. At the same time, in 1950s Artificial Intelligence (AI) was
officially confirmed as an academic discipline and it went through completely “new”
development. It improved AI technologies, especially brought extreme success to
deep learning.
Multimedia applications are special domain which combines different content forms.
In the past 10 years, we were lucky to experience revolutionary changes in both
multimedia’s software and hardware. Owing to these great changes, daily used
gadgets such as telephones or computer systems and digital cameras are equipped
with optical scanners that can “hear” and “see” objects around it. Using these
“abilities”, it can “say” and “display” data to people on website and etc. Rapid change
of society and the quality of life, enhancement of technology and science in a short
period of time, clarifies that our “Future” will be completely different. All of the
accomplishments that mankind reached and changes that technologies brought, clearly
shows that our future will be bounded with Artificial intelligence technologies and
multimedia.
Key words: Multimedia; Artificial intelligence and its technologies;

Future; Multimedia applications.
Introduction:
Essential development of science and technologies for the past years shows huge
progress and changes in Artificial Intelligence and Multimedia technologies. Changes
that they brought in human lives are dynamically noticeable. These changes can
advance different industries and many other practices, for instance, health care,
education, customer or emergency services, transport or agriculture industries etc.
Even though, the concept of multimedia was invented in 1884 by Paul Nipkow who at
the age of 24 created first video disc, the term “Multimedia” had been changed for
multiple times into different meanings. However, only in 1990s “Multimedia” took on
its current meaning. Multimedia is an interactive media and it provides multiple ways
to present the information to user with the help of combination of text, video, audio,
animation and graphics. Multimedia itself is divided into software and hardware. The
software tells hardware what to do, such as display the color or play the sound and etc.
Also, hardware components are needed to show software’s requests and view and
develop multimedia applications. Rapid change of the society and development of
science could combine multimedia and artificial intelligence. Artificial intelligence
(AI) is combination of computer, science and engineering that makes the system or
machines perform the human functions independently, finding the solutions for given
problems and be able to make decisions. Nowadays, many multimedia tools and
applications are involved when organizing the events and activities, even bounded in
humans’ daily life. Therefore, multimedia, artificial intelligence and AI technologies
have significant impact in the future society and industries. But what we are interested
in is that what will be multimedia in the future and how AI technology will apply in
the future multimedia applications?
By its nature, multimedia is multidisciplinary, relating to many scientific fields
including signal processing, computer vision, database, network, middleware,
human-computer interaction, social science, and the humanities. In order to be able to
give proper vision for “Future multimedia applications”, first we need to know
elements, tools and finally, understand where we could use multimedia applications.
1. Multimedia Elements
Each Multimedia element has different but important tasks to do in computer
system.
 Audio
 Video
 Text
 Graphics
 Animation
 Interactivity
1. 1. Multimedia Tools and Applications
A Multimedia Application is an application which uses a multiple media foundations
for example, images, text, graphics, sound or audio, animation and/or video.
Multimedia conference covers the certain tools functional in multimedia systems and
key multimedia applications. It encompasses of audio, video dispensation, Virtual
reality or known as VR and 3-D imaging, Virtual reality and 3-D imaging, Multimedia
and Artificial Intelligence. Multimedia Applications is the conception of exciting and
innovative multimedia systems that connect information modified to the user in a
non-linear communicating format. Multimedia conference deliberates the basic and
novel features of multimedia document handling, programming, security, human
computer interfaces and multimedia application facilities:
 Virtual Reality (VR) and 3-D imaging
 Wireless, Mobile Computing
 Animation and Graphics
 Audio, video processing
 Education and training
 Visual Communication
 Multimedia analysis and Internet
 Artificial Intelligence and AI technologies
2. AI Technology - Machine Learning
Machine learning is a branch of Artificial Intelligence focused on building
applications that learn from data and improve their accuracy over time without being
programmed to do so. In data science, an algorithm is a sequence of statistical
processing steps. In machine learning, algorithms are 'trained' to find patterns and
features in large amounts of data in order to make decisions and predictions based on
new data. The better the algorithm, the more accurate the decisions and predictions will
become as it processes more data. By 2020 applications of Machine learning is large
and looking around, we notice that Machine Learning had already entered our daily life
style. Digital assistants search the web and play music in response to our voice
commands. Websites recommend products and movies and songs based on what we
bought, watched, or listened to before. Robots vacuum our floors while we are busy
with something else or spend time on something else. Spam detectors stop unwanted
emails from reaching our inboxes. Even in medicine field, medical image analysis
systems help doctors can spot tumors they might have missed. The usage of AI is so
much that it reached the point that the first self-driving cars are already being released.
However, this is not the limit and we can expect more. As big data keeps getting bigger,
as computing becomes more powerful and affordable, and as data scientists keep
developing more capable algorithms, machine learning will drive greater and greater
efficiency in our personal and work lives.
2. 1. Applying Machine Learning to Multimedia Application:
–Video/Audio Production
There are many options for machine learning applications in live video scenarios of all
sizes, including large multi-camera events, smaller single-camera livestreams, and even
lectures at educational institutions. Many different kinds of software programs can
adopt machine learning, such as video production apps, video animation tools, and
encoding software within live production systems like Pearl-2 and Pearl Mini. Here are
11 machine learning applications for video production and I discussed the possibilities
of this exciting new technology and how it may soon apply to professional live video
gear.
1. Simplified virtual studios.
The first one in the list of machine learning applications for video production is virtual
studios. A virtual studio combines objects and real people with digital, computer –
generated environments to imitate a production studio. The advantage of virtual studio
is that it lets you create spectacular and futuristic studio productions in more
economical price than physically building the model. However, configuring a virtual
studio requires technical skill and know ledge on a high level and contribution
significant amount of time. Machine learning can streamline the virtual set process by
automatically adding digital elements (or removing physical elements) based on
detected visuals, such as shapes, depth of field, and static or dynamic images.
Machine learning possibilities:
 Sense and learn visual information on the physical set;
 Automatically remove or include virtual elements based on learned visuals;
 Streamline the virtual set creation process;
2. Comment integration and aggregation
Live events and video broadcasts can be watched in various platforms, such as
Instagram, Tiktok, YouTube, and other CDNs. Viewers will watch on a social media
of their liking, with many viewers preferring one platform over another. However,
this behavior has a problem: the viewer comments and discussion become divided
between platforms. Keeping all relevant comments in one place dramatically boosts
engagement levels while making it more convenient for moderators in responding to
viewer comments.
Machine learning can be applied to gather social media comments into the stream,
allowing hosts give feedback to comments in real time, and automatically route
replies to the appropriate social platform. Machine learning can also simplify the
addition of dynamic content to a livestream, such as a Twitter hashtag conversation or
news feed discussion. The code can detect and learn relevant keywords across specific
digital media channels and dynamically include the content into the livestream.
 Aggregate viewer comments into the livestream across all video platforms;
 Learn keywords relevant to the live events (e.g. hashtags(#), at(@) or the
name of the event);
 Monitor online discussions across specified channels (e.g. Twitter, Tiktok,
Facebook, YouTube, news and media sites) and dynamically include content
into the livestream;
3. Indexing using transcription, visual cues, and OCR
Indexing allows the viewer to quickly locate a desired spot in the presentation or
lecture without the need to dig around manually. This is inconvenient for longer
presentations and lectures where the viewer may want to revisit key moments or
essential learning topics in the video. Machine learning can index a live video using a
few different methods:
 Visual/audio cues: Alternatively, a recorded lecture or live event can be
indexed based on visual or audio cues, such as audience applause, a slide
change, or a new speaker on the stage.
 Audio transcription: Audio can be manually transcribed into text data, but for
this process need to invest significant time and human effort.
 Optical Character Recognition: Optical Character Recognition, or OCR lets
you convert a variety of documents, such as scanned paper documents, PDF
files, or digital images, into searchable text data. This data can then be indexed,
allowing readers to easily locate specific information within a document or
media file.
Machine learning can help automate each of these video indexing methods, helping
to save tremendous costs by reducing the need for manual transcription. Human
operators can instead use their time to verify transcribed/converted text, therefore
helping the software learn new words and correct any grammar issues.
 Convert audio into text and index key points in the VOD based on transcribed
text;
 Convert overlays, lower thirds, and other on-screen text into searchable data
with OCR and automatically index key points in the video;
 Learn specific visual and audio cues (e.g. applause, detection of a presenter’s
face) and automatically create index entry when cues are detected in the video;
4. Intelligent live switching
To create a truly likeable live production or lecture, one must swap between multiple
video sources or custom layouts. It will help emphasize important parts of the
presentation while also getting the viewer’s attention. However, this process needs to
be done manually. Frequent switching may also prove tricky for smaller livestreams
with minimal staff on hand. These smaller groups might miss out on a valuable
opportunity to create a dynamic live video experience for viewers.
Machine learning can be applied to current encoding technology to help automate
the process based on visual or verbal cues, such as presenter movement, gestures, or
audience applause. Is a speaker telling a personal anecdote? Switch to the camera
view. Is the speaker explaining a concept in the presentation slides? Switch to the
slide view. Machine learning can create an engaging switched live production with
minimal effort required by the presenter or AV techs.
 Learn visual and audio cues for each video source or layout;
 Switch to each video source or layout based on learned cues;
 Help create a fully switched live production or lecture with minimal overhead
cost;
5. Dynamic image calibration
Live streams and recordings require optimally-calibrated picture settings (such as
white balance and exposure) to achieve a clearly visible presentation for viewers.
Picture calibration can be acomplicated process, particularly when environmental
factors are subject to change (such as lighting), or when users lack the expertise to
make the necessary adjustments. Machine learning can streamline the calibration
process by detecting current picture settings and making modifications to improve
picture quality.
 Detect current picture settings;
 Learn optimal picture settings to achieve the best possible shot;
 Make suggestions to improve current picture (or even configure settings
automatically!);
6. Automated audio optimization
Halfway down our list of machine learning applications for video production
is automated audio optimization. High-quality audio is essential when live
streaming and recording a live presentation or lecture. Without clear audio, viewers
are unable to fully experience or understand the presentation. For the average
non-technical presenter or lecturer however, audio problems bring difficulties to
resolve quickly, such as inaudible/distorted volume or troubleshooting microphone
problems. These issues often require the assistance of AV technicians to diagnose and
resolve the problem before the presentation can proceed which is always not
convenient. Machine learning can be used to keep a watchful eye on audio and
automatically make adjustments to ensure maximum audio quality. Technicians could
be notified only if critical audio issues are detected.
 Streamline the audio diagnostic process;
 Indicate to technicians when there is an audio issue to address;
 Ensure high-quality audio is available at all times;
7. Smarter presenter tracking

There times when lecture or live event usually includes one or more presenters who
are often the focus of the audience’s attention. Presenters will enter and leave the
frame and move around the stage as they present their material. In many cases,
tracking the speaker with the camera as they move helps create a more engaging
presentation overall. However, the tracking would need to be done manually with a
human camera operator, which can be costly for smaller live productions or lectures.
Machine learning can be applied in this situation to learn presenter faces and
automatically track presenter movements without the need for manual camera
operation. As the presenter moves around the stage, the camera automatically
repositions itself in real time to ensure the presenter clearly visible within the frame.
 Learn the presenter’s face and track presenter movements;
 Remember the presenter’s face when as they move in and out of the frame;
 Distinguish presenter from other people who move in and out of the frame;
8. Abridged videos
During the lectures or livestream sometimes there are moments, such as changing
speakers, delays in setting up presentation material, trivial technical errors, etc. A
recorded presentation allows technicians to the opportunity to remove any such
downtime and create a polished and professional final product for viewers. Machine
learning can help automate this process by identifying and removing gaps in the
recorded content in post-production.

 Learn visual and audio cues based on specified parameters, (e.g. greater than 10
seconds of silence or the presenter disappearing from the stage);
 Automatically remove identified gaps from the final product in post-production;
 Helps save time and effort on routine post-production tasks for video editors in a
high-volume video setting;
9. Streamlined recording control

When recording a lecture, the presenter needs to manually operate the encoding
system to initiate recording at the beginning and end of the presentation. While this
task is relatively simple, machine learning presents an opportunity to automate the
process and allow lecturers to focus on what they do best: teaching. Machine learning
technology can simplify recording control by automatically detecting the beginning
and end of each lecture. For example, machine learning can start recording using
environment cues, such as when the room lights are turned on, when audio is detected,
when someone enters the stage, etc.

 Learn presenter’s face, environmental lighting, presentation material, and other
audiovisual cues;
 Initiate and end recording when learned cues are detected;
10. Automated lower thirds

Lower thirds are graphics, animations, or text overlays that are used in live video to
engage viewers and convey a message or other contextual information, such as a
presenter’s name or title. Created using special video editing software (such
as NewBlueFX), lower thirds can be applied in real time or can be manually
configured in post-production to appear at key moments during the presentation.
Machine learning in video editing applications can be used to recognize speaker faces
and other visual cues and automatically display the appropriate overlays without the
need for manual intervention by video editors.

 Learn visual and audio cues, such as each presenter’s face as they enter the frame;
 Automatically display relevant lower thirds information based on learned cues;
11. Highlight reels

The last in the list of machine learning applications for video production is highlight
reels. A recorded presentation can be repurposed as marketing collateral by editing
the original material to contain only the presentation highlights, such as a speaker’s
key points or important moments in the event. Machine learning can be applied to
automatically search for and isolate key moments in the recorded video(s) using
visual (e.g. transcribed text) and audio cues (e.g. audience applause). The code can
help create a highlight reel based on these isolated clips for the video editors to review.
This is particularly helpful for video editors in saving time and effort on routine
post-production tasks in a high-volume video setting.
 Learn visual and audio cues that correspond to an important moment, such as
audience applause or keywords within transcribed text;
 Automatically isolate video clips based on learned cues;
 Use clips together to form a highlight reel;
2.2. Possible Machine Learning Video Applications of the Future
First of all Machine Learning useful as much as for big companies/government to
startup, since things that are done by humans physically, will be done by machines. As
I have analyzed, in the future Machine Learning Video Applications might be:
i. Security: One of the important parts about cloud is access control. For
example, prevent the possibility of employees making mistakes that could
accidentally reveal private content. A “machine learning security guard” could
protect against such issues by detecting protected content which was published
already before anyone else notices.
ii. Content rights: Another problem for video providers is when someone
publishes videos of copyrighted content online with tricks that avoid watermarks
or content filters; for example, slowing the frame rate by one frame per second.
Imagine a machine learning solution that has studio scripts or rights holders
content loaded into it; with this data, the solution can scan the web for new content
and recognize dialogue and audio that matches a script.
Conclusion
Summing what is written above, we can see by combining machine learning
technology with live video solutions presents endless possibilities for automating,
streamlining, and personalizing your live streams and recordings. Whether you’re a
content creator, AV technician for an educational institution, or live event specialist,
machine learning can help improve your live video experience. With rapid
advancement of science and technology and with investment of scholars to explore
more AI and its technologies like machine learning applying to multimedia
applications, I it will open new branches and bring more convenience to human life.
References:
 “A survey on Chromecasat digital device” Journal of Emerging
Technologies and Innovative Research (ISSN: 2349-5162)
Published in Volume 5 Issue 10, October - 2018.
 Wikipedia: Link:https://en.wikipedia.org/wiki/Multimedia.
 Multimedia Applications by Klara Nahrstedt and Ralf Steinmetz.
 IBM: https://www.ibm.com/cloud/learn/machine-learning.
 Epiphan:
https://www.epiphan.com/blog/machine-learning-applications/.
 Wikipedia: https://en.wikipedia.org/wiki/Artificial_intelligence.
 https://aws.amazon.com/media/tech/machine-learning-for-media-a
pplications/?nc1=h_ls

Dissertation

Uploaded by

Copyright:

Available Formats

Dissertation

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Dissertation

Uploaded by

Copyright:

Available Formats

Dissertation

Student ID: 2019380130

Key words: Multimedia; Artificial intelligence and its technologies;

7. Smarter presenter tracking

Machine learning possibilities:

9. Streamlined recording control

Machine learning possibilities:

10. Automated lower thirds

Machine learning possibilities:

11. Highlight reels

Technologies and Innovative Research (ISSN: 2349-5162)

Published in Volume 5 Issue 10, October - 2018.

 Multimedia Applications by Klara Nahrstedt and Ralf Steinmetz.

You might also like