Dissertation
Dissertation
Dissertation
issertation of Multimedia
edia
Name: BISENBAY GULDARIYA
Data:2020.11
2020.11.21.
Title: How will the AI technology be
applied in the future multimedia
application?
Abstract:
Multimedia is the field concerned with the computer controlled integration of
graphics, drawings, still and moving images (Video), text, animation, audio, and
many other medias where every type of information can be signified, stored, illustrated,
communicated and handled digitally. Multimedia can be played or displayed, and
recorded, interacted with or accessed by information satisfied processing devices, such
as high-tech and automated devices, but it can also be part of a live presentation.
Multimedia devices are electronic media strategies used to store and involve
multimedia content. Multimedia is notable from diverse media in fine art; for model, by
containing audio it has a broader scope. Due to the large emerging multimedia
applications and the services in the past decade huge amount of multimedia data has
been made in order to advance research in multimedia. Furthermore, multimedia
researches had brought great improvements in image/video content analysis such as
multimedia content delivery, multimedia search, multimedia search and
recommendations etc. At the same time, in 1950s Artificial Intelligence (AI) was
officially confirmed as an academic discipline and it went through completely “new”
development. It improved AI technologies, especially brought extreme success to
deep learning.
Multimedia applications are special domain which combines different content forms.
In the past 10 years, we were lucky to experience revolutionary changes in both
multimedia’s software and hardware. Owing to these great changes, daily used
gadgets such as telephones or computer systems and digital cameras are equipped
with optical scanners that can “hear” and “see” objects around it. Using these
“abilities”, it can “say” and “display” data to people on website and etc. Rapid change
of society and the quality of life, enhancement of technology and science in a short
period of time, clarifies that our “Future” will be completely different. All of the
accomplishments that mankind reached and changes that technologies brought, clearly
shows that our future will be bounded with Artificial intelligence technologies and
multimedia.
–Video/Audio Production
There are many options for machine learning applications in live video scenarios of all
sizes, including large multi-camera events, smaller single-camera livestreams, and even
lectures at educational institutions. Many different kinds of software programs can
adopt machine learning, such as video production apps, video animation tools, and
encoding software within live production systems like Pearl-2 and Pearl Mini. Here are
11 machine learning applications for video production and I discussed the possibilities
of this exciting new technology and how it may soon apply to professional live video
gear.
1. Simplified virtual studios.
The first one in the list of machine learning applications for video production is virtual
studios. A virtual studio combines objects and real people with digital, computer –
generated environments to imitate a production studio. The advantage of virtual studio
is that it lets you create spectacular and futuristic studio productions in more
economical price than physically building the model. However, configuring a virtual
studio requires technical skill and know ledge on a high level and contribution
significant amount of time. Machine learning can streamline the virtual set process by
automatically adding digital elements (or removing physical elements) based on
detected visuals, such as shapes, depth of field, and static or dynamic images.
Machine learning possibilities:
Sense and learn visual information on the physical set;
Automatically remove or include virtual elements based on learned visuals;
Streamline the virtual set creation process;
2. Comment integration and aggregation
Live events and video broadcasts can be watched in various platforms, such as
Instagram, Tiktok, YouTube, and other CDNs. Viewers will watch on a social media
of their liking, with many viewers preferring one platform over another. However,
this behavior has a problem: the viewer comments and discussion become divided
between platforms. Keeping all relevant comments in one place dramatically boosts
engagement levels while making it more convenient for moderators in responding to
viewer comments.
Machine learning can be applied to gather social media comments into the stream,
allowing hosts give feedback to comments in real time, and automatically route
replies to the appropriate social platform. Machine learning can also simplify the
addition of dynamic content to a livestream, such as a Twitter hashtag conversation or
news feed discussion. The code can detect and learn relevant keywords across specific
digital media channels and dynamically include the content into the livestream.
Machine learning possibilities:
Aggregate viewer comments into the livestream across all video platforms;
Learn keywords relevant to the live events (e.g. hashtags(#), at(@) or the
name of the event);
Monitor online discussions across specified channels (e.g. Twitter, Tiktok,
Facebook, YouTube, news and media sites) and dynamically include content
into the livestream;
3. Indexing using transcription, visual cues, and OCR
Indexing allows the viewer to quickly locate a desired spot in the presentation or
lecture without the need to dig around manually. This is inconvenient for longer
presentations and lectures where the viewer may want to revisit key moments or
essential learning topics in the video. Machine learning can index a live video using a
few different methods:
Visual/audio cues: Alternatively, a recorded lecture or live event can be
indexed based on visual or audio cues, such as audience applause, a slide
change, or a new speaker on the stage.
Audio transcription: Audio can be manually transcribed into text data, but for
this process need to invest significant time and human effort.
Optical Character Recognition: Optical Character Recognition, or OCR lets
you convert a variety of documents, such as scanned paper documents, PDF
files, or digital images, into searchable text data. This data can then be indexed,
allowing readers to easily locate specific information within a document or
media file.
Machine learning can help automate each of these video indexing methods, helping
to save tremendous costs by reducing the need for manual transcription. Human
operators can instead use their time to verify transcribed/converted text, therefore
helping the software learn new words and correct any grammar issues.
Machine learning possibilities:
Convert audio into text and index key points in the VOD based on transcribed
text;
Convert overlays, lower thirds, and other on-screen text into searchable data
with OCR and automatically index key points in the video;
Learn specific visual and audio cues (e.g. applause, detection of a presenter’s
face) and automatically create index entry when cues are detected in the video;
4. Intelligent live switching
To create a truly likeable live production or lecture, one must swap between multiple
video sources or custom layouts. It will help emphasize important parts of the
presentation while also getting the viewer’s attention. However, this process needs to
be done manually. Frequent switching may also prove tricky for smaller livestreams
with minimal staff on hand. These smaller groups might miss out on a valuable
opportunity to create a dynamic live video experience for viewers.
Machine learning can be applied to current encoding technology to help automate
the process based on visual or verbal cues, such as presenter movement, gestures, or
audience applause. Is a speaker telling a personal anecdote? Switch to the camera
view. Is the speaker explaining a concept in the presentation slides? Switch to the
slide view. Machine learning can create an engaging switched live production with
minimal effort required by the presenter or AV techs.
Machine learning possibilities:
Learn visual and audio cues for each video source or layout;
Switch to each video source or layout based on learned cues;
Help create a fully switched live production or lecture with minimal overhead
cost;
5. Dynamic image calibration
Live streams and recordings require optimally-calibrated picture settings (such as
white balance and exposure) to achieve a clearly visible presentation for viewers.
Picture calibration can be acomplicated process, particularly when environmental
factors are subject to change (such as lighting), or when users lack the expertise to
make the necessary adjustments. Machine learning can streamline the calibration
process by detecting current picture settings and making modifications to improve
picture quality.
Machine learning possibilities:
Detect current picture settings;
Learn optimal picture settings to achieve the best possible shot;
Make suggestions to improve current picture (or even configure settings
automatically!);
6. Automated audio optimization
Halfway down our list of machine learning applications for video production
is automated audio optimization. High-quality audio is essential when live
streaming and recording a live presentation or lecture. Without clear audio, viewers
are unable to fully experience or understand the presentation. For the average
non-technical presenter or lecturer however, audio problems bring difficulties to
resolve quickly, such as inaudible/distorted volume or troubleshooting microphone
problems. These issues often require the assistance of AV technicians to diagnose and
resolve the problem before the presentation can proceed which is always not
convenient. Machine learning can be used to keep a watchful eye on audio and
automatically make adjustments to ensure maximum audio quality. Technicians could
be notified only if critical audio issues are detected.
Machine learning possibilities:
Streamline the audio diagnostic process;
Indicate to technicians when there is an audio issue to address;
Ensure high-quality audio is available at all times;
8. Abridged videos
During the lectures or livestream sometimes there are moments, such as changing
speakers, delays in setting up presentation material, trivial technical errors, etc. A
recorded presentation allows technicians to the opportunity to remove any such
downtime and create a polished and professional final product for viewers. Machine
learning can help automate this process by identifying and removing gaps in the
recorded content in post-production.
References:
“A survey on Chromecasat digital device” Journal of Emerging
Wikipedia: Link:https://en.wikipedia.org/wiki/Multimedia.
IBM: https://www.ibm.com/cloud/learn/machine-learning.
Epiphan:
https://www.epiphan.com/blog/machine-learning-applications/.
Wikipedia: https://en.wikipedia.org/wiki/Artificial_intelligence.
https://aws.amazon.com/media/tech/machine-learning-for-media-a
pplications/?nc1=h_ls