ATARC AIDA Guidebook - FINAL 3v
ATARC AIDA Guidebook - FINAL 3v
ATARC AIDA Guidebook - FINAL 3v
Page 14
Artificial Intelligence and Data Analytics (AIDA) Guidebook
Pipeline steps:
Page 15
Artificial Intelligence and Data Analytics (AIDA) Guidebook
code used to create the model. The machine- the next word or words in a text based on
learning model is the combination of the data the preceding words, it’s part of the
and the code, which is refined through technology that predicts the next word you
continuous integration, continuous want to type on your mobile phone allowing
deployment, and continuous training of the you to complete the message faster.
model.
While a DevOps code may be relatively set once developed, machine learning’s challenge is
how to keep code up to date with data while they change in parallel. Model accuracy and
Page 16
Artificial Intelligence and Data Analytics (AIDA) Guidebook
resulting decisions can degrade with time due to data drifts and organizational overconfidence
in the model. Machine learning is not a one-and-done process, creating an algorithm that is
infallible for all time, but an ongoing and indeed constant evolution, where the AI algorithms
repeatedly encounter new data and modify themselves to account for it. To counter this,
organizations use continuous integration (i.e., merging code changes into a central repository),
continuous deployment (i.e., using automated testing to validate if changes to a codebase are
correct and stable), continuous training (i.e., testing of the model’s validity), and a human
element in the development loop.
Conducting these additional steps is what differentiates DevOps from MLOps, democratizing
and streamlining the analytics process. On the technical side, MLOps bypasses the bottlenecks
in the deployment process, i.e., between machine learning design and implementation or
deployment framework. Strategically, MLOps makes machine learning accessible to those with
less data and coding expertise. Additionally, an organization may benefit by exposing the
quantitative rigor to qualitative subject matter, and by combining strategy and tactics to work
together. This is important since only 13% of machine learning projects10 make it into
production due to a lack of organizational engagement.
There are risks to MLOps in addition to the benefits stated above. MLOps may oversimplify the
development process, cloaking intermediate steps, which may pose a challenge to those with
less data and coding expertise. This may lead to downstream impacts if the code & data fall out
of alignment. Developers often weigh the risks and rewards of MLOps, asking questions such as:
Each organization will then identify acceptable risk level when determining how to proceed.
Additionally, organizations must often consider how tightly they link operational SMEs and data
or modeling SMEs; if model accuracy monitoring includes ethical metrics (race, gender, etc.);
and maintaining an organizational culture of respect for all contributors’ expertise.
Infrastructure SMEs manage the CI/CD technical side, working closely with other partners on
CT. Data SMEs understand operational SMEs as well as manage the CI/CD data side and work
closely with other partners on CT. Operational SME coordinates with data SMEs and are
responsible for proactively engaging data and infrastructure SMEs on CT.
10
https://venturebeat.com/2019/07/19/why-do-87-of-data-science-projects-never-make-it-into-production/
Page 17
Artificial Intelligence and Data Analytics (AIDA) Guidebook
Data encryption is one of the simplest technologies to implement to secure data in transit and
at rest. Encryption, which entails the process of converting information or data into a code, is a
key element in data security. Prior to transporting sensitive information, businesses generally
choose to encrypt the data so that it may be protected during transmission. There are several
methods for doing this.
There are connection-level encryption schemes that can be enforced, and the most widely used
types of encryption are connections using Hypertext Transfer Protocol Secure (HTTPS),
Transport Layer Security (TLS), and File Transfer Protocol Secure (FTPS). HTTPS is encrypted in
Page 18
Artificial Intelligence and Data Analytics (AIDA) Guidebook
order to increase security of data transfer. This is particularly important when users transmit
sensitive data, such as by logging into a bank account, email service, or health insurance
provider. Any website, especially those that require login credentials, often uses HTTPS. A
primary use case of TLS is encrypting the communication between web applications and
servers, such as web browsers loading a website. TLS can also be used to encrypt other
communications such as email, messaging, and voice over IP (VoIP). FTPS is a secure file
transfer protocol that allows businesses to connect securely with their trading partners, users,
and customers. Sent files are exchanged through FTPS and authenticated by FTPS supported
applications such as client certificates and server identities.
When compared to the data in transit, data at rest is generally harder to access, which means
that oftentimes private information, such as health records, are stored this way. Making the
interception of this data more valuable to hackers and more consequential for victims of cyber-
attacks. Despite the greater security, there is still a risk of this data being intercepted by
hackers through cyber-attacks, potentially causing private information such as addresses and
financial records to be released, putting an individual’s safety at risk. Protecting all sensitive
data, whether in motion or at rest, is imperative for modern enterprises as attackers find
increasingly innovative ways to compromise systems and steal data.
If the data must be protected for many years, one should make sure that the encryption
scheme used is quantum-safe. Current publically available quantum computers are not
powerful enough to threaten current encryption methods. However, as quantum processors
advance, this could change. Most current public-key encryption methods (where different keys
are used for encryption and decryption) could be broken with a powerful enough quantum
computer. On the other hand, most current symmetric cryptographic algorithms (where the
encryption and decryption keys are the same) are not susceptible to quantum attacks,
assuming the keys are sufficiently long.11
For applications where confidentiality of the data in use is of utmost importance, additional
technologies could be used. When one wants to keep the data private even while it is being
processed, there are a number of technologies that can be employed independently or, in some
cases, even together. These include homomorphic encryption, differential privacy, federated
computing, and synthetic data. Homomorphic encryption is a technique that allows operations
to be performed on encrypted data without decrypting it.12 This permits the confidential
processing of data on a system that is untrusted. The results of the computation can only be
only decrypted with the original key. The biggest barrier to widespread use of homomorphic
encryption has been its poor performance. It is significantly slower than performing the
11
http://www.pqcrypto.org/www.springer.com/cda/content/document/cda_downloaddocument/9783540887010
-c1.pdf
12
See https://eprint.iacr.org/2015/1192 for an overview of homomorphic encryption and related technologies
Page 19