AI ClassXII Study Materials
AI ClassXII Study Materials
AI ClassXII Study Materials
CLASS XII
______________________________________________
2
Unit 1: Capstone Project
A capstone project is a project where students must research a topic independently to find a deep
understanding of the subject matter. It gives an opportunity for the student to integrate all their
knowledge and demonstrate it through a comprehensive project.
So, without further ado, let’s jump straight into some Capstone project ideas that will strengthen your
base
The list can is huge but these are some simple projects, which you can consider to pick up to develop.
3
1. Understanding The Problem
Artificial Intelligence is perhaps the most transformative technology available today. At a high level, every
AI project follows the following six steps:
2) Data gathering
3) Feature definition
4) AI model construction
6) Deployment
In this section, I will share the best practices for the first step: “understanding the problem”.
Begin formulating your problem by asking yourself this simple — is there a pattern? The premise that
underlies all Machine Learning disciplines is that there needs to be a pattern. If there is no pattern, then the
problem cannot be solved with AI technology. It is fundamental that this question is asked before deciding
to embark on an AI development journey.
(If it is believed that there is a pattern in the data, then AI development techniques may be employed , else Don’t apply AI techniques to
solve the problem.
4
If it is believed that there is a pattern in the data, then AI development techniques may be employed.
Applied uses of these techniques are typically geared towards answering five types of questions, all of
which may be categorized as being within the umbrella of predictive analysis:
It is important to determine which of these questions you’re asking, and how answering it helps you
solve your problem.
Project 1:
Form a team of 4-5 students. And submit a detailed report on the most critical problems and how AI can
assist in addressing those problems. Report should include description of the problem and the proposed
way in which AI can solve the problem.
1. Agriculture in India
3. Healthcare in India
The five stages of Design Thinking are as follows: Empathize, Define, Ideate, Prototype, and Test.
5
Real computational tasks are complicated. To accomplish them you need to break down the problem into
smaller units before coding.
1. Understand the problem and then restate the problem in your own words
Know what the desired inputs and outputs are
Ask questions for clarification (in class these questions might be to your instructor, but
most of the time they will be asking either yourself or your collaborators)
2. Break the problem down into a few large pieces. Write these down, either on paper or as
comments in a file.
3. Break complicated pieces down into smaller pieces. Keep doing this until all of the pieces are
small.
4. Code one small piece at a time.
1. Think about how to implement it
2. Write the code/query
3. Test it… on its own.
4. Fix problems, if any
Data
length width height
1 2 3
2 4 3
Example 2: Imagine that you want to create your first app. This is a complex problem. How would you
decompose the task of creating an app?
To decompose this task, you would need to know the answer to a series of smaller problems:
This list has broken down the complex problem of creating an app into much simpler problems that can
now be worked out. You may also be able to get other people to help you with different individual parts
of the app. For example, you may have a friend who can create the graphics, while another will be your
test the app.
6
Example 3: (For Advance learners)
Time series decomposition involves thinking of a series as a combination of level, trend, seasonality, and
noise components. Decomposition provides a useful abstract model for thinking about time series
generally and for better understanding problems during time series analysis and forecasting.
The Airline Passengers dataset describes the total number of airline passengers over a period of time.
The units are a count of the number of airline passengers in thousands. There are 144 monthly
observations from 1949 to 1960.
Download the dataset to your current working directory with the filename “airline-passengers.csv“.
series.plot()
pyplot.show()
7
Reviewing the line plot, it suggests that there may be a linear trend, but it is hard to be sure from eye-
balling. There is also seasonality, but the amplitude (height) of the cycles appears to be increasing,
suggesting that it is multiplicative.
The example below decomposes the airline passenger’s dataset as a multiplicative model.
from pandas import read_csv
result.plot()
pyplot.show()
Running the example plots the observed, trend, seasonal, and residual time series.
We can see that the trend and seasonality information extracted from the series does seem
reasonable. The residuals are also interesting, showing periods of high variability in the early and later
years of the series.
CLASS XII - LEVEL 3: AI INNOVATE AI TEACHER INSTRUCTION MANUAL
9
CLASS XII - LEVEL 3: AI INNOVATE AI TEACHER INSTRUCTION MANUAL
3.Analytic Approach
Those who work in the domain of AI and Machine Learning solve problems and answer questions
through data every day. They build models to predict outcomes or discover underlying patterns, all to
gain insights leading to actions that will improve future outcomes.
Every project, regardless of its size, starts with business understanding, which lays the foundation for
successful resolution of the business problem. The business sponsors needing the analytic solution play
the critical role in this stage by defining the problem, project objectives and solution requirements from
a business perspective. And, believe it or not—even with nine stages still to go—this first stage is the
hardest.
After clearly stating a business problem, the data scientist can define the analytic approach to solving
it. Doing so involves expressing the problem in the context of statistical and machine learning
techniques so that the data scientist can identify techniques suitable for achieving the desired outcome.
Selecting the right analytic approach depends on the question being asked. Once the problem to be
addressed is defined, the appropriate analytic approach for the problem is selected in the context of
the business requirements. This is the second stage of the data science methodology.
10
CLASS XII - LEVEL 3: AI INNOVATE AI TEACHER INSTRUCTION MANUAL
If the question is to determine probabilities of an action, then a predictive model might be used.
Statistical analysis applies to problems that require counts: if the question requires a yes/ no
answer, then a classification approach to predicting a response would be suitable.
4. Data Requirement
11
CLASS XII - LEVEL 3: AI INNOVATE AI TEACHER INSTRUCTION MANUAL
If the problem that needs to be resolved is "a recipe", so to speak, and data is "an ingredient", then the
data scientist needs to identify:
Prior to undertaking the data collection and data preparation stages of the methodology, it's vital to
define the data requirements for decision-tree classification. This includes identifying the necessary
data content, formats and sources for initial data collection.
In this phase the data requirements are revised and decisions are made as to whether or not the
collection requires more or less data. Once the data ingredients are collected, the data scientist will
have a good understanding of what they will be working with.
Techniques such as descriptive statistics and visualization can be applied to the data set, to assess the
content, quality, and initial insights about the data. Gaps in data will be identified and plans to either fill
or make substitutions will have to be made.
12
CLASS XII - LEVEL 3: AI INNOVATE AI TEACHER INSTRUCTION MANUAL
5. Modeling Approach
Data Modeling focuses on developing models that are either descriptive or predictive.
An example of a descriptive model might examine things like: if a person did this, then they're
likely to prefer that.
A predictive model tries to yield yes/no, or stop/go type outcomes. These models are based on
the analytic approach that was taken, either statistically driven or machine learning driven.
The data scientist will use a training set for predictive modelling. A training set is a set of historical data
in which the outcomes are already known. The training set acts like a gauge to determine if the model
needs to be calibrated. In this stage, the data scientist will play around with different algorithms to
ensure that the variables in play are actually required.
The success of data compilation, preparation and modelling, depends on the understanding of the
problem at hand, and the appropriate analytical approach being taken. The data supports the
answering of the question, and like the quality of the ingredients in cooking, sets the stage for the
outcome.
Constant refinement, adjustments and tweaking are necessary within each step to ensure the outcome
is one that is solid. The framework is geared to do 3 things:
The end goal is to move the data scientist to a point where a data model can be built to answer the
question.
13
CLASS XII - LEVEL 3: AI INNOVATE AI TEACHER INSTRUCTION MANUAL
It can be used for classification or regression problems and can be used for any supervised learning
algorithm.
The procedure involves taking a dataset and dividing it into two subsets. The first subset is used to fit
the model and is referred to as the training dataset. The second subset is not used to train the model;
instead, the input element of the dataset is provided to the model, then predictions are made and
compared to the expected values. This second dataset is referred to as the test dataset.
The objective is to estimate the performance of the machine learning model on new data: data not used
to train the model.
This is how we expect to use the model in practice. Namely, to fit it on available data with known inputs
and outputs, then make predictions on new examples in the future where we do not have the expected
output or target values.
The train-test procedure is appropriate when there is a sufficiently large dataset available.
You must choose a split percentage that meets your project’s objectives with considerations that
include:
Now that we are familiar with the train-test split model evaluation procedure, let’s look at how we can
use this procedure in Python.
14
CLASS XII - LEVEL 3: AI INNOVATE AI TEACHER INSTRUCTION MANUAL
As we work with datasets, a machine learning model works in two stages. We usually split the data
around 20%-80% between testing and training stages. Under supervised learning, we split a dataset
into a training data and test data in Python ML.
Pandas
Sklearn
We use pandas to import the dataset and sklearn to perform the splitting. You can import these packages
as:
Following are the process of Train and Test set in Python ML. So, let’s take a dataset first.
1. >>> data=pd.read_csv(‘forestfires.csv’)
2. >>> data.head()
15
CLASS XII - LEVEL 3: AI INNOVATE AI TEACHER INSTRUCTION MANUAL
b. Splitting
Let’s split this data into labels and features. Now, what’s that? Using features, we predict labels. I mean
using features (the data we use to predict labels), we predict labels (the data we want to predict).
1. >>> y=data.temp
2. >>> x=data.drop(‘temp’,axis=1)
Temp is a label to predict temperatures in y; we use the drop() function to take all other data in x. Then,
we split the data.
1. >>> x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=0.2)
2. >>> x_train.head()
1. >>> x_train.shape
(413, 12)
1. >>> x_test.head()
1. >>> x_test.shape
(104, 12)
16
CLASS XII - LEVEL 3: AI INNOVATE AI TEACHER INSTRUCTION MANUAL
The line test_size=0.2 suggests that the test data should be 20% of the dataset and the rest should be
train data. With the outputs of the shape () functions, you can see that we have 104 rows in the test data
and 413 in the training data.
We will demonstrate how to use the train-test split to evaluate a random forest algorithm on the housing
dataset.
The housing dataset is a standard machine learning dataset composed of 506 rows of data with 13
numerical input variables and a numerical target variable.
The dataset involves predicting the house price given details of the house is in the suburbs of the
American city of Boston.
You will not need to download the dataset; we will download it automatically as part of our worked
examples. The example below downloads and loads the dataset as a Pandas DataFrame and summarizes
the shape of the dataset.
# load and summarize the housing dataset from pandas import read_csv
# load dataset
url = 'https://raw.githubusercontent.com/jbrownlee/Datasets/master/housing.csv'
# summarize shape
print(dataframe.shape)
Running the example confirms the 506 rows of data and 13 input variables and single numeric target
variables (14 in total).
(506, 14)
1
First, the loaded dataset must be split into input and output components.
...
print(X.shape, y.shape)
17
CLASS XII - LEVEL 3: AI INNOVATE AI TEACHER INSTRUCTION MANUAL
Next, we can split the dataset so that 67 percent is used to train the model and 33 percent is used to
evaluate it. This split was chosen arbitrarily.
...
We can then define and fit the model on the training dataset.
...
model = RandomForestRegressor(random_state=1)
model.fit(X_train, y_train)
Then use the fit model to make predictions and evaluate the predictions using the mean absolute error
(MAE) performance metric.
...
# make predictions
yhat = model.predict(X_test)
# evaluate predictions
# load dataset
url = 'https://raw.githubusercontent.com/jbrownlee/Datasets/master/housing.csv'
data = dataframe.values
18
CLASS XII - LEVEL 3: AI INNOVATE AI TEACHER INSTRUCTION MANUAL
print(X.shape, y.shape)
model = RandomForestRegressor(random_state=1)
model.fit(X_train, y_train)
# make predictions
yhat = model.predict(X_test)
# evaluate predictions
Running the example first loads the dataset and confirms the number of rows in the input and output
elements.
The dataset is split into train and test sets and we can see that there are 339 rows for training and 167
rows for the test set.
Finally, the model is evaluated on the test set and the performance of the model when making predictions
on new data is a mean absolute error of about 2.211 (thousands of dollars).
MAE: 2.157
19
CLASS XII - LEVEL 3: AI INNOVATE AI TEACHER INSTRUCTION MANUAL
You will face choices about predictive variables to use, what types of models to use, what arguments to
supply those models, etc. We make these choices in a data-driven way by measuring model quality of
various alternatives.
You've already learned to use train_test_split to split the data, so you can measure model quality on the
test data. Cross-validation extends this approach to model scoring (or "model validation.") Compared to
train_test_split, cross-validation gives you a more reliable measure of your model's quality, though it
takes longer to run.
Imagine you have a dataset with 5000 rows. The train_test_split function has an argument for test_size
that you can use to decide how many rows go to the training set and how many go to the test set. The
larger the test set, the more reliable your measures of model quality will be. At an extreme, you could
imagine having only 1 row of data in the test set. If you compare alternative models, which one makes
the best predictions on a single data point will be mostly a matter of luck.
You will typically keep about 20% as a test dataset. But even with 1000 rows in the test set, there's some
random chance in determining model scores. A model might do well on one set of 1000 rows, even if it
would be inaccurate on a different 1000 rows. The larger the test set, the less randomness (aka "noise")
there is in our measure of model quality.
20
CLASS XII - LEVEL 3: AI INNOVATE AI TEACHER INSTRUCTION MANUAL
We run an experiment called experiment 1 which uses the first fold as a holdout set, and everything
else as training data. This gives us a measure of model quality based on a 20% holdout set, much as we
got from using the simple train-test split.
We then run a second experiment, where we hold out data from the second fold (using everything
except the 2nd fold for training the model.) This gives us a second estimate of model quality. We repeat
this process, using every fold once as the holdout. Putting this together, 100% of the data is used as a
holdout at some point.
Returning to our example above from train-test split, if we have 5000 rows of data, we end up with a
measure of model quality based on 5000 rows of holdout (even if we don't use all 5000 rows
simultaneously.
Cross-validation gives a more accurate measure of model quality, which is especially important if you
are making a lot of modeling decisions. However, it can take more time to run, because it estimates
models once for each fold. So it is doing more total work.
Given these tradeoffs, when should you use each approach? On small datasets, the extra computational
burden of running cross-validation isn't a big deal. These are also the problems where model quality
scores would be least reliable with train-test split. So, if your dataset is smaller, you should run cross-
validation.
For the same reasons, a simple train-test split is sufficient for larger datasets. It will run faster, and you
may have enough data that there's little need to re-use some of it for holdout.
There's no simple threshold for what constitutes a large vs small dataset. If your model takes a couple
minute or less to run, it's probably worth switching to cross-validation. If your model takes much longer
to run, cross-validation may slow down your workflow more than it's worth.
Alternatively, you can run cross-validation and see if the scores for each experiment seem close. If each
experiment gives the same results, train-test split is probably sufficient.
Example
First we read the data
import pandas as pd
data = pd.read_csv('../input/melb_data.csv')
cols_to_use = ['Rooms', 'Distance', 'Landsize', 'BuildingArea', 'YearBuilt']
X = data[cols_to_use]
y = data.Price
Then specify a pipeline of our modeling steps (It can be very difficult to do cross-validation properly if you
arent't using pipelines)
from sklearn.ensemble import RandomForestRegressor
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import Imputer
my_pipeline = make_pipeline(Imputer(), RandomForestRegressor())
21
CLASS XII - LEVEL 3: AI INNOVATE AI TEACHER INSTRUCTION MANUAL
You may notice that we specified an argument for scoring. This specifies what measure of model quality
to report. The docs for scikit-learn show a list of options.
It is a little surprising that we specify negative mean absolute error in this case. Scikit-learn has a
convention where all metrics are defined so a high number is better. Using negatives here allows them to
be consistent with that convention, though negative MAE is almost unheard of elsewhere.
You typically want a single measure of model quality to compare between models. So we take the
average across experiments.
print('Mean Absolute Error %2f' %(-1 * scores.mean()))
Conclusion
Using cross-validation gave us much better measures of model quality, with the added benefit of cleaning
up our code (no longer needing to keep track of separate train and test sets. So, it's a good win.
Activity 1: Convert the code for your on-going project over from train-test split to cross-validation. Make
sure to remove all code that divides your dataset into training and testing datasets. Leaving code you don't
need any more would be sloppy.
Activity 2: Add or remove a predictor from your models. See the cross-validation score using both sets of
predictors, and see how you can compare the scores.
Knowing how good a set of predictions is, allows you to make estimates about how good a given machine
learning model of your problem,
You must estimate the quality of a set of predictions when training a machine learning model.
Performance metrics like classification accuracy and root mean squared error can give you a clear objective
idea of how good a set of predictions is, and in turn how good the model is that generated them.
This is important as it allows you to tell the difference and select among:
Different transforms of the data used to train the same machine learning model.
Different machine learning models trained on the same data.
Different configurations for a machine learning model trained on the same data.
22
CLASS XII - LEVEL 3: AI INNOVATE AI TEACHER INSTRUCTION MANUAL
As such, performance metrics are a required building block in implementing machine learning
algorithms from scratch.
All the algorithms in machine learning rely on minimizing or maximizing a function, which we call
“objective function”. The group of functions that are minimized are called “loss functions”. A loss
function is a measure of how good a prediction model does in terms of being able to predict the
expected outcome. A most commonly used method of finding the minimum point of function is
“gradient descent”. Think of loss function like undulating mountain and gradient descent is like sliding
down the mountain to reach the bottom most point.
Loss functions can be broadly categorized into 2 types: Classification and Regression Loss.
23
CLASS XII - LEVEL 3: AI INNOVATE AI TEACHER INSTRUCTION MANUAL
Graphically:
As you can see in this scattered graph the red dots are the actual values and the blue line is the set of
predicted values drawn by our model. Here X represents the distance between the actual value and the
predicted line this line represents the error, similarly, we can draw straight lines from each red dot to the
blue line. Taking mean of all those distances and squaring them and finally taking the root will give us RMSE
of our model.
24
CLASS XII - LEVEL 3: AI INNOVATE AI TEACHER INSTRUCTION MANUAL
Example 1 (RMSE)
Let us write a python code to find out RMSE values of our model. We would be predicting the brain
weight of the users. We would be using linear regression to train our model, the data set used in my code
can be downloaded from here: headbrain6-
import time
import numpy as np
import pandas as pd
"""
here the directory of my code and the headbrain6.csv file is same make sure both the files are stored in
same folder or directory
"""
data=pd.read_csv('headbrain6.csv')
data.head()
x=data.iloc[:,2:3].values
y=data.iloc[:,3:4].values
x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=1/4,random_state=0)
regressor=LinearRegression()
regressor.fit(x_train,y_train)
25
CLASS XII - LEVEL 3: AI INNOVATE AI TEACHER INSTRUCTION MANUAL
y_pred=regressor.predict(x_test)
plt.scatter(x_train,y_train,c='red')
plt.show()
plt.plot(x_test,y_pred)
plt.scatter(x_test,y_test,c='red')
plt.xlabel('headsize')
plt.ylabel('brain weight')
for i in range(0,60):
time.sleep(1)
rss=((y_test-y_pred)**2).sum()
mse=np.mean((y_test-y_pred)**2)
26
CLASS XII - LEVEL 3: AI INNOVATE AI TEACHER INSTRUCTION MANUAL
Output
27
CLASS XII - LEVEL 3: AI INNOVATE AI TEACHER INSTRUCTION MANUAL
The RMSE value of our is coming out to be approximately 73 which is not bad. A good model should have
an RMSE value less than 180. In case you have a higher RMSE value, this would mean that you probably
need to change your feature or probably you need to tweak your hyperparameters.
Below is a plot of an MSE function where the true target value is 100, and the predicted values range
between -10,000 to 10,000. The MSE loss (Y-axis) reaches its minimum value at prediction (X-axis) = 100.
The range is 0 to ∞.
28
CLASS XII - LEVEL 3: AI INNOVATE AI TEACHER INSTRUCTION MANUAL
MSE is sensitive towards outliers and given several examples with the same input feature values, the
optimal prediction will be their mean target value. This should be compared with Mean Absolute Error,
where the optimal prediction is the median. MSE is thus good to use if you believe that your target data,
conditioned on the input, is normally distributed around a mean value, and when it’s important to
penalize outliers extra much.
Use MSE when doing regression, believing that your target, conditioned on the input, is normally
distributed, and want large errors to be significantly (quadratically) more penalized than small ones.
Example-1: You want to predict future house prices. The price is a continuous value, and therefore we
want to do regression. MSE can here be used as the loss function.
Example-2:Consider the given data points: (1,1), (2,1), (3,2), (4,2), (5,4)
You can use this online calculator to find the regression equation / line.
x y Yi
1 1 0.6
2 1 1,39
3 2 1.99
4 2 2.69
5 4 3.4
29
CLASS XII - LEVEL 3: AI INNOVATE AI TEACHER INSTRUCTION MANUAL
# Given values
Y_true = [1,1,2,2,4] # Y_true = Y (original values)
# calculated values
Y_pred = [0.6,1.29,1.99,2.69,3.4] # Y_pred = Y'
Output: 0.21606
30
CLASS XII - LEVEL 3: AI INNOVATE AI TEACHER INSTRUCTION MANUAL
ATitle: Model Life cycle Approach: Hands on, Team Discussion, Web
search, Case studies
Summary: The machine learning life cycle is the cyclical process that AI or machine learning projects
follow. It defines each step that an engineer or developer should follow. Generally, every AI project
lifecycle encompasses three main stages: project scoping, design or build phase, and deployment
in production. In this unit we will go over each of them and the key steps and factors to consider
when implementing them.
The expectation out of students would be to focus more on the hands-on aspect of AI projects.
Objectives:
s
1. Students should develop their capstone project using AI project cycle methodologies
2. Students should be comfortable in breaking down their projects into different phases of
AI project cycle
3. Students should be in a position to choose and apply right AI model to solve the problem
Learning Outcomes:
1. Students will demonstrate the skill of breaking down a problem in smaller sub units
according to AI project life cycle methodologies
2. Students will demonstrate proficiency in choosing and applying the correct AI or ML model
c
a
t 31
CLASS XII - LEVEL 3: AI INNOVATE AI TEACHER INSTRUCTION MANUAL
Generally, every AI project lifecycle encompasses three main stages: project scoping, design or build
phase, and deployment in production. Let's go over each of them and the key steps and factors to consider
when implementing them.
(Source : https://blog.dataiku.com/ai-projects-lifecycle-key-steps-and-considerations)
32
CLASS XII - LEVEL 3: AI INNOVATE AI TEACHER INSTRUCTION MANUAL
The first fundamental step when starting an AI initiative is scoping and selecting the relevant use case(s)
that the AI model will be built to address. This is arguably the most important part of your AI project. Why?
There's a couple of reasons for it. First, this stage involves the planning and motivational aspects of your
project. It is important to start strong if you want your artificial intelligence project to be successful. There's
a great phrase that characterizes this project stage: garbage in, garbage out. This means if the data you
collect is no good, you won't be able to build an effective AI algorithm, and your whole project will collapse.
In this phase, it's crucial to precisely define the strategic business objectives and desired outcomes of the
project, select align all the different stakeholders' expectations, anticipate the key resources and steps,
and define the success metrics. Selecting the AI or machine learning use cases and being able to evaluate
the return on investment (ROI) is critical to the success of any data project.
Once the relevant projects have been selected and properly scoped, the next step of the machine learning
lifecycle is the Design or Build phase, which can take from a few days to multiple months, depending on
the nature of the project. The Design phase is essentially an iterative process comprising all the steps
relevant to building the AI or machine learning model: data acquisition, exploration, preparation, cleaning,
feature engineering, testing and running a set of models to try to predict behaviours or discover insights
in the data.
Enabling all the different people involved in the AI project to have the appropriate access to data, tools,
and processes in order to collaborate across different stages of the model building is critical to its success.
Another key success factor to consider is model validation: how will you determine, measure, and evaluate
the performance of each iteration with regards to the defined ROI objective?
During this phase, you need to evaluate the various AI development platforms, e.g.:
Open languages — Python is the most popular, with R and Scala also in the mix.
Approaches and techniques — Classic ML techniques from regression all the way to state-of-the-
art GANs and RL
Development tools — DataRobot, H2O, Watson Studio, Azure ML Studio, Sagemaker, Anaconda,
etc.
33
CLASS XII - LEVEL 3: AI INNOVATE AI TEACHER INSTRUCTION MANUAL
Different AI development platforms offer extensive documentation to help the development teams.
Depending on your choice of the AI platform, you need to visit the appropriate webpages for this
documentation, which are as follows:
BigML;
Step 3: Testing
While the fundamental testing concepts are fully applicable in AI development projects, there are
additional considerations too. These are as follows:
Human biases in selecting test data can adversely impact the testing phase, therefore, data
validation is important.
Your testing team should test the AI and ML algorithms keeping model validation, successful
learnability, and algorithm effectiveness in mind.
Regulatory compliance testing and security testing are important since the system might deal with
sensitive data, moreover, the large volume of data makes performance testing crucial.
You are implementing an AI solution that will need to use data from your other systems, therefore,
systems integration testing assumes importance.
Test data should include all relevant subsets of training data, i.e., the data you will use for training
the AI system.
Your team must create test suites that help you validate your ML models.
34
CLASS XII - LEVEL 3: AI INNOVATE AI TEACHER INSTRUCTION MANUAL
Unit 3: Storytelling
Refreshing what we learnt in Level 1, we will be re-visiting some concepts of storytelling.
Why storytelling is so powerful and cross-cultural, and what this means for data
storytelling?
Stories create engaging experiences that transport the audience to another space and time. They
establish a sense of community belongingness and identity. For these reasons, storytelling is
considered a powerful element that enhances global networking by increasing the awareness
about the cultural differences and enhancing cross-cultural understanding. Storytelling is an
integral part of indigenous cultures.
Some of the factors that make storytelling powerful are its attribute to make information more
compelling, the ability to present a window in order to take a peek at the past, and finally to draw
lessons and to reimagine the future by affecting necessary changes. Storytelling also shapes,
empowers and connects people by doing away with judgement or critic and facilitates openness
for embracing differences.
A well-told story is an inspirational narrative that is crafted to engage the audience across
boundaries and cultures, as they have the impact that isn’t possible with data alone. Data can be
persuasive, but stories are much more. They change the way that we interact with data,
transforming it from a dry collection of “facts” to something that can be entertaining, engaging,
thought provoking, and inspiring change.
Each data point holds some information which maybe unclear and contextually deficient on its
own. The visualizations of such data are therefore, subject to interpretation (and
misinterpretation). However, stories are more likely to drive action than are statistics and
numbers. Therefore, when told in the form of a narrative, it reduces ambiguity, connects data
with context, and describes a specific interpretation – communicating the important messages
in most effective ways. The steps involved in telling an effective data story are given below:
Understanding the audience
Choosing the right data and visualisations
Drawing attention to key information
Developing a narrative
Engaging your audience
35
CLASS XII - LEVEL 3: AI INNOVATE AI TEACHER INSTRUCTION MANUAL
Activity
A new teacher joined the ABC Higher Secondary School, Ambapalli to teach Science to the
students of Class XI. In his first class itself, he could make out that not everyone understood what
was being taught in class. So, he decided to take a poll to assess the level of students. The following
graph shows the level of interest of the students in the class.
11%
19% 5%
25%
40%
Depending on the result obtained, he changed his method of teaching. After a month, he repeated
the same poll once again to ascertain if there was any change. The results of poll are shown in the
chart below.
12%
6%
38%
14%
30%
With the help of the information provided create a good data story setting a strong narrative
around the data, making it is easier to understand the pre and post data, existing problem, action
taken by the teacher, and the resolution of the problem. Distribute A4 sheets and pens to the
students for this activity.
36
CLASS XII - LEVEL 3: AI INNOVATE AI TEACHER INSTRUCTION MANUAL
Purpose: To provide insight into data storytelling and how it can bring a story to life.
Say: “Now that you have understood what storytelling is and why it is needed, let us learn about
a storytelling of a different kind - the art of data storytelling and in the form of a narrative or
story.”
Session Preparation
Logistics: For a class of ____ students. [Group Activity]
Materials Required
ITEM QUANTITY
A4 sheets Xx
Pens Xx
Data storytelling is a structured approach for communicating insights drawn from data, and
invariably involves a combination of three key elements: data, visuals, and narrative. When the
narrative is accompanied with data, it helps to explain the audience what’s happening in the data
and why a particular insight has been generated. When visuals are applied to data, they
can enlighten the audience to the insights that they wouldn’t perceive without the charts or
graphs.
Finally, when narrative and visuals are merged together, they can engage or even entertain an
audience. When you combine the right visuals and narrative with the right data, you have a data
story that can influence and drive change.
37
CLASS XII - LEVEL 3: AI INNOVATE AI TEACHER INSTRUCTION MANUAL
Presenting the data as a series of disjointed charts and graphs could result in the audience
struggling to understand it – or worse, come to the wrong conclusions entirely. Thus, the
importance of a narrative comes from the fact that it explains what is going on within the data
set. It offers a context and meaning, relevance and clarity. A narrative shows the audience where
to look and what not to miss and also keeps the audience engaged.
Good stories don’t just emerge from data itself; they need to be unravelled from data
relationships. Closer scrutiny helps uncover how each data point relates with other. Some easy
steps that can assist in finding compelling stories in the data sets are as follows:
Step 1: Get the data and organise it.
Step 2: Visualize the data.
Step 3: Examine data relationships.
Step 4: Create a simple narrative embedded with conflict.
Activity: Try creating a data story with the information given below and use your imagination to
reason as to why some cases have spiked while others have seen a fall.
38
CLASS XII - LEVEL 3: AI INNOVATE AI TEACHER INSTRUCTION MANUAL
It is an effective tool to transmit human experience. Narrative is the way we simplify and
make sense of a complex world. It supplies context, insight, interpretation—all the things
that make data meaningful, more relevant and interesting.
No matter how impressive an analysis, or how high-quality the data, it is not going to
compel change unless the people involved understand what is explained through a story.
Stories that incorporate data and analytics are more convincing than those based entirely
on anecdotes or personal experience.
It helps to standardize communications and spread results.
It makes information memorable and easier to retain in the long run.
Data Story elements challenge –
Identify the elements that make a compelling data story and name them
_____________________
______________________
_____________________
39
CLASS XII - LEVEL 3: AI INNOVATE AI TEACHER INSTRUCTION MANUAL
APPENDIX
Additional Resource for Advanced learners
The objective of this additional AI programming resource for Class 12 is to increase student
knowledge and exposure to programming and help them create AI projects.
The Resources are divided into two categories:
Links:
Beginner - https://bit.ly/33spBZq
Advance - https://bit.ly/3b9US7V
Note: Please use google collab links for easy reference
40