ITAE002

Download as pdf or txt
Download as pdf or txt
You are on page 1of 10

Question 1

A multiple regression model uses a surface, such as a to approximate the relationship between a
continuous response (target) variable and a set of predictor variables.

O Linear, plane or hyperplane

O Non-linear, parabola or byperbola

O Both (a) and (b) R

O None of these

Question 2

In which phase of CRISP-DM, report is generated?

1. Data understanding Phase


2. Modelling phase
3. Evaluation phase
4. Deployment phase

Question 3

Which of the following methods is least sensitive to the presence of outliers?

O Standard deviation

O Mean absolute deviation

O Z-score

O Inter quartile range

Question 4

Sensitivity measures the ability of the model to classify a record ___________. While specificity
measures the ability to classify record __________.
*Negatively, Positively

* Positively, Negatively

* Positively, Positively

*Negatively, Negatively

Question 5

For continuous variable, generally two sample t-tests are used for

O Difference in means

O Difference in proportions

O Homogeneity of proportions

O None of these

Question 6

It is to be stressed that model evaluation techniques should be performed on the _______ test data set,
rather than on the training set, or on the data set as a whole.

O Test

O Verification

O Training

O None of these

Question 7

_______ is also a good estimate of the overall variance, but only on the condition that the null
hypothesis is true.

1. MSE
2. MSR
3. RMSE
4. None of these
Question 8

Let's consider, there are 4 variables and each can take 2 values. Now, there are 18 entries in the data
set. How many duplicate records may be present in data set?

O 03

O4

02

Question 9

In the _____ task, analysts try to find ways to describe patterns and trends lying within the data.

1. Estimation
2. Prediction
3. Description
4. Classification

Question 10

When x and y are as the value of x increases, the value of y tendes to decrease.

O positively correlated

O negatively correlated

O uncorrelated

O All of these

Question 11

Consider the following statements. Find out correct sequence of execution. 1. Build a data mining model
using the training set data. 2. Partition the available data into a training set and a test set. Validate the
partition. 3. Evaluate the data mining model using the set data
O 1,2,3

O 2,1,3

O 1,3,2

O 2,3,1

Question 12

Generally, F-test is used to find significance of the regression mode in which F-test considers the

between the target variable y and the set of predictors taken as a whole but not as individual predictor.

O Linear

O Non-linear

O Both (a) and (b)

O None of these

Question 13

For multinomial variable, generally the test is used for

O Difference in means

O Difference in proportions

O Homogeneity of proportions

O None of these

Question 14

What are the values of coefficient of determination and SSE respectively for perfect fit case (SSR=SST)?

0 0,1
O 1.1

0 1,0

O 1,-1

Question 15

For most of the real-world data, skewness is right

O Positive

O Negative

O Zero

O None of these

Question 16

Data mining is the process of

O Gathering data

O Plotting data

O Finding useful patterns and trends in large data sets

O Filtering data

Question 17

Extrapolation refers to estimates and predictions of the target variable made using the regression
equation with values of the predictor variable outside of the range of the values of, ______ in the data
set.
Ox

Oy

O Both (a) and (b)

O None of these

Question 18

Generally, by increasing complexity of model, it performs well on training set and may results
in___________ on test data.

O Overfitting

O Underfitting

O Perfectly well

O None of these

Question 19

In ANOVA for continuous variable, as extension of two sample t-tests, if we have three-fold partition of
data set, then it analyzes that the _________ value of the continuous variable is the same across the
subsets of data.

O Mean

O Variance

O error

O None of these

Question 20

95% Confidence interval about the mean number of customer service calls for all customers indicates:

We are 95% confident that the population mean number of customer service calls for all customers
falls between some range
We are 95% confident that the sample mean number of customer service calls for all customers falls
between some range.

We are 5% confident that the population mean number of customer service calls for all falls between
some range.

None of these

Question 21

_______ will treat all errors equally, whether outliers or not, and thereby avoid the problem of undue
influence of outliers

O MAE (mean absolute error)

O MSE (mean square error)

O RMSE

O None of these

Question 22

If the false positive cost increases, then specificity should

O decrease

O jncrease

O not be changed

O none of these

Question 23

Predictive analytics is the process of

O Just cleaning data


O Just compressing data

O Guessing about present output without any data

O Information retrieval to make useful predictions about future outcomes

Question 24

According to CRISP-DM, how many phases are there in data mining project life cycle?

O Five

O Four

O Seven

O Six

Question 25

In the regression model, changing the ordering of the variables into the model changes nothing except
the

O Sequential sum of squares

O Sum of squares

O Difference of squares

O None of these

Question 26

The factor solutions provided by factor analysis are not invariant to

O Transformations

O Scaling

O None of these
Question 27

In general, a user-defined composite is simply a ‘combination of the variables, which combines several
varial 2s

together into a single composite measure.

O Homogenous

O Superposition

O Linear

O Non-linear

Question 28

For flag variable, generally two sample Z-tests are used for

O Difference in means

O Difference in proportions

O Homogeneity of proportions

O None of these

Question 29

In which phase, performance of selected models is tested?

O Deployment

O Evaluation

O Modelling

O Data preparation

Question 30

In general,______ null hypothesis if p-value is less than level of significance a (a small preset value, say
0.05).
O Accept

O Reject

O More rest required

O None of these

You might also like