ITAE002

Question 1
A multiple regression model uses a surface, such as a to approximate the relationship between a
continuous response (target) variable and a set of predictor variables.
O Linear, plane or hyperplane
O Non-linear, parabola or byperbola
O Both (a) and (b) R
O None of these
Question 2
In which phase of CRISP-DM, report is generated?
1. Data understanding Phase

2. Modelling phase
3. Evaluation phase
4. Deployment phase
Question 3
Which of the following methods is least sensitive to the presence of outliers?
O Standard deviation
O Mean absolute deviation
O Z-score
O Inter quartile range
Question 4
Sensitivity measures the ability of the model to classify a record ___________. While specificity
measures the ability to classify record __________.
*Negatively, Positively
* Positively, Negatively
* Positively, Positively
*Negatively, Negatively
Question 5
For continuous variable, generally two sample t-tests are used for
O Difference in means
O Difference in proportions
O Homogeneity of proportions
O None of these
Question 6
It is to be stressed that model evaluation techniques should be performed on the _______ test data set,
rather than on the training set, or on the data set as a whole.
O Test
O Verification
O Training
O None of these
Question 7
_______ is also a good estimate of the overall variance, but only on the condition that the null
hypothesis is true.
1. MSE
2. MSR
3. RMSE
4. None of these
Question 8
Let's consider, there are 4 variables and each can take 2 values. Now, there are 18 entries in the data
set. How many duplicate records may be present in data set?
O 03
O4
02
Question 9
In the _____ task, analysts try to find ways to describe patterns and trends lying within the data.
1. Estimation
2. Prediction
3. Description
4. Classification
Question 10
When x and y are as the value of x increases, the value of y tendes to decrease.
O positively correlated
O negatively correlated
O uncorrelated
O All of these
Question 11
Consider the following statements. Find out correct sequence of execution. 1. Build a data mining model
using the training set data. 2. Partition the available data into a training set and a test set. Validate the
partition. 3. Evaluate the data mining model using the set data
O 1,2,3
O 2,1,3
O 1,3,2
O 2,3,1
Question 12
Generally, F-test is used to find significance of the regression mode in which F-test considers the
between the target variable y and the set of predictors taken as a whole but not as individual predictor.
O Linear
O Non-linear
O Both (a) and (b)
O None of these
Question 13
For multinomial variable, generally the test is used for
O None of these
Question 14
What are the values of coefficient of determination and SSE respectively for perfect fit case (SSR=SST)?
0 0,1
O 1.1
0 1,0
O 1,-1
Question 15
For most of the real-world data, skewness is right
O Positive
O Negative
O Zero
O None of these
Question 16
Data mining is the process of
O Gathering data
O Plotting data
O Finding useful patterns and trends in large data sets
O Filtering data
Question 17
Extrapolation refers to estimates and predictions of the target variable made using the regression
equation with values of the predictor variable outside of the range of the values of, ______ in the data
set.
Ox
Oy
O Both (a) and (b)
O None of these
Question 18
Generally, by increasing complexity of model, it performs well on training set and may results
in___________ on test data.
O Overfitting
O Underfitting
O Perfectly well
O None of these
Question 19
In ANOVA for continuous variable, as extension of two sample t-tests, if we have three-fold partition of
data set, then it analyzes that the _________ value of the continuous variable is the same across the
subsets of data.
O Mean
O Variance
O error
O None of these
Question 20
95% Confidence interval about the mean number of customer service calls for all customers indicates:
We are 95% confident that the population mean number of customer service calls for all customers
falls between some range
We are 95% confident that the sample mean number of customer service calls for all customers falls
between some range.
We are 5% confident that the population mean number of customer service calls for all falls between
some range.
None of these
Question 21
_______ will treat all errors equally, whether outliers or not, and thereby avoid the problem of undue
influence of outliers
O MAE (mean absolute error)
O MSE (mean square error)
O RMSE
O None of these
Question 22
If the false positive cost increases, then specificity should
O decrease
O jncrease
O not be changed
O none of these
Question 23
Predictive analytics is the process of
O Just cleaning data

O Just compressing data
O Guessing about present output without any data
O Information retrieval to make useful predictions about future outcomes
Question 24
According to CRISP-DM, how many phases are there in data mining project life cycle?
O Five
O Four
O Seven
O Six
Question 25
In the regression model, changing the ordering of the variables into the model changes nothing except
the
O Sequential sum of squares
O Sum of squares
O Difference of squares
O None of these
Question 26
The factor solutions provided by factor analysis are not invariant to
O Transformations
O Scaling
O None of these
Question 27
In general, a user-defined composite is simply a ‘combination of the variables, which combines several
varial 2s
together into a single composite measure.
O Homogenous
O Superposition
O Linear
O Non-linear
Question 28
For flag variable, generally two sample Z-tests are used for
O None of these
Question 29
In which phase, performance of selected models is tested?
O Deployment
O Evaluation
O Modelling
O Data preparation
Question 30
In general,______ null hypothesis if p-value is less than level of significance a (a small preset value, say
0.05).
O Accept
O Reject
O More rest required
O None of these

ITAE002

Uploaded by

Copyright:

Available Formats

ITAE002

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

ITAE002

Uploaded by

Copyright:

Available Formats

Question 1

O Linear, plane or hyperplane

O Non-linear, parabola or byperbola

O Both (a) and (b) R

In which phase of CRISP-DM, report is generated?

1. Data understanding Phase

Which of the following methods is least sensitive to the presence of outliers?

O Mean absolute deviation

O Inter quartile range

O Both (a) and (b)

For multinomial variable, generally the test is used for

For most of the real-world data, skewness is right

Data mining is the process of

O Finding useful patterns and trends in large data sets

O Both (a) and (b)

O MAE (mean absolute error)

O MSE (mean square error)

If the false positive cost increases, then specificity should

Predictive analytics is the process of

O Just cleaning data

O Guessing about present output without any data

O Information retrieval to make useful predictions about future outcomes

O Sequential sum of squares

The factor solutions provided by factor analysis are not invariant to

together into a single composite measure.

In which phase, performance of selected models is tested?

O More rest required

You might also like