Advanced Analytics PGDBDA Feb20

Download as pdf or txt
Download as pdf or txt
You are on page 1of 13

USM’s Shriram Mantri Vidyanidhi Info Tech Academy

PG DBDA Feb 20 Advanced Analytics Question Bank


1. Which of the following is the equation for calculating the coefficient of variation (CV)?
A. CV = standard deviation/mean
B. CV = standard deviation - z-score/mean (total variation)
C. CV = value of observation's distance from mean/standard deviation
D. CV = mean/(standard deviation)2

2. Which of the following is the equation for calculating the coefficient of variation (CV)?
A. CV = standard deviation/mean
B. CV = standard deviation - z-score/mean (total variation)
C. CV = value of observation's distance from mean/standard deviation
D. CV = mean/(standard deviation)2

3. For two variables, a positive correlation coefficient indicates ________.


A. a linear relationship exists for which one variable increases as the other also increases
B. a linear relationship exists for one variable that increases while the other decreases
C. that the two variables have no linear relationship with each other
D. a nonlinear relationship with no linear correlation between the two variables

4. Which of the following types of sampling involves using random procedures to select a sample?
A. judgment sampling
B. probabilistic sampling
C. subjective sampling
D. convenience sampling

5. Which of the following sampling methods bases its selection of samples on the ease of data collection? A.
probabilistic sampling
B. judgment sampling
C. simple random sampling
D. convenience sampling

6. Which of the following describes periodic sampling?


A. It is a sampling method based solely on expert opinion.
B. It is a sampling method based on selecting a time and then sampling the products after that time.
C. It is a sampling method based on selecting every nth item from a population.
D. It is a sampling method exclusively used for population that is divided into subsets.

7. ________ sampling applies to populations that are divided into natural subsets and allocates the
appropriate proportion of samples to each subset.
A. Systematic
B. Stratified
C. Cluster
D. Continuous process
USM’s Shriram Mantri Vidyanidhi Info Tech Academy
PG DBDA Feb 20 Advanced Analytics Question Bank
8. The Ransin Sports Company has noted that the size of individual customer orders is normally distributed
with a mean of $112 and a standard deviation of $9. Which of the following is the answer for the
probability that the next individual who buys a product will make a purchase of more than $116?
A. 71%
B. 48%
C. 33%
D. 42%

9. Which of the following is a difference between the t-distribution and the standard normal distribution?
The t-distribution cannot be calculated without a known standard deviation, while the standard normal
distributions can be.
A. The standard normal distribution's confidence levels are wider than those of the t-distribution.
B. The t-distribution has a larger variance than the standard normal distribution.
C. The standard normal distribution is dependent on parameters like degrees of freedom, while
tdistribution is not.

10. Troista Mobile Accessories sells mobile apps on their Web site. If a customer spends on average, $12 per
visit and visits the Web site 20 times each year, what is the average nondiscounted gross profit during a
customer's lifetime? Given that Troista makes a margin of 60 percent on the average bill, with 25 percent
of customers not returning each year.
A. $30
B. $75
C. $360
D. $576

11. Use the table below to answer the following question(s).


Below is the profit model spreadsheet for the Lazarus Shoe Company producing their latest model of shoes
for the month of January.

Profit Model for Lazarus Shoe


(All cost in $)
Company for January
Unit Price 47
Unit Cost 22
Fixed Cost for Production 350,000
Demand 40,000
Model
Unit Price 47
Quantity Sold 38,000
Revenue
Unit Cost 22
Quantity Produced 38,000
USM’s Shriram Mantri Vidyanidhi Info Tech Academy
PG DBDA Feb 20 Advanced Analytics Question Bank
Variable Cost
Fixed Cost 300,000
Profit

19. Calculate the revenue for units sold.


A. $836,000
B. $1,136,000
C. $600,000
D. $1,786,000

20. Calculate the variable cost of production.


A. $1,786,000
B. $836,000
C. $600,000
D. $1,436,000

21 Calculate the total profit.


A. $600,000
B. $1,436,000
C. $836,000
D. $1,786,000

22. When a model has a unique optimal solution, it means that ________.
A. the objective is maximized or minimized by more than one combination of decision variables
B. there is no solution that simultaneously satisfies all the constraints
C. the Allowable Increase or Allowable Decrease values for changing cells are zero
D. there is exactly one solution that will result in the maximum or minimum objective

Sampling
1. Estimation is possible only in case of a:
(a) Parameter (b) Sample (c) Random sample (d) Population

2. Estimation is of two types:


(a) One sided and two sided (b) Type I and type II
(c) Point estimation and interval estimation (d) Biased and unbiased

3. A formula or rule used for estimating the parameter is called:


(a) Estimation (b) Estimate (c) Estimator (d) Interval estimate

4. A value of an estimator is called:


USM’s Shriram Mantri Vidyanidhi Info Tech Academy
PG DBDA Feb 20 Advanced Analytics Question Bank
(a) Estimation (b) Estimate (c) Variable (d) Constant

5. Estimate and estimator are:


(a) Same (b) Different (c) Maximum (d) Minimum

6. The type of estimates are:


(a) Point estimate (b) Interval estimates
(c) Estimation of confidence region (d) All of the above

7. Estimate is the observed value of an:


(a) Unbiased estimator (b) Estimator
(c) Estimation (d) Interval estimation

8. The process of using sample data to estimate the values of unknown population parameter is called:
(a) Estimate (b) Estimator (c) Estimation (d) Interval estimation

9. The process of making estimates about the population parameter from a sample is called:
(a) Statistical independence (b) Statistical inference
(c) Statistical hypothesis (d) Statistical decision

10. Statistical inference has two branches namely:


A. Level of confidence and degrees of freedom
B. Biased estimator and unbiased estimator
C. Point estimator and unbiased estimator
D. Estimation of parameter and testing of hypothesis

11. A specific value calculated from sample is called:

(a) Estimator (b) Estimate (c) Estimation (d) Bias

12 An estimator is a random variable because it varies from:


(a) Population to sample (b) Population to population
(c) Sample to sample (d) Sample to population

13. Statistic is an estimator and its calculated value is called:


(a) Biased estimate (b) Estimation (c) Estimator (d) Interval estimate

14. The numerical value which we determine from the sample for population parameter is called:
(a) Estimation (b) Estimate (c) Estimator (d) Confidence coefficient

15 A single value used to estimate a population values is called:


(a) Interval estimate (b) Point estimate
(c) Level of confidence (d) Degrees of freedom
USM’s Shriram Mantri Vidyanidhi Info Tech Academy
PG DBDA Feb 20 Advanced Analytics Question Bank
16 An interval calculated from the sample data and it is likely to contain the value of parameter with some
probability is called:
(a) Interval estimate (b) Point estimate
(c) Confidence interval (d) Level of confidence

17 A range of values within which the population parameter is expected to occur is called:
(a) Confidence coefficient (b) Confidence interval
(c) Confidence limits (d) Level of significance

18 Interval estimate is determined in terms of:


(a) Sampling error (b) Error of estimation
(c) Confidence coefficient (d) Degrees of freedom

19 The level of confidence is denoted by:


(a) α (b) β (c) 1 - α (d) 1 - β

20 The end points of a confidence interval are called:


(a) Confidence coefficient (b) Confidence limits
(c) Error of estimation (d) Parameters

21 The probability associated with confidence interval is called:


(a) Level of confidence (b) Confidence coefficient
(c) Both (a) and (b) (d) Confidence limits

22 If the mean of the estimator is not equal to the population parameter, the estimator is said to be:
(a) Unbiased (b) Biased (c) Positively biased (d) Negatively biased

23 The difference between the expected value of an estimator and the value of the corresponding parameter
is called:
(a) Bias (b) Sampling error (c) Error of estimation (d) Standard error

24 Bias of an estimator can be:


(a) Negative (b) Positive (c) Zero (d) Both (a) or (b)

25 The confidence interval estimate for the difference of two population means in case of paired

26 Estimates given in the form of confidence intervals are called:


(a) Point estimates (b) Interval estimates
(c) Confidence limits (d) Degree of freedom
USM’s Shriram Mantri Vidyanidhi Info Tech Academy
PG DBDA Feb 20 Advanced Analytics Question Bank
27 Interval estimate is associated with:
(a) Probability (b) Non-probability
(c) Range of values (d) Number of parameters

28 The point estimator of population mean µ is:


(a) Sample mean (b) Sample variance
(c) Sample standard deviation (d) Sample size

29 (1 – α) is called:
(a) Critical value (b) Level of significance (c) Level of confidence (d) Interval estimate

30 If (1 – α) is increased, the width of a confidence interval is:


(a) Decreased (b) Increased (c) Constant (d) Same

31 By decreasing the sample size, the confidence interval becomes:


(a) Narrower (b) Wider (c) Fixed (d) All of the above

32 Confidence interval become narrow by increasing the:


(a) Sample size (b) Population size (c) Level of confidence (d) Degrees of freedom

33 By increasing the sample size, the precision of confidence interval is:


(a) Increased (b) Decreased (c) Same (d) Unchanged

34 A function for estimating a parameter is called as:


(a) Estimator (b) Estimate (c) Estimation (d) Level of confidence

35 A sample constant representing a population parameter is known as:


(a) Estimation (b) Estimator (c) Estimate (d) Bias

36 The distance between an estimate and the estimated parameter is called:


(a) Sampling error (b) Error of estimation
(c) Bias (d) Standard error

37 Standard error is the standard deviation of the sampling distribution of an:


(a) Estimate (b) Estimation (c) Estimator (d) Error of estimation

38 ∑Xi / n for i=1,2,3,….,n is called:


(a) Estimation (b) Estimate (c) Estimator (d) Interval estimate

39 A statistic is an unbiased estimator of a parameter if:


(a) E(statistic)=parameter (b) E(mean)=variance
(c) E(variance)=mean (d) E(sample mean)=proportion

40 The following statistics are unbiased estimators:


USM’s Shriram Mantri Vidyanidhi Info Tech Academy
PG DBDA Feb 20 Advanced Analytics Question Bank
(a) The sample mean (b) The sample variance
(c) The sample proportion (d) All the above

41 Which of the following is biased estimator?

42 The number of values that are free to vary after we have placed certain restrictions upon the data is
called:
(a) Degrees of freedom (b) Confidence coefficient
(c) Number of parameters (d) Number of samples

43 If the observations are paired and the number of pairs is n, then degree of freedom is equal to:
(a) n (b) n – 1 (c) n1 + n2 – 2 (d) n/2

44 In t-distribution for two independent samples n1 = n2 = n, then the degrees of freedom is equal to:
(a) 2n – 1 (b) 2n – 2 (c) 2n + 1 (d) n – 1

45 If the population standard deviation σ is unknown, and the sample size is small i.e.; n≤30, the confidence
interval for the population mean µ is based on
(a) The t-distribution (b) The normal distribution
(c) The binomial distribution (d) The hypergeometric distribution

46 The shape of the t-distribution depends upon the:


(a) Sample size (b) Population size (c) Parameters (d) Degrees of freedom

47 If the population standard deviation σ is known, the confidence interval for the population mean µ is
based on:
(a) The Poisson distribution (b) The t-distribution
(c) The X2-distribution (d) The normal distribution

48 A statistician calculates a 95% confidence interval for µ when σ is known. The confidence interval is Rs.
18000 to Rs. 22000, the amount of the sample mean is:
(a) Rs. 18000 (b) Rs. 20000 (c) Rs. 22000 (d) Rs. 40000

49 A student calculates a 90% confidence interval for population mean when population standard deviation
σ is unknown and n = 9. The confidence interval is -24.3 cents to 64.3 cents, the sample mean is:
(a) 40 (b) -24.3 (c) 64.3 (d) 20

50 A 95% confidence interval for population proportion p is 32.4% to 47.6%, the value of sample proportion
is:
(a) 40% (b) 32.4% (c) 47.6% (d) 80%

51 A confidence interval will be widened if:


USM’s Shriram Mantri Vidyanidhi Info Tech Academy
PG DBDA Feb 20 Advanced Analytics Question Bank
A. The confidence level is increased and the sample size is reduced
B. The confidence level is increased and the sample size is increased
C. The confidence level is decreased and the sample size is increased
D. The confidence level is decreased and the sample size is decreased

52 A 95% confidence interval for the mean of a population is such that:


A. It contains 95% of the values in the population
B. There is a 95% chance that it contains all the values in the population.
C. There is a 95% chance that it contains the mean of the population
D. There is a 95% chance that it contains the standard deviation of the population

53 If the population standard deviation σ is doubles, the width of the confidence interval for the population
mean µ (i.e.; the upper limit of the confidence interval – lower limit of the confidence interval) will be:
(a) Divided by 2 (b) Multiplied by (c) Doubled (d) Decrease

54 If α = 0.10 and n = 15; equals:


(a) 1.761 (b) 1.753 (c) 1.771 (d) 2.145

55 If n1 = 16, n2 = 9 and α = 0.01; equals:


(a) 2.787 (b) 2.807 (c) 2.797 (d) 3.767

56 If 1 – α = 0.90, then value of is:


(a) 1.96 (c) 1.645 (d) 2.326

57 If the population standard deviation σ is known and the sample size n is less than or equal to or more than
30, the confidence interval for the population mean µ is:

58 If the population standard deviation σ is unknown and the sample size n is greater than 30, the
confidence interval for the population mean µ is:

59 If the population standard deviation σ is unknown and the sample size n is less than or equal to 30, the
confidence interval for the population mean is:

60 If we have normal populations with known population standard deviations σ1 and σ2, the confidence
interval estimate for the difference between two population means is:
USM’s Shriram Mantri Vidyanidhi Info Tech Academy
PG DBDA Feb 20 Advanced Analytics Question Bank

61 If the population standard deviations σ1 and σ2 are unknown and sample sizes n1, n2 ≥ 30, the100 (1 –
α)% confidence interval for is:

62 If the sample size is large, the confidence interval estimate of a population proportion p is:

63 If n1, n2 ≤ 30, the confidence interval estimate for the difference of two population means when
population standard deviation σ1, σ2 are unknown but equal in case of pooled variates is:

Inferential Stats
1) In a normal distribution, the standard score of the tenth percentile (the value that separates the
bottom 10 percent from the top 90 percent) is: a. -2.33
b. -0.90
c. 0.10
d. -0.10
e. 0.54
f. -0.82
g. -1.28

2) To obtain a statistically significant result means:


a. To reject H0 when it is true
b. To get an outcome in the rejection region
c. To arrive at a correct conclusion
USM’s Shriram Mantri Vidyanidhi Info Tech Academy
PG DBDA Feb 20 Advanced Analytics Question Bank
d. To obtain a scientifically meaningful result
e. To obtain a result that will recur in a replication

3) If we conduct a statistical test of a hypothesis, using a random sample, and the result is not significant,
what conclusion can we draw?
a. We did not manage to reject H0
b. H1 should be rejected
c. H0 should be rejected
d. H0 is true
e. H1 is true
f. We did not manage to reject H1

4) The distribution of heights of adult American men is approximately Normal with mean 69 inches
and standard deviation 2.5 inches. Between what heights do the middle 95% of men fall? a. 66.5-71.5
b. 64-74
c. 61.5-76.5
d. 65-73

5) The mean life of a tire is 30,000 km. The standard deviation is 2000 km. Then, 68% of all tires will have
a life between __ km and __ km. a. 28,000 km and 32,000 km.
b. 24,000 km and 34,000 km.
c. 26,000 km and 34,000 km.
d. 27,000 km and 31,000 km.

6) The normal curve is symmetrical about the mean.


a. True
b. False

7) The shelf life of a particular dairy product is normally distributed with a mean of 12 days and a
standard deviation of 3 days. About what percent of the products last between 12 and 15 days? a. 68%
b. 34%
c. 16%
d. 2.5%

8) A survey will be given to 100 students randomly selected from the freshmen class at Lincoln High School.
What is the population?
a. The 100 selected students
b. All freshmen at Lincoln High School
c. All students at Lincoln High School

9) What is meant by a Type 1 error?


a. Rejecting a null hypothesis that is true
b. Retaining a null hypothesis that false is
c. Inputting your data inaccurately in a statistical test
USM’s Shriram Mantri Vidyanidhi Info Tech Academy
PG DBDA Feb 20 Advanced Analytics Question Bank

10) You are interested in how stress affects heart rate in humans. Your dependent variable would be the
_____.
a. stress
b. heart rate
c. number of humans
d. interest

11) Which statement is true for "outliers"?


a. they should be deleted from analysis.
b. they can distort summary statistics.
c. they are mistakes made in analysis.
d. they are of little importance.

12) Chi-Square is
a. used when comparing two sets of rankings
b. used when comparing several sets of scores
c. used when comparing two sets of scores
d. used with categorical data

13) If a researcher rejects a null hypothesis, the researcher either


a. is incorrect, or made a Type I error.
b. is correct, or made a Type I error.
c. is incorrect, or made a Type II error.
d. is correct, or made a Type II error.

Feature Engineering
1. True-False: We use confidence intervals for for deciding whether the population supports a specific
idea/model/hypothesis.
A. True
B. False
We use Hypothesis Testing for deciding whether the population supports a specific idea/model/hypothesis

2. Type 1 Error : False Positives :: Type 2 Error : ?


A. False Positives
B. True Positives
C. False Negatives
D. True Negatives

3. True-False: With the Chi-squared test, we can test whether a sample mean differs from the an expected
(population) mean for numeric variables.
A. True
B. False
USM’s Shriram Mantri Vidyanidhi Info Tech Academy
PG DBDA Feb 20 Advanced Analytics Question Bank
Chi-squared test is for categorical variables

4. In Wrapper Methods which of the following methods are used?


A. Forward Selection
B. Backward Elimination
C. Recursive Feature elimination
D. All of the above

5. Which are the methods used for feature selection.


A. Filter Methods
B. Wrapper Methods
C. Embedded Methods
D. All of the above

6. what are Filter Methods used in Feature Selection.


A. Pearson’s Correlation
B. LDA(Linear discriminant analysis)
C. ANOVA
D. Chi-Square
E. All of the above
F. None of the above

7. True-False: Feature selection is different from dimensionality reduction


A. False
B. True

Both methods seek to reduce the number of attributes in the dataset, but a dimensionality reduction
method do so by creating new combinations of attributes, where as feature selection methods include
and exclude attributes present in the data without changing them.

8. True-False: LASSO is one more method for feature Selection


A. True
B. False

9. Why do we use feature selection?


A. To enables the machine learning algorithm to train faster.
B. reduces the complexity of a model and makes it easier to interpret
C. improves the accuracy of a model if the right subset is chosen.
D. It reduces overfitting.
E. All of the above

10. True-False: In Filter Methods, each feature is considered separately, thereby ignoring feature
dependencies, which may lead to worse classification performance when compared to other types of
feature selection techniques.
USM’s Shriram Mantri Vidyanidhi Info Tech Academy
PG DBDA Feb 20 Advanced Analytics Question Bank
A. True
B. False

You might also like