Advanced Analytics PGDBDA Feb20
Advanced Analytics PGDBDA Feb20
Advanced Analytics PGDBDA Feb20
2. Which of the following is the equation for calculating the coefficient of variation (CV)?
A. CV = standard deviation/mean
B. CV = standard deviation - z-score/mean (total variation)
C. CV = value of observation's distance from mean/standard deviation
D. CV = mean/(standard deviation)2
4. Which of the following types of sampling involves using random procedures to select a sample?
A. judgment sampling
B. probabilistic sampling
C. subjective sampling
D. convenience sampling
5. Which of the following sampling methods bases its selection of samples on the ease of data collection? A.
probabilistic sampling
B. judgment sampling
C. simple random sampling
D. convenience sampling
7. ________ sampling applies to populations that are divided into natural subsets and allocates the
appropriate proportion of samples to each subset.
A. Systematic
B. Stratified
C. Cluster
D. Continuous process
USM’s Shriram Mantri Vidyanidhi Info Tech Academy
PG DBDA Feb 20 Advanced Analytics Question Bank
8. The Ransin Sports Company has noted that the size of individual customer orders is normally distributed
with a mean of $112 and a standard deviation of $9. Which of the following is the answer for the
probability that the next individual who buys a product will make a purchase of more than $116?
A. 71%
B. 48%
C. 33%
D. 42%
9. Which of the following is a difference between the t-distribution and the standard normal distribution?
The t-distribution cannot be calculated without a known standard deviation, while the standard normal
distributions can be.
A. The standard normal distribution's confidence levels are wider than those of the t-distribution.
B. The t-distribution has a larger variance than the standard normal distribution.
C. The standard normal distribution is dependent on parameters like degrees of freedom, while
tdistribution is not.
10. Troista Mobile Accessories sells mobile apps on their Web site. If a customer spends on average, $12 per
visit and visits the Web site 20 times each year, what is the average nondiscounted gross profit during a
customer's lifetime? Given that Troista makes a margin of 60 percent on the average bill, with 25 percent
of customers not returning each year.
A. $30
B. $75
C. $360
D. $576
22. When a model has a unique optimal solution, it means that ________.
A. the objective is maximized or minimized by more than one combination of decision variables
B. there is no solution that simultaneously satisfies all the constraints
C. the Allowable Increase or Allowable Decrease values for changing cells are zero
D. there is exactly one solution that will result in the maximum or minimum objective
Sampling
1. Estimation is possible only in case of a:
(a) Parameter (b) Sample (c) Random sample (d) Population
8. The process of using sample data to estimate the values of unknown population parameter is called:
(a) Estimate (b) Estimator (c) Estimation (d) Interval estimation
9. The process of making estimates about the population parameter from a sample is called:
(a) Statistical independence (b) Statistical inference
(c) Statistical hypothesis (d) Statistical decision
14. The numerical value which we determine from the sample for population parameter is called:
(a) Estimation (b) Estimate (c) Estimator (d) Confidence coefficient
17 A range of values within which the population parameter is expected to occur is called:
(a) Confidence coefficient (b) Confidence interval
(c) Confidence limits (d) Level of significance
22 If the mean of the estimator is not equal to the population parameter, the estimator is said to be:
(a) Unbiased (b) Biased (c) Positively biased (d) Negatively biased
23 The difference between the expected value of an estimator and the value of the corresponding parameter
is called:
(a) Bias (b) Sampling error (c) Error of estimation (d) Standard error
25 The confidence interval estimate for the difference of two population means in case of paired
29 (1 – α) is called:
(a) Critical value (b) Level of significance (c) Level of confidence (d) Interval estimate
42 The number of values that are free to vary after we have placed certain restrictions upon the data is
called:
(a) Degrees of freedom (b) Confidence coefficient
(c) Number of parameters (d) Number of samples
43 If the observations are paired and the number of pairs is n, then degree of freedom is equal to:
(a) n (b) n – 1 (c) n1 + n2 – 2 (d) n/2
44 In t-distribution for two independent samples n1 = n2 = n, then the degrees of freedom is equal to:
(a) 2n – 1 (b) 2n – 2 (c) 2n + 1 (d) n – 1
45 If the population standard deviation σ is unknown, and the sample size is small i.e.; n≤30, the confidence
interval for the population mean µ is based on
(a) The t-distribution (b) The normal distribution
(c) The binomial distribution (d) The hypergeometric distribution
47 If the population standard deviation σ is known, the confidence interval for the population mean µ is
based on:
(a) The Poisson distribution (b) The t-distribution
(c) The X2-distribution (d) The normal distribution
48 A statistician calculates a 95% confidence interval for µ when σ is known. The confidence interval is Rs.
18000 to Rs. 22000, the amount of the sample mean is:
(a) Rs. 18000 (b) Rs. 20000 (c) Rs. 22000 (d) Rs. 40000
49 A student calculates a 90% confidence interval for population mean when population standard deviation
σ is unknown and n = 9. The confidence interval is -24.3 cents to 64.3 cents, the sample mean is:
(a) 40 (b) -24.3 (c) 64.3 (d) 20
50 A 95% confidence interval for population proportion p is 32.4% to 47.6%, the value of sample proportion
is:
(a) 40% (b) 32.4% (c) 47.6% (d) 80%
53 If the population standard deviation σ is doubles, the width of the confidence interval for the population
mean µ (i.e.; the upper limit of the confidence interval – lower limit of the confidence interval) will be:
(a) Divided by 2 (b) Multiplied by (c) Doubled (d) Decrease
57 If the population standard deviation σ is known and the sample size n is less than or equal to or more than
30, the confidence interval for the population mean µ is:
58 If the population standard deviation σ is unknown and the sample size n is greater than 30, the
confidence interval for the population mean µ is:
59 If the population standard deviation σ is unknown and the sample size n is less than or equal to 30, the
confidence interval for the population mean is:
60 If we have normal populations with known population standard deviations σ1 and σ2, the confidence
interval estimate for the difference between two population means is:
USM’s Shriram Mantri Vidyanidhi Info Tech Academy
PG DBDA Feb 20 Advanced Analytics Question Bank
61 If the population standard deviations σ1 and σ2 are unknown and sample sizes n1, n2 ≥ 30, the100 (1 –
α)% confidence interval for is:
62 If the sample size is large, the confidence interval estimate of a population proportion p is:
63 If n1, n2 ≤ 30, the confidence interval estimate for the difference of two population means when
population standard deviation σ1, σ2 are unknown but equal in case of pooled variates is:
Inferential Stats
1) In a normal distribution, the standard score of the tenth percentile (the value that separates the
bottom 10 percent from the top 90 percent) is: a. -2.33
b. -0.90
c. 0.10
d. -0.10
e. 0.54
f. -0.82
g. -1.28
3) If we conduct a statistical test of a hypothesis, using a random sample, and the result is not significant,
what conclusion can we draw?
a. We did not manage to reject H0
b. H1 should be rejected
c. H0 should be rejected
d. H0 is true
e. H1 is true
f. We did not manage to reject H1
4) The distribution of heights of adult American men is approximately Normal with mean 69 inches
and standard deviation 2.5 inches. Between what heights do the middle 95% of men fall? a. 66.5-71.5
b. 64-74
c. 61.5-76.5
d. 65-73
5) The mean life of a tire is 30,000 km. The standard deviation is 2000 km. Then, 68% of all tires will have
a life between __ km and __ km. a. 28,000 km and 32,000 km.
b. 24,000 km and 34,000 km.
c. 26,000 km and 34,000 km.
d. 27,000 km and 31,000 km.
7) The shelf life of a particular dairy product is normally distributed with a mean of 12 days and a
standard deviation of 3 days. About what percent of the products last between 12 and 15 days? a. 68%
b. 34%
c. 16%
d. 2.5%
8) A survey will be given to 100 students randomly selected from the freshmen class at Lincoln High School.
What is the population?
a. The 100 selected students
b. All freshmen at Lincoln High School
c. All students at Lincoln High School
10) You are interested in how stress affects heart rate in humans. Your dependent variable would be the
_____.
a. stress
b. heart rate
c. number of humans
d. interest
12) Chi-Square is
a. used when comparing two sets of rankings
b. used when comparing several sets of scores
c. used when comparing two sets of scores
d. used with categorical data
Feature Engineering
1. True-False: We use confidence intervals for for deciding whether the population supports a specific
idea/model/hypothesis.
A. True
B. False
We use Hypothesis Testing for deciding whether the population supports a specific idea/model/hypothesis
3. True-False: With the Chi-squared test, we can test whether a sample mean differs from the an expected
(population) mean for numeric variables.
A. True
B. False
USM’s Shriram Mantri Vidyanidhi Info Tech Academy
PG DBDA Feb 20 Advanced Analytics Question Bank
Chi-squared test is for categorical variables
Both methods seek to reduce the number of attributes in the dataset, but a dimensionality reduction
method do so by creating new combinations of attributes, where as feature selection methods include
and exclude attributes present in the data without changing them.
10. True-False: In Filter Methods, each feature is considered separately, thereby ignoring feature
dependencies, which may lead to worse classification performance when compared to other types of
feature selection techniques.
USM’s Shriram Mantri Vidyanidhi Info Tech Academy
PG DBDA Feb 20 Advanced Analytics Question Bank
A. True
B. False