Vac QP

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 6

VIVEKANANDHA COLLEGE OF ARTS AND SCIENCES

FOR WOMEN
(Autonomous)
PG & Research Department of Computer Science and
Applications
Value Added Course
DATA SCIENCE USING PYTHON (VA2K2364)
(2023-24)
UNIT – I to UNIT - V
1.What is the primary goal of data science?
a) Predicting the future
b) Analyzing historical data
c) Making decisions based on data
d) All of the above

2. Which of the following is NOT a key component of data science?


a) Data collection
b) Data storage
c) Data visualization
d) Data transformation

3. Which of the following best describes data in the context of data science
a) Information that is already processed and analyzed Raw
b) facts and figures that need interpretation
c) Predictive Models
d) None of the above

4. What is structured data?


a) Data that is stored in a database
b) Data with a well-defined format and schema
c) Data that is unorganized and messy
d) Data that is unstructured and text-based

5. In a relational database, what is a primary key?


a) A key used to unlock the database
b) A unique identifier for a record in a table
c) A table that stores primary data
d) A table with no relationships to other tables

6. What is a foreign key in a database table?


a) A key used for encryption
b) A key used to access external data sources
c) A field that establishes a link between two tables
d) A primary key of a table

7. Which Python library is commonly used for data manipulation and analysis in data science?
a) Pygame
b) Matplotli
c) b Pandas
d) Numpy
8. Which library is used for creating data visualizations in Python?
a) Pandas
b) Matplotlib
c) Numpy
d) Scikit-learn

9.In the context of data science, what is a DataFrame?


a) A structured format for storing data in Python
b) A mathematical function
c) A data visualization tool
d) A type of database table

10. What is the term for a function that calls itself within its own code?
a) Recursive function
b) function Reusable
c) function Looped
d) function Iterative
11.What is the main purpose of linear functions in data science?
a) To model non-linear relationships
b) To represent data using a straight line
c) To handle categorical data
d) To perform statistical tests
12. In a linear function, what does the coefficient of the independent variable represent?
a) Intercept
b) Slope
c) Correlation coefficient
d) Error term
13.Which of the following is NOT a common application of linear functions in data science?

a) Linear regression
b) Time series analysis
c) Principal component analysis
d) Forecasting
14.Which type of plot is most suitable for visualizing the distribution of a single continuous
variable in data science?
a) Bar plot
b) Scatter plot
c) Histogram
d) Pie chart
15.In data science, what type of plot is used to show the relationship between two continuous
variables?
a) Box plot
b) Line plot
c) Scatter plot
d) Stacked bar plot
16.What is the primary purpose of a box plot in data science?
a) To show the distribution of a single variable
b) To compare multiple categories
c) To display correlations
d) To show time series data
17.In the context of linear regression, what does the slope represent?
a) The point where the regression line intersects the y-axis
b) The change in the dependent variable for a one-unit change in the independent
variable
c) The correlation between variables
d) The standard error of the regression
18.Which term represents the point where the regression line intersects the y-axis in a linear
regression equation?
a) Slope
b) Intercept
c) Correlation coefficient
d) Residual
19.In a simple linear regression model, the equation is y = mx + b. What does 'b' represent in
this equation?
a) The slope of the regression line
b) The variance of the dependent variable
c) The intercept of the regression line
d) The residual error
20.Which type of plot is best suited to visualize the distribution of a single numeric variable?
a) Scatter plot
b) Histogram
c) Box plot
d) Line plot
21.What is statistics?
a. The study of static objects
b. The science of data collection, analysis, interpretation
c. The study of mathematical equations
d. The science of probability theory
22.Why is statistics important?
a. It helps in making predictions with 100% accuracy
b. It provides tools for dealing with uncertainty and variability in data
c. It is used to create artistic visualizations
d. It is only relevant in the field of mathematics
23.What does the 25th percentile represent in a dataset?
a. The value below which 75% of the data falls
b. The value below which 25% of the data falls
c. The average of the highest and lowest data points
d. The value above which 25% of the data falls
24.If the median and the 50th percentile are the same, what can you say about the data?
a. The data is normally distributed
b. The data is negatively skewed
c. The data is positively skewed
d. Nothing specific can be concluded
25.What does the standard deviation measure?
a. The central tendency of a dataset
b. The spread or dispersion of data points in a dataset
c. The probability of an event occurring
d. The percentage of data below a certain value
26. A low standard deviation indicates
a. Data points are close to the mean
b. Data points are widely spread from the mean
c. A perfect normal distribution
d. An error in data collection
27.How is variance calculated?
a. It is the square root of the mean
b. It is the average of squared differences from the mean
c. It is the range of data values
d. It is the difference between the maximum and minimum values
28.If two datasets have the same mean but different variances, what can you infer?
a. The datasets have the same degree of variation
b. The datasets have different degrees of variation
c. The datasets have the same distribution
d. The datasets are identical
29.In a normally distributed dataset, where does the median percentile (50th percentile) lie?
a. At the minimum value
b. At the maximum value
c. At the mean (average) value
d. Exactly in the middle of the data range
30.If the variance of a dataset is zero, what can you conclude about the data?
a. All data points are equal
b. The data is normally distributed
c. The data is highly variable
d. The data is a large dataset
31.What does the term "correlation" refer to in the context of data science?
a) A measure of the strength and direction of a linear relationship between two variables.
b) The process of gathering and organizing data for analysis.
c) The prediction of future data trends.
d) The removal of outliers from a dataset.
32. In correlation analysis, what does a correlation coefficient of -0.9 indicate?
a) A strong positive relationship between two variables.
b) A strong negative relationship between two variables.
c) No relationship between two variables.
d) Perfect causality between two variables.

33. In statistics, what does a correlation coefficient of 0.7 represent?


a) A weak positive correlation.
b) A strong positive correlation.
c) A weak negative correlation.
d) A perfect negative correlation.

34.Which of the following correlation coefficients indicates the strongest relationship


between two variables?
a) -0.5
b) B) 0.2
c) C) 0.9
d) D) -0.7

35. What does a correlation matrix in statistics help you to determine?


a) The individual values of each data point in a dataset.
b) The mean and median of a dataset.
c) The relationships between multiple pairs of variables in a dataset.
d) The distribution of data in a histogram.
36. In a correlation matrix, what does a diagonal entry of 1" indicate?
a) A perfect negative correlation.
b) A perfect positive correlation.
c) No correlation.
d) A causal relationship.
37. What is the primary difference between correlation and causality in statistics?
a) Correlation measures the strength of a relationship between variables,
b) While causality determines the cause-and-effect relationship between them.
c) Correlation focuses on predicting future data trends, while causality deals with past data
analysis.
d) Correlation and causality are two terms for the same statistical concept.
e) Correlation can only be applied to discrete data, while causality is used for continuous data.
38. Which of the following statements is true regarding causality in statistical analysis?
a) Causality can be established solely based on a high correlation between two variables.
b) Correlation always implies causality.
c) Causality is a more complex concept and often requires experimentation or well-designed
observational studies to establish.
d) Causality is a synonym for correlation.
39. In a correlation matrix, each element on the diagonal represents the correlation of a
variable with itself. What is the range of values for the correlation coefficient for this
diagonal element?
a) -1 to 1
b) 0 to 1
c) -∞ to ∞
d) 1 to 100
40. In data science, correlation measures the degree of relationship between two variables.
Which of the following correlation coefficients indicates a perfect positive linear relationship
between two variables?
a) 0
b) 1
c) -1
d) 0.5
41. What is the primary goal of linear regression in data science?
a) Classification
b) Prediction
c) Clustering
d) Dimensionality Reduction
42. Which of the following is a common method to measure the goodness of fit in linear
regression?
a) Mean Absolute Error (MAE)
b) F1 Score
c) Confusion Matrix
d) ROC AUC
43. In the context of linear regression, what does a regression table typically display?
a) Coffee consumption data
b) Coefficients, p-values, and standard errors for predictor variables
c) Descriptive statistics of the target variable
d) Probability distributions of the residuals
44. What type of information is commonly included in regression output summaries?
a) The price of the dataset
b) The number of data points in the dataset
c) Coefficients, R-squared value, and p-values
d) The data visualization used in the analysis
45. In linear regression, what do the regression coefficients represent?
a) The probability distribution of the target variable
b) The relationship between the target variable and predictor variables
c) The correlation between two predictor variables
d) The number of data points in the dataset
46. If a regression coefficient for a particular predictor variable is close to zero, what does it
indicate?
a) Strong positive correlation with the target variable
b) Strong negative correlation with the target variable
c) Weak or no correlation with the target variable
d) High multicollinearity with other predictor variables
47. What is the significance of a regression coefficient with a p-value less than the chosen
significance level (e.g., 0.05)?
a) The coefficient is not statistically significant
b) The coefficient is statistically significant
c) The coefficient is related to multicollinearity
d) The coefficient represents an interaction effect
48. In regression analysis, what does the term "residuals" refer to?
a) Predicted values
b) Errors or the differences between observed and predicted values
c) Independent variables
d) Coefficients of the regression model
49. What does R-squared (R²) measure in the context of regression analysis?
a) The number of independent variables
b) The variance explained by the model
c) The p-value of the intercept
d) The correlation coefficient between variables
50. In a regression table, what does the "Coefficients" column display?
a) p-values
b) Variable names
c) Standard errors
d) Regression coefficients

You might also like