Misheck Mlambo n02217292f Data Analytics Test 2

Download as pdf or txt
Download as pdf or txt
You are on page 1of 12

NATIONAL UNIVERSITY OF SCIENCE

AND TECHNOLOGY
FACULTY OF APPLIED SCIENCE
DEPARTMENT OF INFORMATICS AND
ANALYSIS
DATA ANALYTICS SCI2206

Mlambo Misheck N02217292F


SCI2206 TEST 1

1. Which soft skills are needed to be a good data analyst?

Solution
Data analysts act as the crucial link between unprocessed data and actionable insights.
Their responsibilities include collecting, processing, and examining data to assist
organizations in making well-informed decisions. Data analysts frequently collaborate
with various departments, ranging from marketing to finance, to comprehend business
objectives and translate them into data-driven solutions.

Vital Data Analyst Expertise


The technical aptitude forms the core of a data analyst's skill set. These competencies
empower individuals to manipulate data, carry out intricate analyses, and produce insights
that can steer business decisions. Let's delve into the technical skills required for data
analysts:
1. Programming Proficiency (Python, R, SQL)
Within the field of data analysis, programming languages like Python, R, and SQL are
indispensable. These languages enable individuals to manipulate data, conduct statistical
analyses, and design data visualizations.
Python: Widely utilized for data manipulation and analysis, Python features a vast
ecosystem of libraries such as Pandas and NumPy.
R: Specialized for statistical analysis, R serves as another potent tool commonly utilized
in academic research and data visualization.
SQL: The primary language for database management, SQL enables individuals to query,
update, and manipulate structured data.
2. Data Visualization Tools (Tableau, Power BI)
Data visualization transcends mere chart creation; it involves narrating a story through
data. Tools like Tableau and Power BI are extensively employed for this purpose,
facilitating the transformation of complex data into easily digestible visual formats.
Tableau: Renowned for its user-friendly interface, Tableau allows data analysts to create
intricate visualizations without any coding. It proves particularly beneficial for crafting
interactive dashboards that can be effortlessly shared across an organization.
Power BI: Developed by Microsoft, Power BI stands as another robust tool for generating
interactive reports and dashboards. It seamlessly integrates with various Microsoft
products and supports real-time data tracking, making it prevalent in corporate
environments.
3. Statistical Analysis
Statistical analysis serves as the foundation of data analytics, furnishing the
methodologies for drawing inferences from data. Proficiency in statistical methods equips
data analysts to utilize the following:
Descriptive statistics: Summarize and interpret data to offer a comprehensive overview of
the data's implications.
Inferential statistics: Form predictions and inferences about a population based on a
sample.
Hypothesis testing: Assess theories or hypotheses with the aim of resolving practical
issues.

4. Data Wrangling and Cleansing


Prior to initiating any data analysis, the data must undergo cleaning and transformation
into a usable format, a process known as data wrangling. This process encompasses:
Data cleaning: Identifying and rectifying errors, inconsistencies, and inaccuracies in
datasets.
Data transformation: Converting data into a format conducive to easy analysis, which
may entail aggregating, reshaping, or enriching the data.
Data integration: Merging data from diverse sources and presenting a unified perspective.
5. Communication Skills
In the realm of data analytics, communication extends beyond presenting findings; it
involves translating intricate data into actionable insights comprehensible to non-
technical stakeholders. Effective communication entails:
Data storytelling: Crafting a compelling narrative using data to influence business
decisions.
Presentation skills: Proficiency in presentation tools and the capacity to visually and
verbally present data are imperative.
Interpersonal skills: Establishing relationships with team members and stakeholders is
pivotal for collaborative endeavors.

6. Problem-Solving Proficiency
Data analysts frequently encounter ambiguous problems necessitating analytical thinking.
Problem-solving in data analytics entails:
Critical thinking: Objectively analyzing and evaluating issues to form informed
judgments.
Analytical reasoning: Employing logical reasoning to address problems and make
decisions grounded in data.
Innovation: At times, conventional methods prove inadequate, and innovative thinking is
indispensable.
7. Attention to detail
According to a Gartner report, poor data quality can cost organizations an average
of $12.9 million every year, highlighting the importance of attention to detail.

In data analytics, even a small error can lead to incorrect conclusions. Attention to
detail is critical for:

I. Data cleaning. Ensuring the data you work with is accurate and free from errors.
II. Quality assurance. Double-checking your analyses and visualizations for
accuracy.
III. Documentation. Keeping thorough records of your data sources, methodologies,
and code.
8. Machine learning
Machine learning is an advanced form of data analysis that enables computers to
glean insights from data. As per a Gartner report, by 2022, 20% of analytic
applications had integrated machine learning, a figure expected to rise further given
the escalating significance of machine learning and artificial intelligence. Mastering
the fundamentals can greatly enhance your proficiency as a data analyst. These
fundamentals encompass:
a) - Supervised learning: Techniques used to construct models capable of making
predictions based on labeled data.
b) - Unsupervised learning: Approaches employed to identify patterns within
unlabeled data.
c) - Natural Language Processing (NLP): A specialized area that centers on the
interaction between computers and human language.
Comparing supervised and unsupervised learning - key data analyst skills

9. Advanced data technologies


i. As the volume and intricacy of data continue to expand, advanced data
technologies such as Hadoop and Spark are gaining significance. These
technologies enable users to engage in:
ii. Data storage. Managing extensive datasets that surpass the capabilities of
conventional databases.
iii. Data processing. Executing intricate calculations and analyses on large-scale data.
iv. Real-time analytics. Assessing data instantaneously to facilitate prompt business
decisions.
2. Which problems might a data analyst encounter when running an
analysis?

Solution
Data Quality Challenges: Substandard data quality, characterized by missing data,
inaccuracies, and inconsistencies, can result in unreliable outcomes and necessitate
thorough data cleansing.
Data Quantity: Managing substantial amounts of data can pose difficulties in terms of
storage, processing, and analysis.
Data Confidentiality and Protection: Guaranteeing data privacy and adhering to
regulations (e.g., GDPR) when handling sensitive information is crucial.
Data Accessibility: Restricted access to essential data sources or data stored in diverse
formats can impede analysis.
Data Preparation: Data must undergo transformation and preparation for analysis, a
process that can be time-consuming, particularly when intricate transformations are
involved.
Selection of Appropriate Tools: Choosing the right software and tools for analysis can
be overwhelming given the multitude of options available.
Selection of Suitable Analytical Methods: Determining the appropriate statistical or
machine learning techniques and parameters for a specific issue can be challenging.
Interpretation of Findings: It is imperative to comprehend and interpret results
accurately; misinterpretation can lead to erroneous conclusions.
Data Visualization: Crafting effective data visualizations that communicate insights
clearly can be complex.
Time Constraints: Adhering to deadlines and delivering results within a specified
time-frame can be stressful.
Stakeholder Engagement: Effectively conveying complex findings to non-technical
stakeholders can be demanding.
Model Over-fitting: In the realm of machine learning, over-fitting models to the
training data can result in poor generalization on new data.
Partiality and Equity: Ensuring equity and addressing bias in data and models is of
growing importance.
Resource Constraints: Limited computational resources may confine the extent and
intricacy of analysis.
Evolving Requirements: Requirements and objectives may change during the analysis
process, necessitating adaptability and flexibility.
3. What is prescriptive analytics?

Solution
Prescriptive analytics involves the utilization of sophisticated processes and tools to
scrutinize data and content in order to propose the most advantageous course of action
or strategy for the future. In essence, it aims to address the query, "What actions
should we take?"

Benefits
Ultimately, prescriptive analytics aids data analysts in making superior decisions
regarding their subsequent course of action. This may encompass any facet of their
business, such as enhancing revenue, diminishing customer attrition, thwarting fraud,
and boosting operational efficiency.

Emphasize data-driven, rather than instinctual, decisions. Through advanced


algorithms and machine learning, a specific course of action is suggested based on a
wide array of factors, including historical and current performance, available
resources, and probability-weighted projections and scenarios. This diminishes the
probability of human bias or fallacy.

Simplify intricate decisions. Prescriptive analysis models various scenarios and


furnishes the likelihood of diverse outcomes, spanning from immediate to long-term.
This not only facilitates comprehension of the particular recommendation from the
tool but also enables an understanding of the likelihood of the worst-case scenario,
allowing for its integration into your plans.

Concentrate on implementation as opposed to decision-making. Your organization is


likely inundated with data from a myriad of sources. The swift pace of contemporary
business necessitates quick action. Premier prescriptive analytics tools initially
disassemble data silos to evaluate an amalgamated data set and subsequently provide
instantaneous, precise recommendations for your organization.best course of action.
This allows you to focus your effort on executing the plan.

Prescriptive Analytics Illustrations


The subsequent examples illustrate various types of prescriptive insights generated by
sophisticated data analytics tools.

Financial Services
Mitigate risk through the automated analysis of credit risk and loan default
probabilities.

Health-care
Enhance patient care through forecasting patient admissions and readmissions.

Energy Utilities
Ensure consistent service delivery by forecasting peak demand cycles.

Retail Consumer
Optimize pricing strategies and marketing communication to boost customer retention
rates.

Life Sciences
Identify the most efficient and effective territory alignments.

Public Sector
Maximize investments in transportation infrastructure based on population density.

4. Explain the main steps involved in data validation.

Solution

Data validation is the process of ensuring the accuracy and quality of data. It plays an
essential role in tasks such as analytics, data science, machine learning, and data
migration initiatives. By integrating validation rules into the user's work-flow, data
becomes more consistent, functional, and valuable to users.

Data validation is a critical step in data analysis, guaranteeing the accuracy and
reliability of the data analyst's findings. Here is an overview of the primary steps
involved:

Establish Data Quality Standards


Set criteria of the available data. This involves specifying the format, range, and
completeness requirements for each data point. For example, an age field should only
accept positive integers within a reasonable range.

Data Cleansing and Filtering


Identify and resolve missing values, outliers (extreme values), and discrepancies.
Techniques such as data imputation (filling missing data) or data capping (limiting
outliers) may be employed. Filtering might be necessary to eliminate irrelevant data
points.

Data Type and Format Validation


Ensure that data conforms to the specified formats. This includes checking for
typographical errors in text data, validating correct date formats, and ensuring
numerical data falls within the expected range.

Cross-referencing and Uniqueness Validation


Compare data points within the dataset and against external sources (if applicable) to
detect inconsistencies. Search for duplicate entries and verify the proper functioning
of unique identifiers.

Business Rule Verification


Apply business-specific logic to validate data. For instance, in sales data, a customer
ID should correspond to a valid customer record.

Documentation and Reporting


Document the validation process and any identified errors. This aids in maintaining
transparency and allows for future reference or enhancement of validation methods.

Throughout these processes, data visualization tools can be beneficial in identifying


patterns and anomalies in the data. Data validation is an iterative process.

5. Explain the concept of outlier detection and how you would


identify outliers in a dataset.

Outlier Detection: Identifying Anomalies in Your Data

Outliers are data points that fall significantly outside the typical pattern exhibited by
the majority of the data. They can manifest as values that are exceptionally high or
low, and while they may offer valuable insights, they have the potential to distort
one's analysis if not managed with care. As a data analyst, I specialize in the
identification of outliers within your dataset.

Comprehending Outliers

Although there is no singular definition of an outlier, these data points generally


exhibit a noticeable deviation from the bulk of the data. This deviation can be
quantified through various means.

Deviation from the mean: If a data point deviates significantly from the average
(mean) compared to the majority of other points, it may be considered an outlier.

Standard deviation: This metric provides insight into the spread of the data. Data
points located several standard deviations away from the mean are often indicative of
outliers. (As a general guideline, values exceeding 3 standard deviations are
commonly classified as outliers).
Inter-quartile range (IQR): This encompasses the central 50% of your data. Data
points lying beyond the range defined by 1.5 times the IQR below the first quartile
(Q1) and 1.5 times the IQR above the third quartile (Q3) are deemed potential outliers.

Detection of Outliers in Acquired Data

Visualization of Data:

Histograms: These visual representations illustrate the frequency distribution of your


data. Data points situated far from the central peak of the distribution may signify
outliers.

Box-plots: These diagrams present the IQR and identify outliers as data points lying
beyond the whiskers (the lines extending from the box).

Scatter-plots: These graphs can uncover outliers as data points that are distantly
removed from the general trend of the data.

Statistical Approaches
Z-scores: These scores indicate how many standard deviations a data point is from the
mean. Elevated absolute z-scores are indicative of outliers.

IQR method: This approach utilizes the IQR, as previously described, to pinpoint
outliers.

6.What are the ethical considerations of data analysis?

Ethical Considerations in Data Analysis


Data analysis is a potent tool that must be handled with ethical considerations. Here
are some crucial ethical considerations for data analysts

Privacy: Upholding user privacy is of utmost importance. Ensure that you obtain
proper consent for data collection and anonymize data when necessary.

Bias: Data can exhibit bias, which can lead to discriminatory outcomes. Be vigilant of
potential biases in your data collection, analysis methods, and interpretations.

Transparency: Be transparent about the data collection, analysis, and utilization


processes. This fosters trust and enables users to comprehend the limitations of the
findings.

Fairness: Guarantee that your analysis is fair and does not disadvantage specific
groups. Challenge assumptions and be attentive to potential fairness issues.

Accountability: Data analysts are responsible for the outcomes of their work. Ensure
thorough documentation and be prepared to elucidate your methods and conclusions.
7. Analyse the practical steps that are involved in a Data Cleansing
Work-flow.

Data Cleansing Work-flow


Data cleansing readies raw data for analysis by identifying and rectifying errors,
inconsistencies, and missing values. Here is a breakdown of the practical steps:

Data Collection Assessment


Comprehend the data source, format, and potential quality issues.

Data Profiling
Examine the data to identify data types, value ranges, missing values, and outliers.

Data Cleaning Techniques


Missing Values: Determine imputation techniques (such as mean/median imputation
or more complex methods) for filling in missing values.
Inconsistent Formatting: Rectify typos, standardize date formats, and ensure
uniformity across data points.
Outliers: Detect and address outliers as necessary (through removal, capping, or
flagging).

Data Transformation
Modify data as needed for analysis (e.g., converting units, scaling).

Data Validation
Assess the effectiveness of cleaning techniques and ensure that data quality aligns
with your standards.

Data Documentation
Record the cleaning process, detailing identified issues and applied corrections.

8. Define data governance highlighting its goals, benefits and


challenges.

Data Governance
Data governance encompasses a structured framework that ensures the effective
management of data within an organization.

Objectives:
Data Quality: Uphold the integrity of data to facilitate accurate analysis and decision-
making.
Data Security: Protect data from unauthorized access, alteration, or loss.
Data Compliance: Ensure adherence to data privacy regulations.
Data Accessibility: Facilitate easy access to data for authorized users for analytical
purposes.

Advantages:
Enhanced decision-making: Reliable data fosters well-informed decision-making.
Increased efficiency: Decreased time spent resolving data quality issues.
Risk mitigation: Acts as a shield against data breaches and penalties for non-
compliance.

Challenges:
Implementation: The establishment and enforcement of data governance policies can
be complex.
Cultural Transformation: Organizations must foster a culture centered around data.
Technological Adaptation: Data governance must evolve to align with changing
technologies and regulations.

9. Identify all the steps that a Data Analyst can implement to


develop a simulation model.

Developing a Simulation Model


Simulation models are computer programs that replicate real-world systems. Here is
how a data analyst can create one:

Define the Problem


Identifying the scenario or process he intends to simulate.

Data Collection
Collect relevant historical data and pinpoint key variables.

Model Design
Select a simulation modeling technique (e.g., Monte Carlo simulation) and structure
the model.

Model Calibration
Parameterize the model using historical data and ensure it accurately mirrors the real
system.

Model Validation
Test the model with unseen data to validate its accuracy and predictive capabilities.

Scenario Analysis
Utilize the model to simulate various scenarios and forecast potential outcomes.

Model Refinement
Continuously monitor and enhance the model based on new data and feedback.

10.Illustrate the critical features of Power BI?

Power BI Features
Power BI is a business intelligence tool employed for data visualization and analysis.

Below are some of its crucial features

Data Connectivity: Links to diverse data sources (e.g., databases, cloud storage).
Interactive Dashboards: Craft visually appealing and interactive dashboards for data
exploration.

Data Modeling: Prepare and transform data for analysis.

Data Visualization: Provides an array of charts and graphs to depict data insights.

Self-Service Analytics: Empowers users to independently explore and analyze data.

Collaboration: Enables sharing and collaboration on dashboards and reports.

11.Can you name some data category types used in Power BI?

Power BI Data Category Types

Continuous: Numerical values within a range (e.g., temperature, sales figures).

Discrete: Numerical values representing distinct categories (e.g., customer ID,


product category).

Date/Time: Data indicating specific dates and times.


Hierarchical: Data organized in a parent-child relationship (e.g., geographic regions,
product categories with subcategories).

Text: Alphanumeric data (e.g., customer names, product descriptions).

Logical: Data with two possible values (e.g., True/False, Active/Inactive).


Assigning the correct data category type is crucial for precise analysis and
visualization in Power BI.

12.Which stages will you work through when using Power BI?

Stages of Working with Power BI


Here are the stages involved in utilizing Power BI:

Data Acquisition

Connect Power BI to your data sources and import the data.

Data Modeling
Clean, transform, and structure your data for analysis. This may include establishing
relationships between tables, defining data types, and creating calculated columns.

Data Visualization
Select suitable charts and graphs to represent your data insights. Power BI offers a
wide range of visualizations that can be customized for clarity and interactivity.
Dashboard Creation:

Design dashboards that amalgamate key visualizations and metrics to narrate a data
story. Layouts, filters, and slicers can enhance the user experience.
Sharing and Collaboration:

Share your dashboards and reports with colleagues for further analysis and discussion.
Power BI allows for collaborative editing and commenting.
Monitoring and Refreshing:

Regularly monitor your data and update reports as necessary. Power BI permits
scheduled data refreshes to ensure that insights remain current.

You might also like