Misheck Mlambo n02217292f Data Analytics Test 2
Misheck Mlambo n02217292f Data Analytics Test 2
Misheck Mlambo n02217292f Data Analytics Test 2
AND TECHNOLOGY
FACULTY OF APPLIED SCIENCE
DEPARTMENT OF INFORMATICS AND
ANALYSIS
DATA ANALYTICS SCI2206
Solution
Data analysts act as the crucial link between unprocessed data and actionable insights.
Their responsibilities include collecting, processing, and examining data to assist
organizations in making well-informed decisions. Data analysts frequently collaborate
with various departments, ranging from marketing to finance, to comprehend business
objectives and translate them into data-driven solutions.
6. Problem-Solving Proficiency
Data analysts frequently encounter ambiguous problems necessitating analytical thinking.
Problem-solving in data analytics entails:
Critical thinking: Objectively analyzing and evaluating issues to form informed
judgments.
Analytical reasoning: Employing logical reasoning to address problems and make
decisions grounded in data.
Innovation: At times, conventional methods prove inadequate, and innovative thinking is
indispensable.
7. Attention to detail
According to a Gartner report, poor data quality can cost organizations an average
of $12.9 million every year, highlighting the importance of attention to detail.
In data analytics, even a small error can lead to incorrect conclusions. Attention to
detail is critical for:
I. Data cleaning. Ensuring the data you work with is accurate and free from errors.
II. Quality assurance. Double-checking your analyses and visualizations for
accuracy.
III. Documentation. Keeping thorough records of your data sources, methodologies,
and code.
8. Machine learning
Machine learning is an advanced form of data analysis that enables computers to
glean insights from data. As per a Gartner report, by 2022, 20% of analytic
applications had integrated machine learning, a figure expected to rise further given
the escalating significance of machine learning and artificial intelligence. Mastering
the fundamentals can greatly enhance your proficiency as a data analyst. These
fundamentals encompass:
a) - Supervised learning: Techniques used to construct models capable of making
predictions based on labeled data.
b) - Unsupervised learning: Approaches employed to identify patterns within
unlabeled data.
c) - Natural Language Processing (NLP): A specialized area that centers on the
interaction between computers and human language.
Comparing supervised and unsupervised learning - key data analyst skills
Solution
Data Quality Challenges: Substandard data quality, characterized by missing data,
inaccuracies, and inconsistencies, can result in unreliable outcomes and necessitate
thorough data cleansing.
Data Quantity: Managing substantial amounts of data can pose difficulties in terms of
storage, processing, and analysis.
Data Confidentiality and Protection: Guaranteeing data privacy and adhering to
regulations (e.g., GDPR) when handling sensitive information is crucial.
Data Accessibility: Restricted access to essential data sources or data stored in diverse
formats can impede analysis.
Data Preparation: Data must undergo transformation and preparation for analysis, a
process that can be time-consuming, particularly when intricate transformations are
involved.
Selection of Appropriate Tools: Choosing the right software and tools for analysis can
be overwhelming given the multitude of options available.
Selection of Suitable Analytical Methods: Determining the appropriate statistical or
machine learning techniques and parameters for a specific issue can be challenging.
Interpretation of Findings: It is imperative to comprehend and interpret results
accurately; misinterpretation can lead to erroneous conclusions.
Data Visualization: Crafting effective data visualizations that communicate insights
clearly can be complex.
Time Constraints: Adhering to deadlines and delivering results within a specified
time-frame can be stressful.
Stakeholder Engagement: Effectively conveying complex findings to non-technical
stakeholders can be demanding.
Model Over-fitting: In the realm of machine learning, over-fitting models to the
training data can result in poor generalization on new data.
Partiality and Equity: Ensuring equity and addressing bias in data and models is of
growing importance.
Resource Constraints: Limited computational resources may confine the extent and
intricacy of analysis.
Evolving Requirements: Requirements and objectives may change during the analysis
process, necessitating adaptability and flexibility.
3. What is prescriptive analytics?
Solution
Prescriptive analytics involves the utilization of sophisticated processes and tools to
scrutinize data and content in order to propose the most advantageous course of action
or strategy for the future. In essence, it aims to address the query, "What actions
should we take?"
Benefits
Ultimately, prescriptive analytics aids data analysts in making superior decisions
regarding their subsequent course of action. This may encompass any facet of their
business, such as enhancing revenue, diminishing customer attrition, thwarting fraud,
and boosting operational efficiency.
Financial Services
Mitigate risk through the automated analysis of credit risk and loan default
probabilities.
Health-care
Enhance patient care through forecasting patient admissions and readmissions.
Energy Utilities
Ensure consistent service delivery by forecasting peak demand cycles.
Retail Consumer
Optimize pricing strategies and marketing communication to boost customer retention
rates.
Life Sciences
Identify the most efficient and effective territory alignments.
Public Sector
Maximize investments in transportation infrastructure based on population density.
Solution
Data validation is the process of ensuring the accuracy and quality of data. It plays an
essential role in tasks such as analytics, data science, machine learning, and data
migration initiatives. By integrating validation rules into the user's work-flow, data
becomes more consistent, functional, and valuable to users.
Data validation is a critical step in data analysis, guaranteeing the accuracy and
reliability of the data analyst's findings. Here is an overview of the primary steps
involved:
Outliers are data points that fall significantly outside the typical pattern exhibited by
the majority of the data. They can manifest as values that are exceptionally high or
low, and while they may offer valuable insights, they have the potential to distort
one's analysis if not managed with care. As a data analyst, I specialize in the
identification of outliers within your dataset.
Comprehending Outliers
Deviation from the mean: If a data point deviates significantly from the average
(mean) compared to the majority of other points, it may be considered an outlier.
Standard deviation: This metric provides insight into the spread of the data. Data
points located several standard deviations away from the mean are often indicative of
outliers. (As a general guideline, values exceeding 3 standard deviations are
commonly classified as outliers).
Inter-quartile range (IQR): This encompasses the central 50% of your data. Data
points lying beyond the range defined by 1.5 times the IQR below the first quartile
(Q1) and 1.5 times the IQR above the third quartile (Q3) are deemed potential outliers.
Visualization of Data:
Box-plots: These diagrams present the IQR and identify outliers as data points lying
beyond the whiskers (the lines extending from the box).
Scatter-plots: These graphs can uncover outliers as data points that are distantly
removed from the general trend of the data.
Statistical Approaches
Z-scores: These scores indicate how many standard deviations a data point is from the
mean. Elevated absolute z-scores are indicative of outliers.
IQR method: This approach utilizes the IQR, as previously described, to pinpoint
outliers.
Privacy: Upholding user privacy is of utmost importance. Ensure that you obtain
proper consent for data collection and anonymize data when necessary.
Bias: Data can exhibit bias, which can lead to discriminatory outcomes. Be vigilant of
potential biases in your data collection, analysis methods, and interpretations.
Fairness: Guarantee that your analysis is fair and does not disadvantage specific
groups. Challenge assumptions and be attentive to potential fairness issues.
Accountability: Data analysts are responsible for the outcomes of their work. Ensure
thorough documentation and be prepared to elucidate your methods and conclusions.
7. Analyse the practical steps that are involved in a Data Cleansing
Work-flow.
Data Profiling
Examine the data to identify data types, value ranges, missing values, and outliers.
Data Transformation
Modify data as needed for analysis (e.g., converting units, scaling).
Data Validation
Assess the effectiveness of cleaning techniques and ensure that data quality aligns
with your standards.
Data Documentation
Record the cleaning process, detailing identified issues and applied corrections.
Data Governance
Data governance encompasses a structured framework that ensures the effective
management of data within an organization.
Objectives:
Data Quality: Uphold the integrity of data to facilitate accurate analysis and decision-
making.
Data Security: Protect data from unauthorized access, alteration, or loss.
Data Compliance: Ensure adherence to data privacy regulations.
Data Accessibility: Facilitate easy access to data for authorized users for analytical
purposes.
Advantages:
Enhanced decision-making: Reliable data fosters well-informed decision-making.
Increased efficiency: Decreased time spent resolving data quality issues.
Risk mitigation: Acts as a shield against data breaches and penalties for non-
compliance.
Challenges:
Implementation: The establishment and enforcement of data governance policies can
be complex.
Cultural Transformation: Organizations must foster a culture centered around data.
Technological Adaptation: Data governance must evolve to align with changing
technologies and regulations.
Data Collection
Collect relevant historical data and pinpoint key variables.
Model Design
Select a simulation modeling technique (e.g., Monte Carlo simulation) and structure
the model.
Model Calibration
Parameterize the model using historical data and ensure it accurately mirrors the real
system.
Model Validation
Test the model with unseen data to validate its accuracy and predictive capabilities.
Scenario Analysis
Utilize the model to simulate various scenarios and forecast potential outcomes.
Model Refinement
Continuously monitor and enhance the model based on new data and feedback.
Power BI Features
Power BI is a business intelligence tool employed for data visualization and analysis.
Data Connectivity: Links to diverse data sources (e.g., databases, cloud storage).
Interactive Dashboards: Craft visually appealing and interactive dashboards for data
exploration.
Data Visualization: Provides an array of charts and graphs to depict data insights.
11.Can you name some data category types used in Power BI?
12.Which stages will you work through when using Power BI?
Data Acquisition
Data Modeling
Clean, transform, and structure your data for analysis. This may include establishing
relationships between tables, defining data types, and creating calculated columns.
Data Visualization
Select suitable charts and graphs to represent your data insights. Power BI offers a
wide range of visualizations that can be customized for clarity and interactivity.
Dashboard Creation:
Design dashboards that amalgamate key visualizations and metrics to narrate a data
story. Layouts, filters, and slicers can enhance the user experience.
Sharing and Collaboration:
Share your dashboards and reports with colleagues for further analysis and discussion.
Power BI allows for collaborative editing and commenting.
Monitoring and Refreshing:
Regularly monitor your data and update reports as necessary. Power BI permits
scheduled data refreshes to ensure that insights remain current.