Capstone Project
Capstone Project
Capstone Project
Find if there is a pattern as it is the premise that underlies all ML disciplines. If there is
no pattern, then the problem cannot be solved with AI technology.
These techniques are used to answer five types of questions, all falling under umbrella
of predictive analysis:
1) Which category? (Classification)
2) How much or how many? (Regression)
3) Which group? (Clustering)
4) Is this unusual? (Anomaly Detection)
5) Which option should be taken? (Recommendation)
Determine which of these questions you’re asking, and how answering it helps you
solve your problem.
To accomplish real computational tasks, you need to break down the problem into
smaller units before coding.
Imagine that you want to create your first app. This is a complex problem. How would
you
decompose the task of creating an app?
To decompose this task, you would need to know the answer to a series of smaller
problems:
what kind of app you want to create?
what will your app will look like?
who is the target audience for your app?
what will the graphics will look like?
what audio will you include?
what software will you use to build your app?
how will the user navigate your app?
how will you test your app?
This list has broken down the complex problem of creating an app into much simpler
problems that can now be worked out.
3.Analytic Approach
Models are built to predict outcomes or discover underlying patterns, all to
gain insights leading to actions that will improve future outcomes. It is the
‘ Foundational Methodology of Data Science’
It has 10 stages -
Every project starts with business understanding, which lays the foundation for
successful resolution of the business problem. In this stage by defining the problem,
project objectives and solution requirements from a business perspective.
Then an analytic approach is defined to solve the problem. It involves expressing the
problem in the context of statistical and machine learning techniques so that suitable
techniques can be identified which for achieving the desired outcome.
Selecting the right analytic approach depends on the question being asked.
If the question is to determine probabilities of an action, then a predictive model
might be used.
If the question is to show relationships, a descriptive approach maybe be required.
Statistical analysis applies to problems that require counts
f the question requires a yes/ no answer, then a classification approach to predicting a
response would be suitable.
4. Data Requirement
Define the data requirements for decision-tree classification.
Identify the necessary data content, formats and sources for initial data collection.
Data requirements are revised and it is decided whether less/more data is needed.
Data scientists will have a good understanding of what they will be working with.
Techniques such as descriptive statistics and visualization can be applied to the data set,
to assess the content, quality, and initial insights about the data.
Gaps in data will be identified and it will either be filled or replaced.
5. Modeling Approach
Data Modeling focuses on developing models that are either descriptive or predictive.
A descriptive model might examine things
A predictive model tries to yield yes/no, or stop/go type outcomes.
Here, Training set is used for predictive modelling which acts as a gauge to determine if
the model needs to be calibrated.
A training set is a set of historical data in which the outcomes are already known.
The success of data compilation, preparation and modelling, depends on the
understanding of the problem and choosing appropriate analytical approach.
Constant refinement, adjustments and tweaking are necessary to ensure a solid
outcome.
The framework does 3 things:
Understand the question.
Select an analytic approach or method to solve the problem.
Obtain, understand, prepare, and model the data.
The end goal is to build a model to answer the question.
Split percentage is chosen w.r.t your project’s objectives with considerations that
includes:
Computational cost in training the model.
Computational cost in evaluating the model.
Training set representativeness.
Test set representativeness.
Conclusion
Using cross-validation gave us much better measures of model quality, with the added
benefit of cleaning up our code.
Loss functions:
It is the group of functions that are minimized
It is a measure of how good a prediction model does in terms of being able to predict
the expected outcome.
It can be categorized into 2 types: Classification Loss and Regression Loss.
Mathematically:
Graphically:
The red dots are the actual values and the
blue line is the set of predicted values drawn
by our model.
X represents the distance between the actual
value and the predicted line. It represents
the error.
Taking mean of all those distances and
squaring them and finally taking the root will
give us RMSE of our model.
A good model should have an RMSE value less than 180.
If you have a higher RMSE value, you either need to change your feature or tweak your
hyperparameters.