Bayes Ejplo

is p=1, the probability of drawing a diamond, like so: p=1-0.25=0.75.
You get the

complement of a probability when you subtract a probability from 1.
Multiplication helps you compute the intersection of independent events. Inde-

pendent events
a game of dice and you throw two dice, the probability of getting two sixes is 1/6
of getting six from the second die), which is p=1/6 * 1/6=0.028. This means that if
you throw the dice one hundred times, you can expect two sixes to come up only
two or three times. Note that you can use simple math in Python to perform these
sorts of calculations (just make sure you use parentheses to ensure order of calcu-
lation as needed): Sixes = (1/6) * (1/6).
most complex situations dealing with events. For instance, you can now compute
the probability getting at least a six from two thrown dice, which is a summation
of mutually exclusive events:
» The probability of having a two sixes: Sixes = (1/6) * (1/6)
»
on the second one: SixAndOther = (1/6) * (1 - (1/6))
» The probability of having a six on the second die and something other than a
OtherAndSix = (1/6) * (1 - (1/6))
Your probability of getting at least one six from two thrown dice is OneSix =
(1/6)*[1/6 + (1 – 1/6) + (1 – 1/6)] = (1/6)*[1/6 + 5/6 + 5/6] = (1/6)
[11/6] = 11/36
and the chance of getting a six from the second die is 1/6. Because these two
Conditioning chance by Bayes’ theorem

Probability makes sense in terms of time and space, but some other conditions
estimate the probability of an event, you may (sometimes wrongly) tend to believe
that you can apply the probability you calculated to each possible situation. The
term to express this belief is a priori probability, meaning the general probability of
an event.
For example, when you toss a coin, if the coin is fair, the a priori probability of a
head is 50 percent. No matter how many times you toss the coin, when faced with
a new toss the probability for heads is still 50 percent.
132 PART 3 Getting Started with the Math Basics

The coin toss is an example of a situation in which the true probability isn’t cal-
culated. If the toss is truly fair, there is a chance, albeit an incredibly small one,
probability calculation, you must consider whether to include conditions of this

sort to obtain a more precise answer.
a pri-
ori probability is not valid anymore because something subtle happened and
changed it. In this case, you can express this belief as an a posteriori probability,
which is the a priori probability after something happened to modify the count. For
instance, the a priori probability of a person’s being female is roughly about
there are more females than males. As another example related to gender, if you
examine the presence of women in certain faculties at a university, you notice that
these two contexts, the a posteriori a pri-

ori one. In terms of gender distribution, nature and culture can both create a dif-
ferent a posteriori probability.
You can view such a case as a conditional probability, and express it as p(y|x), which
is read as the probability of event y happening given that x has happened. Conditional
probabilities are a very powerful tool for machine learning. In fact, if the a priori
probability can change so much because of certain circumstances, knowing the
possible circumstances can boost your chances of correctly predicting an event by
example, as previously mentioned, generally the expectation of a random per-

son’s being a male or a female is 50 percent. But what if you add the evidence that
the person’s hair is long or short? You can estimate the probability of having long
hair as being 35 percent of the population; yet, if you observe only the female
population, the probability rises to 60 percent. If the percentage is so high in the
female population, contrary to the a priori probability, a machine learning algo-
In fact, the Naïve Bayes algorithm can boost the chance of making a correct pre-
diction by knowing the circumstances surrounding the prediction, as explained in
-
erend Bayes and his revolutionary theorem of probabilities. In fact, one of the
-
tions for the development of advanced algorithms based on Bayesian probability;
MIT’s Technology Review magazine mentioned Bayesian machine learning as an
emerging technology that will change our world (http://www2.technology
review.com/news/402435/10-emerging-technologies-that-will-change-
your/). Yet, the foundations of the theorem aren’t all that complicated (although
CHAPTER 7 Demystifying the Math Behind Machine Learning 133

they may be a bit counterintuitive if you normally consider just prior probabilities
without considering posterior ones).
Reverend Thomas Bayes was a statistician and a philosopher who formulated his
published while he was alive. Its publication revolutionized the theory of proba-
bility by introducing the idea of conditional probability just mentioned.
Thanks to Bayes’ theorem, predicting the probability of a person’s being male or

female becomes easier if the evidence is that the person has long hair. The for-
mula used by Thomas Bayes is quite useful:
P(B|E) = P(E|B)*P(B) / P(E)
Reading the formula using the previous example as input can provide a better
understanding of an otherwise counterintuitive formula:
» P(B|E): The probability of a belief (B) given a set of evidence (E) (posterior
probability). Read “belief” as an alternative way to express a hypothesis. In this
case, the hypothesis is that a person is a female and the evidence is long hair.
Knowing the probability of such a belief given evidence can help to predict the
» P(E|B): The probability of having long hair when the person is a female. This
term refers to the probability of the evidence in the subgroup, which is itself a
» P(B): The general probability of being a female; that is, the a priori probability
of the belief. In this case, the probability is 50 percent, or a value of 0.5
(likelihood).
» P(E): The general probability of having long hair. Here it is another a priori
probability, this time related to the observed evidence. In this formula, it is a
35 percent probability, which is a value of 0.35 (evidence).
If you solve the previous problem of determining gender using the Bayes formula
and the values you have singled out, the result is Female = 0.6 * 0.5 / 0.35 =
0.857
given such evidence, the person is probably a female.
Another common example, which can raise some eyebrows and is routinely found
quite interesting for a better understanding of how prior and posterior probabili-
134 PART 3 Getting Started with the Math Basics

Say that you’re worried that you have a rare disease experienced by 1 percent of
the population. You take the test and the results are positive. Medical tests are
never perfectly accurate, and the laboratory tells you that when you are ill, the test
is positive in 99 percent of the cases, whereas when you are healthy, the test will
be negative in 99 percent of the cases.
-
rem are as follows:
» 0.99 as P(E|B)
» 0.01 as P(B)
» 0.01 * 0.99 0.99 *0.01 = 0.0198 as P(E)
The calculations are then IsIll = 0.99 * 0.01 / ((0.01 * 0.99) + (0.99 *
0.01)) = 0.50, which corresponds to just a 50 percent probability that you’re ill.
In the end, your chances of not being ill are more than you expected. You may
wonder how this is possible. The fact is that the number of people seeing a posi-
tive response from the test is as follows:
» Who is ill and gets the correct answer from the test: This group is the true
positives, and it amounts to 99 percent of the 1 percent of the population who
gets the illness.
» Who isn’t ill and gets the wrong answer from the test: This group is the
1 percent of the 99 percent of the population who gets a positive response
even though they aren’t ill. Again, this is a multiplication of 99 percent and
1 percent. This group corresponds to the false positives.
If you look at the problem using this perspective, it becomes evident why, when
limiting the context to people who get a positive response to the test, the proba-
bility of being in the group of the true positives is the same as that of being in the
false positives.
Describing the Use of Statistics

As a concluding topic related to probability, it’s important to skim through some
basic statistical concepts related to probability and statistics and understand how
they can better help you describe the information used by machine learning
CHAPTER 7 Demystifying the Math Behind Machine Learning 135

Bayes Ejplo

Uploaded by

Copyright:

Available Formats

Bayes Ejplo

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Bayes Ejplo

Uploaded by

Copyright:

Available Formats

is p=1, the probability of drawing a diamond, like so: p=1-0.25=0.75.

You get the

Multiplication helps you compute the intersection of independent events. Inde-

» The probability of having a two sixes: Sixes = (1/6) * (1/6)

Conditioning chance by Bayes’ theorem

132 PART 3 Getting Started with the Math Basics

probability calculation, you must consider whether to include conditions of this

these two contexts, the a posteriori a pri-

example, as previously mentioned, generally the expectation of a random per-

CHAPTER 7 Demystifying the Math Behind Machine Learning 133

Thanks to Bayes’ theorem, predicting the probability of a person’s being male or

P(B|E) = P(E|B)*P(B) / P(E)

134 PART 3 Getting Started with the Math Basics

» 0.01 * 0.99 0.99 *0.01 = 0.0198 as P(E)

Describing the Use of Statistics

CHAPTER 7 Demystifying the Math Behind Machine Learning 135

You might also like