FIT1006Asst2 vn0b
FIT1006Asst2 vn0b
FIT1006Asst2 vn0b
This assignment is worth 40% of your final mark (subject to the hurdles described in the
FIT1006 handbook entry, FIT1006 Moodle preview [or Unit Guide] and links therein).
Among other things (see below), note the need to hit the `Submit’ button (and the possible
requirement of an interview).
Note 2: And a reminder not to post even part of a proposed partial solution to a forum or
other public location. This includes when you are seeking clarification of a question.
If you seek clarification on an Assignment question then – bearing in mind the above – word
your question very carefully and/or (if necessary) send private e-mail. If you are seeking to
understand a concept better, then try to word your question so that it is a long way removed
from the Assignment. You are reminded that this is probably the best path to a faster and
clearer answer (in addition to consultation sessions) without (e.g.) removal of your post. You
are also reminded that Monash University takes academic integrity very seriously.
Note 4: As a general rule, don’t just give a number or an answer like `Yes’ or `No’ without
at least some clear and sufficient explanation - or, otherwise, you risk being awarded 0
marks for the relevant exercise. Make it easy for the person/people marking your work to
follow your reasoning. Without clear explanation, there is the possibility that any such
exercise will be awarded 0 marks.
Re-iterating a point above, for each and every question, sub-question and exercise, clearly
state any assumptions, clearly explain your answer and clearly show any working.
Note 5: All of your submitted work should be in machine readable form, and none of your
submitted work should be hand-written.
Note 6: If you wish for your work to be marked and not to accrue (possibly considerable)
late penalties, then make sure to upload the correct files and (not to leave your files
as Draft). You then need to determine whether you have all files uploaded and that you are
ready to hit `Submit’. Once you hit `Submit’, you give consent for us to begin marking your
work. If you hit `Submit’ without all files uploaded then you will probably be deemed not to
have followed the instructions from the Notes above. If you leave your work as Draft and
have not hit `Submit’ then we have not received it, and it can accrue late penalties once the
deadline passes. In short, make sure to hit ‘Submit’ at the appropriate time to make sure that
your work is submitted. Late penalties will be as per Monash University Faculty of IT and
Monash University policies (see, e.g.,
https://publicpolicydms.monash.edu/Monash/documents/1935752) and, e.g., sec. 1.11). It is
expected that any work submitted at least 10 calendar days after the deadline will
automatically be given a mark of 0.
Above is an introduction.
----
----
Qu 1 (8 + 8 + 8 + 4 = 28 marks)
Throughout this and all questions, clearly state any assumptions, show all working and
explain all your reasoning.
A company is interested in estimating how many of its employees are fit for work. To
simplify the question, we assume that each test costs no money and is free.
For people with C1, if they do test T1 then the probability of a positive test result is q.
For people without C1, if they do test T1 then the probability of a positive test result is r.
a) If someone tests positive for C1 from test T1 then what is the probability that they have
C1?
b) If someone tests negative for C1 from test T1 then what is the probability that they have
C1?
We now modify matters so that some people who test positive for C1 from test T1 are
permitted to do test T1 again.
As before, for people with C1, if they do test T1 then the probability of a positive test result is
q.
As before, for people without C1, if they do test T1 then the probability of a positive test
result is r.
c) If someone tested positive in the first test T1 for C1 and now tests positive again for C1
from test T1 then what is the probability that they have C1?
d) If someone tested positive in the first test T1 for C1 but now tests negative for C1 from
test T1 then what is the probability that they have C1?
----
Qu 2 (6 + 4 = 10 marks)
Throughout this and all questions, clearly state any assumptions, show all working and
explain all your reasoning.
Prior to learning these 100 data points, someone hypothesised that half the reports to the SCU
are online - or, more specifically, that the difference from 0.5 is not statistically significant.
Clearly stating any assumptions and showing all working, what is your opinion of this
hypothesis?
(b) In this artificial data, 85% of reports to SCU are resolved within 40 working days.
Assume that the rate at which reports are resolved is related to a Poisson distribution (with a
certain rate per day). Putting the first sentence to this question another way, assume that
there is a probability of 0.85 that a report will have resolved - and reach resolution - in 40
days. Estimate the Poisson rate at which reports are resolved.
P.S. to Question 2: While the SCU and Student Academic Success do indeed both exist (and
with above web links), we again emphasise that the above data is artificial.
----
----
Qu 3 (7 marks)
The Reserve Bank of Australia (RBA) keeps all sorts of data, some of which is on exchange
rates https://www.RBA.gov.au/statistics/historical-data.html#exchange-rates. Monash
University has several campuses in Australia and also several campuses outside of Australia,
including Monash Malaysia. There is active interaction, collaboration and travel between
these campuses. In order to make plans, various people have accessed
https://www.RBA.gov.au/statistics/tables/xls-hist/2023-current.xls and column N, concerning
MYR (the Malaysian ringgit). Someone has looked at the data from 17-Apr-2023 (where the
exchange rate is 2.9674) to (252 rows and 1 year later) 17-Apr-2024 (where the exchange rate
is 3.0714). They have then taken the daily differences. For example, the 1st daily difference
(value at 18-Apr-2023 minus value at 17-Apr-2023) is 2.9884 - 2.9674 = 0.0210. And, for
example, the last daily difference is (value at 17-Apr-2024 minus value at 16-Apr-2024)
3.0714 - 3.0806 = -0.0092.
We only consider days which have matching rows in the spreadsheet for column N and MYR
(Malaysian ringgit) vs Australian dollar (AUD).
Based on this data, someone has hypothesised that, during this period, the daily difference
between the Australian dollar (AUD) and the Malaysian ringgit (MYR) has been zero - or,
more specifically, that the difference from 0 is not statistically significant.
Clearly stating any assumptions and showing all working, what is your opinion of this
hypothesis?
----
----
Qu 4 (7 + 7 + 5 + 3 + 3 = 25 marks)
Much money is spent by various companies (at least one of which is high profile) and other
organisations (e.g., NASA, etc.) on space exploration.
The Challenger Space Shuttle Disaster was mentioned in Lecture 4, approximately slides
118-120. Seven astronauts died within approximately 73 seconds of launch. Relevant data
(even if perhaps seemingly different in places from that given in Lecture 4) is given (not at
http://wps.aw.com/wps/media/objects/15/15719/projects/ch5_challenger/index.html but
rather) at https://www.archive.ics.uci.edu/dataset/92/challenger+usa+space+shuttle+o+ring.
Following DOWNLOAD (in the top right-hand corner) at that
https://www.archive.ics.uci.edu/dataset/92/challenger+usa+space+shuttle+o+ring link, we
obtain o-ring-erosion.names and o-ring-erosion-only.data (and we will ignore the other files).
In describing the contents of the file o-ring-erosion-only.data, the file o-ring-erosion.names
mentions
``6. Number of Attributes: 5
1. Number of O-rings at risk on a given flight
2. Number experiencing thermal distress
3. Launch temperature (degrees F)
4. Leak-check pressure (psi)
5. Temporal order of flight’’
Throughout this and all questions, clearly state any assumptions, show all working and
explain all your reasoning.
(a) Split the data from o-ring-erosion-only.data into two roughly equal-sized groups,
based on the launch temperature (degrees F, where we recall that 32 degrees F is the
temperature at which water freezes, equivalent to 0 degrees C).
For these two groups, consider (attribute 2) the number of O-rings experiencing thermal
distress. Clearly stating any assumptions, make a case as to whether or not the two groups
come from the same distribution for attribute 2.
(b) For the two groups from part (a), consider (attribute 4) the leak-check
pressure. Clearly stating any assumptions, make a case as to whether or not the two
groups come from the same distribution for attribute 4.
(c) Split the data from o-ring-erosion-only.data into two roughly equal-sized groups (as
much as is possible), based on (attribute 4) the leak-check pressure.
For these two groups, consider (attribute 2) the number of O-rings experiencing thermal
distress. Clearly stating any assumptions, make a case as to whether or not the two groups
come from the same distribution for attribute 2.
(d) Let us return to part (a). On the ill-fated launch day of 28/January/1986, the
temperature was below freezing (see, e.g., Wikipedia). Based on your analysis in part
(a), how many O-rings would you expect to experience – or have experienced -
thermal distress on 28/January/1986?
(e) Let us return now to part (b). As above, on the ill-fated launch day of
28/January/1986, the temperature was below freezing (see, e.g., Wikipedia). Based
on your analysis in part (b), what would you expect to be – or to have been - the leak-
check pressure on 28/January/1986?
----
—-
Qu 5 (5 + 5 + 5 + 5 + 10 = 30 marks)
This question continues from your Assignment 1. You are required to use your data from
Assignment 1. To make it easier for your marker, include the relevant data from (e.g.) your
Assignment 1 Qu 2.
For many world events, we wish to estimate the probability as accurately as possible. This
can often be done by an approach called the wisdom of the crowd, in which we combine the
opinions of various people to try to arrive at a combined weighted average opinion. The
various parts to this question concern the best way to combine models of probabilities (in this
case, coming from two different entrants) to get a more reliable probability.
Throughout this and all questions, clearly state any assumptions, show all working and
explain all your reasoning.
For your Player1 and Player2 from Assignment 1, recalling your Assignment 1 Qu 2-5,
(b) calculate the Pearson correlation coefficient on their `This Round' scores.
For your Player1 and Player2 from Assignment 1, recalling your Assignment 1 Qu 6-8,
consider the allocated team - Melbourne, Geelong or Richmond, and the associated p.
For clarity and the avoidance of doubt, the term p being used here is not the p-value from
classical hypothesis testing. Rather, it is the probability chosen by the competition entrant,
as in Assignment 1 Qu 6-8.
(c) calculate the Q-correlation between Player1 and Player2 on their values of p
(d) calculate the Pearson correlation coefficient between Player1 and Player2 on their
values of p.
For this hypothesis test in part (e), try to choose a non-trivial hypothesis involving Player1
(and possibly also involving Player2) that seems - at least on the surface - at least partly
plausible.
This could concern (e.g.) some property of Player1 from Assignment 1 Qu 2-3 and/or Qu 6,
this could concern (e.g.) from Assignment 1 Qu 2-5 and/or Qu 6-8, the difference in values
between some property of Player1 and the values of that property of Player2.
Above at points (i), (ii), (iii), (iv) and (v) are some examples of the sort of hypothesis that you
might consider for part (e) of this question.
Again, you are asked to find a suitable non-trivial hypothesis involving an earlier part of this
question, and then suitably analyse and test the hypothesis.
Marks will be given here and throughout for clear explanation based on your documentation.
----
Please recall and carefully re-read all notes and instructions at the start of the Assignment and
throughout the Assignment.