060 Techniques of Data Analysis
060 Techniques of Data Analysis
060 Techniques of Data Analysis
Director
Centre for Real Estate Studies
Faculty of Engineering and Geoinformation Science
Universiti Tekbnologi Malaysia
Skudai, Johor
Objectives
Specific:
* Concepts of data analysis
* Some data analysis techniques
* Some tips for data analysis
No. of houses
250,000
150000
200,000 1991
100000 150,000 2000
100,000
50000
50,000
0
1 2 3 4 5 6 7 8
0
t u g gi g r n t
h a ahr uan in g sin ua ntia ma
Loa n t o prope rt y se c t or (RM 32635.8 38100.6 42468.1 47684.7 48408.2 61433.6 77255.7 97810.1
million)
a
P B Kl T er M o
P Seg
a
tu hor
De ma nd for shop shouse s (unit s) 71719 73892 85843 95916 101107 117857 134864 86323
S upply of shop house s (unit s) 85534 85821 90366 101508 111952 125334 143530 154179
a ota M
Year (1990 - 1997) B Jo K
Trends in property loan, shop house dem and & supply District
200
14
180
12
10 160
8
140
6
4
120
2
0 100
4 4 4 4 4 4 4 4
0- 0 -1 0 -2 0 -3 0 -4 0 -5 0 -6 0 -7 80
1 2 3 4 5 6 7
0 20 40 60 80 100 120
Age Category (Years Old)
Demand (% sales success)
Examples of “abstraction” of phenomena
200
50.00
180 %
prediction
Dep=7t – 192.6
Coefficientsa
Unstandardized Standardized
Coefficients Coefficients
Model B Std. Error Beta t Sig.
1 (Constant) 1993.108 239.632 8.317 .000
Tanah -4.472 1.199 -.190 -3.728 .000
Bangunan 6.938 .619 .705 11.209 .000
Ansilari 4.393 1.807 .139 2.431 .017
Umur -27.893 6.108 -.241 -4.567 .000
Flo_go 34.895 89.440 .020 .390 .697
a. Dependent Variable: Nilaism
Which one to use?
Nature of research
* Descriptive in nature?
* Attempts to “infer”, “predict”, find “cause-and-effect”,
“influence”, “relationship”?
* Is it both?
Research design (incl. variables involved). E.g.
Outputs/results expected
* research issue
* research questions
* research hypotheses
“Effects” of KLIA on the development of Likert scaling based on Descriptive analysis based
Sepang interviews on ex-ante post-ante
experimental investigation
Note: No way can Likert scaling show “cause-and-effect” phenomena!
Basic concepts
Central tendency
Variability
Probability
Statistical Modelling
Basic Concepts
Population: the whole set of a “universe”
Sample: a sub-set of a population
Parameter: an unknown “fixed” value of population characteristic
Statistic: a known/calculable value of sample characteristic
representing that of the population. E.g.
μ = mean of population, = mean of sample
Thus, = 96/12 = 8
Central Tendency–“Mean of Grouped Data”
House rental or prices in the PMR are frequently
tabulated as a range of values. E.g.
x^
“Probability Distribution”
Defined as of probability density function (pdf).
Many types: Z, t, F, gamma, etc.
“God-given” nature of the real world event.
General form: (continuous)
(discrete)
E.g.
“Probability Distribution” (contd.)
Dice1
Dice2 1 2 3 4 5 6
1 2 3 4 5 6 7
2 3 4 5 6 7 8
3 4 5 6 7 8 9
4 5 6 7 8 9 10
5 6 7 8 9 10 11
6 7 8 9 10 11 12
“Probability Distribution” (contd.)
Mean = 4.0628
Std. Dev. = 1.70319
N = 32
0
2.00 3.00 4.00 5.00 6.00 7.00
Rental (RM/ sq.ft.)
“Probability Distribution” (contd.)
* Bell-shaped, symmetrical
μ = mean of variable x
σ = std. dev. Of x
* Has a function of
π = ratio of circumference of a
circle to its diameter = 3.14
e = base of natural log = 2.71828
“Probability distribution”
Note: p(AGE=age) ≠ 1
How to turn this graph into
a probability distribution
function (p.d.f.)?
When X= μ, Z = 0, i.e.
When X = μ + σ, Z = 1
When X = μ + 2σ, Z = 2
When X = μ + 3σ, Z = 3 and so on.
It can be proven that P(X1 <X< Xk) = P(Z1 <Z< Zk)
SND shows the probability to the right of any
particular value of Z.
Example
Normal distribution…Questions
Your sample found that the mean price of “affordable” homes in Johor
Bahru, Y, is RM 155,000 with a variance of RM 3.8x107. On the basis of a
normality assumption, how sure are you that:
Answer (a):
160,000 -155,000
P(Y ≤ 160,000) = P(Z ≤ ---------------------------)
= P(Z ≤ 0.811) 3.8x10
7
= 0.1867
Using Z-table , the required probability is:
1-0.1867 = 0.8133
Always remember: to convert to SND, subtract the mean and divide by the std. dev.
Normal distribution…Questions
Answer (b):
X1 - μ 145,000 – 155,000
Z1 = ------
σ
= ---------------- = -1.622
3.8x107
X2 - μ 160,000 – 155,000
Z2 = ------
σ = ----------------
3.8x10 = 0.811
7
P(Z1<-1.622)=0.0455; P(Z2>0.811)=0.1867
P(145,000<Z<160,000)
= P(1-(0.0455+0.1867)
= 0.7678
Normal distribution…Questions
You are told by a property consultant that the
average rental for a shop house in Johor Bahru is
RM 3.20 per sq. After searching, you discovered
the following rental data:
Similar to Z-distribution:
* t(0,σ) but σn→∞→1
* -∞ < t < +∞
* Flatter with thicker tails
* As n→∞ t(0,σ) → N(0,1)
* Has a function of
where =gamma distribution; v=n-1=d.o.f; =3.147
* Probability calculation requires information on
d.o.f.
“Student’s t-Distribution”
* defining
Fr(t) =
where r ≡ n-1 is the number of degrees of freedom, -∞<t<∞,(t) is the gamma function,
B(a,b) is the beta function, and I(z;a,b) is the regularized beta function defined by
Forms of “statistical” relationship
Correlation
Contingency
Cause-and-effect
* Causal
* Feedback
* Multi-directional
* Recursive
The last two categories are normally dealt with
through regression
Correlation
“Co-exist”.E.g.
* left shoe & right shoe, sleep & lying down, food & drink
Indicate “some” co-existence relationship. E.g.
* Linearly associated (-ve or +ve)
Formula:
* Co-dependent, independent
But, nothing to do with C-A-E r/ship!
Example: After a field survey, you have the following
data on the distance to work and distance to the city
of residents in J.B. area. Interpret the results?
Contingency
A form of “conditional” co-existence:
* If X, then, NOT Y; if Y, then, NOT X
* If X, then, ALSO Y
* E.g.
+ if they choose to live close to workplace,
then, they will stay away from city
+ if they choose to live close to city, then, they
will stay away from workplace
+ they will stay close to both workplace and city
Correlation and regression – matrix approach
Correlation and regression – matrix approach
Correlation and regression – matrix approach
Correlation and regression – matrix approach
Correlation and regression – matrix approach
Test yourselves!
Q1: Calculate the min and std. variance of the following data:
PRICE - RM ‘000 130 137 128 390 140 241 342 143
SQ. M OF FLOOR 135 140 100 360 175 270 200 170
Q2: Calculate the mean price of the following low-cost houses, in various
localities across the country:
Q5: Find:
(AGE > “30-34”)
(AGE ≤ 20-24)
( “35-39”≤ AGE < “50-54”)
Test yourselves!
Q6: You are asked by a property marketing manager to ascertain
whether
or not distance to work and distance to the city are “equally” important
factors influencing people’s choice of house location.
You are given the following data for the purpose of testing:
(a) Set your research design and data analysis procedure to address
the research issue
(b) Test your hypothesis that low-income tenants do not perceive
“quality life” to be important in paying their house rentals.
Thank you