This document provides an overview of key concepts in probability and causality relevant for understanding randomized experiments. It introduces potential outcomes notation to represent causal effects and discusses how random assignment addresses selection bias by making treatment status independent of potential outcomes. Randomized experiments allow for an unbiased estimate of the average causal effect, though their results may not be externally valid in other contexts. Observational data requires methods to approximate experimental conditions and reduce selection bias.
This document provides an overview of key concepts in probability and causality relevant for understanding randomized experiments. It introduces potential outcomes notation to represent causal effects and discusses how random assignment addresses selection bias by making treatment status independent of potential outcomes. Randomized experiments allow for an unbiased estimate of the average causal effect, though their results may not be externally valid in other contexts. Observational data requires methods to approximate experimental conditions and reduce selection bias.
This document provides an overview of key concepts in probability and causality relevant for understanding randomized experiments. It introduces potential outcomes notation to represent causal effects and discusses how random assignment addresses selection bias by making treatment status independent of potential outcomes. Randomized experiments allow for an unbiased estimate of the average causal effect, though their results may not be externally valid in other contexts. Observational data requires methods to approximate experimental conditions and reduce selection bias.
This document provides an overview of key concepts in probability and causality relevant for understanding randomized experiments. It introduces potential outcomes notation to represent causal effects and discusses how random assignment addresses selection bias by making treatment status independent of potential outcomes. Randomized experiments allow for an unbiased estimate of the average causal effect, though their results may not be externally valid in other contexts. Observational data requires methods to approximate experimental conditions and reduce selection bias.
J. Angrist Preliminaries Reading: MHE Chapters 1-2 Ultimately, were interested in measuring causal relationships !las, we ha"e to pay some pro# $ stats dues #e%ore we learn how &ut causality is a #ig and deep concept, so we should start thin'ing a#out it now (e ma'e sense o% causal relationships using potential outcomes )hese capture *what i%s+, a#out the world -or e.ample, /1i 0 my health i% 1 go to the hospital /2i 0 my health i% 1 stay away 3Here, were using an e.plicit notation %or potential outcomes 4ometimes well 'eep this in the #ac'ground5 My %riend Mi'e, who runs emergency medicine at Hart%ord Hospital, descri#es the causal e%%ect o% hospitali6ation li'e this: People come to the ER and they want to be admitted. They figure theyll just get admitted to the hospital and well take over and make them better. They dont realie that the hospital can be a pretty dangerous place. !nless youre really sick" youre really better off going home.# How does /1i compare with /2i+ (e can ne"er 'now %or sure so we try to loo' at e.pectations or a"erages: E7/1i 8/2i 901: 0 E7/1i 9i01: 8 E7/2i 9i01: 31n general, *E7/i; <i:, means the *population a"erage o% a random "aria#le, /i, holding the random "aria#le <i %i.ed,5 )he ta#le on page 1= o% MHE shows some &ritish data on hospitali6ation and health )a'en at %ace "alue, these data suggest Mi'e is right )he pro#lem with a causal interpretation o% this ta#le is selection #ias >et yi denote the o#ser"ed outcome 3say an inde. o% health5 )hen, we ha"e:
E7yi 9i 01: 8 E7yi 9i0 2: 0 E7/1i 8 /2i 9i 0 1: ? @E7/2i 9i01: 8 E7/2i 9i02:A 0 the a"erage causal e%%ect on the hospitali6ed ? selection #ias Hospitali6ation may help you or hurt you 3on a"erage5, #ut as a rule, its the sic' who see' treatment Random assignment o% 9i %i.es selection #ias #ecause 9i and /2i are then independent E.periments are there%ore said to ha"e *internal "alidity, though they may not ha"e *e.ternal "alidity,, which is predicti"e "alue %or another time or conte.t than the one e"aluated /ou cant randomi6e e"erything, o% course, perhaps not hospitali6ation %or routine medical complaints, #ut you can try to use the data you ha"e 3or collect more5 in an e%%ort to come close to the desired e.periment )he details o% how this is done is what most o% econometrics is a#out 1 Lecture !te 1 Pr!"a"ilit# an$ %istri"uti!n Reading: (ooldridge !ppendices ! and & ! Bro#a#ility *! system %or Cuanti%ying chance and ma'ing predictions a#out %uture e"ents, Concepts $ample space: 4 0 @a1, a2, a=, , aDA the basic elements of the experiment example: toss two coins (J=4) (to make this interesting, we could place bets) Random variable: X(a) the data A function that assigns numerical values to events example: number of heads in two coin tosses Probability: a function defined over events or random variables. When defined over events, probability satisfies axioms: 0<P(A)<1 P(S)=1 P{U j A j ) = _ j P(A j ) for disjoint events A j and has properties P() =0 P(A c ) = 1 - P(A) AB P(A)<P(B) P(AOB)=P(A)+P(B)-P(AB) When we write P(x) for a discrete r.v. this is shorthand for P(the union of all events a j such that X(a j )=x). For a continuous r.v., we write P(X<x) to mean P(the union of all events a j such that X(a j )<x). But what is probability really? The relative frequency of an event in many () repeated trials. A personal and subjective assessment of the likelihood of an event, where the assessment obeys the axioms of probability 2 Probabil ity (cont.) Conditional probability: P(A|B) P(AB)/P(B) Conditional probability has the properties of and obeys the axioms of probability Bayes Rule: Let the set {C i ; i=1, . . .I} be a partition of the sample space. Then: P(C i | A) = {P(A|C i )P(C i )}/{_ i P(A| C i )P(C i )} Proof: use P(C i | A)=P(A|C i )P(C i )/P(A) and the fact that {C i ; i=1,. . .,I} is a partition. Bayes rule is useful for reversing conditional probability statements. Independence: A is said to be independent of B iff P(AB)=P(A)P(B) Sometimes we write: A_ B Note: A_B P(A|B)=P(A) Note: r.v.s are independent if their distribution or density functions factor (more below) B. Distribution and density functions (how we characterize r.v.s) For the rest of the course, our probability statements will apply directly to r.v.s 1. Discrete random variables Empirical distribution functions Example: years of schooling Probability mass function (pmf) Parametric examples: Bernoulli, binomial, multinomial, geometric Cumulative distribution functions (cdf)-- discrete r.v. Obtain by summation 2. Continuous random variables Probability density functions (pdf) note: P(X=x)=0
= Parametric examples: uniform, exponential, normal; empirical PDFs of students grades Cumulative distribution functions -- continuous r.v. Obtain by integration P(X<c)=F(c) = - | c f(t)dt P(a<X<b)= a | b f(t)dt= F(b)-F(a) P(X>c)=1-F(c) Relationship between cdf and pdf F(x)=f(x). 3. Functions of random variables Mantra: A function of a random variable is a random variable and therefore has a distribution Discrete r.v. Y=r(X); P(X=x j ) = f(x j ); then g(y) = P[r(X)=y] = _ {x: r(x)=y} f(x) Continuous r.v. Examples: (i) Y=lnX; X F G(y)=P(Y<y)=P(lnX<y)=P(X<e y )=F(e y ) g(y) = f(e y )e y (ii) Y=1/X; X>0 G(y)=P(Y<y)=P(1/X<y)=P(X>1/y)=1-F(1/y) g(y)=-f(1/y)(-y -2 ) In general: for Y=r(X) and X = r -1 (Y) s(Y), where r(X) is continuous and invertible if r is increasing, G(y) = F[s(y)] and g(y) = f[s(y)]s(y) E If r is decreasing, G(y) = 1-F[s(y)] and g(y) = -f[s(y)]s(y) Important special case: Y = r(X) =a + bX; b>0 X = (Y-a)/b = s(Y) G(Y) = F[(Y-a)/b] g(Y)=f[(Y-a)/b](1/b) Standardize r.v. X by setting a=-E(X)/ X and b=1/ X . C. Bivariate distribution functions: how r.v.s move together For discrete r.v.s: f(x,y) = P(X=x, Y=y) For continuous r.v.s: f(x,y) is the joint density Probability statements for joint continuous r.v.s. use the cdf: F(x, y) = P(X<x, Y<y) = - | x
- | y f(s,t)dsdt Marginal distributions Marginal for X: f 1 (x); obtain by integrating the joint density or summing the joint pmf over Y Marginal for Y: f 2 (y); obtain by integrating the joint density or summing the joint pmf over X Conditional distributions Divide the joint density or pmf by the marginal density or pmf f 2 (y|x)=f(x,y)/f 1 (x); f 1 (x|y)=f(x,y)/f 2 (y) Example: Joint normal: marginal and conditional are also normal f(x,y) = [(1/2)(1- 2 )] -1/2 exp{-1/2(1- 2 )[(x- x ) 2 -2(x- x )(y- y )+(y- y ) 2 ]} X and Y are normally distributed with means x and y , standard deviation 1, F and correlation Example: Roof distribution f(x,y)=(x+y) for 0<x<1 and 0<y<1. f 1 (x)=x+(1/2) f 2 (y)=y+(1/2) f 2 (y|x)= 2(x+y)/(2x+1) D. Example: the effect of a wage voucher (Burtless, 1985) Simple conditional distributions for Bernoulli outcomes in a randomized trial G