Law of total expectation

The proposition in probability theory known as the law of total expectation,^[1] the law of iterated expectations^[2] (LIE), Adam's law,^[3] the tower rule,^[4] and the smoothing theorem,^[5] among other names, states that if $X$ is a random variable whose expected value $\operatorname {E} (X)$ is defined, and $Y$ is any random variable on the same probability space, then

\operatorname {E} (X)=\operatorname {E} (\operatorname {E} (X\mid Y)),

i.e., the expected value of the conditional expected value of $X$ given $Y$ is the same as the expected value of $X$ .

The conditional expected value $\operatorname {E} (X\mid Y)$ , with $Y$ a random variable, is not a simple number; it is a random variable whose value depends on the value of $Y$ . That is, the conditional expected value of $X$ given the event $Y=y$ is a number and it is a function of $y$ . If we write $g(y)$ for the value of $\operatorname {E} (X\mid Y=y)$ then the random variable $\operatorname {E} (X\mid Y)$ is $g(Y)$ .

One special case states that if ${\left\{A_{i}\right\}}$ is a finite or countable partition of the sample space, then

\operatorname {E} (X)=\sum _{i}{\operatorname {E} (X\mid A_{i})\operatorname {P} (A_{i})}.

Example

Suppose that only two factories supply light bulbs to the market. Factory $X$ 's bulbs work for an average of 5000 hours, whereas factory $Y$ 's bulbs work for an average of 4000 hours. It is known that factory $X$ supplies 60% of the total bulbs available. What is the expected length of time that a purchased bulb will work for?

Applying the law of total expectation, we have:

{\begin{aligned}\operatorname {E} (L)&=\operatorname {E} (L\mid X)\operatorname {P} (X)+\operatorname {E} (L\mid Y)\operatorname {P} (Y)\\[3pt]&=5000(0.6)+4000(0.4)\\[2pt]&=4600\end{aligned}}

where

$\operatorname {E} (L)$ is the expected life of the bulb;
$\operatorname {P} (X)={6 \over 10}$ is the probability that the purchased bulb was manufactured by factory $X$ ;
$\operatorname {P} (Y)={4 \over 10}$ is the probability that the purchased bulb was manufactured by factory $Y$ ;
$\operatorname {E} (L\mid X)=5000$ is the expected lifetime of a bulb manufactured by $X$ ;
$\operatorname {E} (L\mid Y)=4000$ is the expected lifetime of a bulb manufactured by $Y$ .

Thus each purchased light bulb has an expected lifetime of 4600 hours.

Informal proof

When a joint probability density function is well defined and the expectations are integrable, we write for the general case ${\begin{aligned}\operatorname {E} (X)&=\int x\Pr[X=x]~dx\\\operatorname {E} (X\mid Y=y)&=\int x\Pr[X=x\mid Y=y]~dx\\\operatorname {E} (\operatorname {E} (X\mid Y))&=\int \left(\int x\Pr[X=x\mid Y=y]~dx\right)\Pr[Y=y]~dy\\&=\int \int x\Pr[X=x,Y=y]~dx~dy\\&=\int x\left(\int \Pr[X=x,Y=y]~dy\right)~dx\\&=\int x\Pr[X=x]~dx\\&=\operatorname {E} (X)\,.\end{aligned}}$ A similar derivation works for discrete distributions using summation instead of integration. For the specific case of a partition, give each cell of the partition a unique label and let the random variable Y be the function of the sample space that assigns a cell's label to each point in that cell.

Proof in the general case

Let $(\Omega ,{\mathcal {F}},\operatorname {P} )$ be a probability space on which two sub σ-algebras ${\mathcal {G}}_{1}\subseteq {\mathcal {G}}_{2}\subseteq {\mathcal {F}}$ are defined. For a random variable $X$ on such a space, the smoothing law states that if $\operatorname {E} [X]$ is defined, i.e. $\min(\operatorname {E} [X_{+}],\operatorname {E} [X_{-}])<\infty$ , then

\operatorname {E} [\operatorname {E} [X\mid {\mathcal {G}}_{2}]\mid {\mathcal {G}}_{1}]=\operatorname {E} [X\mid {\mathcal {G}}_{1}]\quad {\text{(a.s.)}}.

Proof. Since a conditional expectation is a Radon–Nikodym derivative, verifying the following two properties establishes the smoothing law:

$\operatorname {E} [\operatorname {E} [X\mid {\mathcal {G}}_{2}]\mid {\mathcal {G}}_{1}]{\mbox{ is }}{\mathcal {G}}_{1}$ -measurable
$\int _{G_{1}}\operatorname {E} [\operatorname {E} [X\mid {\mathcal {G}}_{2}]\mid {\mathcal {G}}_{1}]\,d\operatorname {P} =\int _{G_{1}}X\,d\operatorname {P} ,$ for all $G_{1}\in {\mathcal {G}}_{1}.$

The first of these properties holds by definition of the conditional expectation. To prove the second one,

{\begin{aligned}\min \left(\int _{G_{1}}X_{+}\,d\operatorname {P} ,\int _{G_{1}}X_{-}\,d\operatorname {P} \right)&\leq \min \left(\int _{\Omega }X_{+}\,d\operatorname {P} ,\int _{\Omega }X_{-}\,d\operatorname {P} \right)\\[4pt]&=\min(\operatorname {E} [X_{+}],\operatorname {E} [X_{-}])<\infty ,\end{aligned}}

so the integral $\textstyle \int _{G_{1}}X\,d\operatorname {P}$ is defined (not equal $\infty -\infty$ ).

The second property thus holds since $G_{1}\in {\mathcal {G}}_{1}\subseteq {\mathcal {G}}_{2}$ implies

\int _{G_{1}}\operatorname {E} [\operatorname {E} [X\mid {\mathcal {G}}_{2}]\mid {\mathcal {G}}_{1}]\,d\operatorname {P} =\int _{G_{1}}\operatorname {E} [X\mid {\mathcal {G}}_{2}]\,d\operatorname {P} =\int _{G_{1}}X\,d\operatorname {P} .

Corollary. In the special case when ${\mathcal {G}}_{1}=\{\emptyset ,\Omega \}$ and ${\mathcal {G}}_{2}=\sigma (Y)$ , the smoothing law reduces to

\operatorname {E} [\operatorname {E} [X\mid Y]]=\operatorname {E} [X].

Alternative proof for $\operatorname {E} [\operatorname {E} [X\mid Y]]=\operatorname {E} [X].$

This is a simple consequence of the measure-theoretic definition of conditional expectation. By definition, $\operatorname {E} [X\mid Y]:=\operatorname {E} [X\mid \sigma (Y)]$ is a $\sigma (Y)$ -measurable random variable that satisfies

\int _{A}\operatorname {E} [X\mid Y]\,d\operatorname {P} =\int _{A}X\,d\operatorname {P} ,

for every measurable set $A\in \sigma (Y)$ . Taking $A=\Omega$ proves the claim.

References

^ Weiss, Neil A. (2005). A Course in Probability. Boston: Addison–Wesley. pp. 380–383. ISBN 0-321-18954-X.
^ "Law of Iterated Expectation | Brilliant Math & Science Wiki". brilliant.org. Retrieved 2018-03-28.
^ "Adam's and Eve's Laws". Adam and Eve's laws (Shiny app). 2024-09-15. Retrieved 2022-09-15.
^ Rhee, Chang-han (Sep 20, 2011). "Probability and Statistics" (PDF).
^ Wolpert, Robert (November 18, 2010). "Conditional Expectation" (PDF).

Billingsley, Patrick (1995). Probability and measure. New York: John Wiley & Sons. ISBN 0-471-00710-2. (Theorem 34.4)
Christopher Sims, "Notes on Random Variables, Expectations, Probability Densities, and Martingales", especially equations (16) through (18)

[1] Weiss, Neil A. (2005). A Course in Probability. Boston: Addison–Wesley. pp. 380–383. ISBN 0-321-18954-X.

[2] "Law of Iterated Expectation | Brilliant Math & Science Wiki". brilliant.org. Retrieved 2018-03-28.

[3] "Adam's and Eve's Laws". Adam and Eve's laws (Shiny app). 2024-09-15. Retrieved 2022-09-15.

[4] Rhee, Chang-han (Sep 20, 2011). "Probability and Statistics" (PDF).

[5] Wolpert, Robert (November 18, 2010). "Conditional Expectation" (PDF).

[1]

[2]

[3]

[4]

[5]

Example

Informal proof

Proof in the general case

See also

References