Proposition in probability theory
The proposition in probability theory known as the law of total expectation,[1] the law of iterated expectations[2] (LIE), Adam's law,[3] the tower rule,[4] and the smoothing theorem,[5] among other names, states that if
is a random variable whose expected value
is defined, and
is any random variable on the same probability space, then
![{\displaystyle \operatorname {E} (X)=\operatorname {E} (\operatorname {E} (X\mid Y)),}](https://wikimedia.org/api/rest_v1/media/math/render/svg/2f474922469e6178e791d731c5f72b7b05a5a3c5)
i.e., the expected value of the conditional expected value of
given
is the same as the expected value of
.
The conditional expected value
, with
a random variable, is not a simple number; it is a random variable whose value depends on the value of
. That is, the conditional expected value of
given the event
is a number and it is a function of
. If we write
for the value of
then the random variable
is
.
One special case states that if
is a finite or countable partition of the sample space, then
![{\displaystyle \operatorname {E} (X)=\sum _{i}{\operatorname {E} (X\mid A_{i})\operatorname {P} (A_{i})}.}](https://wikimedia.org/api/rest_v1/media/math/render/svg/8f2c9820f1b9960111d21644ba1623f8510cfad2)
Suppose that only two factories supply light bulbs to the market. Factory
's bulbs work for an average of 5000 hours, whereas factory
's bulbs work for an average of 4000 hours. It is known that factory
supplies 60% of the total bulbs available. What is the expected length of time that a purchased bulb will work for?
Applying the law of total expectation, we have:
![{\displaystyle {\begin{aligned}\operatorname {E} (L)&=\operatorname {E} (L\mid X)\operatorname {P} (X)+\operatorname {E} (L\mid Y)\operatorname {P} (Y)\\[3pt]&=5000(0.6)+4000(0.4)\\[2pt]&=4600\end{aligned}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/4f779fcae9239db19590476736aa14a83da4ee42)
where
is the expected life of the bulb;
is the probability that the purchased bulb was manufactured by factory
;
is the probability that the purchased bulb was manufactured by factory
;
is the expected lifetime of a bulb manufactured by
;
is the expected lifetime of a bulb manufactured by
.
Thus each purchased light bulb has an expected lifetime of 4600 hours.
When a joint probability density function is well defined and the expectations are integrable, we write for the general case
A similar derivation works for discrete distributions using summation instead of integration. For the specific case of a partition, give each cell of the partition a unique label and let the random variable Y be the function of the sample space that assigns a cell's label to each point in that cell.
Proof in the general case
[edit]
Let
be a probability space on which two sub σ-algebras
are defined. For a random variable
on such a space, the smoothing law states that if
is defined, i.e.
, then
![{\displaystyle \operatorname {E} [\operatorname {E} [X\mid {\mathcal {G}}_{2}]\mid {\mathcal {G}}_{1}]=\operatorname {E} [X\mid {\mathcal {G}}_{1}]\quad {\text{(a.s.)}}.}](https://wikimedia.org/api/rest_v1/media/math/render/svg/8bc3530a10a06d64ddb28a07e07960fb1d835edf)
Proof. Since a conditional expectation is a Radon–Nikodym derivative, verifying the following two properties establishes the smoothing law:
-measurable
for all ![{\displaystyle G_{1}\in {\mathcal {G}}_{1}.}](https://wikimedia.org/api/rest_v1/media/math/render/svg/32407bd8d66b1513e0d3b4cdfb46134bd3d58d36)
The first of these properties holds by definition of the conditional expectation. To prove the second one,
![{\displaystyle {\begin{aligned}\min \left(\int _{G_{1}}X_{+}\,d\operatorname {P} ,\int _{G_{1}}X_{-}\,d\operatorname {P} \right)&\leq \min \left(\int _{\Omega }X_{+}\,d\operatorname {P} ,\int _{\Omega }X_{-}\,d\operatorname {P} \right)\\[4pt]&=\min(\operatorname {E} [X_{+}],\operatorname {E} [X_{-}])<\infty ,\end{aligned}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/2e371ba89f0588fc1532b987288b5aac461fdfba)
so the integral
is defined (not equal
).
The second property thus holds since
implies
![{\displaystyle \int _{G_{1}}\operatorname {E} [\operatorname {E} [X\mid {\mathcal {G}}_{2}]\mid {\mathcal {G}}_{1}]\,d\operatorname {P} =\int _{G_{1}}\operatorname {E} [X\mid {\mathcal {G}}_{2}]\,d\operatorname {P} =\int _{G_{1}}X\,d\operatorname {P} .}](https://wikimedia.org/api/rest_v1/media/math/render/svg/0906acbab789272c2e4b4e068a913bc721c7db86)
Corollary. In the special case when
and
, the smoothing law reduces to
![{\displaystyle \operatorname {E} [\operatorname {E} [X\mid Y]]=\operatorname {E} [X].}](https://wikimedia.org/api/rest_v1/media/math/render/svg/6706569e29099b285c9c9032d5ea122c6de71098)
Alternative proof for
This is a simple consequence of the measure-theoretic definition of conditional expectation. By definition,
is a
-measurable random variable that satisfies
![{\displaystyle \int _{A}\operatorname {E} [X\mid Y]\,d\operatorname {P} =\int _{A}X\,d\operatorname {P} ,}](https://wikimedia.org/api/rest_v1/media/math/render/svg/33d2aaa633a5ae2ca8674962bb391989854a4a05)
for every measurable set
. Taking
proves the claim.