Chat with us, powered by LiveChat STAT 0007 UCL Statistics Worksheet - Credence Writers
+1(978)310-4246 [email protected]

Description

STAT0007: Introduction to Applied Probability
(Level 5) or Stochastic Processes (Level 6)
Dr. Elinor M Jones
2
Contents
I
Week 0
5
1 Overview of STAT0007
1.1 Content of the course . . . . . . . . . . . . . . . . . . . . . . . . .
1.2 How this course will be run . . . . . . . . . . . . . . . . . . . . .
1.3 Useful books . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
II
Week 1
7
7
8
11
13
2 Preamble
15
2.1 Mathematical/ Statistical Writing . . . . . . . . . . . . . . . . . 15
III
Week 1 (Continued)
3 Preliminaries
3.1 What is a stochastic process?
3.2 Random variables . . . . . . .
3.3 Conditioning on events . . . .
3.4 Generating functions . . . . .
IV
17
.
.
.
.
Week 2
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
19
19
25
29
35
41
4 Introduction to discrete time Markov processes
43
4.1 The Markov property . . . . . . . . . . . . . . . . . . . . . . . . 43
4.2 Introduction to discrete time Markov chains . . . . . . . . . . . . 45
4.3 The idea of ?first step decomposition? . . . . . . . . . . . . . . . . 50
V
Week 3
57
5 Classifying discrete time Markov processes
59
5.1 Irreducible classes of intercommunicating states . . . . . . . . . . 59
3
4
CONTENTS
5.2
5.3
5.4
5.5
5.6
VI
Recurrence and transience . . . . . . . . . . . . .
Periodicity . . . . . . . . . . . . . . . . . . . . . .
Class properties . . . . . . . . . . . . . . . . . . .
Further properties of discrete time Markov chains
Closed classes and absorption . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Week 4
6 Long-run behaviour of discrete time Markov
6.1 Invariant distributions . . . . . . . . . . . . .
6.2 Equilibrium distributions . . . . . . . . . . .
6.3 Invariant vs equilibrium distributions . . . . .
6.4 Ergodicity and the equilibrium distribution .
60
62
62
65
66
69
processes
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
71
72
74
75
76
Part I
Week 0
5
Chapter 1
Overview of STAT0007
1.1
Content of the course
This module aims to provide an introduction to the study of systems which
change state stochastically with time and to facilitate the development of skills
in the application of probabilistic ideas. It is primarily intended for second and
third year students registered on the undergraduate degree programmes offered
by the Department of Statistical Science, or jointly with other departments.
On successful completion of the module a student should understand the Markov
property in discrete and continuous time; for discrete-time Markov chains, be
able to find and classify the irreducible classes of intercommunicating states,
calculate absorption or first passage times and probabilities, assess the equilibrium behaviour; for simple examples of continuous-time Markov chains, be
able to write down the forward equations, find and interpret the equilibrium
distribution.
Stochastic processes are vital to applications in finance and insurance, and have
many applications in biology and medicine, and in the social sciences. They
also play a fundamental role in areas such as queueing theory and the study
of system reliability. The material in this module can be applied to simplified
real-world situations, and provides the foundations for further study of more
complex systems.
The course will start by briefly revising of conditional probability (however you
are expected to be well-versed in this before starting the course). We?ll then
move on to look at Markov chains (discrete time and states), and in particular
long-run behaviour. The second half of the course looks at Markov processes
in continuous time, but still discrete state space, and again our ultimate aim
is to study the long-run behaviour of these processes. We?ll look at two very
special types of continuous time processes: Poisson processes and birth-death
7
8
CHAPTER 1. OVERVIEW OF STAT0007
processes.
1.2 How this course will be run
1.2.1
Course leader
The course will be led by Dr. Elinor Jones, Associate Professor (Teaching) in
the Department of Statistical Science.
1.2.2
Level 5 and 6 versions of the course
The course is offered at two levels, but the chances are you won?t be able to
choose which level (this generally depends on your year of study etc).
The content and teaching is identical between the two levels, but in the main
exam period level 6 students will complete a more challenging paper.
1.2.3
How you?ll learn
You will work on the material in your own time with live sessions to complement
your learning. A typical weekly schedule will look like the following:
? Batch of work uploaded to Moodle each Monday at 9am.
? Finish the work by the following Monday.
? Each week, work will include:
? Watch videos;
? Read course notes;
? Complete guided exercises;
? Complete and submit homework each week (not in the first week);
? Quizzes;
? Other activities, as required.
You will also attend about three live sessions a week (exact number depends on
which week we?re in).
ONLINE large group sessions (all students attend):
? Tuesday 9-10am UK time.
? Friday 9-10am UK time.
These sessions typically include:
? Overview of material;
? Time for questions;
? Going through unseen questions.
FACE-TO-FACE small group sessions:
? Tutorial (attendance is monitored).
? All students must attend their allocated session
1.2. HOW THIS COURSE WILL BE RUN
1.2.4
9
Weekly exercises
On Friday each week I will release an exercise sheet. Note: the exercise sheet is
not released at the start of the week.
? Submit by the following Thursday 10am UK time.
? Solutions will be available at 9am each Friday.
? Those with a SoRA will have until 12:01am on Friday (i.e. an additional 14 hours according to UCL guidelines) to submit.
? NO LATE SUBMISSIONS. This rule will be strictly adhered to: no
exceptions (but see note above for students with a SoRA).
? Your work will be marked and individual feedback provided for SOME
questions.
These weekly exercises will feed into the weekly tutorial (tutorials start in the
third week).
? You should have been allocated to a tutorial group.
? You cannot move tutorial group unless you have a timetable clash.
? If you need to change your tutorial group, please email Ms Karen Leport.
? Please don?t email me: I?ve no control over tutorial allocation.
During the tutorial, your tutor will go through the questions that were not
marked. You should ?mark? your own work while they do this.
You will then discuss an unseen question which is connected to the work you?ve
been doing. The question will be more involved than a typical homework question. You may be asked to do short tasks connected to the question, e.g.
? Discuss in small groups;
? Respond to live quiz questions.
Please take part: the tutorials are designed to benefit your learning.
1.2.5
Weekly structure in more depth
Just to make sure you are clear on how the weekly workload will be managed:
Week 1
?
?
?
?
Material released on Monday 10th Jan.
Live session Tuesday 11th Jan (welcome session only).
Live session Friday 14th Jan (Q&A for week 1 material).
Exercise sheet 1 released Friday 14th Jan.
Week 2
? Material released on Monday 17th Jan.
? Live session Tuesday 18th Jan (Q&A for week 1 material, continued).
? Submit exercise sheet 1 by 10am on Thursday 20th Jan (unless you have
a SoRA).
? Live session Friday 21st Jan (Q&A for week 1 and 2 material).
10
CHAPTER 1. OVERVIEW OF STAT0007
? Exercise sheet 2 released Friday 21st Jan.
Week 3
?
?
?
?
Material released on Monday 24th Jan.
Live session Tuesday 25th Jan (Q&A for week 2 material, continued).
Attend your allocated tutorial.
Submit exercise sheet 2 by 10am on Thursday 27th Jan (unless you have
a SoRA).
? Live session Friday 28th Jan (Q&A for week 2 and 3 material).
? Exercise sheet 3 released Friday 28th Jan.
And so on, and so forth. There will be some exceptions to this, e.g. when you
sit an ICA for the course.
1.2.6
Getting help
You can get help in the following ways:
? Post to the Forum.
? Book an o?ice hour slot.
? If your question is connected to the tutorial material, then ask your tutor
during the tutorial. Your tutor will not answer questions outside of your
tutorial slot.
Please note that I won?t answer questions by email (the exception to this is if
your question is more personal, e.g. organising additional time for your ICA).
1.2.7
Assessment
Assessment is split into four components:
? ICA 1: Tuesday 8th February (7.5% weight)
? Short Moodle quiz.
? Takes place instead of Tuesday live session.
? Mixture of MCQs and calculation questions.
? ICA 2: Tuesday 22nd March (7.5% weight)
? Short Moodle quiz
? Takes place instead of Tuesday live session.
? Mixture of MCQs and calculation questions.
? Participation (10% weight)
? Acceptable quality of weekly exercise submission.
? Peer exercise.
? Open book exam (75% weight)
? Sometime in main exam period.
? No choice of questions.
? Level 6 students will have more challenging questions.
1.3. USEFUL BOOKS
1.3
11
Useful books
? Introduction to Probability Models by S. Ross;
? Probability and Random Processes by Grimmett and Stirzaker (more
mathematical? be warned).
12
CHAPTER 1. OVERVIEW OF STAT0007
Part II
Week 1
13
Chapter 2
Preamble
2.1
Mathematical/ Statistical Writing
Mathematics is a language for which you must learn its grammar. Whether you
are handing in a tutorial sheet or writing an exam, you need to present your
work in an appropriate manner. You could start by thinking of the following:
? Think carefully about precisely what you are trying to calculate/ solve
and what information you have available in order to do so.
? Have you clearly defined all your notation? Be precise: are your definitions watertight?
? Is your notation appropriate?
? Does your solution flow logically, that is, does each step follow on logically
from the previous step? Again, be precise. The examiner is not a mind
reader. YOU need to guide the reader through your thought process.
? Have you explained your reasoning, not just written down some formulae?
? It is important to understand what you are doing; don?t just repeat a
list of steps. Always ask yourself: ?do I know what I?m doing?? and ?how
would I explain this to a fellow student??.
Being able to write clearly will help you to think clearly. It forces you to
understand the material: if you can?t write it well, you don?t understand it well
enough. Learning mathematics is not about repetition, as it was at A-Level,
and so past papers are NOT A GOOD REVISION TOOL. In particular, I write
ICA and exam questions that probe your understanding, and not your ability
to regurgitate methods. This reflects the fact that proper mathematics is not
about learning to apply a set of methods, but using logical reasoning to find
your way through a problem.
15
16
CHAPTER 2. PREAMBLE
Some tips for writing in STAT0007 (based on previous cohorts of students):
? Random variables form the basis for most, if not every, question you will
encounter. Make sure you define each and every random variable you use.
All random variables should be assigned to CAPITAL letters.
? Probabilities should be written out in full – if you want to calculate ? (? =
1) for some random variable ?, then write it as such. NEVER use notation
such as ? (1) to denote this. And, as the previous bullet point states,
writing ? (? = 1) is plain wrong.
? The ?equals sign? has a very specific meaning. Please don?t abuse it.
? Do a reality check on your answer. For example, a probability should
be between 0 and 1. An expectation of how many time units it takes to
reach a particular point should be at least 1 (assuming moving between
locations takes 1 unit of time, and that you?re not already there!).
? If your answer doesn?t contain written explanation, then it is not a complete solution. Think about guiding the marker through your thought process, rather than expecting the marker to second guess what you meant
to say.
? Use tutorial sheets to practice good mathematical writing.
Part III
Week 1 (Continued)
17
Chapter 3
Preliminaries
3.1
3.1.1
What is a stochastic process?
A course overview
A stochastic process is a process which evolves randomly over time (or space,
or both):
? ?stochastic?: equivalent to ?random?.
? ?process?: occurring over time (or space, or both).
We will consider some simple stochastic processes which are used to model reallife situations. Our main focus will be on developing mathematical tools to
study their properties and learn about long-term behaviour of such processes.
Though this is a mathematical course, the applications of stochastic processes
are wide-ranging, e.g. in physics, engineering, the environment, social science,
epidemics, genetics and finance. We?ll see examples from some of these areas
during the course.
3.1.2
Definitions and basic properties
A stochastic process is a collection of random variables {?? , ? ? ? } taking
values in the state space ?. The parameter, or index, set ? often represents
time. We will look at both discrete-time processes and continuous-time
processes. We?ll generally use the following notation, though you need to be
flexible in adapting to different notation from time to time.
? Discrete-time processes: {?? , ? ? ? }, where ? = {0, 1, 2, ?}. This is often
written in terms of the natural numbers (including zero) rather than ? ,
i.e. {?? , ? ? N}, where N = {0, 1, 2, ?}
? Continuous-time processes: {?(?), ? ? ? }, where ? = R+ (i.e. ? = 0).
19
20
CHAPTER 3. PRELIMINARIES
We?ll often write this as {?? , ? ? ? }, which is identical to the discrete time
notation above, but note that ? here can take any value in the positive
reals whereas for discrete time processes the set of values ? can take is
countable.
? Notice that the state space can be discrete or continuous. In this course
we will only consider discrete state spaces.
3.1.3
Examples of stochastic processes
Below are examples of stochastic processes, categorised according to whether
they are continuous or discrete time processes, and whether the state space is
discrete or continuous. We only consider discrete state space processes in this
course, in both discrete and continuous time.
Discrete time, discrete state space
Here we are tossing a coin and counting the number of excess heads (i.e. how
many more heads than tails we see). The ?time? on the x-axis represents the coin
tosses. For example, we see from this particular realisation that the first toss
must have landed tails (because the number of excess heads is now -1), followed
by a heads (we?re now back up to 0 excess heads). The state-space is countable
(integers, so ? = Z) and the time element here is the coin tosses themselves so
we?re in discrete ?time? with ? = N ? {0}. Notice here that one coin toss is
equivalent to one unit of time, so we need to be flexible in how we view ?time?.
Also notice that the sequence ?1 , ?2 , … isn?t independent: if we know ?? say
then ??+1 will either be ?? + 1 (if we see a head at the next toss) or ?? – 1
otherwise.
Discrete time, continuous state space
Here we have a gambler who starts off with ?10. We?re tracking how much
money they have after each game. The ?time? on the x-axis represents the games
that the gambler bets on, so the state-space is ? = N ? {0}. We consider the
state-space to be continuous here, though technically you could consider it to be
discrete. The reason why it?s easier to consider this as continuous state-space
is that there are so many possible ?values? that the process could take (think:
each value corresponds to a particular value in pounds and pence). Notice again
here that one game is equivalent to one unit of time, so we need to be flexible in
how we view ?time?. Also notice that the sequence ?1 , ?2 , … isn?t independent:
if we know ?? then though we can?t provide a short list of possible values for
??+1 as we could for the previous example, we could take an educated guess
depending on the game being played.
Continuous time, discrete state space
Here we are counting occurrences of a specified event. At time zero we?ve seen
no events, but the first event happens roughly at time 2.5 (we can see this as
the number of occurrences steps up to 1), and the second event happens shortly
after. By time 5 we have seen 2 events. Notice here that the events can happen
at any instant in time so we think of time as continuous. The state-space is
3.1. WHAT IS A STOCHASTIC PROCESS?
21
Figure 3.1: Tracking the number of excess heads in 20 coin tosses
discrete as we?re just counting events (? = N ? {0}). Also notice that if we look
at the value of the process at time say 8.3 and 8.8, ?(8.3) and ?(8.8), then
these aren?t independent: if we know ?(8.3) then ?(8.8) must be at least as big
as ?(8.3) as we can?t ?unsee? events. Under certain distributional assumptions
which we?ll discuss later in the course, these processes are known as ?Poisson
processes?.
Continuous time, continuous state space
Finally an example of a stochastic process that we won?t be studying further:
this is a Brownian motion which has continuous state-space and continuous
time. It has continuous paths, but an extension to Brownian motion is the L?vy
process which allows jumps in values to occur. These types of processes are
used in Finance but are beyond the scope of this course.
NOTE: Continuous-time processes can change their value/state (?jump?) at any
instant of time; discrete-time processes can only do this at a discrete set of time
points. For discrete-time processes, when we use the word ?time? we mean
?number of transitions/steps?.
22
CHAPTER 3. PRELIMINARIES
Figure 3.2: Tracking a gambler?s wealth over 30 games
3.1. WHAT IS A STOCHASTIC PROCESS?
Figure 3.3: A realisation of a Poisson process
23
24
CHAPTER 3. PRELIMINARIES
Figure 3.4: A realisation of a Brownian motion
3.2. RANDOM VARIABLES
3.2
25
Random variables
Informally speaking, a random variable is a variable (i.e. can take different
values), whose numerical values are randomly chosen from a set of possible
values, governed by a probability distribution. For example, the time that a
passenger must wait until the next southbound tube train at Goodge street
station is a variable. Now suppose you arrive at Goodge street station – the
amount of time you must wait until the next southbound tube arrives is a
realisation of the random variable ?waiting time?.
It is important that you can spot a random variable, and can also define appropriate random variables for a given scenario.
3.2.1
Definitions
Ideally, we should think about random variables as functions. In order to do so,
we must first build up some terminology. We?ll use a coin tossing example to
illustrate: suppose you toss a coin twice and record whether it lands heads or
tails at each toss. Note that the following is a simplification of ideas from Probability Theory and we won?t be using the technicalities in this course. However,
it does no harm to see these ideas.
? An experiment is any situation where there is a set of possible outcomes.
? Our coin-tossing example is an experiment.
? The possible outcomes are HH, HT, TH or TT.
? The set of possible outcomes is called the sample space, O.
? In our case, O = {??, ?? , ? ?, ? ? }
? An outcome is denoted ? and is an element of the sample space.
? Suppose we conduct the experiment, and the outcome is TH. This is
our ?.
? An event, ?, is a subset of the sample space.
? e.g. define an event as ?at least one head?.
? This is satisfied by HH, HT and TH.
? This is a subset of O.
Probability model
Real world
Sample space, O
Sample point, ?
Event ? ? O
?Set of all possible outcomes of the experiment.?
?Outcome of the experiment.?
?Property that the outcome of the experiment may
possess.?
With a finite sample space O, we can assign probabilities to individual sample
points ? via a ?weight function?, ? : O ? [0, 1]. This allocates a value ?(?) for
each possible outcome ?, which we interpret as the probability that ? occurs.
26
CHAPTER 3. PRELIMINARIES
We need
? ?(?) = 1,
??O
then we may calculate the probability of an event ? ? O via
P(?) = ? ?(?).
???
Caution: this doesn?t always work with infinite sample spaces, but we won?t
consider this further.
Notice that:
1. P(?) = 0 for all events ?,
2. if ?1 , ?2 , ?3 , ? are events with ?? n?? = ? for all ? ? ?, then P(?8
?=1 ?? ) =
8
??=1 P(?? ) i.e. P is countably additive,
3. P(O) = 1.
As the outcome of an experiment corresponds to a sample point ? ? O, we
can make some numerical measurements whose values depend on the outcome
?. This gives rise to a random variable, let?s call it ?, which is a function
? : O ? R. Its value at a sample point, ?(?), represents the numerical value
of the measurement when the outcome of the experiment is ?.
In our set-up, there are different random variables you could define. Let?s define
? to be the number of heads in the two tosses of a coin. In this case ? maps
O to 0 (if we see TT), 1 (if we see TH or HT) or 2 (if we see HH).
?(? ? ) = 0
?(? ?) = ?(?? ) = 1
?(??) = 2
All outcomes are equally likely here, and so ?(?) = 1/4 for all outcomes ?. If
we define our event ? =?two heads?, then since the only sample point in ? is
HH,
? ?(?) = 1/4,
???
and we say that P(?) = 1/4.
Similarly, we can also define another event ? =?at least one tail?, and this is
given by P(? = 1), which we can calculate as:
P(?) = P(? = 1) = P({? ? O : ?(?) = 1})
(3.1)
Which ? satisfy (3.1) in this case? Now you should be able to calculate the
probability required.
Note: we have defined the distribution function of ? above. In general, the
distribution function of ? of a random variable is given, for ? ? R, by:
?? (?) = P(? = ?) = P({? ? O : ?(?) = ?}).
3.2. RANDOM VARIABLES
3.2.2
27
Discrete random variables
? is a discrete random variable if it takes only finitely many or countably
infinitely many values.
The probability mass function of a discrete random variable is
?? (?) = P(? = ?) = P({? : ?(?) = ?}).
The expectation of ? is
E(?) = ? ? P(? = ?).
?
For a function ? we have
E(?(?)) = ? ?(?) P(? = ?).
?
The variance of ? is
Var(?) = E[(? – E(?))2 ] = E(? 2 ) – [E(?)]2 .
3.2.3
Continuous random variables
? is a continuous random variable if it has a probability density func8
tion, i.e., if there exists a non-negative function ?? with ?-8 ?? (?)?? = 1, such
that for all ?,
?
?? (?) = ? ?? (?)?? .
-8
Then
?? (?) =
??? (?)
.
??
The expectation of ? is
8
E(?) = ?
??? (?)??
-8
and the expectation of ?(?), for some function ?, is
8
E(?(?)) = ?
?(?)?? (?)?? ,
-8
provided that these integrals exist. As before, the variance of ? is
var(?) = E[(? – E(?))2 ] = E(? 2 ) – [E(?)]2 .
28
CHAPTER 3. PRELIMINARIES
3.2.4
Important distributions
There are several probability distributions which we will use repeatedly during
the course. Make sure you are very familiar with the following.
Discrete probability distributions:
Distribution
Support
PMF
Expectation
Variance
Binomial
0, 1, …, ?
P(? =
?) =
??
???
? ~ Bin(?, ?)
Poisson
?!
? ?-?
?!(?-?)! ? ?
0, 1, 2, 3…
P(? = ?
?) =
?
exp(-?) ??!
?
1, 2, 3, 4…
P(? =
?) =
(1 –
?)?-1 ?
1/?
?/?2
Expectation
Variance
1/?
1/?2
? < ?1 < ?
2 (? – ?)
otherwise
1
12 (?
? ~ Po(?)
?>0
Geometric
? ~ Geom(?)
Continuous probability distributions:
Distribution
Exponential
Support
PMF
+
??
[?, ?]
1
{
0
R
-??
? ~ Exp(?)
?>0
Uniform
? ~ Unif(?, ?)
– ?)2
3.3. CONDITIONING ON EVENTS
Distribution
Gamma
Support
R+
PMF
?
29
Expectation
?-1
? exp(-??)?
G(?) ?/?
Variance
?/? 2
? ~ Gamma(?, ?)
?, ? > 0
3.3
Conditioning on events
Calculating probabilities that are conditional on a particular event or occurrence is a natural requirement. Make sure you understand the rationale behind
conditional probability, don?t just learn the formulae!
Easy exercise
You have a deck of playing cards (52 cards in total, split equally into four suits:
hearts, diamonds, spades and clubs, with the first two considered ?red? cards,
and the others considered ?black? cards).
? I pick a card at random. What is the probability that the chosen card is
from the diamond suit?
? I pick a card at random and tell you that it is a red card (i.e. either hearts
or diamonds). What is the probability that the chosen card is from the
diamond suit?
The second question requires a conditional probability (it is conditional on knowing that the card is red), but you most likely calculated the probability intuitively without using any formulae. How did you do that? You applied Bayes?s
theorem intuitively, without even realising it.
3.3.1
Intuition behind conditional probability
In fact, Bayes?s theorem for calculating conditional probabilities is perfectly
intuitive. Suppose we are calculating P(?|?) for some events ? and ?:
? This is asking for the probability of ? occurring, given that ? occurs.
? Assuming ? occurs, then the contents of the statespace O which relate
to ? becomes the ?new? statespace, O’ (which is just the event ? for our
purposes).
? What we are asking here, in effect, is to calculate the probability of ? in
the ?new? sample space O’ (i.e. P(? n ?)), scaled to O’ (i.e. divided by
P(?)).
? The final equation is therefore P(?|?) = P(? n ?)/P(?)
But why scale by P(?)? Simply because the probability of all disjoint events
in any sample space must sum to 1. Our sample space is now O’ = ?, so the
30
CHAPTER 3. PRELIMINARIES
new (probability) weight function we apply to all events in O’ must sum to 1.
We scale all probabilities by P(?) to ensure that this is the case.
The following graphic may help you to visualise why Bayes?s theorem ?works?.
Figure 3.5: A schematic of conditional probability
3.3.2
Formal version of conditional probability
Let ? and ? be events with P(?) > 0. The conditional probability of ?
given ? is
P(? n ?)
P(? | ?) =
.
P(?)
3.3. CONDITIONING ON EVENTS
31
It is easy to verify that
1. P(? | ?) = 0 ,
2. if ?1 , ?2 , ? are mutually exclusive events, then
P(?1 ? ?2 ? ? | ?) = ? P(?? | ?),
?
(try drawing diagrams like the above to visualise this)
3. P(? | ?) = 1.
Conditional probability for random variables instead of events works is the same
way.
Discrete RVs ? and ?
Continuous RVs ? and ?
Joint probability mass function
Joint density function
??,? (?, ?) = P(? = ?, ? = ?)
??,? (?, ?)
? and ? independent if and only
if for all ?, ?
? and ? independent if and only if for
all ?, ?
??,? (?, ?) = ?? (?)?? (?) = P(? = ?)P(? = ??)
?,? (?, ?) = ?? (?)?? (?)
Marginal probability mass
function (of ?)
Marginal density (of ? )
8
?? (?) = P(? = ?) = ? P(? = ?, ? = ?)
?? (?) = ?
??,? (?, ?)??
-8
?
Conditional PMF of ? given
? =?
??|? =? (?) = P(? = ? | ? = ?) =
Conditional density of ? given ? = ?
(?? (?) > 0)
P(? = ?, ? = ?)
??,? (?, ?)
??|? =? (?) =
P(? = ?)
?? (?)
Conditional distribution function
of ? given ? = ?
Conditional distribution function of ?
given ? = ?
??|? =? (?) = P(? = ? | ? = ?)
??|? =? (?) = P(? = ? | ? = ?)
32
CHAPTER 3. PRELIMINARIES
3.3.3
Conditional expectation
Just like calculating an expectation of, say, ?, requires knowing the distribution
of ?, the conditional expectation of ?, given that we know ? = ? requires
knowing the distribution of (?|? = ?).
The conditional expectation of ? given ? = ? is
E(?|? = ?) = {
?? ?P(? = ? | ? = ?) for discrete RVs
8
?-8 ???|? =? (?)??
for continuous RVs
The conditional expectation of ?(?) given ? = ?, for some function ?,
is
E(?(?)|? = ?) = {
?? ?(?)P(? = ? | ? = ?) for discrete RVs
8
?-8 ?(?)??|? =? (?)??
for continuous RVs
Example 3.1. Suppose that O = {?1 , ?2 , ?3 } and P(?? ) = 1/3 for ? = 1, 2, 3.
Suppose also that ? and ? are random variables with
?(?1 ) = 2, ?(?2 ) = 3, ?(?3 ) = 1
? (?1 ) = 2, ? (?2 ) = 2, ? (?3 ) = 1
Find the conditional pmf ??|? =2 (?) and the conditional expectation E(? | ? =
2).
Solution
Possible values of ? are 1, 2, 3.
??|? =2 (1) = P(? = 1 | ? = 2) = 0.
??|?=2 (2) = P(? = 2 | ? = 2) =
P(? = 2, ? = 2)
P(?1 )
1
=
= .
P(? = 2)
P(?1 )+P(?2 )
2
??|?=2 (3) = P(? = 3 | ? = 2) =
P(?2 )
1
P(? = 3, ? = 2)
=
= .
P(? = 2)
P(?1 )+P(?2 )
2
Thus the conditional pmf of ? given ? = 2 equals
and is 0 otherwise. The conditional expectation is
1
2
for ? = 2 and for ? = 3
E(? | ? = 2) = ? ? P(? = ? | ? = 2) = (1 ? 0) + (2 ? 1/2) + (3 ? 1/2) = 5/2 .
?
3.3. CONDITIONING ON EVENTS
33
So far, we have only calculated conditional expectations when conditioning on
specific values, e.g. E[?|? = 2]. We can extend this idea to conditioning on
random variables without equating them to specific values.
The notation E(? | ? ) is used to denote a random variable that takes the
value E(? | ? = ?) with probability P(? = ?).
Note that E(? | ? ) is a function of the random variable ? , and is itself a
random variable.
Example 3.2. In Example 3.1 we found that
E[? | ? = 2] =
5
2
It can similarly be shown that
E[? | ? = 1] = 1
You can also check that
P(? = 1) = 1/3 and P(? = 2) = 2/3
Thus, the random variable
E[? | ? ]
has two possible values: 1 and 25 , with probabilities
is,
E[?|? ] = {
3.3.4
1
5/2
1
3
and 23 , respectively. That
with probability 1/3
with probability 2/3
Useful conditioning formulae
Three important formulae:
Law of total probability Let ? be an event and ? be ANY random variable.
Then
? P(?|? = ?)P(? = ?)
if ? discrete
P(?) = { 8?
?-8 P(? | ? = ?)?? (?)?? if ? continuous
Modification of the law of total probability: if ? is a third discrete random
variable,
34
CHAPTER 3. PRELIMINARIES
P(? = ? | ? = ?) = ? P(? = ?, ? = ? | ? = ?)
?
=?
?
=?
?
P(? = ?, ? = ?, ? = ?)
P(? = ?)
P(? = ?, ? = ?, ? = ?) P(? = ?, ? = ?)
P(? = ?, ? = ?)
P(? = ?)
= ? P(? = ? | ? = ?, ? = ?) P(? = ? | ? = ?),
?
assuming the conditional probabilities are all defined.
Law of conditional (iterated) expectations
E[?] = E[E(? | ? )] = {
?? E(? | ? = ?) P(? = ?)
8
?-8 E(? | ? = ?)?? (?)??
if ? discrete
if ? continuous
and, more generally,
E[?(?)] = E(E[?(?) | ? ])
Note in particular:
? The law of total probability applies to ANY event ? and ANY random
variable ? .
? The law of the conditional (iterated) expectation, E[?] = E[E[?|? ]] applies to ANY random variables ? and ? , and we will be using this identity
heavily during the course.
Example 3.3. In Example 3.2 we found the distribution of the random variable
E(? | ? ) was
1
w.p. 1/3
E[?|? ] = {
5/2 w.p. 2/3
The expectation of this random variable is
E[E(?|? )] = 1 ?
1 5 2
+ ? = 2.
3 2 3
Also, the expectation of ? is given by
E(?) = 1 ?
1
1
1
+ 2 ? + 3 ? = 2,
3
3
3
so that we see that the law of the conditional (iterated) expectation, i.e. that
E[?] = E[E[?|? ]], holds.
3.4. GENERATING FUNCTIONS
35
Example 3.4. Let ? and ? have joint density
??,? (?, ?) =
1 -?/? -?
?
? , 0 < ?, ? < 8 .
?
Find ?? (?), E(? | ? ) and hence E(?).
Solution
For any ? > 0,
8
?? (?) = ?
0
1 -?/? -?
-?
?
? ?? = ?-? [-?-?/? ]8
0 =? ,
?
so ? has an exponential distribution with parameter 1.
Also, for any fixed ? > 0 and any ? > 0,
??|? (? | ?) =
??,? (?, ?)
1
?-?/? ?-?
= ?-?/? ,
=
?? (?)
??-?
?
so ? | ? = ? ~ exp(1/?).
It follows that E(? | ? = ?) = ? and so E(? | ? ) = ? . Using the Law of
Conditional Expectations we then have
E[?] = E[E(? | ? )] = E(? ) = 1 .
3.4
Generating functions
Generating functions are USEFUL. Don?t think of them as baffling mathematical constructs; instead think of them as marvelous time-saving devices.
Imagine you have a sequence of real numbers, ?1 , ?2 , ?3 , …. It can be hard to
keep track of such numbers and all the information they contain. A generating
function provides a way of wrapping up the entire sequence into one expression.
When we want to extract information about the sequence, we simply interrogate
this one expression in a particular way in order to get what we want.
There are many different types of generating function. These are different in
the sense that we wrap up the sequence in different ways, and so extracting
particular pieces of information from them requires different techniques. The
two types of generating function that you should be familiar with are:
1. The probability generating function (PGF);
2. The moment generating function (MGF)
36
3.4.1
CHAPTER 3. PRELIMINARIES
Probability generating functions
The probability generating function (p.g.f) of ? is defined to be
8
?? (?) = E(?? ) = ? ?? P(? = ?) for |?| = 1.
?=0
Probability generating function fundamentals
The random variable
?
Sequence to ?wrap up?
Method
Information stored
Takes values in N0 = {0, 1, 2, ?}
P(? = 0), P(? = 1), P(? = 2), P(? = 3), …
8
??=0 ?? P(? = ?) (for any ? ? [-1, 1])
The probabilities P(? = 0), P(? = 1), P(? = 2), …
and some moments (expectations).
The information stored in probability generating functions can be acces

error: Content is protected !!