Likelihood Weighting
The purpose of this webpage is to provide a detailed example of likelihood weighting.
Prepared by Duane A. McCully, University of Utah, Spring 2009.
Sample Network
Consider the alarm network (page 494, Figure 14.2).

Generating Likelihood Weights
A simple description of the likelihood-weighting algorithm described
on page 515 (Figure 14.14) is as follows:
At the time that the network is sampled, the state of some
nodes will be known and others will not.
The nodes whose values are known are referred to as evidence variables.
So, given the evidence, we need to query the remaining nodes to determine
the state of the entire network.
When this is complete a likelihood weight is assigned to this sample
by multiplying together the probabilities of each evidence variable given
its parents.
This result is stored in a map, that we will name W, that associates all of the
variables in the network with its weight.
In slightly more detail:
-
A temporary variable, w is set to 1.
This will hold the calculated weight of this sample.
-
A temporary variable, x is set to empty.
This will hold the state of each node for this sample.
For example, if the state of the network, including both evidence variables
and variables that were queried, were as follows: Burglary=false,
Earthquake=false, Alarm=false, JohnCalls=true,
MaryCalls=false, this could be represented as x
= (~b,~e,~a,j,~m).
-
Each node in the network is examined. If the node is evidence, then we
perform the following calculation:
w = wp(currentNode | parents of currentNode)
If the current node is not evidence, then it is sampled to determine
its state.
It does not contribute to the weight calculation.
Whether the node considered is evidence, or whether is state is discovered
through sampling, its state is added to x.
After the entire network is examined for this sample, we will be left with
x and w, representing the state of the network and the
likelihood weight to be associated to that state, respectively. This is
added to W using x as the key and w as the data
value. If x already exists in W, then w is added
to the data value associated to x in W.
We will now generate a set of samples for the above Alarm network:
Sample 1
-
Evidence is Burglary=false and Earthquake=false.
We will now query the remaining nodes in the network to determine their state.
-
We now set the weight w is set to 1.0 and x to empty.
-
Burglary is an evidence variable with value false.
Therefore, we set
w = wp(Burglary=False) = (1.0)(0.999) = 0.999
x = (~b).
-
Earthquake is an evidence variable with value false.
Therefore, we set
w = wp(Earthquake=False) = (0.999)(0.998) = 0.997
x = (~b,~e).
-
We sample from p(Alarm|Burglary=false), Earthquake=false) = <0.001, 0.999>; suppose this returns false.
x = (~b,~e,~a).
-
We sample from p(JohnCalls|Alarm=false) = <0.05, 0.95>; suppose this
returns false.
x = (~b,~e,~a,~j).
-
We sample from p(MaryCalls|Alarm=false) = <0.01, 0.99>; suppose this
returns false.
x = (~b,~e,~a,~j,~m).
For this example, the weighted sample is Burglary=false, Earthquake=false, Alarm=false, JohnCalls=false, MaryCalls=false with a weight of 0.997.
Now the book speaks of "W, a vector of weighted counts over X."
So pursuant to this terminology, we think of a map in C++ whose key is the tuple (~b,~e,a,j,m) and whose mapped value is the weight.
Or, in python, a dict() object.
Any weight that is computed through the above algorithm is added to any existing weight that matches the key in W.
Since this is our first sample, there is no such key in W so the existing weight is effectively zero.
Here is what W looks like so far (the sample column is for our referencing convenience):
| W |
| Sample | Key | Weight |
| 1 | ~b | ~e | ~a | ~j | ~m | 0.997 |
Sample 2
-
Evidence is Alarm=false and JohnCalls=true.
We will now query the remaining nodes in the network to determine their state.
-
We now set the weight w is set to 1.0 and x to empty.
-
Burglary is not an evidence variable so we sample it; suppose it return false.
x = (~b).
-
Earthquake is not an evidence variable so we sample it; suppose it return false.
x = (~b,~e).
-
Alarm is an evidence variable with value false.
Therefore, we set
w = wp(Alarm=false | Burglary=false, Earthquake=false) = (1.0)(0.999) = 0.999
x = (~b,~e,~a).
-
JohnCalls is an evidence variable with value true.
Therefore, we set
w = wp(JohnCalls=true | Alarm=false) = (0.999)(0.05) = 0.05
x = (~b,~e,~a,j).
-
MaryCalls is not an evidence variable so we sample it; suppose it return false.
x = (~b,~e,~a,j,~m).
Based on the above we now add (~b,~e,~a,j,~m) to W with a weight of 0.05:
| W |
| Sample | Key | Weight |
| 1 | ~b | ~e | ~a | ~j | ~m | 0.997 |
| 2 | ~b | ~e | ~a | j | ~m | 0.05 |
Sample 3
-
Evidence is JohnCalls=true and MaryCalls=true.
We will now query the remaining nodes in the network to determine their state.
-
We now set the weight w is set to 1.0 and x to empty.
-
Burglary is not an evidence variable so we sample it; suppose it return false.
x = (~b).
-
Earthquake is not an evidence variable so we sample it; suppose it return false.
x = (~b,~e).
-
Alarm is not an evidence variable so we sample it; suppose it return true.
x = (~b,~e,a).
-
JohnCalls is an evidence variable with value true.
Therefore, we set
w = wp(JohnCalls=true | Alarm=true) = (1.0)(0.90) = 0.90
x = (~b,~e,a,j).
-
MaryCalls is an evidence variable with value true.
Therefore, we set
w = wp(MaryCalls=true | Alarm=true) = (0.90)(0.70) = 0.63
x = (~b,~e,a,j,m).
Based on the above we now add (~b,~e,a,j,m) to W with a weight of 0.63:
| W |
| Sample | Key | Weight |
| 1 | ~b | ~e | ~a | ~j | ~m | 0.997 |
| 2 | ~b | ~e | ~a | j | ~m | 0.05 |
| 3 | ~b | ~e | a | j | m | 0.63 |
Sample 4
-
Evidence is Burglary=false, Earthquake=false, and JohnCalls=true.
We will now query the remaining nodes in the network to determine their state.
-
We now set the weight w is set to 1.0 and x to empty.
-
Burglary is an evidence variable with value false.
Therefore, we set
w = wp(Burglary=False) = (1.0)(0.999) = 0.999
x = (~b).
-
Earthquake is an evidence variable with value false.
Therefore, we set
w = wp(Earthquake=False) = (0.999)(0.998) = 0.997
x = (~b,~e).
-
Alarm is not an evidence variable so we sample it; suppose it return false.
x = (~b,~e,~a).
-
JohnCalls is an evidence variable with value true.
Therefore, we set
w = wp(JohnCalls=true | Alarm=false) = (0.997)(0.05) = 0.05
x = (~b,~e,~a,j).
-
MaryCalls is not an evidence variable so we sample it; suppose it return false.
x = (~b,~e,~a,j,~m).
Based on the above we now add (~b,~e,~a,j,~m) to W with a weight of 0.05.
However, note that (~b,~e,~a,j,~m) matches the key for sample 2.
Therefore, the weight of 0.05 is added to the existing weight of 0.05 giving 0.10:
| W |
| Sample | Key | Weight |
| 1 | ~b | ~e | ~a | ~j | ~m | 0.997 |
| 2 | ~b | ~e | ~a | j | ~m | 0.10 |
| 3 | ~b | ~e | a | j | m | 0.63 |
Sample 5
-
Evidence is Burglary=true and Earthquake=false.
We will now query the remaining nodes in the network to determine their state.
-
We now set the weight w is set to 1.0 and x to empty.
-
Burglary is an evidence variable with value true.
Therefore, we set
w = wp(Burglary=True) = (1.0)(0.001) = 0.001
x = (b).
-
Earthquake is an evidence variable with value false.
Therefore, we set
w = wp(Earthquake=False) = (.001)(0.998) = 0.001
x = (b,~e).
-
Alarm is not an evidence variable so we sample it; suppose it return false.
x = (b,~e,~a).
-
JohnCalls is not an evidence variable so we sample it; suppose it return false.
x = (b,~e,~a,~j,~m).
-
MaryCalls is not an evidence variable so we sample it; suppose it return false.
x = (b,~e,~a,~j,~m).
Based on the above we now add (b,~e,~a,~j,~m) to W with a weight of 0.001.
| W |
| Sample | Key | Weight |
| 1 | ~b | ~e | ~a | ~j | ~m | 0.997 |
| 2 | ~b | ~e | ~a | j | ~m | 0.10 |
| 3 | ~b | ~e | a | j | m | 0.63 |
| 4 | b | ~e | ~a | ~j | ~m | 0.001 |
Using Likelihood Weights
We will now use the sampling data collected above to compute some probabilities.
-
In order to compute the probability of an event that is independent, such as P(Burglary=true), we
sum the weight for every sample where Burglary=true and divide by the sum of all of
the weights. For example, in the above data, the only sample where Burglary=true is
sample 4, with weight 0.001. Therefore, p(Burglary=true)
= (0.001) / (0.997 + 0.10 + 0.63 + 0.001) = 0.001 / 1.728 = 0.00058
-
In order to compute the probability of an event, X=true, that is dependent on another event,
Y=true, we sum the weights of all samples where X=true and
Y=true and divide it by the sum of the weights of all samples where Y=true.
For example, if we want to compute p(a | j), we need to sum the weights of all samples where
we have both a and j (meaning Alarm=True and JohnCalls=True). We find that
only sample 3 meets this criteria with a weight of 0.63. We now sum the weights of all samples
that have j. Only samples 2 and 3 meet this criteria with weights 0.10 and 0.63, respectively.
Putting this all together, we have p(a | j) = 0.63 / (0.10 + 0.63) = 0.63 / 0.73 = 0.863.
-
In the above data, the probability of an event that has never been observed is zero. This is
because we have information about every node in the alarm network in every sample. For example,
if we want to compute p(b | a), we need to sum the weights for all samples where we have both
b and a. There are no such samples. Therefore, the sum is zero and the probability is zero.
Book
Artificial Intelligence - A Modern Approach
Second Edition
Stuart J. Russel and Peter Norvig
Prentice Hall, Pearson Education, Inc., NJ
2003
ISBN 0-13-790395-2
References
Page 514 of the book.
Day 17 Slides 32-35