Earlier, we explored Bayesian Networks, which belongs to a family of graphical models that uses directed graphs. We also see that some distributions may have independence assumptions that cannot be perfectly represented by the structure of a Bayesian network.

There exists another technique for compactly representing and visualizing a probability distribution that is based on the language of undirected graphs. This class of models can compactly represent independence assumptions that directed models cannot represent. Undirected graphical models are typically called Markov networks, or also Markov random field.

We are going to start by talking about the simplest subclass of those which is pairwise Markov networks and we’re going to generalize it.

Pairwise Markov Networks

Pairwise Markov networks are a subclass of Markov networks that represent distributions where all of the factors are over single variables or pairs of variables.

Untitled

The above diagram is a model for four students, Alice, Bob, Charles and Debbie, who get together in pairs to work on their homework for a class. The random variables indicate whether these students have a misconception about the homework or not (binary assignment). The pairs that meet are shown via the edges in the undirected graph. Note that the edges connecting the nodes are bidirectional.

We are interested in building a model for all four individuals and that amounts to specifying $P(A,B,C,D)$. However, the key issue with parameterizing a Markov network is that the edges are undirected, i.e. we no longer have the notion of a conditional probability distribution. We’re therefore going to use the general notion of a factor.

Untitled

Notice that these factors are not in the range of 0-1. Now, what do these factors mean? These factors are compatibility factors which capture the affinities between adjacent variables.

$\phi_1(A,B)$ asserts that is more likely that Alice and Bob agree. It also adds more weight for the cases where they are both right then both wrong.
$\phi_2(B,C)$ describes that it is more likely for Bob and Charles to agree and assigns weights to the joint assignments where they are both right and both wrong equally
$\phi_4(D,A)$ has the same assertion as $\phi_2(B,C)$, except for Debbie and Alice
$\phi_3(C,D)$ indicates that Charles and Debbie tend to argue, so the most likely joint assignments are those where they end up disagreeing.

To define a global model that specifies $P(A,B, C,D)$, we can multiply these factors together:

$$ \tilde{p}(A, B, C, D) = \phi_1(A,B) \phi_2(B, C) \phi_3(C, D) \phi_4(D, A) $$

Note that $\tilde{p}$ denotes an unnormalized joint distribution. Thus, we need to normalize it to define a legal distribution. Specifically, we define