Earlier, we explored Bayesian Networks, which belongs to a family of graphical models that uses directed graphs. We also see that some distributions may have independence assumptions that cannot be perfectly represented by the structure of a Bayesian network.
There exists another technique for compactly representing and visualizing a probability distribution that is based on the language of undirected graphs. This class of models can compactly represent independence assumptions that directed models cannot represent. Undirected graphical models are typically called Markov networks, or also Markov random field.
We are going to start by talking about the simplest subclass of those which is pairwise Markov networks and we’re going to generalize it.
Pairwise Markov networks are a subclass of Markov networks that represent distributions where all of the factors are over single variables or pairs of variables.
The above diagram is a model for four students, Alice, Bob, Charles and Debbie, who get together in pairs to work on their homework for a class. The random variables indicate whether these students have a misconception about the homework or not (binary assignment). The pairs that meet are shown via the edges in the undirected graph. Note that the edges connecting the nodes are bidirectional.
We are interested in building a model for all four individuals and that amounts to specifying $P(A,B,C,D)$. However, the key issue with parameterizing a Markov network is that the edges are undirected, i.e. we no longer have the notion of a conditional probability distribution. We’re therefore going to use the general notion of a factor.
Notice that these factors are not in the range of 0-1. Now, what do these factors mean? These factors are compatibility factors which capture the affinities between adjacent variables.
To define a global model that specifies $P(A,B, C,D)$, we can multiply these factors together:
$$ \tilde{p}(A, B, C, D) = \phi_1(A,B) \phi_2(B, C) \phi_3(C, D) \phi_4(D, A) $$
Note that $\tilde{p}$ denotes an unnormalized joint distribution. Thus, we need to normalize it to define a legal distribution. Specifically, we define