Skip to main content

Maximum-Entropy Ensembles of Graphs

Graph Ensembles

State Space

A graph ensemble is a probability distribution P(G)P(G) over a set of graphs, where each graph is a microstate. This statistical-mechanics view of networks is the starting point for maximum-entropy network models [1][2].

If we think of a graph ensemble consisted of 3 nodes, there are three possible undirected edges and therefore 23=82^3=8 graphs in the ensemble.

Graph Observables

Observables are functions of a graph. Examples include edge count L(G)L(G), node degrees ki(G)k_i(G), triangle count, clustering, or any other statistic we may want an ensemble to preserve.

Probability Distributions on Graphs

The probability distributions on the graph must satisfy

GP(G)=1.\sum_G P(G)=1.

If normalization is the only constraint, the maximum-entropy distribution is uniform over all graphs. For N=3N=3, that means P(G)=1/8P(G)=1/8 and the entropy is log28=3\log_2 8=3 bits.

Maximum Entropy

Unconstrained Maximum Entropy

Without constraints other than normalization, the entropy-maximizing distribution is uniform. This is the global maximum over the whole probability simplex.

For the eight graphs on three nodes,

P(G)=18,S=log28=3.P(G)=\frac{1}{8},\qquad S=\log_2 8=3.

Constrained Maximum Entropy

The maximum-entropy principle says to preserve only the facts we choose as constraints and otherwise use the distribution with the largest entropy [5]. For graph ensembles, those facts are often expected observables xa\langle x_a\rangle such as the expected total edge count and the degree distribution:

xa=GP(G)xa(G)=xa.\langle x_a\rangle=\sum_G P(G)\,x_a(G)=x_a^{*}.

Among all graph distributions satisfying those equations, choose the one with largest Shannon entropy:

S=GP(G)lnP(G).S=-\sum_G P(G)\ln P(G).

The result is the least-committed distribution consistent with the constraints.

Lagrange Multipliers

We maximize entropy over the probabilities of all graphs: P(G1),P(G2),P(G_1),P(G_2),\ldots. The constraints are equations that those probabilities must satisfy:

GP(G)=1,GP(G)xa(G)=xafor each observable a.\sum_G P(G)=1,\qquad \sum_G P(G)x_a(G)=x_a^*\quad\text{for each observable }a.

The Lagrangian L\mathcal{L} combines the entropy objective with terms enforcing normalization and the observable constraints:

L=GP(G)lnP(G)entropyα(GP(G)1)normalizationaλa(GP(G)xa(G)xa)observable constraints.\mathcal{L}= \underbrace{-\sum_G P(G)\ln P(G)}_{\text{entropy}} -\underbrace{\alpha\left(\sum_G P(G)-1\right)}_{\text{normalization}} -\underbrace{\sum_a\lambda_a\left(\sum_G P(G)x_a(G)-x_a^*\right)}_{\text{observable constraints}}.

At the maximum, changing one graph's probability while holding the others fixed should not improve the constrained objective. Thus L/P(G)=0\partial\mathcal{L}/\partial P(G)=0 gives

lnP(G)1αaλaxa(G)=0.-\ln P(G)-1-\alpha-\sum_a\lambda_a x_a(G)=0.

Solving this equation for P(G)P(G) gives P(G)eaλaxa(G)P(G)\propto e^{-\sum_a\lambda_a x_a(G)} . The normalization constant is the partition function ZZ.

Lagrange multipliers turn the constrained optimization into the Gibbs-Boltzmann form:

P(G)=eH(G)Z,H(G)=aλaxa(G),Z=GeH(G).\begin{aligned} P(G)&=\frac{e^{-H(G)}}{Z},\\ H(G)&=\sum_a \lambda_a\,x_a(G),\\ Z&=\sum_G e^{-H(G)}. \end{aligned}

The multipliers are fixed by the constraint equations xa=lnZ/λa=xa\langle x_a\rangle=-\partial\ln Z/\partial\lambda_a=x_a^{*}. In this page, the multipliers are best understood as fields conjugate to graph observables.

Canonical Ensembles

Canonical ensembles use soft constraints: observables are fixed in expectation, not exactly graph by graph.

Example: Expected Edge Count

Here, we use expected edge count as the running example of a canonical ensemble. The constrained observable is the number of edges, and the constraint is soft:

L=.\langle L\rangle=\ell.

The Hamiltonian is

H(G)=λL(G).H(G)=\lambda L(G).

An ensemble tuned to L\langle L\rangle still assigns probability to graphs with several possible realized edge counts. Only the average is pinned.

Partition Function

Because L(G)=i<jaijL(G)=\sum_{i<j}a_{ij}, the partition function factorizes over possible edges:

Z(λ)=GeλL(G)=(1+eλ)M,Z(\lambda)=\sum_G e^{-\lambda L(G)}=(1+e^{-\lambda})^M,

where M=(N2)M=\binom{N}{2} is the number of possible undirected edges.

Step-by-step

Write a graph GG by its adjacency matrix. Let aija_{ij} be the (i,j)(i,j) entry:

aij={1if edge (i,j) is present,0otherwise.a_{ij}= \begin{cases} 1 & \text{if edge }(i,j)\text{ is present},\\ 0 & \text{otherwise}. \end{cases}

For a simple undirected graph on NN labeled nodes, there are

M=(N2)M=\binom{N}{2}

possible edges. The edge count is the sum of these indicators:

L(G)=i<jaij.L(G)=\sum_{i<j}a_{ij}.

With the edge-count Hamiltonian H(G)=λL(G)H(G)=\lambda L(G), the partition function is

Z(λ)=GeλL(G)={aij}exp ⁣(λi<jaij).Z(\lambda) =\sum_G e^{-\lambda L(G)} =\sum_{\{a_{ij}\}} \exp\!\left(-\lambda\sum_{i<j}a_{ij}\right).

The exponential of a sum becomes a product:

exp ⁣(λi<jaij)=i<jeλaij.\exp\!\left(-\lambda\sum_{i<j}a_{ij}\right) =\prod_{i<j}e^{-\lambda a_{ij}}.

Therefore the sum over all graphs is a sum over all binary choices for all possible edges:

Z(λ)=a12=01a13=01aN1,N=01i<jeλaij.Z(\lambda) =\sum_{a_{12}=0}^{1}\sum_{a_{13}=0}^{1}\cdots \sum_{a_{N-1,N}=0}^{1} \prod_{i<j}e^{-\lambda a_{ij}}.

Because the product separates by edge, the sums factorize:

Z(λ)=i<jaij=01eλaij.Z(\lambda) =\prod_{i<j}\sum_{a_{ij}=0}^{1}e^{-\lambda a_{ij}}.

Each one-edge sum has only two terms:

aij=01eλaij=eλ0+eλ1=1+eλ.\sum_{a_{ij}=0}^{1}e^{-\lambda a_{ij}} =e^{-\lambda\cdot 0}+e^{-\lambda\cdot 1} =1+e^{-\lambda}.

Since the same factor appears once for each of the MM possible edges,

Z(λ)=GeλL(G)=(1+eλ)M,Z(\lambda)=\sum_G e^{-\lambda L(G)}=(1+e^{-\lambda})^M,

Erdos-Renyi G(N,p)

The factorization means every possible edge is an independent Bernoulli trial:

aijBernoulli(p),p=eλ1+eλ.a_{ij}\sim\mathrm{Bernoulli}(p), \qquad p=\frac{e^{-\lambda}}{1+e^{-\lambda}}.

Equivalently, a graph with L(G)L(G) present edges has probability

P(G)=pL(G)(1p)ML(G).P(G)=p^{L(G)}(1-p)^{M-L(G)}.
Step-by-step

Now substitute the factorized partition function back into the Gibbs distribution:

P(G)=eλL(G)Z(λ).P(G)=\frac{e^{-\lambda L(G)}}{Z(\lambda)}.

Using L(G)=i<jaijL(G)=\sum_{i<j}a_{ij} and Z(λ)=(1+eλ)MZ(\lambda)=(1+e^{-\lambda})^M,

P(G)=i<jeλaiji<j(1+eλ)=i<jeλaij1+eλ.P(G) =\frac{\prod_{i<j}e^{-\lambda a_{ij}}} {\prod_{i<j}(1+e^{-\lambda})} =\prod_{i<j} \frac{e^{-\lambda a_{ij}}}{1+e^{-\lambda}}.

This shows that the probability of a whole graph is a product of one-edge probabilities. For a single edge,

P(aij=0)=11+eλ,P(aij=1)=eλ1+eλ.P(a_{ij}=0)=\frac{1}{1+e^{-\lambda}}, \qquad P(a_{ij}=1)=\frac{e^{-\lambda}}{1+e^{-\lambda}}.

Define

p=eλ1+eλ.p=\frac{e^{-\lambda}}{1+e^{-\lambda}}.

Then

1p=11+eλ,1-p=\frac{1}{1+e^{-\lambda}},

so each edge probability can be written as

P(aij)=paij(1p)1aij.P(a_{ij})=p^{a_{ij}}(1-p)^{1-a_{ij}}.

Thus every possible edge is an independent Bernoulli trial:

aijBernoulli(p),p=eλ1+eλ.a_{ij}\sim\mathrm{Bernoulli}(p), \qquad p=\frac{e^{-\lambda}}{1+e^{-\lambda}}.

For a graph with L(G)L(G) present edges and ML(G)M-L(G) absent edges, the graph probability becomes

P(G)=i<jpaij(1p)1aij=pL(G)(1p)ML(G).P(G) =\prod_{i<j}p^{a_{ij}}(1-p)^{1-a_{ij}} =p^{L(G)}(1-p)^{M-L(G)}.

This is exactly the Gilbert random graph model G(N,p)G(N,p): each of the MM possible edges is included independently with probability pp.

The maximum-entropy calculation determines the exponential form of the distribution, but the constraint value determines the numerical value of λ\lambda. For the edge-count ensemble, L=Mp\langle L\rangle=Mp with M=(N2)M=\binom{N}{2}, so a target \ell implies p=/Mp=\ell/M and therefore

λ=log1pp=logM.\lambda=\log\frac{1-p}{p} =\log\frac{M-\ell}{\ell}.

Thus λ\lambda is the log-odds against an edge: positive values make sparse graphs more likely, negative values make dense graphs more likely, and λ=0\lambda=0 gives the uniform ensemble.

This is Gilbert's G(N,p)G(N,p) model: every possible edge is present independently with the same probability pp [3]. For N=3N=3, there are three trials, so L=3p\langle L\rangle=3p and k=2p\langle k\rangle=2p.

Edge-Count Distribution

A particular graph with LL present edges has probability

P(G)=pL(1p)3L.P(G)=p^L(1-p)^{3-L}.

Collapsing the eight graph probabilities by edge count gives a binomial distribution:

P(L)=(3L)pL(1p)3L.P(L)=\binom{3}{L}p^{L}(1-p)^{3-L}.

Microcanonical Ensembles

Microcanonical ensembles use hard constraints: an observable is fixed exactly on every graph in the support. Again, microcanonical says how the constraint is enforced; the observable says what is fixed.

Exact Edge Count

For edge count, the hard constraint is

L(G)=m.L(G)=m.

This gives the fixed-edge model G(N,m)G(N,m), the original Erdos-Renyi random graph model [4].

Uniform Distribution on a Constraint Set

The maximum-entropy solution is uniform over all graphs satisfying the hard constraint:

P(G)={1/{G:L(G)=m}if L(G)=m,0otherwise.P(G)= \begin{cases} 1/|\{G:L(G)=m\}| & \text{if } L(G)=m,\\ 0 & \text{otherwise}. \end{cases}

For N=3N=3 and m=1m=1, the constraint set has three graphs, so the entropy is log231.59\log_2 3\approx 1.59 bits.

Canonical vs Microcanonical

A chosen constraint value gives one entropy-maximizing distribution. Varying the constraint value gives a family of maximum-entropy distributions.

For one global edge-count constraint, the canonical model G(N,p)G(N,p) and microcanonical model G(N,m)G(N,m) become equivalent as NN\to\infty when mm and pp are matched by edge density. For small NN, they are visibly different: a canonical model tuned to L=1\langle L\rangle=1 still leaves probability on L=0,2,3L=0,2,3, while the microcanonical model puts all mass on L=1L=1.

Degree-Constrained Ensembles

The same maximum-entropy framework can constrain node degrees instead of only the total edge count. That gives the configuration-model family: the soft configuration model fixes expected degrees, while the hard configuration model fixes the degree sequence exactly. See Configuration model for that version.

Ensemble Equivalence

Global Constraints

Equivalence is most natural for a small number of global, self-averaging constraints. Edge count is the basic example. In G(N,p)G(N,p), the realized edge count LL fluctuates around MpMp. Equivalently, the realized edge density p^=L/M\hat p=L/M fluctuates around pp, but those fluctuations shrink as NN grows because M=(N2)M=\binom{N}{2} grows.

Local Constraints

Degree-sequence constraints are local and extensive: there is one constraint per node. The canonical ensemble allows each degree to fluctuate around its target; the microcanonical ensemble forbids those fluctuations exactly.

Canonical-Microcanonical Non-Equivalence

Canonical and microcanonical ensembles can fail to become equivalent when the number of constraints grows with the system. Graph ensembles with a fixed number of edges are equivalent in the large-graph limit, while ensembles with a fixed degree sequence need not be [7].

This is why G(N,p)G(N,p) and G(N,m)G(N,m) are usually interchangeable asymptotically, while soft and hard configuration models can remain measurably different.

Interactive Example for N=3

The smallest nontrivial case is N=3N=3. There are three possible undirected edges, so the full graph space has only 23=82^3=8 graphs. That makes it possible to show the entire probability distribution P(G)P(G) at once.

Use the tabs to compare three ways of assigning probabilities to the same eight graphs. The edge-count mode is the canonical G(N,p)G(N,p) ensemble: one expected edge count L\langle L\rangle fixes one probability pp shared by all edges. The degree mode replaces that global constraint with three local expected-degree constraints. Free play lets you draw an arbitrary distribution, so you can see what is lost when the distribution no longer has the maximum-entropy form.

Interactive maximum-entropy graph ensemble on three nodes. Choose a constraint on total edge count or on individual node degrees, or edit the distribution freely.

1.50
a single constraint shared by all three possible edges

p=L/3=0.50p=\langle L\rangle/3=0.50; λ=log1pp=0.00\lambda=\log\frac{1-p}{p}=0.00

Graph Probability Distribution P(G)
00.250.50.75113%13%13%13%13%13%13%13%0 Edges1 Edge2 Edges3 EdgesProbability
Edge Count Distribution P(L)
L=1.50\langle L\rangle=1.50
00.5113%01 Graph38%13 Graphs38%23 Graphs13%31 Graph
Entropy Versus Expected Edge Count
0123011.5233 Bits = Uniform Over All 8Expected Edges <L>Entropy (Bits)
Expected Edges L\langle L\rangle
1.50
Entropy
3.00 Bits
Max At This L\langle L\rangle
3.00 Bits
Entropy Gap
0.00 Bits
Blue: The Maximum (Ceiling)Coral: The Gap (What Structure Costs)

Probability Distribution of Graphs

The first chart shows the probability of each individual graph. The graphs are grouped by their realized edge count L=0,1,2,3L=0,1,2,3. In edge-count mode, all graphs with the same LL have the same probability because the Hamiltonian only contains L(G)L(G).

Edge Count Distribution

The second chart collapses those eight microstates onto the observable LL. This is where degeneracy appears: the levels L=0,1,2,3L=0,1,2,3 contain 1,3,3,11,3,3,1 graphs. The control pins the expectation L\langle L\rangle, not the realized value of LL, so the observed edge count still fluctuates.

For the edge-count ensemble, those fluctuations follow the binomial law for three Bernoulli trials:

P(L)=(3L)pL(1p)3L.P(L)=\binom{3}{L}p^{L}(1-p)^{3-L}.

Entropy Ceiling

The third chart compares entropy against the maximum possible entropy at the same expected edge count. The blue curve is

Smax(L)=3h ⁣(L/3),S_{\max}(\langle L\rangle)=3\,h\!\left(\langle L\rangle/3\right),

where hh is the binary entropy in bits. This is the constrained maximum-entropy envelope: for each value of L\langle L\rangle, it gives the entropy of the canonical distribution with that expected edge count.

The binary entropy function is

h(p)=plog2p(1p)log2(1p).h(p)=-p\log_2 p-(1-p)\log_2(1-p).

For example, when L=1.5\langle L\rangle=1.5,

p=L3=1.53=0.5,h(0.5)=1,p=\frac{\langle L\rangle}{3}=\frac{1.5}{3}=0.5, \qquad h(0.5)=1,

so

Smax(1.5)=3h(0.5)=3 bits.S_{\max}(1.5)=3h(0.5)=3\ \text{bits}.

This is the peak of the envelope: p=1/2p=1/2, every graph has probability 1/81/8, and the entropy reaches the unconstrained global maximum log28=3\log_2 8=3 bits.

Any extra structure beyond the mean edge count lowers entropy. In the node-degree mode, fixing three expected degrees also fixes L\langle L\rangle, but it adds node-specific information. The coral gap is the entropy spent to buy that extra structure. In free play, the same gap appears whenever the hand-drawn distribution has more structure than the maximum-entropy distribution at the same L\langle L\rangle.

References

  1. Park, J., & Newman, M. E. J. (2004). Statistical mechanics of networks. Physical Review E, 70, 066117.
  2. Squartini, T., & Garlaschelli, D. (2017). Maximum-Entropy Networks: Pattern Detection, Network Reconstruction and Graph Combinatorics. Springer.
  3. Gilbert, E. N. (1959). Random graphs. Annals of Mathematical Statistics, 30(4), 1141-1144.
  4. Erdos, P., & Renyi, A. (1959). On random graphs I. Publicationes Mathematicae, 6, 290-297.
  5. Jaynes, E. T. (1957). Information theory and statistical mechanics. Physical Review, 106, 620-630.
  6. Newman, M. E. J., Strogatz, S. H., & Watts, D. J. (2001). Random graphs with arbitrary degree distributions and their applications. Physical Review E, 64, 026118.
  7. Squartini, T., de Mol, J., den Hollander, F., & Garlaschelli, D. (2015). Breaking of ensemble equivalence in networks. Physical Review Letters, 115, 268701.
  8. Snijders, T. A. B., Pattison, P. E., Robins, G. L., & Handcock, M. S. (2006). New specifications for exponential random graph models. Sociological Methodology, 36(1), 99-153.