Introduction
Let ρ=Pareto(kˉ,γ) for a HSCM(n,ρ). A random graph in such HSCM can be generated using multiple methods, and this note describes the derivation process from the k-generator to the u-generator (inverse CDF method). See chapter 9 of the class notes from Network Science 2 for more details.
Pareto Distribution
PDF
The probability density function (PDF) of the Pareto distribution is given by:
P(k)=αkminαk−(α+1)
where kmin is the minimum value of k, α>0 is the shape parameter.
CDF
The cumulative distribution function (CDF) of the Pareto distribution can be derived from the PDF as follows:
C(k)=∫kminkP(x)dx=∫kminkαkminαx−(α+1)dx=αkminα∫kminkx−(α+1)dx=−α1αkminαx−α=−kminα[x−α]kmink=−kminα(k−α−kmin−α)=−kminαk−α+1=1−(kkmin)α
Note that the CDF takes values in the interval [0,1] for k≥kmin.
Mean
The mean of the Pareto distribution kˉ can be calculated as follows:
kˉ=∫kmin∞xP(x)dx=∫kmin∞x⋅αkminαx−(α+1)dx=∫kmin∞αkminαx−αdx=αkminα[−α+11x−α+1]kmin∞=(αkminα)(−−α+11kmin−α+1)=−−α+1αkmin
Since γ=α+1, i.e., α=γ−1, we can rewrite the above equation for kˉ as follows:
kˉ=−−γ+2γ−1kmin=kmin(γ−2γ−1)
Rearranging the above equation, we can express kmin in terms of kˉ and γ as follows:
kmin=kˉ(γ−1γ−2)(1)
k-generator
If we could directly sample from the Pareto distribution, we could generate a random graph by connecting nodes i and j independently with probability pk(ki,kj):
pk(ki,kj)=kikjkˉn+11(2)
ki is independently sampled from the Pareto distribution as
ki∼Pareto[kˉ,γ](k)
where
ki∈[kmin,∞]
u-generator: Inverse CDF sampling
Overview
Practically, we often use inverse CDF method to sample from the Pareto distribution. The inverse CDF method involves the following steps:
- Generate a uniform random number u∼Uniform(0,1). Note that 1−u∼Uniform(0,1) as well, so we can use u or 1−u interchangeably.
- Use the inverse function of the CDF to find the corresponding value of k from the uniform random number u.
Derivation of pu(ui,uj)
The CDF can be expressed with u as a function of k as follows:
u=1−(kkmin)α
Solving for k, we get the inverse CDF as follows:
(kkmin)αkkmink=1−u=(1−u)α1=kmin(1−u)−α1
Plugging equation (3) into equation (2), the probability of connecting two nodes i and j with degrees ki and kj respectively can be expressed in terms of ui and uj as follows:
pk(ki,kj)⟺pu(ui,uj)=kikjkˉn+11=kmin(1−ui)−α1⋅kmin(1−uj)−α1kˉn+11=kmin2(1−ui)−α1(1−uj)−α1kˉn+11=kmin2((1−ui)(1−uj))−α1kˉn+11=kmin2kˉn(1−ui)(1−uj)α1+11(3)
Now, let ν=kmin2kˉ. Since kmin=kˉγ−1γ−2 from equation (1),
ν=kmin2kˉ=(kˉ⋅γ−1γ−2)2kˉ=kˉ⋅kˉ−2⋅(γ−1γ−2)−2=kˉ1(γ−2γ−1)2
Recalling that u is interchangable with 1−u, we can rewrite the equation for pu(ui,uj) as follows:
pu(ui,uj)=ν((1−ui)(1−uj))α1+11=ν(uiuj)α1+11(4)