Creating Real Network With Expected Degree Distribution: A Statistical Simulation
Creating Real Network With Expected Degree Distribution: A Statistical Simulation
Creating Real Network With Expected Degree Distribution: A Statistical Simulation
Article
Received 16 December 2011; Accepted 12 February 2012; Published online 1 September 2012
IAEES
Abstract
The degree distribution of known networks is one of the focuses in network analysis. However, its inverse
problem, i.e., to create network from known degree distribution has not yet been reported. In present study, a
statistical simulation algorithm was developed to create real network with expected degree distribution. It is an
iteration procedure in which a real network, with the least deviation of actual degree distribution to expected
degree distribution, was created. Random assignment was used in the creation of connections. The Java
program was designed. It may produce adjacency matrix, connection details, and actual degree distribution of
the network created.
1 Introduction
Based on either a probability distribution (e.g., Poisson distribution) or a frequency distribution, we may create
a set of samples that follow the given distribution. The statistical characteristics of these theoretical
distributions can be inferred and represented by these samples. It is obvious that the estimates from samples
are either unbiased or biased. In present study, degree distribution means the frequency distribution of degree
in a network. Degree distribution and connection structure of known networks is one of the focuses in network
analysis (Dunne et al., 2002; Ibrahim et al., 2011; Paris and Bazzoni, 2011; Tacutu et al., 2011; Zhang, 2011;
Zhang and Zhan, 2011). However, its inverse problem, i.e., to create network from known degree distribution
has not yet been reported.
The present study aimed to present a statistical simulation algorithm that creates real network with
expected degree distribution.
2 Algorithm
The statistical simulation procedures for creating real network with expected degree distribution are described
bellow.
(1) Transform the expected degree distribution
IAEES www.iaees.org
Network Biology, 2012, 2(3):110-117 111
[d1, d2): f1
[d2, d3): f2
…
[dm, dm+1): fm
n1, s1
n2, s2
…
np, sp
where [di, di+1): interval of degree; fi: number of nodes which degree falls in the interval [di, di+1); nj: jth node;
sj: number of connections of jth node, and p=Σfi. There are totally Σsi connections in the network.
At this step, each node falling in the same degree interval is randomly attributed an integer number
(number of connections of this node) in this interval
sj=(int)((dk-dk-1)*Math.random()+dk).
(2) Rearrange p pairs of (node, number of connections), from larger to smaller in the number of
connections
ni, si
nj, sj
…
nq, sq
(3) For each node ni with si expected connections, randomly create ni–mi connections to other nodes and
each of ni–mi nodes has a connection, where mi is the number of connections already created by previous
nodes.
Given p constraints (p equalities for expected degree of p nodes), there will be p(p-1)/2 variables for
connections. Therefore in general there are multiple connection plans, i.e. multiple networks may be obtained.
Randomly creating connections to other nodes is a useful method. Moreover, the nodes with more connections
should be handled in this procedure because of some reasons. For example, a reason is that the nodes with
more connections are generally more important in the network.
(4) After all nodes are assigned connections, examine the symmetry of adjacency matrix, A=(aij), where
aij=1 denotes there exists a connection between two nodes i and j, and aij=0 denotes there is not connection
between nodes i and j. If aji≠aij, then let aji=aij=1.
(5) If the deviation tolerance of degree distribution or the number of simulations is reached, terminate
algorithm; or else return to (3). Here percent deviation of actual degree distribution to expected degree
distribution is defined as
IAEES www.iaees.org
112 Network Biology, 2012, 2(3):110-117
where ti: actual number of nodes in degree group i. If the number of simulations is reached, the results with the
minimal percent deviation during simulation will be shown.
The following codes are the main Java algorithm, netConnect, used by Java applet netConnCreate (see
http://www.iaees.org/publications/software/index.asp), to create a real network with expected degree
distribution:
import java.io.*;
public class netConnect {
public int s,inn[],cols[][],netcon[][];
public double err;
public netConnect(int m, int in[], int ff[], double er, int sim) {
int i,j,k,l,u,v,c,cs,f[],p[],w[],innn[],colss[][];
double errr;
s=0;
for(i=1;i<=m;i++) s+=ff[i];
f=new int[s+1];
p=new int[s+1];
w=new int[s+1];
inn=new int[m+1];
innn=new int[m+1];
cols=new int[s+1][s+1];
colss=new int[s+1][s+1];
k=u=0;
for(i=1;i<=m;i++)
for(j=1;j<=ff[i];j++) {
k++;
f[k]=(int)((in[i+1]-in[i])*Math.random()+in[i]);
u+=f[k]; }
netcon=new int[u+1][u+1];
for(i=1;i<=s;i++) p[i]=i;
for(i=1;i<=s-1;i++) {
k=i;
for(j=i;j<=s-1;j++)
if(f[j+1]>f[k]) k=j+1;
l=p[i];
p[i]=p[k];
p[k]=l;
u=f[i];
f[i]=f[k];
f[k]=u; }
k=0;
err=1e+50;
do {
for(i=1;i<=s;i++)
for(j=1;j<=s;j++) colss[i][j]=0;
loop: for(i=1;i<=s;i++) {
v=0;
for(j=1;j<=i-1;j++)
if (colss[j][i]==1) {
colss[i][j]=1;
v++;
if (v==f[i]) continue loop; }
c=0;
for(j=1;j<=s;j++) w[j]=j;
while (f[i]!=0) {
cs=(int)((s-c)*Math.random()+1);
if (w[cs]==i) continue;
if ((colss[w[cs]][i]==0) & (w[cs]<i)) continue;
colss[i][w[cs]]=1;
if (cs<s-c) for(j=cs+1;j<=s-c;j++) w[j-1]=w[j];
c++;
if (c>=(f[i]-v)) break; } }
for(i=1;i<=s;i++)
for(j=1;j<=s;j++)
if (colss[j][i]!=colss[i][j]) colss[j][i]=colss[i][j]=1;
for(i=1;i<=s;i++) {
IAEES www.iaees.org
Network Biology, 2012, 2(3):110-117 113
p[i]=0;
for(j=1;j<=s;j++)
if (colss[i][j]==1) p[i]++; }
for(i=1;i<=m;i++) {
innn[i]=0;
for(j=1;j<=s;j++)
if ((p[j]>=in[i]) & (p[j]<in[i+1])) innn[i]++; }
errr=0;
for(i=1;i<=m;i++) errr+=Math.abs(ff[i]-innn[i]);
errr/=s;
if (errr<err) {
err=errr;
for(i=1;i<=s;i++)
for(j=1;j<=s;j++) cols[i][j]=colss[i][j];
for(i=1;i<=m;i++) inn[i]=innn[i]; }
k++;
if (k>sim) break;
} while(errr>er);
System.out.println("\nAdjacency matrix:");
for(i=1;i<=s;i++) {
for(j=1;j<=s;j++)
System.out.print(String.valueOf(cols[i][j])+" ");
System.out.println(); }
System.out.println("\nNode "+"to Node"+" Connection");
for(i=1;i<=s;i++)
for(j=1;j<=s;j++)
if (cols[i][j]==1) {
System.out.print(String.valueOf(i)+" "+String.valueOf(j)+" 1"+"\n");
netcon[i][j]=1; }
System.out.println("\nExpected degree distribution of network:");
for(i=1;i<=m;i++)
System.out.println("["+in[i]+", "+in[i+1]+"): "+ff[i]);
System.out.println();
System.out.println("\nActual degree distribution of created network:");
for(i=1;i<=m;i++)
System.out.println("["+in[i]+", "+in[i+1]+"): "+inn[i]);
System.out.print("\nPercent deviation of actual degree distribution to expected degree distribution:
"+String.valueOf((int)(err*10000)/100.00)+"%\n");
}}
3 Results
Suppose there is an expected degree distribution for the network, in which there are nine degree groups
[1, 3): 1
[3, 5): 2
[5, 7): 4
[7, 9): 8
[9, 11): 5
[11, 13): 3
[13, 15): 2
[15, 17): 2
[17, 19): 1
In the data file for the algorithm, the expected degree distribution should be stored in the following format
124853221
1 3 5 7 9 11 13 15 17 19
Running the algorithm above, the parameter input interface of Java algorithm is shown as in Fig. 1
IAEES www.iaees.org
114 Network Biology, 2012, 2(3):110-117
0 1 0 0 1 1 0 0 0 1 1 1 0 1 0 1 1 1 1 1 1 1 1 0 1 1 0 0
1 0 1 1 1 1 0 0 0 1 0 0 1 1 1 0 1 0 1 1 0 1 1 0 0 0 0 1
0 1 0 0 1 0 1 1 0 1 1 0 0 1 0 1 0 1 0 0 1 1 0 1 0 0 1 1
0 1 0 0 0 1 1 0 0 1 1 0 1 0 1 1 1 1 0 1 0 1 0 0 0 0 0 1
1 1 1 0 0 0 0 1 1 0 1 1 0 0 0 0 1 1 0 0 0 1 0 0 0 1 0 0
1 1 0 1 0 0 0 0 1 1 0 1 1 0 0 1 0 0 1 1 0 1 0 1 0 0 0 0
0 0 1 1 0 0 0 0 0 0 1 1 0 0 0 0 1 1 1 0 0 1 0 1 1 1 0 0
0 0 1 0 1 0 0 0 0 1 0 0 1 1 1 0 0 0 1 1 0 1 0 0 0 1 0 1
0 0 0 0 1 1 0 0 0 0 0 1 0 0 0 1 1 1 1 0 0 0 1 1 0 0 0 0
1 1 1 1 0 1 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1
1 0 1 1 1 0 1 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 1
1 0 0 0 1 1 1 0 1 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 1
0 1 0 1 0 1 0 1 0 0 0 0 0 0 0 0 0 1 0 1 0 1 0 0 1 0 0 0
1 1 1 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 1
0 1 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 1 1 1 1 0 1 0 0 0 0 0
1 0 1 1 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 1
1 1 0 1 1 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
1 0 1 1 1 0 1 0 1 0 1 1 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0
1 1 0 0 0 1 1 1 1 1 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0
1 1 0 1 0 1 0 1 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0
1 0 1 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0
1 1 1 1 1 1 1 1 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
1 1 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 1
0 0 1 0 0 1 1 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0
1 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0
1 0 0 0 1 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 1 0 0 0
0 1 1 1 0 0 0 1 0 1 1 1 0 1 0 1 0 0 0 0 0 0 1 0 0 0 0 0
IAEES www.iaees.org
Network Biology, 2012, 2(3):110-117 115
IAEES www.iaees.org
116 Network Biology, 2012, 2(3):110-117
The connection details of the real network created are indicated in Table 1. It should be noted that
connection details in Table 1 can be easily transformed into adjacency matrix. The connection details are
equivalent to the adjacency matrix. The actual degree distribution of created network is
[1, 3): 0
[3, 5): 3
[5, 7): 4
[7, 9): 8
[9, 11): 5
[11, 13): 4
[13, 15): 2
[15, 17): 1
[17, 19): 1
Percent deviation of actual degree distribution to expected degree distribution is 14.28%. Thus the network
created is a better realization of expected network.
Using the data of connection details above, I draw a network using the software (Zhang, 2012), as
indicated in Fig. 2.
4 Discussion
In the algorithm, if allowed number of simulations reached but the percent deviation of actual degree
distribution to the expected degree distribution is still large, we may increase the number of simulations and
restart the algorithm. The tolerance deviation should not be too small in the algorithm.
IAEES www.iaees.org
Network Biology, 2012, 2(3):110-117 117
In addition to the algorithm above, other methods can also be used to create real network with expected
degree distribution. For example, connections may be obtained by using optimization techniques. But for large
networks the computation will be time-consuming.
References
Dunne JA, Williams RJ, Martinez ND. 2002. Food-web structure and network theory: the role of connectance
and size. Ecology, 99(20): 12917-12922
Ibrahim SS, Eldeeb MAR, Rady MAH. 2011. The role of protein interaction domains in the human cancer
network. Network Biology, 1(1): 59-71
Paris L, Bazzoni G. 2011. The polarity sub-network in the yeast network of protein-protein interactions.
Network Biology, 1(3-4): 149-158
Tacutu R, Budovsky A, Yanai H, et al. 2011.Immunoregulatory network and cancer-associated genes:
molecular links and relevance to aging. Network Biology, 1(2): 112-120
Zhang WJ, Zhan CY. 2011. An algorithm for calculation of degree distribution and detection of network type:
with application in food webs. Network Biology, 1(3-4): 159-170
Zhang WJ. 2011. Constructing ecological interaction networks by correlation analysis: hints from community
sampling. Network Biology, 1(2): 81-98
Zhang WJ. 2012. A Java software for drawing graphs. Network Biology, 2(1): 38-44
IAEES www.iaees.org