Reference 6
Reference 6
Reference 6
For a general tutorial and introduction to UCINET see the online User's Guide which accompanies this
program.
An introduction to the general form of most help files in UCINET is contained in the Introduction Section
(see link below). Also below are links to the UCINET standard datasets together with help on the DL file
format.
Introduction Section
DL
Standard Datasets
DESCRIPTION Imports labels which are in text format into a UCINET dataset. The labels
should be separated by a carriage return and be of plain text.
PARAMETERS
Label File
Name of text file containing the labels
Import into:
Choices are:
Row Labels
Column Labels
Matrix Labels
TIMING N/A
COMMENTS None
REFERENCES None
FILE > DELETE
DESCRIPTION Both the header and the data files are deleted. Files should be separated by a
space.
PARAMETERS
File(s) to be deleted
List of files to be deleted. Data type: any UCINET file.
TIMING N/A
COMMENTS None
REFERENCES None
FILE>RENAME UCINET FILE
PARAMETERS
Original Dataset Name :
Name of file to be re-named
TIMING N/A
COMMENTS None
REFERENCES None
FILE>COPY UCINET DATASET
PARAMETERS
Original Dataset Name :
Name of dataset to be copied. Data type: any UCINET file.
TIMING N/A
COMMENTS None
REFERENCES None
Introduction
This file gives technical information about all the routines contained within UCINET.
The manual assumes that users have certain rudimentary knowledge of the Windows operating system and
of network terminology. Elementary information on UCINET is available in the accompanying users guide.
Each routine is documented in a standard way. This should help the user to understand some of the non-
standard routines once documentation for which they are familiar has been thoroughly digested.
Command Format
Each routine is documented using the following keywords: MENU, PURPOSE, PESCRIPTION,
PARAMETERS, LOG FILE, COMMENTS, and REFERENCES. The details of these are as follows:
MENU This gives the exact position of the routine within the UCINET menu system.
For example NETWORK>SUBGROUPS>K-PLEX can be found by first
selecting NETWORK on the top level of the menu and then from the pull down
submenu selecting SUBGROUPS and then finally from this submenu selecting
K-PLEX. The selection of all the options in the MENU list followed by a
mouse click will begin execution of the routine.
PURPOSE This gives a brief one or two line description of the routine.
DESCRIPTION Gives a fuller account of what the routine does. This description will include a
brief definition of some of the concepts required to understand the technique and
an outline of the algorithms employed. It should contain sufficient information
for a user to fully comprehend the action of the routine. An effort has been made
to make the descriptions succinct. Users should read descriptions carefully if
they are unfamiliar with the action of a particular algorithm.
PARAMETERS This gives a complete list of what information must be supplied by the user in
order to run a routine. It contains a list of all the information requested on the
forms when a routine is executed. This list is indented in such a way as to make
it clear what exactly appears on the forms.
For each entry on the form the manual gives the defaults provided by UCINET.
This can be useful in trying to locate files that have been created by the software,
or when re-running a particular routine with different parameters.
In addition the manual gives additional information (to the help line on the form)
about how to complete each entry on the form.
If the routine requires a dataset (which most usually do) then the manual
specifies precisely which type of data can be analyzed. These are as follows:
Valued graph - an n´n matrix. The entries are usually reals, sometimes there are
restrictions on the values to integers or the matrix to symmetric.
Square matrix - an n´n matrix. The entries are usually reals, sometimes there are
restrictions on the values to integers or probabilities. Obviously valued graph and
square matrix are the same data type, it is just convention which dictates usage.
Matrix - an n´m matrix. The entries are usually reals. These can be restricted to
binary or integer.
Each data type is contained within the next. So, for example, any routine that
accepts valued graphs will run on digraphs or graphs.
Some routines contain options which will run on different data types. In this
case the data type given in the manual is the most general. Certain options
dictated by the parameters may not run with this data type. It should be apparent
from the manual which data types will be applicable for the selected parameters.
Routines which take specific action on multirelational data have this indicated in
the data type specification. For example, the routine specified by
TRANSFORM>SEMIGROUP
has as its data type Digraph.Multirelational. This indicates that this routine acts
on multirelational data in a particular way. If this data type is not included and a
multirelational data set is submitted for analysis then UCINET will perform the
analysis on each relation separately, if possible. In some cases such an action
would not make network sense, and in other cases it is simply not technically
possible to do this. In these cases the routine only acts on the first relation.
LOG FILE The LOG FILE contains output generated by each routine. The contents of the
file are displayed on the screen and the user can browse, edit, save or print it.
For each routine a comprehensive account of the contents of the file is given.
TIMING The timing gives the order of the routine related to the longest dimension of the
data matrix, which is called N. Care should be taken on the interpretation of this
value since it only gives the order of the polynomial (if one exists) which
dictates the time. Hence a time O(N^3) means that for sufficiently large N the
time to execute will increase at the rate of N^3. It is quite possible for the user to
increase N for an O(N^3) routine by a factor of 2 say, and the execution time to
increase by 20-fold instead of the expected 8-fold increase. This would be
because N was not sufficiently large for the highest order to dominate. Equally
well it cannot be used to compare two different routines.
Whilst caution is wise for a strict interpretation, it will be true that for O(N^3)
routine doubling the size of N will probably cause the execution time to increase
by approximately a factor of 8. Timings which are exponential mean that the
user should be aware that small increases in N may cause very large increases in
execution time.
COMMENTS Additional comments which may be of help to the user are given in this section.
REFERENCES A 'sample' of useful references which should enable the interested user to gain
more information.
STANDARD DATASETS
Ucinet comes with a collection of network datasets. Multirelational data are stored,
where possible, in a single multirelational data file. Each relation within a
multirelational set is labelled and information about the form of the data is described
for each individual matrix.
DESCRIPTION All UCINET data files store the data as a matrix. Upon execution
of this routine a spreadsheet style editor is invoked. The
spreadsheet layout is very similar to that found on other
spreadsheets such as Excel, and hence should be familiar to
most users.
Each element of the data occupies a cell in the spreadsheet. The data matrix is
displayed exactly in matrix form. The user can move around the matrix using the
keys , ¯, ¬ and ® to move from one cell to an adjacent cell, and 'Page Up',
'Page Down', 'Home' and 'End' to move up one screen, down one screen, to the
beginning and to the end of the data respectively. When the cursor is located in a
particular cell the position of the cursor is recorded on the screen in terms
highlighted row and column numbers of the cell.
If the rows and/or columns are labeled then the labels are displayed at the top of
the screen. To edit or enter a new value in a particular cell then the cursor must
be placed in the relevant cell. The new value is typed at the keyboard and this
value appears at the top of the screen. Once the value has been correctly typed
then it is confirmed using the ENTER key. After ENTER has been depressed the
value is placed in the relevant cell.
Note that you can only type in the labels once some data has been filled in to the
relevant row or column. If you already know the size of your data then fill in the
last row and column entry first and you can type in the labels at the beginning. If
your data is symmetric click the Asymmetric mode button before you enter any
data this will automatically fill in the other half of your data. You need only enter
the non-zero values in the spreadsheet, once these have been filled in then click
on the button marked Fill all empty cells will be given a value of zero. If you
accidentally stray outside the size of your required matrix then you need to delete
the extra rows and columns rather than filling them in with blanks. If your data
has more than one relation then add the extra matrices using the + button on the
right side of the toolbar (the - can be used to delete relations). Individual matrices
within the network can be named using the rename sheet button situated just to
the right of the add and delete worksheet buttons.
The editor allows the access to some 2D and 3D graphics facilities. To utilize the
graphics load a UCINET dataset into the editor. Block the data that you wish to
display. Click on edit>copy to move the data onto the clipboard and then click on
edit>paste to deposit the data into the spreadsheet graphics facility. Finally click
on the graph button on the tool bar towards the right hand side just left of the
Symmetric/Asymmetric Mode button. The graphic wizard will take you through
the creation of your picture or chart.
The UCINET spreadsheet is limited to 255 columns and so this method cannot be
used for larger datasets
PARAMETERS N/A.
LOG FILE None.
TIMING Linear.
REFERENCES None.
DATA > RANDOM > MATRIX
PURPOSE Generate matrices where the cell values are drawn randomly from a variety of
possible distributions.
DESCRIPTION Generate a set of m´n matrices whose elements are random numbers drawn from
any of the following distributions - uniform, normal, binomial, Poisson, gamma
or exponential.
PARAMETERS
# of rows: (Default = 10).
The number of rows in the random matrix to be generated.
Choices are:
Uniform
Each cell value is taken from a [0,1] uniform distribution so that each cell value
is between 0 and 1. The mean is 0.5.
Normal
Each cell value is taken from a normal distribution.
Upon execution of the routine with this option a new window will appear with
the following parameters:
Binomial
Each cell is filled with the number of times an event with probability p occurs in
n trials.
Upon execution of the routine with this option a window will appear with the
following parameters:
Poisson
Each cell is filled with the number of times an event occurred in a unit interval of
time assuming a Poisson process.
Upon execution of the routine a window will appear with the following
parameter:
Gamma
Each cell is filled with the time taken for the kth occurrence of an event to occur
assuming the event follows a Poisson process with an average of one occurrence
per time period.
Upon execution of the routine a window will appear with the following
parameter:
Exponential
Each cell is filled with the time taken for the 1st occurrence of an event to occur
assuming the event follows a Poisson process with an average of one occurrence
per time period. The mean is 1.
Generator Seed:
A seed for random number generator. Use of the same number will create
exactly the same 'random' matrix twice. Any value from 1 to 32000 is
permissible. The default is randomly generated.
LOG FILE Generated random matrix. The cells of the random matrix will be of the
following type:
TIMING O(N^2).
COMMENTS None.
REFERENCES None.
DATA >RANDOM > SOCIOMETRIC
PURPOSE A random digraph is created in which edges are generated with the constraint
that each vertex has a user specified out-degree.
PARAMETERS
where filename is the name of the data file. The command ROW or COLUMN
followed by the appropriate number specifies which row or column of the dataset
is to be used.
TIMING O(N^2).
COMMENTS None.
REFERENCES None.
DATA > RANDOM > BERNOULLI
DESCRIPTION A random network is created in which edges are generated independently from a
Bernoulli distribution.
PARAMETERS
Number of nodes (Default = 10)
The size of the graph to be constructed.
Once an option has been selected the routine highlights parameters which are
dependent on the option selected.
MATRIX option:
ROW option:
CELL option:
TIMING O(N^2).
COMMENTS None.
REFERENCES None.
DATA > RANDOM > MULTINOMIAL
PURPOSE Generate random valued graphs in which the values are distributed by user
assigned probabilities.
DESCRIPTION The user specifies N, the total number of cases in the simulated "sample". The
algorithm randomly distributes the N cases into the cells of the adjacency
matrix. This distribution can either be uniform, in which case each cell has the
same probability of being assigned one of the cases, or the distribution can be
user specified. In this case the algorithm randomly assigns each case in
proportion to the cell probabilities. The probabilities can be specified by row,
column or individual cells. The result is a value for each directed arc in the
network.
PARAMETERS
Number of nodes (Default = 10)
Number of nodes in each valued adjacency matrix to be created.
Row*Column - two sets of probabilities are prescribed, one for the rows and
one for the columns. The probability for each cell is the product of the
probabilities prescribed for its row and column.
Once an option has been selected the routine highlights parameters which are
dependent on the option selected.
Row option
Row*Column option
Two datasets are provided row probabilities as in row option and column
probabilities as in column option.
Cell option
LOG FILE The log file contains a display of each random matrix.
TIMING O(N^2).
COMMENTS None.
REFERENCES None.
DATA > IMPORT>DL
PURPOSE Convert text (ie ASCII) data files in DL format to UCINET format.
DESCRIPTION Imports ASCII files, that is plain text files which are in DL format into UCINET.
These files can be created externally or using the UCINET text editor, more
information is contained in the users guide or in the DL help.
PARAMETERS
Input dataset:
Name of DL type file containing data to be imported. Data type: ASCII or text.
Output dataset:
Name of UCINET data file, this will be set to the same name as the text file by
default.
TIMING O(N^2).
COMMENTS None
DATA > IMPORT > PAJEK
PURPOSE Convert Pajek data files into UCINET format.
DESCRIPTION Imports Pajek files for use by UCINET, both the network in the form of an
adjacency matrix and the co-ordinates of the nodes in the plot may be imported.
PARAMETERS
Input dataset:
Name of file containing data to be imported. Data type: ASCII file.
TIMING O(N^2).
COMMENTS None.
REFERENCES None.
DATA > IMPORT > KRACKPLOT
PURPOSE Convert Krackplot data files into UCINET format.
DESCRIPTION Imports Krackplot files for use by UCINET both the network in the form of an
adjacency matrix and the co-ordinates of the nodes in the plot may be imported.
PARAMETERS
Input dataset:
Name of file containing data to be imported. Data type: ASCII file.
TIMING O(N^2).
COMMENTS None.
REFERENCES None.
DATA > IMPORT>UCINET 3
PURPOSE Convert UCINET 3 data into UCINET for windows format.
DESCRIPTION Imports UCINET 3 data into UCINET for windows format, this format is the
same as UCINET IV.
PARAMETERS
Input dataset:
Name of UCINET 3 file to be imported.
Output dataset:
Name of UCINET data file, this will be set to the same name as the input file by
default.
TIMING O(N^2).
COMMENTS None
DATA > IMPORT>RAW
PURPOSE Convert a text file (that is an ASCII file) containing a matrix into UCINET for
windows format.
DESCRIPTION Imports a text file (that is an ASCII file) containing a matrix into UCINET for
windows format. The datafile must be pure text with spaces, commas or carriage
returns between the characters.
PARAMETERS
Input dataset:
Name of text file to be imported.
# of columns
The number of columns in the data matrix.
# of rows
The number of rows in the data matrix
Output dataset:
Name of UCINET data file, this will be set to the same name as the input file by
default.
TIMING O(N^2).
DATA > IMPORT>EXCEL
PURPOSE Convert EXCEL files (4.0 or 5.0/95) into UCINET format.
DESCRIPTION Imports simple EXCEL files (4.0 or 5.0/95) into UCINET format. Note that the
spreadsheet must have no extras such as shading or borders.
PARAMETERS
Input dataset:
Name of EXCEL type file containing data to be imported.
Output dataset:
Name of UCINET data file, this will be set to the same name as the input file by
default.
TIMING O(N^2).
COMMENTS This is very sensitive and many users find it easier to copy and paste from their
spreadsheet into the UCINET spreadsheet. The easiest way is to copy the data
only (ie not the labels) paste into the UCINET spreadsheet by first blocking the
same dimensions as you wish to import. To import the labels save them and use
the label import feature in DESCRIBE.
DATA > IMPORT > NEGOPY
PURPOSE Convert text files formatted for the Negopy program into UCINET datasets.
DESCRIPTION Reads the .dat and .nam Negopy files and creates a UCINET dataset.
PARAMETERS
(2I3,1F5.1,1f3.1)
19 23 156.7 26.2
19 28 162.3 28.9
...
The first line is a Fortran format statement, required by Negopy but ignored by
UCINET. You can just put a blank line if you like. The second line indicates a
tie from person 19 to person 23, of strength 156.7 on the first relation, and of
strength 26.2 on the second relation.
(1I2,1X,1A30)
01 Billy-Bob
02 Johnny
...
TIMING O(N^2).
DESCRIPTION Converts UCINET data files into DL format, for a full description of the DL
format go to help dl .
PARAMETERS
Input dataset:
Name of file containing data to be exported. Data type: Matrix.
Full matrix
A complete N´N matrix;
Lowerhalf
Gives the lower-triangle and should only be used for symmetric matrices.
Upper half
Gives the upper-triangle and should only be used for symmetric matrices.
Nodelist1
This is used on binary matrices only. Each line of data consists of a row number
(call it i) followed by a list of column numbers (call each one j) such that x(i,j) =
1.
Nodelist1B
This is used on binary matrices only. Each line of data corresponds to a matrix
row (call it i). The first number on the line is the number of non-zero cells in that
row. This is followed by a list of column numbers (call each one j) such that x(i,j)
= 1. Note that rows must appear in numerical order, and none may be skipped
(unlike the Nodelist1 format).
Nodelist2
Each line begins with a row id number followed by a list of column id numbers
that are connected to that row number. For use in 2-mode matrices
Edgelist1
This format is used on data forming a matrix in which the rows and columns
refer to the same kinds of objects (e.g., an illness-by-illness proximity matrix, or
a person-by-person network). The 1-mode matrix X is built from pairs of indices
(a row and a column indicator). Pairs are typed one to a line, with indices
separated by spaces or commas. The presence of a pair i,j indicates that there is a
link from i to j, which is to say a non-zero value in x(i,j). Optionally, the pair may
be followed by a value representing an attribute of the link, such as its strength or
quality. If no value is present, it is assumed to be 1.0. If a pair is omitted
altogether, it is assigned a value of 0.0.
Edgelist2
This is used on data forming a matrix in which the rows and columns refer to
different kinds of objects (e.g., illnesses and treatments). The 2-mode matrix X is
built from pairs of indices (a row and a column indicator). Pairs are one to a line,
with indices separated by spaces. The presence of a pair i,j indicates that there is
a link from row i to column j, which is to say a non-zero value in x(i,j). If the pair
is followed by a value then this is the strength of the tie. If no value is present, it
is assumed to be 1.0. If a pair is omitted altogether, it is assigned a value of 0.0.
Output dataset:
Name of file to be created with .txt file extension.
TIMING O(N^2)
COMMENTS None.
REFERENCES None.
DATA>EXPORT >KRACKPLOT
PURPOSE Convert UCINET data files into Krackplot format.
DESCRIPTION Converts UCINET data files including co-ordinate and attribute files into
Krackplot format.
PARAMETERS
(Input) Network dataset:
Name of file containing data to be exported. Data type: Matrix.
TIMING O(N^2)
COMMENTS None.
REFERENCES None.
DATA>EXPORT >MAGE
PURPOSE Convert UCINET data files into Mage format.
DESCRIPTION Converts UCINET data files including co-ordinate files and attribute files into
Mage format for 3D visualization.
PARAMETERS
(Input) Network dataset:
Name of file containing network data to be exported. Data type: Digraph
Output File
Name of file to be created, normally the file extension should be .kin.
TIMING O(N^2)
COMMENTS None.
REFERENCES None.
DATA>EXPORT > PAJEK > NETWORK
PURPOSE Convert UCINET graph or digraph files into Pajek format together with any
categorical attribute files.
DESCRIPTION Converts UCINET data files into Pajek format, the conversion can take valued
data and dichotomize it during the export and also export associated categorical
attribute files together with co-ordinate files. The conversion will also
automatically delete isolated vertices if required.
PARAMETERS
(Input) Network dataset:
Name of file containing network data to be exported. Data type: Valued digraph.
TIMING O(N^2)
COMMENTS None.
REFERENCES None.
DATA>EXPORT > PAJEK > CATEGORICAL ATTRIBUTE
PURPOSE Convert UCINET categorical attribute files into a Pajek file.
DESCRIPTION Converts UCINET categorical attribute files into Pajek format ie Pajek clu files.
The conversion can take a matrix of attributes and create a set of Pajek clu files
one for each column of the matrix. These files can be used in Pajek to color the
nodes according to a particular attribute.
PARAMETERS
TIMING O(N)
COMMENTS None.
REFERENCES None.
DATA>EXPORT > PAJEK > QUANTITATIVE ATTRIBUTE
DESCRIPTION Converts UCINET quantative attribute files into Pajek format ie Pajek vec files.
The conversion can take a matrix of attributes and create a set of Pajek vec files
one for each column of the matrix. These files can be used in Pajek to change the
sizes of the nodes according to a particular attribute.
TIMING O(N)
COMMENTS None.
REFERENCES None.
DATA>EXPORT > METIS
PURPOSE Convert UCINET network files into Metis files.
DESCRIPTION Converts UCINET datafiles either binary or valued but only symmetric into data
files for the Metis partitioning software.
PARAMETERS
Input dataset
Name of UCINET data file containing network. Data Type: Valued symmetric
graph
Type of Data
Choices are Binary or Valued.
Output Dataset
Name of Metis file to be created, note there are no prescribed file extensions.
TIMING O(N)
COMMENTS None.
REFERENCES None.
DATA > EXPORT>RAW
PURPOSE Convert UCINET data files into raw format.
DESCRIPTION Converts UCINET data files into raw format, these are the same as the DL format
but without the headers, for full information of the DL formats go to help dl .
PARAMETERS
Input dataset:
Name of file containing data to be exported. Data type: Matrix.
Full matrix
A complete N´N matrix;
Lowerhalf
Gives the lower-triangle and should only be used for symmetric matrices.
Upper half
Gives the upper-triangle and should only be used for symmetric matrices.
Nodelist1
This is used on binary matrices only. Each line of data consists of a row number
(call it i) followed by a list of column numbers (call each one j) such that x(i,j) =
1.
Nodelist1B
This is used on binary matrices only. Each line of data corresponds to a matrix
row (call it i). The first number on the line is the number of non-zero cells in that
row. This is followed by a list of column numbers (call each one j) such that x(i,j)
= 1. Note that rows must appear in numerical order, and none may be skipped
(unlike the Nodelist1 format).
Nodelist2
Each line begins with a row id number followed by a list of column id numbers
that are connected to that row number. For use in 2-mode matrices
Edgelist1
This format is used on data forming a matrix in which the rows and columns
refer to the same kinds of objects (e.g., an illness-by-illness proximity matrix, or
a person-by-person network). The 1-mode matrix X is built from pairs of indices
(a row and a column indicator). Pairs are typed one to a line, with indices
separated by spaces or commas. The presence of a pair i,j indicates that there is a
link from i to j, which is to say a non-zero value in x(i,j). Optionally, the pair may
be followed by a value representing an attribute of the link, such as its strength or
quality. If no value is present, it is assumed to be 1.0. If a pair is omitted
altogether, it is assigned a value of 0.0.
Edgelist2
This is used on data forming a matrix in which the rows and columns refer to
different kinds of objects (e.g., illnesses and treatments). The 2-mode matrix X is
built from pairs of indices (a row and a column indicator). Pairs are one to a line,
with indices separated by spaces. The presence of a pair i,j indicates that there is
a link from row i to column j, which is to say a non-zero value in x(i,j). If the pair
is followed by a value then this is the strength of the tie. If no value is present, it
is assumed to be 1.0. If a pair is omitted altogether, it is assigned a value of 0.0.
Output dataset:
Name of file to be created with txt file extension.
TIMING O(N^2)
COMMENTS None.
REFERENCES None.
DATA > EXPORT > UCINET 3.0
PURPOSE Convert UCINET data files into Ucinet 3.0 format.
PARAMETERS
Input dataset:
Name of file containing data to be exported. Data type: Matrix.
Output format:
Choices are:
Binary
Non-Binary
Decimal places:
The number of decimal places to include.
TIMING O(N^2)
COMMENTS None.
REFERENCES None.
DATA>EXPORT>EXCEL
PURPOSE Export a UCINET dataset to Excel format.
PARAMETERS
Input dataset:
Name of dataset to be converted. Data type: any UCINET file.
Excel 5 and 7
Excel 4
TIMING N/A
COMMENTS None
REFERENCES None
DATA >ATTRIBUTE
PURPOSE Create a network from attribute data.
DESCRIPTION Convert a vector of valued attributes to a matrix based upon either exact
matches, differences, absolute differences, squared differences, product or sums
of the values.
PARAMETERS
Dataset containing attribute vector:
Name of data file containing vector of valued attributes. This vector must be a
row or column of a matrix , it can be the only row or column. Data type: Matrix
Exact Matches
Matrix X is formed by X(i,j) = 1 if vector(i) = vector(j) and 0 otherwise.
Difference
Matrix X is formed by X(i,j) = vector(i) - vector(j).
Absolute Difference
Matrix X is formed by X(i,j) = ABS (Vector(i) - vector(j)).
Squared Difference
Matrix X is formed by X(i,j) = (vector(i) - vector(j))^2.
Product
Matrix X is formed by X(i,j) = vector(i) * vector(j).
Sum
Matrix X is formed by X(i,j) = vector(i) + vector(j).
Output dataset:
Name of file which contains constructed matrix.
TIMING O(N^2).
COMMENTS None.
REFERENCES None.
DATA >AFFILIATIONS
PURPOSE Create a network from affiliation data.
PARAMETERS
Input dataset:
Name of file containing 2-mode dataset. Data type: Matrix
Row
Represents row by row matrix of overlaps, i.e. forms AA'
Column
Represents column by column matrix of overlaps, i.e. forms A'A.
TIMING O(N^2).
COMMENTS None.
REFERENCES None.
DATA>CSS
PURPOSE Combines a number of different relations or cognitive "slices" of the same
network into a single pooled network. These may either be a number of views of
the whole network or the view of the whole network through all ego centered
networks.
DESCRIPTION The input is a set of k adjacency matrices, each of the form A(i,j) stacked into a
three-dimensional matrix, A(i,j,k). This form is useful for cognitive social
structures, where k refers to the perceiver of a relation from i to j. This routine
compresses this 3-D matrix into a two-dimensional matrix, A'(i,j) using one of
two methods. One is to compute the element-wise sum over the k matrices:
A'(i,j) = SUM over k of A(i,j,k) This matrix can be dichotomized around a
threshold to produce a "consensus" structure.
PARAMETERS
Input dataset:
Name of file containing any set of matrices representing the same network. Data
type: Valued graph. Multirelational.
Slice. Take an individuals view of the network. This simply extracts a single
matrix from the structure.
Row LAS. Construct a matrix which uses each respondents row as a row in the
data matrix. The result is that each row of the data corresponds to the
respondents perception of that row.
Median LAS. Construct a matrix with values A(i,j) which are the median of i's
value of the i,j connection and j's value of the connection.
Consensus. The consensus takes the sum of all the respondents and then
dichotomises the sum.
If the users choose either Slice or Consensus then the following parameters will
be highlighted.
TIMING O(N^2)
COMMENTS None.
PARAMETERS
Data Set Filename
Name of file to be displayed. Data type: Matrix.
TIMING Linear.
COMMENTS 'Width of Field' should be greater than # of places of decimals. If this is not the
case data is still displayed with no spaces between cells causing the labels to be
incorrectly aligned.
REFERENCES None.
DATA>DESCRIBE
PURPOSE Gives a description of a UCINET dataset and allows the user to import, enter or
edit the labels
DESCRIPTION Displays information contained in UCINET header file, this includes the data
type; number of dimensions, size of matrix, title and labels. The labels can be
edited, entered or imported. To edit an existing label simply double click on the
label and perform the edit. The edits will only be kept if the file is saved using
the 'save as' button. To type in a new set of labels change the label flag from false
to true and double click in the label box. Proceed as an edit remembering to save
the file when you have finished. You can import labels saved in ASCII by
clicking on the import button and then entering the appropriate file name.
PARAMETERS None
TIMING Linear.
COMMENTS None.
REFERENCES None.
DATA>EXTRACT
PURPOSE To extract parts of a dataset from a UCINET dataset.
DESCRIPTION Extracts by means of specified lists rows, columns or matrices from UCINET IV
datasets.
PARAMETERS
Input dataset:
Name of file from which data is to be extracted. Data type: matrix.
LOG FILE Newly created dataset with labeled rows and columns.
TIMING O(N^2)
COMMENTS None.
REFERENCES None.
DATA>EGONET
PURPOSE Construct an ego centered network from the whole network
DESCRIPTION The neighborhood of an actor is the set of actors they are connected to together
with the actors that are connected to them. An ego centered network is the
subgraph induced by the set of neighbors. That is the network that consists of all
the neighbors and the connections between them. The idea of an ego network
can be extended to a group of actors and the neighborhood is simply the union
of the neighborhoods of the group. This procedure returns the adjacency matrix
of the ego network and provides an option to include or exclude ego(s) from the
network
Focal Nodes
The node or nodes on whom the neighborhood will be built. Nodes are specified
by a list. Each node is listed separated by a comma or space. The keywords TO,
FIRST and LAST are permissible. Hence FIRST 3, 5 TO 7, 10, 12 would give
nodes 1, 2, 3, 5, 6, 7, 10 and 12. Lists kept in a UCINET dataset can be used.
Enter the filename followed by ROW (or COLUMN) and a number to specify
which row or column of the file to use.The list must be specified using a binary
vector where a 1 in position k indicates that vertex k is a member of the list, a
zero indicates that k is not a member.
TIMING O(N)
COMMENTS None
REFERENCES None
DATA > UNPACK
DESCRIPTION Unpacks some or all matrices from a UCINET multirelational dataset. This
routine is similar to extract for matrices except it places each extracted matrix as
a single UCINET dataset. Hence extracting n matrices results in n different
single datasets.
PARAMETERS
Input dataset:
Name of file from which data is to be unpacked. Data type: matrix
multirelational.
TIMING Linear
COMMENTS None.
REFERENCES None.
DATA>JOIN
PURPOSE Combine UCINET data files to form a single data file. Combines sets of single
matrices into a new matrix by merging all rows or all columns. Also combines
sets of single matrices or multi-relational matrices into one multi-relational
matrix.
DESCRIPTION Combines sets of single matrices, with equal columns, row wise into a larger
matrix. If A1, A2 ... AN are all matrices with R1, R2, ... RN rows respectively
and C columns then these are merged into the R1 + R2 +...+ RN by C matrix
(A1 A2 ... AN) transpose.
Also combines sets of single matrices, with equal rows, column wise into a larger
matrix. If A1, A2, ... AN are all matrices with R rows and C1, C2, ... CN
columns respectively then these are merged into the R by C1 + C2 + CN matrix
(A1 A2 ... AN).
Certain UCINET routines permit the analysis of multiple relations on the same
set of actors. This routine can create a single data file which brings together all
the relevant networks or matrices and makes them suitable for analysis.
PARAMETERS
Files selected:
Names of datasets each containing one or more matrices. The names should be
entered in the order required in the merged data set. To enter a file, highlight one
or more files in the Possible Files and click on the > button and they will be
moved across. Clicking on < moves the files back. All possible files can be
moved across by clicking on >> or <<. To select more than one file press Ctrl and
then click. The files will be placed in the order they are selected.
Rows
Matrices combine row-wise creating extra rows. Each matrix must be a single
relation with an equal number of columns.
Columns
Matrices combine column-wise creating extra columns. Each matrix must be a
single relation with an equal number of rows.
Matrices
Matrices appended as additional matrices or relations. Networks must all have
the same dimensions.
If Columns has been selected then the new columns are labeled in a similar way
to Row labels described above.
TIMING Linear.
COMMENTS None.
REFERENCES None.
DATA > PERMUTE
PURPOSE Re-order rows, columns or matrices in a dataset according to a user specified list.
DESCRIPTION Re-ordering of matrices can be by a list given at the keyboard or from a dataset.
PARAMETERS
Input dataset:
Name of dataset to be permuted. Data type: Matrix
A UCINET data file can be specified which contains the order. This must be of
the form
where file name is the name of the data file. The command ROW or COLUMN
followed by the appropriate number specifies which row or column of the dataset
is to be used. The keyword RANDOM is also allowed.
TIMING O(N^2).
COMMENTS There is a limitation of 255 characters on keyboard entered lists. Lists longer
than 255 characters must be specified in a UCINET dataset.
REFERENCES None.
DATA > SORT
PURPOSE Re-orders nodes in a network so that they correspond to the monotonic ordering
of a prescribed vector.
DESCRIPTION Arranges the nodes of a network so that they are in the same order as an external
vector.
The sort can be either ascending or descending. Hence if the ASCENDING
option is chosen and the external vector is (V1, V2, ... VN), the nodes would be
ordered so that node i would be before node j if and only if Vi £ Vj. The external
vector can be selected from the rows or columns of any UCINET data matrix.
PARAMETERS
Input dataset
Name of dataset to be sorted. Data type: Matrix
Dimensions to be arranged:
Choices are:
Ascending
Gives a sort which corresponds to placing the elements of the prescribed vector
in the order from smallest to largest.
Descending
Gives a sort which corresponds to placing the elements of the prescribed vector
in the order from largest to smallest.
where <dataset> is the name of the dataset containing the criterion vector. The
command ROW or COLUMN followed by the appropriate number specifies
which row or column of the dataset is to be used.
Alternatively, a list of values may be entered, one for each row or column being
sorted. Each list entry is separated by a comma or a space. There must be as
many values as rows or columns being sorted.
To sort in ascending or descending order the dataset itself should be used as the
key.
TIMING O(N*LOG(N)).
COMMENTS User prescribed SORT to a keyboard list is provided by the routine PERMUTE
REFERENCES None.
DATA >TRANSPOSE
PURPOSE Take the transpose of a matrix.
DESCRIPTION Interchanges the rows and columns of a matrix. Note that this corresponds to
taking the converse of a directed graph. That is, reversing the direction of every
arc.
PARAMETERS
Output dataset (Default = 'Transpose')
Name of file containing transposed data.
TIMING O(N^2).
COMMENTS More complicated transposes for three-dimensional matrices can be done using
TOOLS>MATRIX>ALGEBRA
REFERENCES None.
DATA >PARTITION TO SETS
PURPOSE Transforms a partition indicator vector into a group by actor incidence matrix and
display partition by groups.
DESCRIPTION A partition indicator vector has the form (k1,k2,...,ki...) where ki assigns vertex i
to group ki. So that (1 1 2 1 2) assigns vertices 1, 2 and 4 to block 1; and 3 and
5 to block 2. A group by vertex incidence matrix has vertices as its columns and
the groups as the rows. A 1 in row i column j indicates that actor j is a member
of group i; the values are zero otherwise.
PARAMETERS
Input dataset:
Partition indicator vector. This can either be entered at the keyboard by
specifying the elements of the vector, each number separated by a comma or
space or as a UCINET dataset.
For partitions kept in a UCINET data file enter the filename followed by ROW
(or COLUMN) and a number to specify which row or column of the file to use.
Data type: Partition indicator vector.
LOG FILE A list of the groups. Each group is numbered and specified by the vertices it
contains.
TIMING O(N^2)
COMMENTS Partition indicator vectors enters using the keyboard are restricted to 255
characters. Longer vectors should be specified using a UCINET dataset.
REFERENCES None.
DATA >RESHAPE
PURPOSE Reorganize the data into different size matrix or matrices.
DESCRIPTION This routine treats any input data as one long list. The list is formed row by row
and, if applicable, level by level. The new matrix is then filled up row by row
and then level by level from this list.
PARAMETERS
# of rows desired (Default = 0)
Number of rows in reshaped matrix.
TIMING O(N^2).
COMMENTS None.
REFERENCES None.
DATA > CREATE NODE SETS
PURPOSE To create a group indicator vector based on comparing two vectors or a vector
and a number.
DESCRIPTION Given a vector of attributes or values for every actor and a threshold number
then this routine selects actors which are have a value which is less than (or
greater than) the threshold. More generally the threshold can itself be a vector so
that actors are selected if they have a value less than (or greater than) the value
in the corresponding cell in the threshold vector. An example of using two
vectors would be the selection of actors whose closeness centrality is less than
their degree centrality.
PARAMETERS
Variable 1:
Name of file from which contains value or attribute vector this must be a
UCINET data file. Enter the filename followed by ROW (or COL) and a number
to specify which row or column of the file to use.
Relational Operator
Criterion by which to compare the actor values or attributes.
Choices are:
LT -Less than
LE -Less than or equal to
EQ -Equal to
NEQ -Not equal to
GE -Greater than or equal to
GT -Greater than
Variable 2
The threshold value or vector. If a single value is required then this can be typed
in directly. Vectors must be specified using a UCINET data file, enter the
filename followed by ROW (or COLUMN) and a number to specify which row
or column of the file to use.
TIMING Linear
COMMENTS The group indicator vector can be used in routines such as Extract
REFERENCES None.
TRANSFORM > BLOCK
PURPOSE Partition nodes in a data graph into blocks and calculate block densities, sums or
other statistics.
DESCRIPTION The adjacency matrix is partitioned into submatrices. The average, sum,
maximum, minimum, standard deviation, or sum of squares of each submatrix is
then calculated.
This routine is virtually identical to the Networks>Properties>Density routine,
except that it provides more options for aggregating cells within a matrix block.
PARAMETERS
Input dataset:
Name of file containing matrices to be blocked. Data type: Matrix.
LOG FILE List of block numbers together with their members. The pre-image matrix ie the
permuted original data matrix. Blocked matrices. A blank in the matrix indicates
that a matrix value (such as the average), was undefined.
TIMING O(N^2)
COMMENTS Users who wish to produce a binary image matrix from the output of this routine
can obtain one by using Transform>Dichotomize.
REFERENCES None.
TRANSFORM>COLLAPSE
DESCRIPTION Combines row, columns or both simultaneously to form a new smaller matrix.
The value of the combined cells can either be the average, the sum, the maximum
or the minimum of the set of cells which are to be collapsed.
PARAMETERS
Input dataset
Name of file containing matrix to be collapsed. Data type: Matrix.
Choices are:
Each new line must commence with one of these keywords. Each keyword is
followed by a list of the rows, columns or nodes which are to be collapsed. The
list has elements separated by spaces or commas, the keywords TO is
permissible. For example:
ROWS 1 3 4
COLS 2 TO 4
COLS 1, 6
LOG FILE A list of assignments of rows and columns to blocks. The blocks specify the new
row and column numbers for each of the old row and column numbers.
The collapsed matrix. Each row or column is labeled. Rows or columns that
have been collapsed are labeled by B followed by their block number. Rows or
columns which have not been collapsed retain the label R (for row) or C (for
column) followed by their row or column number.
TIMING O(N^2).
COMMENTS None.
REFERENCES None.
TRANSFORM>RECODE
DESCRIPTION The routine allows the user to change values or a range of values in a matrix to a
new value. Up to 5 values or ranges can be recoded.
PARAMETERS
Input dataset
Name of dataset to be recoded. Data type: Matrix.
If the values x, y and z are entered so that the completed line reads
then all values of the matrix in the range from x to y inclusive are changed to the
value z. To change a single value set both x and y to the value. Note that the
value na can be used for missing values.
TIMING O(N^2).
COMMENTS None.
REFERENCES None.
TRANSFORM>REVERSE
DESCRIPTION Subtract each value of the matrix from the sum of the maximum and minimum
entries.
PARAMETERS
Input dataset:
Name of file containing matrix to be reversed. Data type: Matrix.
TIMING O(N^2).
COMMENTS None.
REFERENCES None.
TRANSFORM>DICHOTOMIZE
DESCRIPTION Given a specified cut-off value then the valued matrix is made binary by
comparing each element with the cut-off value. Comparisons can be strictly
greater, greater than or equal, equal, less than or equal or strictly less than.
PARAMETERS
Input dataset:
Name of matrix to be dichotomized. Data type: Matrix.
GT - Matrix values replaced by a 1 if they are strictly greater than the cut-off
value and 0 otherwise.
GE - Matrix values replaced by a 1 if they are greater than or equal to the cut-off
value and 0 otherwise.
EQ - Matrix values replaced by a 1 if they are equal to the cut-off value and 0
otherwise.
LE - Matrix values replaced by a 1 if they are less than or equal to the cut-off
value and 0 otherwise.
LT - Matrix values replaced by a 1 if they are strictly less than the cut-off value
and 0 otherwise.
TIMING O(N^2).
COMMENTS None.
REFERENCES None.
TRANSFORM > DIAGONAL
DESCRIPTION Set the diagonal of a matrix to a new value. Save the diagonal of a matrix.
PARAMETERS
Input dataset
Name of file on which to perform the transformations. Data type: Square matrix.
TIMING O(N).
COMMENTS None.
REFERENCES None.
TRANSFORM>SYMMETRIZE
PURPOSE Change an unsymmetric matrix into a symmetric matrix by using one of a variety
of criteria.
DESCRIPTION Produces a symmetric square matrix by one of the following methods. Replace
xij and xji by their maximum, minimum, average, sum, absolute difference,
product or xij/xji (provided xji is non zero) i < j. Alternatively make the lower
triangle equal the upper triangle or the upper triangle equal the lower triangle.
The routine also produces a symmetric matrix with binary values on all off-
diagonal by replacing xij and xji by 1 if xij > xji for i £ j. The > operation in xij >
xji can be replaced by ³, =, <, £.
PARAMETERS
Input dataset:
Name of file containing matrix to be symmetrized. Data type: Square matrix.
Handle missing
Specify how to treat missing data in the symmetrization process. Choose the
non-missing value allows the user to reduce or even eliminate the number of
missing values in the data. Both missing means that if either value is missing
then this is recorded as missing in the symmetrized data.
TIMING O(N^2).
COMMENTS None.
TRANSFORM>NORMALIZE
PURPOSE Normalize the values in a matrix.
Each technique can be applied to either the whole matrix or just the rows or
columns. In addition an iterative facility is provided to Normalize both rows and
columns simultaneously. These operate on the matrix as follows:
Z-Score: standardizes the mean to be zero and the standard deviation to be one.
This is achieved by subtracting from every row, column or matrix element the
current mean and then dividing the rows, columns or matrix by the current
standard deviation.
The routine also allows each of these options to be applied to the rows and
columns simultaneously. This involves an iterative procedure in which the
technique is first applied to the rows and then the columns and then the rows etc.
It is terminated when (and if) there is convergence.
PARAMETERS
Input dataset
Name of file containing matrix to be standardized. Data type: Matrix.
Z-Score - Forces the mean of the elements to be zero and the standard deviation
to be 1. By row, column, matrix or row and column. If standard deviation is
initially zero then elements of matrix are treated as missing.
TIMING O(N^2).
COMMENTS None.
REFERENCES None.
TRANSFORM > BIPARTITE
DESCRIPTION Any 2-mode incidence matrix can be thought of as a bipartite graph. If the 2-
modes are actors and events then the bipartite graph consists of the union of the
actors and events as vertices with the edges only connecting actors with events
(ie no connections between actors or between events). This routine takes a 2-
mode incidence matrix and converts it to a 1-mode adjacency matrix of a
bipartite graph. If the incidence matrix had n rows and m columns then the
resultant adjacency matrix would be a square matrix of dimension m+n.
PARAMETERS
Input 2-mode dataset:
Name of file containing incidence matrix.
Output dataset:(Default='bi')
Name of file containing adjacency matrix of bipartite graph.
TIMING Linear.
COMMENTS None.
REFERENCES None.
TRANSFORM > INCIDENCE
PURPOSE Convert an adjacency matrix to an incidence matrix.
DESCRIPTION An incidence matrix is a node by edge matrix. The rows represent the nodes of a
graph and the columns the edges. A one in row i column j indicates that node i is
incident to edge j. This representation is often called the hypergraph
representation.
PARAMETERS
Input dataset:
Name of file containing adjacency matrix. Data type: Digraph
TIMING O(N^2).
COMMENTS None.
REFERENCES None.
TRANSFORM > LINEGRAPH
PURPOSE Construct the line graph of a graph or network.
DESCRIPTION The line graph of a graph G is the graph obtained by using the edges of G as
vertices, two vertices being adjacent whenever the corresponding edges are. In a
digraph the arcs of a digraph are the vertices and two vertices are adjacent if the
corresponding arcs induce a walk.
PARAMETERS
Input Dataset:
Name of file containing graph from which to create the line graph. Data type:
Digraph.
LOG FILE Adjacency matrix of the line graph vertices labeled with corresponding edges
from original graph
TIMING O(N^2).
COMMENTS Note that multirelational data cannot be converted to line graph format. Users
should do each relation separately.
REFERENCES None.
TRANSFORM > MULTIGRAPH
PURPOSE Convert a valued graph into a set of binary graphs.
DESCRIPTION A single binary graph is created for each different value of a valued graph. All
created graphs are stacked in a single dataset.
PARAMETERS
Input dataset:
Name of file containing valued data. Data type: Valued graph.
where Mijk is the (i,j) entry of the kth adjacency matrix, xij is the (i,j) entry of the
input data, and wk are the ordered values of the weights of the valued data placed
in ascending order.
TIMING O(N^2).
COMMENTS The number of relations constructed will correspond to the number of different
values. Care should be taken not to enter datasets that will create a large number
of binary graphs.
REFERENCES None.
TRANSFORM>MULTIPLEX
PARAMETERS
Input dataset:
Name of file that contains multirelational binary network data. Valued data are
automatically converted to multirelational binary data using a technique
identical to Multigraph. Data type: Digraph. Multirelational.
TIMING Exponential.
COMMENTS In the worst case, the timing for the algorithm is exponential. The timing
depends on the number of possible bundles; up to 2 to the power N bundles can
occur when there are N different relations.
REFERENCES None.
TRANSFORM > SEMIGROUP
PURPOSE Construct the semigroup of a graph, digraph or multirelational graph.
This routine finds all members of the semigroup, or members of the semigroup
up to a certain length of product. In addition the semigroup is specified by a
multiplication table.
PARAMETERS
Input dataset:
Name of file containing adjacency matrix or matrices. Data type: Digraph.
Multirelational.
Each row (and column) is labeled with the compound relation number. The rows
also give the word that accounts for the compound. Hence if row 6 is labeled 1 1
2 1 then relation 6 is the matrix obtained by Boolean matrix multiplication of the
original relations numbered 1 1 2 1 in that order. The value in row i column j is
the result of the Boolean matrix multiplication of relation i and relation j.
If the word length is not sufficient to generate all elements of the semigroup then
the right multiplication table of the generated elements is displayed. This table
gives the product of the generated elements with the input matrices.
REFERENCES None.
TOOLS > MDS > METRIC
PARAMETERS
Input dataset
Name of file containing proximity matrix. Data type: Square symmetric matrix.
No of dimensions: (Default = 2)
Number of dimensions to use in representing items in Euclidean space.
LOG FILE The output first gives a 2D scatterplot of the first pair of co-ordinates. The x-axis
is the first co-ordinate set and the y-axis is the second. The scatterplot can be
saved or printed. Simple editing can be achieved using the options button. The
labels can be turned on or off and values can be attached to the points (or
removed). The scales can also be changed. More advanced editing is possible by
double clicking in the plot, this invokes the chart wizard. To find the label
attached to a single point when all the labels are moved click on a single point,
this will highlight all the points, then click a second time to highlight one vertex.
Now double click on the vertex and the label will be highlighted in the chart
designer. The save button and the save chart data option allow the user to save
all the chart data into a file which can be reviewed using
Tools>Scatterplot>Review. The chart itself can be saved as a windows metafile
which can then be read into a word processing or graphics package. Only one
chart can be open at one time and the chart window will be closed if you click on
any other UCINET window. Behind the chart is a numeric display of coordinates
of each point in space together with information about the stress.
TIMING O(N^4)
COMMENTS MDS solutions are not unique, and they are subject to convergence to local
minima. The first point means that two or more maps can be equally good (same
stress) but place points in radically different locations. The second point means
that it is possible for the algorithm to fail to find the configuration with least
stress. If you suspect this has happened, run the program several times using
random starting configurations. Stress values below 0.1 are excellent and above
0.2 unacceptable.
This routine only works if the regional settings are set to UK or USA. If you do
not have these regional settings and do not get a plot then change them in the
settings control panel on your machine.
REFERENCES Gower
TOOLS > MDS > NON-METRIC
PARAMETERS
Input dataset
Name of file containing proximity matrix. Data type: Square symmetric matrix.
No of dimensions: (Default = 2)
Number of dimensions to use in representing items in Euclidean space.
LOG FILE The output first gives a 2D scatterplot of the first pair of co-ordinates. The x-axis
is the first co-ordinate set and the y-axis is the second. The scatterplot can be
saved or printed. Simple editing can be achieved using the options button. The
labels can be turned on or off and values can be attached to the points (or
removed). The scales can also be changed. More advanced editing is possible by
double clicking in the plot, this invokes the chart wizard. To find the label
attached to a single point when all the labels are moved click on a single point,
this will highlight all the points, then click a second time to highlight one vertex.
Now double click on the vertex and the label will be highlighted in the chart
designer. The save button and the save chart data option allow the user to save
all the chart data into a file which can be reviewed using
Tools>Scatterplot>Review. The chart itself can be saved as a windows metafile
which can then be read into a word processing or graphics package. Only one
chart can be open at one time and the chart window will be closed if you click on
any other UCINET window. Behind the chart is a numeric display of coordinates
of each point in space together with information about the stress. If the print
diagnostics have been selected then dyads with large differences between the
proximity data and the distances in the co-ordinate date are listed.
TIMING O(N^4)
COMMENTS MDS solutions are not unique, and they are subject to convergence to local
minima. The first point means that two or more maps can be equally good (same
stress) but place points in radically different locations. The second point means
that it is possible for the algorithm to fail to find the configuration with least
stress. If you suspect this has happened, run the program several times using
random starting configurations. Stress values below 0.1 are excellent and above
0.2 unacceptable.
This routine only works if the regional settings are set to UK or USA. If you do
not have these regional settings and do not get a plot then change them in the
settings control panel on your machine.
REFERENCES Kruskal J B and Wish M (1978). Multidimensional Scaling, Newbury Park: Sage
Publications.
PARAMETERS
Input dataset
Name of file containing proximity matrix to be clustered. Data type: Square
symmetric matrix.
SINGLE_LINK
Also known as the "minimum" or "connectedness" method. Distance between
two clusters is defined as smallest dissimilarity (largest similarity) between
members.
COMPLETE_LINK
Also known as the "maximum" or "diameter" method. Distance between two
clusters is defined as largest dissimilarity (smallest similarity) between members.
AVERAGE
Distance between clusters defined as average dissimilarity (or similarity)
between members.
LOG FILE Primary output are cluster diagrams. The first diagram (either a tree diagram or a
dendrogram) re-orders the actors so that they are located close to other actors in
similar clusters. The level at which any pair of actors are aggregated is the point
at which both can be reached by tracing from the start to the actors from right to
left. The scale at the top gives the level at which they are clustered. The diagram
can be printed or saved. Parts of the diagram can be viewed by moving the
mouse to the split point in a tree diagram or the beginning of a line in the
dendrogram and clicking. The first click will highlight a portion of the diagram
and the second click will display just the highlighted portion. To return to the
original right click on the mouse. There is also a simple zoom facility simply
change the values and then press enter. If the labels need to be edited
(particularly the scale labels) then you should take the partition indicator matrix
into the spreadsheet editor remove or reduce the labels and then submit the edited
data to Tools>Dendrogram>Draw. The output also produces a standard Log file
that contains a different cluster diagram which looks like this:
A B C D E F G H I J
1
Level 1 2 3 4 5 6 7 8 9 0
1.000 XXXXX XXX XXX XXXXX
1.422 XXXXX XXX XXXXXXXXX
1.578 XXXXXXXXX XXXXXXXXX
3.287 XXXXXXXXXXXXXXXXXXX
In this example, the data were distances among 10 items, labeled A through J.
The results are 4 nested partitions, corresponding to rows in the diagram. Within
a given row, an 'X' between two adjacent columns indicates that the items
associated with those columns were assigned to the same cluster in that partition.
For example, in the first partition (level 1.000), items D and E belong to the same
cluster, but C is a member of a different cluster. In the third partition (level
1.578), items D, E and C all belong to the same cluster.
For similarity data, the meaning of the levels for the single link and complete link
methods is, in a sense reversed. For the single link method, a level of 1.578
means that every item in a cluster is at least 1.578 units similar to at least one
other item in the cluster. For the complete link method, a level of 1.578 means
that every item in a cluster is at least 1.578 units similar to every other item in the
cluster.
TIMING O(N^3)
COMMENTS None.
PURPOSE Optimizes a cost function which measures the total distance or similarity within
classes for a proximity matrix.
DESCRIPTION Given a partition of a proximity matrix of similarities into clusters, then the
average similarity values within each gives a measure of the extent to which the
groups form clusters. A slightly different approach is required for distance data -
in this case the cost is measured by summing the values for each pair of actors
belonging to the same block. The routine attempts to optimize these measures to
try and find the best fit for a given number of blocks. The cost function can be
changed to give greater weight to relationships between the clusters. In this case
the cost simultaneously reflects a high degree of association within clusters and
a similarity of association between members of different clusters using a
correlation criteria. To do this correlate the data with an ideal structure matrix
A(i,j) in which the i,j th entry is a one if actor i and j are in the same partition
and zero otherwise. This correlation can either be Pearson correlation or a much
faster pseudocorrelation measure. This cost is then either maximized or
minimized depending on whether the proximity matrix contains similarities or
distances. The similarity value needs to be maximized and the distance measure
minimized. The routine uses a tabu search minimization procedure and
therefore to maximize multiplies the costs by -1.
PARAMETERS
Input dataset:
Name of file containing proximity matrix to be clustered. Data type: Square
symmetric matrix.
Fit criterion
Density the average value within clusters for similarity and the sum for distance
data.
PseudoCorrelation a simple fast correlation measure between the clustered data
and the ideal structure matrix.
Correlation the Pearson correlation measure between the clustered data and the
ideal structure matrix.
Type of Data:
Similarities causes large values to be clustered together. Distances causes small
values to be clustered together.
To test the robustness of the solution the algorithm should be run a number of
times from different starting configurations. If there is good agreement between
these results then this is a sign that there is a clear split of the data into the
reported blocks.
REFERENCES Glover F (1989). Tabu Search - Part I. ORSA Journal on Computing 1, 190-
206.
Glover F (1990). Tabu Search - Part II. ORSA Journal on Computing 2, 4-32.
TOOLS -> 2 MODE > SVD
DESCRIPTION Given an n-by-m matrix X with n ³ m, SVD finds matrices U, D, and V such that
X = UDV'. The matrix D is an r-by-r diagonal matrix containing r singular
values. The matrix U is an n-by-r matrix containing the r eigenvectors of XX'
and V is an m-by-r matrix containing the r eigenvectors of X'X. The
eigenvectors are sorted in descending order by eigenvalue. With symmetric data,
U and V are identical (except for sign reversals).
PARAMETERS
Input dataset:
File containing matrix X to be decomposed; must have at least as many rows as
columns (otherwise transpose the matrix then resubmit). Data type: Matrix.
Coordinates -
Eigenvectors are weighted by their respective eigenvalues.
Loadings
Eigenvectors are weighted by the square root of the eigenvalues (yields factor
loadings when SVD is applied to correlation matrix).
Axes
No rescaling is performed.
LOG FILE The output first gives a 2D scatterplot of the first two dimensions (eigenvectors).
The scatterplot can be saved or printed. Simple editing can be achieved using the
options button. The labels can be turned on or off and values can be attached to
the points (or removed). The scales can also be changed. More advanced editing
is possible by double clicking in the plot, this invokes the chart wizard. To find
the label attached to a single point when all the labels are moved click on a single
point, this will highlight all the points, then click a second time to highlight one
vertex. Now double click on the vertex and the label will be highlighted in the
chart designer. The save button and the save chart data option allow the user to
save all the chart data into a file which can be reviewed using
Tools>Scatterplot>Review. The chart itself can be saved as a windows metafile
which can then be read into a word processing or graphics package. Only one
chart can be open at one time and the chart window will be closed if you click on
any other UCINET window.
TIMING O(N^3).
COMMENTS This routine only gives a plot if the regional settings are set to UK or USA. If
you do not have these regional settings and do not get a plot then change them in
the settings control panel on your machine.
PARAMETERS
Input dataset.
Name of dataset containing 2-mode matrix to be factored. Data type: Matrix.
Principal Components
Perform a principle component analysis in which the matrix is factored into a
product of the most dominant eigenvectors.
Minimum Residuals
Factor the matrix into factors so that the residuals (the sum of squares of the
difference between the original data and the product of the factors) are
minimized.
None
No rotation is performed
LOG FILE The log file gives a full set of descriptive statistics of each actors profile. These
are followed by the eigenvalues placed in descending order of size and labeled as
factors in ascending order. The value of each is expressed as a percentage of the
sum and a cumulative percentage of all the factors given so far is presented. The
final column gives the ratio of the factor below to the current factor. This is
followed by a matrix of factor loadings, entry X(i,j) is the loading of the jth
factor on actor i.
TIMING O(N^3)
COMMENTS None
REFERENCES None
TOOLS > 2 MODE > CORRESPONDENCE
DESCRIPTION Given a non-negative, n-by-m matrix with n ³ m, this routine represents the n
rows and m columns as vectors in a common multidimensional space. The
algorithm essentially performs a singular value decomposition of an adjusted
data matrix in which rows and columns have been separately normalized to yield
more equal marginals.
PARAMETERS
Input dataset:
Name of file containing matrix to be analyzed, it must have at least as many rows
as columns (otherwise transpose the matrix then resubmit). Data type: Matrix.
Coordinates - Scores for each point on each dimension adjusted both for point
marginals and dimension weights (eigenvalues).
Optimal - Scores for each point are corrected for point marginals, but not
dimension weights.
LOG FILE The output first gives a 2D scatterplot of the first two dimensions (eigenvectors).
The scatterplot can be saved or printed. Simple editing can be achieved using the
options button. The labels can be turned on or off and values can be attached to
the points (or removed). The scales can also be changed. More advanced editing
is possible by double clicking in the plot, this invokes the chart wizard. To find
the label attached to a single point when all the labels are moved click on a single
point, this will highlight all the points, then click a second time to highlight one
vertex. Now double click on the vertex and the label will be highlighted in the
chart designer. The save button and the save chart data option allow the user to
save all the chart data into a file which can be reviewed using
Tools>Scatterplot>Review. The chart itself can be saved as a windows metafile
which can then be read into a word processing or graphics package. Only one
chart can be open at one time and the chart window will be closed if you click on
any other UCINET window.
The log file has a numeric display of coordinates (eigenvectors) of each point in
r-space.
TIMING O(N^3).
REFERENCES None.
TOOLS > SIMILARITIES
PURPOSE Compute similarities among rows or columns of a matrix using one of various
measures.
DESCRIPTION Given a matrix with n rows and m columns, the program computes either an n-
by-n matrix of similarities among the rows, or an m-by-m matrix of similarities
among the columns.
PARAMETERS
Input dataset:
Name of file containing matrix to be analyzed. Data type: Matrix.
TIMING O(N^3).
REFERENCES None.
TOOLS > DISSIMILARITIES
PURPOSE Compute dissimilarities among rows or columns of a matrix using one of various
measures.
DESCRIPTION Given a matrix with n rows and m columns, the program computes either an n-
by-n matrix of dissimilarities among the rows, or an m-by-m matrix of
dissimilarities among the columns.
PARAMETERS
Input dataset:
Name of file containing matrix to be analyzed. Data type: Matrix.
Euclidean
Euclidean distance: SQRT(S(xi-yi)^2) . When missing values are present, the
computed distance is multiplied by n/m where n is the size of the vectors and m
is the number of non-missing values.
Manhattan
City-block distance: S abs(xi-yi) When missing values are present, the computed
distance is multiplied by n/m where n is the size of the vectors and m is the
number of non-missing values.
Normed SSD
Normed sum of squared differences: S(xi-yi)^2/ Sxi^2Syi^2
Non-Matches
Proportion of cases in which xi does not equal yi for all i.
Positive Non-Matches
Proportion of cases in which xi does not equal yi given that either xi > 0 or yi > 0
or both.
TIMING O(N^3).
REFERENCES None.
TOOLS > STATISTICS > UNIVARIATE
PARAMETERS
Input dataset
Name of file containing matrix to be analyzed. Data type: Matrix.
Rows - Statistics are computed separately for each row in matrix. Result is a
matrix whose rows correspond to the rows of the data matrix and the columns are
statistics.
Columns - Statistics are computed separately for each column in matrix. Result
is a matrix whose columns correspond to the columns of the data matrix and the
rows are statistics.
TIMING O(N^2).
REFERENCES None.
TOOLS > STATISTICS > MATRIX (QAP) > QAP-CORRELATION
PURPOSE Compute correlation between entries of two square matrices, and assess the
frequency of random correlations as large as actually observed.
DESCRIPTION The procedure is principally used to test the association between networks.
Often, one network is an observed network while the other is a model or
expected network.
The algorithm proceeds in two steps. In the first step, it computes Pearson's
correlation coefficient (as well as simple matching coefficient) between
corresponding cells of the two data matrices. In the second step, it randomly
permutes rows and columns (synchronously) of one matrix (the observed matrix,
if the distinction is relevant) and recomputes the correlation.
The second step is carried out hundreds of times in order to compute the
proportion of times that a random correlation is larger than or equal to the
observed correlation calculated in step 1. A low proportion (< 0.05) suggests a
strong relationship between the matrices that is unlikely to have occurred by
chance.
PARAMETERS
Data Matrix:
Name of dataset containing the first matrix (the observed or dependent matrix, if
such distinctions are meaningful). Data type: Square Matrix.
Structure Matrix:
Name of dataset containing the expected, modelled or independent matrix (if
such distinctions are meaningful). Data type: Square Matrix.
CORRELATION MATCHES
Observed value: 0.207 0.000
Average: 0.001 0.000
Standard deviation: 0.113 0.000
Proportion as large: 0.036 1.000
Proportion as small: 0.964 1.000
Proportion as extreme: 0.036 1.000
The correlation column indicates that the observed correlation between the two
networks was 0.207. The average random correlation was almost zero with a
standard error of 0.113. The percentage of random correlations that were as large
as .207 was 3.7%. At a typical 0.05 level, this correlation would be considered
significant since 0.036 < 0.05.
REFERENCES None.
TOOLS > STATISTICS > MATRIX (QAP) > QAP-REGRESSION
PURPOSE Regress a dependent matrix on one or more independent matrices, and assess
significance of the r-square and regression coefficients.
DESCRIPTION The procedure is principally used to model a social relation (matrix) using values
of other relations.
The algorithm proceeds in two steps. In the first step, it performs a standard
multiple regression across corresponding cells of the dependent and independent
matrices.
In the second step, it randomly permutes rows and columns (together) of the
dependent matrix and recomputes the regression, storing resultant values of r-
square and all coefficients. This step is repeated hundreds of times in order to
estimate standard errors for the statistics of interest. For each coefficient, the
program counts the proportion of random permutations that yielded a coefficient
as extreme as the one computed in step 1. The primary requirement for
conducting a multiple regression quadratic assignment procedure is that all the
variables in the regression have to be one-mode, two-way matrices. That is, they
must all be NxN networks. Person-by-object or Person-by-event matrices can be
converted to NxN matrices using Data>Affiliations.
PARAMETERS
Dependent variable:
Name of dataset containing the observed or dependent data: the matrix whose
values are to be predicted. Data type: Square Matrix.
Independent variables:
Names of datasets containing the independent or predictor matrices. To include
more than one dataset using the browse button highlight all required files by
pressing Ctrl and clicking with the mouse. If the file names are typed they should
be separated by commas with no spaces. Data type: Square Matrices.
LOG FILE Two tables are output. The first looks like this:
Unstandardized Two-Tailed
Independent Coefficient Probability
Intercept 0.385965 0.178
R1 -0.007519 0.866
R2 -0.150376 0.170
R3 0.000000 0.838
This table gives the Unstandardized regression coefficient for each independent
variable, including the intercept, along with the proportion of random trials
yielding a coefficient with an absolute value as large or larger than the observed.
In this example, all the coefficients have non-significant probabilities, indicating
that the observed values are well within the range of random variation.,
TIMING O(N^2).
REFERENCES None.
TOOLS > STATISTICS > AUTOCORRELATION > CATEGORICAL > JOIN
COUNT
PARAMETERS
Input Dataset
Name of file containing matrix to be analyzed. Data type: Graph
Partition Vector:
The name of an UCINET dataset that contains a partition of the actors into two
groups. To partition the data matrix into groups specify a vector by giving the
dataset name, a dimension (either row or column) and an integer value. For
example, to use the second row of a dataset called ATTRIB, enter "ATTRIB
ROW 2". The program will then read the second row of ATTRIB and use that
information to define the groups. All actors with identical values on the criterion
vector (i.e. the second row of attrib) will be placed in the same group.
LOG FILE The actor attributes are recoded to 1 and 2 these are reported.
A table which gives the observed and expected counts for the data. The first row
gives the counts within group 1, the second is the counts between the groups and
the third is the counts within group 2. The expected simply gives the values that
would be expected if the ones were randomly distributed within and between the
groups. The observed gives the counts of the data and the difference subtracts the
expected from the observed. The P>=Diff and P<=Diff give the relative
frequency that a randomly permuted matrix gets a difference as large or larger
and as small or smaller than the observed. These columns are used to test the
significance of the observed data.
TIMING O(N^2)
COMMENTS None
REFERENCES Cliff, A D and Ord, J K 1973 Spatial Autocorrelation. Pion, London.
TOOLS > STATISTICS > AUTOCORRELATION > CATEGORICAL > RCT
ANALYSIS
PARAMETERS
Input Dataset
Name of file containing matrix to be analyzed. Data type: Graph
Attribute:
The name of an UCINET dataset that contains a partition of the actors into two
groups. To partition the data matrix into groups specify a vector by giving the
dataset name, a dimension (either row or column) and an integer value. For
example, to use the second row of a dataset called ATTRIB, enter "ATTRIB
ROW 2". The program will then read the second row of ATTRIB and use that
information to define the groups. All actors with identical values on the criterion
vector (i.e. the second row of attrib) will be placed in the same group.
LOG FILE The actor attributes are recoded to run from 1 and these are reported.
A table which gives the cross classified frequencies, that is a contingency table
corresponding to the attributes and the input dataset.
A table which gives the expected values of the frequencies assuming that the ties
are independent and randomly distributed throughout the groups.
The observed values in each cell of the first table divided by the corresponding
cell in the second table are then reported. This is followed by the observed chi
square value, ie the square of the observed minus the expected divided by the
expected value.
The average permutation frequency table gives the mean values of the entries
from all the permutation tests. Each of the generated entries have their value
compared with the observed value and the significance is the relative frequency
of the number of times the generated value is larger than the observed.
TIMING O(N^2)
COMMENTS None
PARAMETERS
Network or Proximity Matrix
Name of file containing matrix to be analyzed. Data type: Matrix.
Actor Attribute:
Name of file containing actor attributes, given as a vector of shared attributes so
that (1,2,3,1,2,2) means that actors 1 and 4 share the same attribute actors 2,5,and
6 share the same attribute and actor 3 has a different attribute from all the others.
Structural Blockmodel. Most general model. Just asks whether the different
classes have significantly different interaction patterns. For example, girls might
prefer girls (inbreeding), while boys also prefer girls (outbreeding).
TIMING O(N^2)
COMMENTS None
REFERENCES None
TOOLS>STATISTICS>AUTOCORRELATION>INTERVAL/RATIO
PARAMETERS
Network or Proximity Matrix
Name of file containing matrix to be analyzed. Data type: Matrix.
Actor Attribute(s)
Name of file containing actor attributes.
LOG FILE The value of the autocorrelation followed by the autocorrelation averaged over
all the permutations together with the standard error. The proportion of random
values which are as large for Geary or small for Moran as the actual
autocorrelation gives the significance of the calculated value and this is reported.
TIMING O(N^2)
COMMENTS None
REFERENCES See Cliff and Ord's classic 1973 book 'Spatial autocorrelation' London: Pion.
TOOLS > STATISTICS > VECTOR > REGRESSION
PURPOSE Regress a dependent vectors on one or more independent vectors, and assess
significance of the r-square and regression coefficients.
DESCRIPTION The procedure is principally used to model a vector using values of other vectors.
The algorithm proceeds in two steps. In the first step, it performs a standard
multiple regression across corresponding cells of the dependent and independent
vectors.
In the second step, it randomly permutes rows the elements of the dependent
vector and recomputes the regression, storing resultant values of r-square and all
coefficients. This step is repeated hundreds of times in order to estimate standard
errors for the statistics of interest. For each coefficient, the program counts the
proportion of random permutations that yielded a coefficient as extreme as the
one computed in step 1.
PARAMETERS
Dependent dataset:
Name of dataset containing the observed or dependent data: the vector whose
values are to be predicted. This is given as a column in a matrix. Data type:
Matrix.
Independent dataset:
Names of dataset containing the independent vectors. All independent vectors
must be contained in a single matrix. Data type: Matrix.
LOG FILE The correlation matrix followed by information on the model fit. This is followed
by a table of regression coefficients. This table gives the Unstandardized and
standardized regression coefficients for each independent variable, including the
intercept, along with the proportion of random trials yielding a coefficient i) as
large or larger, ii) as small or smaller and iii) as extreme as the observed value.
These values give the significance of the coefficients.
TIMING O(N^2).
REFERENCES None.
TOOLS > STATISTICS > VECTOR > ANOVA
DESCRIPTION Undertakes a standard analysis of variance but uses a permutation test to generate
the significance level so that standard assumptions on independence and random
sampling are not required.
PARAMETERS
LOG FILE A standard analysis of variance table together with the significance value derived
from the permutation test.
TIMING N/A
COMMENTS None
REFERENCES None
TOOLS > STATISTICS > VECTOR > T TEST
DESCRIPTION Undertakes a standard t-test to compare the means of two groups but uses a
permutation test to generate the significance level so that standard assumptions
on independence and random sampling are not required.
PARAMETERS
LOG FILE Gives standard statistics on each group followed by significance tests. The
difference in means is reported together with the two one tailed tests assessing
whether one mean is greater than the other and the two tailed test.
TIMING N/A
COMMENTS None
REFERENCES None
TOOLS > STATISTICS >COMPARE DENSITIES>PAIRED
PURPOSE Give a statistical test for the comparison of the densities of two networks in
which the actors are paired.
DESCRIPTION This routine uses a bootstrap technique to compare the densities of two not
necessarily independent networks with the same actors. This method is
analogous to the classical paired sample t-test for estimating the standard error
of the difference. Its main use would be in comparing the same relation on the
same set of actors at two different time points.
2nd Network
Name of UCINET dataset containing the same actors (in the same order) as the
1st dataset. Data type: Valued graph.
Number of Samples
Gives the number of times sampling with replacement is used to construct the
distribution.
LOG FILE The output gives the density of both matrices together with the difference and
the number of samples taken. This is followed by a classical t-test. The
estimated bootstrap standard errors are then reported together with the bootstrap
standard error of the differences, the bootstrap 95% confidence intervals and the
bootstrap t-statistic assuming independent samples. The bootstrap standard error,
confidence interval, t-statistic and average value are then reported for the paired
samples. Finally the proportion of differences (absolute, as large as and as small
as) to the observed values are given.
TIMING
COMMENTS
REFERENCES Tom A.B. Snijders and Stephen P. Borgatti (1999) Non-Parametric Standard
Errors and Tests for Network Statistics. Connections 22(2): 1-11
TOOLS > STATISTICS >COMPARE>DENSITIES>THEORETICAL
PARAMETER
PURPOSE Give a statistical test for the comparison of the density of a network to a
theoretical value.
DESCRIPTION This routine uses a bootstrap technique to compare the density of a network to a
specified value. In essence a distribution is built up by sampling the network
with replacement from the vertices. There is an assumption that vertices are
interchangeable.
Expected Density
Value of the theoretical parameter to which the observed value will be compared.
Number of Samples
Gives the number of times sampling with replacement is used to construct the
distribution.
LOG FILE The output gives the parameter value and the density of the matrix together with
the difference and the number of samples taken. This is followed by the actual
variance and the classical estimate of the standard error. The number of samples
in the bootstrap are then reported together with the estimated bootstrap standard
error, z-score and average density. Finally the proportion of differences
(absolute, as large as and as small as) to the observed values are given.
TIMING
COMMENTS
REFERENCES Tom A.B. Snijders and Stephen P. Borgatti (1999) Non-Parametric Standard
Errors and Tests for Network Statistics. Connections 22(2): 1-11
TOOLS>STATISTICS>COMPARE AGGREGATE PROXIMITY
MATRICES>PARTITION
Partition Vector
The name of an Ucinet dataset.To partition the matrices of the data
matrix into groups, specify a blocking vector by giving the dataset name, a
dimension and an integer value. For example, to use the second row of a dataset
called ATTRIB, enter "ATTRIB ROW 2". The program will then read the second
row of ATTRIB and use that information to sort the matrices. All matrices with
identical values on the criterion vector (i.e. the second row of attrib) will be
placed in the same group. There should only be two groups and so the vector
should only contain two different values. The partition can also be typed in
directly so that 1 1 2 1 2 2 2 places matrices 1,2 and 4 in one group and matrices
3,5,6 and 7 in the other group.
TIMING O(N^2)
COMMENTS None
REFERENCES Borgatti, S.P. () A Statistical Method for Comparing Aggregate Data Across a
Priori Groups
TOOLS>STATISTICS>COMPARE AGGREGATE PROXIMITY
MATRICES>OVERLAPPING GROUPS
TIMING O(N^2)
COMMENTS None
REFERENCES Borgatti, S.P. () A Statistical Method for Comparing Aggregate Data Across a
Priori Groups
TOOLS > STATISTICS > P1
PURPOSE Fits the Holland and Leinhardt P1 model for binary networks.
DESCRIPTION All dyads (i,j) in a sociometric choice matrix X can be classified as mutual (xij =
xji = 1), asymmetric (xij not equal to xji), or null (xij = xji = 0). The probabilities of
each type of dyad are modelled as a function of three sets of substantive
parameters: expansiveness of each actor, popularity of each actor, and
reciprocity. The probabilities of mutual, asymmetric and null dyads, denoted mij,
aij, and nij respectively, are modeled as follows:
mij = lijexp(r+2q+ai+aj+âi+âj)
aij = lijexp(q+ai+bj)
nij = lij
PARAMETERS
Input Dataset:
Name of file that contains network to be analyzed. Data type: Valued graph.
LOG FILE G-squared negative goodness-of-fit value with degrees of freedom. Probabilities
are not printed because the theoretical distribution governing these values has not
yet been established.
Values of q and r .
An nxn matrix containing the P1 expected value between each pair of actors.
An nxn matrix of residuals (observed data minus expected) between each pair of
actors.
TIMING O(N^4).
COMMENTS The model would be more useful if the distribution of G-squared were known: as
it is, we cannot say for certain when the model fits and when it does not.
DESCRIPTION Input and output are UCINET datasets. Capabilities are divided into functions
and procedures, which have different syntax. Further, within functions we can
distinguish three basic types:
Uniary Operations. Those that operate on a single dataset and take no arguments
(e.g. ABS, which takes the absolute value of every cell in the matrix);
When you choose Algebra from the menu, then a command window will open
up. You can close the window by clicking on the close button. Commands are
typed in the command window you can scroll back to previous commands by
using the up and down arrows.
1. Functions
y = inverse(x)
a:tdavis = transpose(c:\ucinet\data\davis)
Most functions will have a single argument consisting of the name of an input
matrix. Others will have two or more arguments, again consisting of the names
of datasets. For instance, the syntax for the ADD command is as follows:
<matrix> = add(<matrix1>,<matrix2>,...)
An example would be:
mpx = add(business,marriage,friend)
junk = identity(5)
2. Procedures
The syntax for procedures differs from functions in that there is no output matrix:
<procedure><arguments>
An example is:
display padgett
svd davis = u d v
This requests a singular value decomposition of the matrix davis into three
matrices (datasets) to be called u, d, and v.
3. Expressions
One useful fact to remember is that whenever the syntax for a function or
procedure calls for the name of a matrix, a function may be substituted instead.
For example, the command
y = inverse(transpose(inf))
requests that the inverse of the transpose of a matrix inf be calculated and saved
as dataset y. There is no limit to the amount of nesting. For example, the
following command is perfectly valid, though neither efficient nor very readable:
b = prod(inv(prod(transp(x),x)),prod(transp(x),y))
xt = transp(x)
xtx = prod(xt,x)
xty = prod(xt,y)
b = prod(inv(xtx),xty)
FURTHER INFORMATION
Uniary Functions
Binary Functions
Inner Products
Procedures
TOOLS > SCATTERPLOT> DRAW
PURPOSE Plots one matrix column against another in the (x,y) plane.
DESCRIPTION Plots two specified columns of a matrix against each other. The x co-ordinates
(horizontal axes) are an element of the first column and the y co-ordinates
(vertical axes) are the corresponding elements of the second column. Points can
be labeled using ASCII characters.
PARAMETERS
Input dataset:
Name of file containing matrix with data to be plotted. Data type: Matrix.
LOG FILE A scatter plot with the tick marks on the axes. Each point on the scatter plot is
marked by the row of the column vectors or a label from the label file. If two
points have the same coordinates then the label corresponding to the highest row
number is used.The scatterplot can be saved or printed. Simple editing can be
achieved using the options button. The labels can be turned on or off and values
can be attached to the points (or removed). The scales can also be changed. More
advanced editing is possible by double clicking in the plot, this invokes the chart
wizard. To find the label attached to a single point when all the labels are moved
click on a single point, this will highlight all the points, then click a second time
to highlight one vertex. Now double click on the vertex and the label will be
highlighted in the chart designer. The save button and the save chart data option
allow the user to save all the chart data into a file which can be reviewed using
Tools>Scatterplot>Review. The chart itself can be saved as a windows metafile
which can then be read into a word processing or graphics package. Only one
chart can be open at one time and the chart window will be closed if you click on
any other UCINET window.
TIMING Linear
COMMENTS This routine only works if the regional settings are set to UK or USA. If you do
not have these regional settings and do not get a plot then change them in the
settings control panel on your machine.
REFERENCES None.
TOOLS > SCATTERPLOT REVIEW
DESCRIPTION Scatter plots can be saved as files and reviewed directly using this routine. They
are saved with the extension sdf.
PARAMETERS
Input dataset:
Name of scatterplot file to be displayed.
TIMING N/A
COMMENTS None
REFERENCES None
TOOLS > DENDROGRAM /TREE DIAGRAM> DRAW
PURPOSE Generates a dendrogram or tree diagram from hierarchically nested partition data.
DESCRIPTION This routine allows for the creation of the hierarchical cluster diagrams from a
UCINET generated partition matrix. It is also possible to generate the diagrams
from user defined partition matrices.
PARAMETERS
Input dataset
Name of file containing a partition indicator matrix. A partition indicator matrix
has rows which correspond to different partitions and columns which represent
members of the groups. A value of k in row i and column j means that actor j is
in group k for the partition corresponding to row i. All other actors in the same
group should be assigned the same value in row i. Each successive row must
specify an increasingly finer (or coarser) partition. The row labels (if specified)
correspond to the levels of the partition.
LOG FILE A hierarchical clustering diagram either a tree diagram or a dendrogram. The plot
re-orders the actors so that they are located close to other actors in similar
clusters. The level at which any pair of actors are aggregated is the point at which
both can be reached by tracing from the start to the actors from right to left. The
scale at the top gives the level at which they are clustered. The diagram can be
printed or saved. Parts of the diagram can be viewed by moving the mouse to the
split point in a tree diagram or the beginning of a line in the dendrogram and
clicking. The first click will highlight a portion of the diagram and the second
click will display just the highlighted portion. To return to the original right click
on the mouse. There is also a simple zoom facility simply change the values and
then press enter. If the labels need to be edited (particularly the scale labels) then
you should take the partition indicator matrix into the spreadsheet editor remove
or reduce the labels and then submit the edited data.
TIMING Linear
COMMENTS None
REFERENCES None.
TOOLS >DENDROGRAM/TREE DIAGRAM >REVIEW
DESCRIPTION Dendrograms and tree diagrams can be saved as bitmap files and reviewed
directly using this routine. They are saved with the extension bmp.
PARAMETERS
Input bitmap filename:
Name of file to be displayed.
TIMING N/A
COMMENTS None
REFERENCES None
UNIARY OPERATIONS
ABSOLUTE - Syntax: abs(<mat>).
Takes the absolute value of every value in <mat>. May be abbreviated to "ABS". Example:
junk = abs(a:\atlanta\corrmat)
ARCTAN - Syntax: arc(<mat>). Takes the arctangent of each value in <mat>. Example:
junk = arc(a:\atlanta\corrmat)
COMMON LOG - Syntax: log10(<mat>). Takes the base 10 logarithm of each value of the
argument. Example:
junk = log10(a:\atlanta\corrmat)
COSINE - Syntax: cos(<mat>). Takes the cosine of each value in <mat>. Example:
junk = cos(a:\atlanta\corrmat)
EXPONENT - Syntax: exp(<mat>). Raises e (the base of natural logarithms) to the power
given by each cell of the argument. Example:
junk = exp(a:\atlanta\corrmat)
FILL - Syntax: fill(<mat>,<nr>,<nc>). Expands the matrix in <mat> to the dimensions given
by <nr> and <nc> by duplicating values. For example, given matrix X, the command
1 2 3
X= 4 5 6
7 8 9
y = fill(x,5,6)
yields:
1 2 3 1 2 3
4 5 6 4 5 6
Y= 7 8 9 7 8 9
1 2 3 1 2 3
4 5 6 4 5 6
junk = ginv(a:\atlanta\corrmat)
IDENTITY - Syntax: id(<n>). Generates an identity matrix with <n> rows and columns.
Example:
i = id(100)
junk = inv(a:\atlanta\corrmat)
LINEAR - Syntax: lin(<mat>,<real>,<real>). Given a data set containing a matrix then the
function performs a linear transformation on every cell value. If a cell value was x then the
function forms real 1x + real 2. If real 2 is omitted then it is assumed to be zero. Example:
junk = lin(a:\atlanta\corrmat,3.2,4)
creates a new matrix junk which has each cell transformed by multiplying by 3.2 and adding 4.
junk = add(freqs,mat(0.01,8,10))
adds the constant 0.01 to every cell of the 8-by-10 matrix contained in freqs.
NATURAL LOG - Syntax: log(<mat>) or ln(<mat>). Takes the natural logarithm of each value
of the argument. Examples:
junk = log(a:\atlanta\corrmat)
junk = ln(a:\atlanta\corrmat)
revcorr = neg(a:\atlanta\corrmat)
RECIPROCAL - Syntax: rec(<mat>). Multiplies each value of the argument by -1. Example:
junk = rec(a:\atlanta\corrmat)
ROUND - Syntax: round(<mat>) or rnd(<mat>). Rounds each value of <mat> to the nearest
integer. Example:
junk = rnd(a:\atlanta\corrmat)
junk = sin(a:\atlanta\corrmat)
junk = sqr(a:\atlanta\corrmat)
SQUARE ROOT - Syntax: sqrt(<mat>). Computes square root of each value in <mat>.
Example:
junk = sqrt(a:\atlanta\corrmat)
TRUNCATE - Syntax: trunc(<mat>) or trnc(<mat>). Rounds each value of <mat> down to the
largest whole number contained by the value. Example:
junk = trunc(a:\atlanta\corrmat)
FURTHER INFORMATION
Binary Operations
Uniary Operations
Procedures
Matrix Algebra
BINARY OPERATIONS
AVERAGE - Syntax: avg(<mat1>,<mat2>,...). Takes the average value of corresponding cells
across two or more matrices.Example:
c = avg(a,b)
junk = bprod(business,marriage)
junk = div(c:\atlanta\corrmat,mcorr)
EQUAL - Syntax: eq(<mat1>,<mat2>,...). Compares two or more matrices and puts a value of
1 where all matrices have the same value and a 0 where any are different. For example, typing
junk = eq(a,b)
gives a new binary matrix called junk which has 1s in those cells where a and b have the same
value, and has 0s elsewhere.
c = gt(a,b)
In the example, the matrix c will have 1s only in those cells where a dominates b.
c = ge(a,b)
In the example, the matrix c will have 1s only in those cells where a is not dominated by b.
LESS THAN - Syntax: 1t(<mat1>,<mat2>,...). Compares two or more matrices, creating a new
matrix which is 1 for all cells where the first matrix is strictly less than all subsequent matrices,
and 0 elsewhere.
c = lt(a,b)
In the example, the matrix c will have 1s only in those cells where a is dominated by b.
c = le(a,b)
In the example, the matrix c will have 1s only in those cells where a is smaller than or equal to
the value of b.
c = max(a,b)
c = min(a,b)
c = mul(a,b)
buskin = prod(business,marriage)
c = sqrdif(a,b)
One application of this function is to compare a data matrix with a predicted matrix, based on a
least squares criterion.
c = sub(a,b)
FURTHER INFORMATION
Uniary Operations
Inner Products
Procedures
Matrix Algebra
INNER PRODUCTS
The last example totals all matrices contained in thenewcomb dataset to get a single matrix. In
other words, it takes a 3-dimensional table (rows, columns and matrices) and aggregates across
matrices to obtain a table with just rows and columns.
The last example totals all matrices contained in the newcomb dataset to get a single matrix. In
other workds, it takes a 3-dimensional table (rows, columns and matrices) and aggregates across
matrices to obtain a table with just rows and columns.
tdavis = transp(davis)
cent2 = transp(cent cols levs)
FURTHER INFORMATION
Uniary Operations
Binary operations
Procedures
Matrix Algebra
PROCEDURES
In this section we document each ALGEBRA procedure individually, giving the syntax and a
brief description for each one. The syntax gives the minimum abbreviation and any alternate
spellings. The procedures are arranged in alphabetical order by concept.
cd\ucinet\data
cd a:
DISPLAY - Syntax: disp <mnat> or dsp <mat>. Displays all cells of <mat> to the screen.
dsp c:\ucinet\data\padgett
dsp ginv(transp(davis))
LET - Syntax: let <function call>. Technically, the LET command is always implicit before any
function statement. For example, the following two commands are identical:
xtx = prod(transp(x),x)
let xtx = prod(transp(x),x)
The only reason to use LET is if your output dataset has the same name as an ALGEBRA
procedure, which would confuse the interpreter. For example, the following command would
NOT create a dataset called "DSP":
dsp = inverse(xtx)
Instead, the interpreter would assume that you wanted to display a matrix called "= inverse(xtx)".
However, the following would work:
QUIT - Syntax: quit or exit. Leave ALGEBRA and close the matrix algebra windows. Usage:
exit
quit
svd davis = u d vt
The <umat> and <vtmat> matrices are often referred to as "row scores" and "column scores"
respectively. The <dmat> matrix contains singular values down the main diagonal and zeros
elsewhere.
The singular value decomposition of a square, symmetric matrix gives row and column scores
equal to the eigenvectors of the matrix, and the singular values are their eigenvalues. The SVD
of any matrix X gives row scores equal to the eigenvectors of XX' and column scores equal to
the eigenvectors of X'X. The singular values of X are the square of the eigenvalues of both XX'
and X'X.
FURTHER INFORMATION
Uniary Operations
Binary Operations
Inner Products
Matrix Algebra
NETWORK > COHESION > DISTANCE
PURPOSE Constructs a distance or generalized distance matrix between all nodes of a
graph. Allows for transformation of this matrix from distance to nearness.
DESCRIPTION The length of a path is the number of edges it contains. The distance between
two nodes is the length of the shortest path. The generalized distance is the
length of an optimum path.
The strength of a path is the strength of its weakest link. The optimum is the
strongest path.
The probability of a path is the product of the probabilities of its edges. The
optimum is the most probable path.
If there is more than one optimum path then the algorithm uses the shortest
optimum path. For a binary adjacency matrix distance and generalized distance
will be equivalent.
PARAMETERS
Input dataset
Name of file containing dataset to be analyzed. Data type: Valued graph.
Multiplicative - distances between nodes are divided into the largest possible
distance. New values are given by Yij = (N-1)/Dij.
Additive - distances between nodes are subtracted from the total number of
nodes. New values are given by Yij = N - Dij.
Linear - distances between nodes are transformed linearly into [0,1]. New
values are given by Yij = 1 - (Dij - 1)/(N-1).
Freq Decay - Uses Burt's 1976 frequency decay function. The nearness of i and
j is one minus the proportion of actors that are as close to i as j is.
TIMING O(N^3)
COMMENTS Note the distances correspond to the number of links and not the optimum
values.
Optimum values are calculated by
NETWORK>COHESION>REACHABILITY
DESCRIPTION A geodesic is a shortest path. There may be more than one shortest path
connecting any two vertices. This procedure gives the number of shortest paths
connecting all pairs of vertices.
PARAMETERS
Input dataset:
Name of file containing network data. Data type: Digraph.
LOG FILE An nxn matrix in which row i column j gives the number of geodesics connecting
i to j.
TIMING O(N^4).
COMMENTS None.
REFERENCES None.
NETWORK > COHESION > REACHABILITY
PURPOSE Constructs a matrix of reachability values for every pair of nodes.
DESCRIPTION The reachability for a pair of nodes is the value of an optimum path.
The probability of the most 'probable' path, where the probability of a path is the
product of the probabilities of its edges.
PARAMETERS
Input dataset
Name of file containing dataset to be analyzed. Data type: Valued graph.
TIMING O(NLOGN)
DESCRIPTION In a valued or binary network the value of each edge (1 or 0 for binary networks)
can represent a capacity. Let c(x) denote the capacity of each edge of a network
N. A flow in N between two nodes s and t is a function f such that 0 £ f(x) £
c(x) for every edge x and for every node z ¹ s or t, Sf(yz) =Sf(zw). So that
for each node, except s and t, the total amount of flow into the node equals the
total flow leaving the node.
The total flow leaving s is the same as that going into t, this value is called the
value of the flow. The maximum flow is simply the maximum value possible
between two vertices.
This procedure uses the algorithm due to Gomory and Hu to compute the
maximum flow between all pairs of vertices of a symmetric graph.
PARAMETERS
Input dataset
Name of file containing network to be analyzed. Data type: Valued graph -
symmetric matrix only with integer values.
LOG FILE The Input dataset followed by an nxn matrix in which row i column j gives the
value of the maximum flow from vertex i to vertex j (i¹j).
TIMING O(N^4).
COMMENTS The maximum flow in a network is equal to the minimum cut. A cut between
two vertices s and t is a collection of edges which contains an edge from every s-
t path. The value of a cut is the sum of the value of the edges. A minimum cut is
the minimum value of all possible cuts between two vertices. For a binary
network this value is called the local edge connectivity.
DESCRIPTION The local (point) connectivity of two non-adjacent vertices is the number of
vertices that need to be deleted so that no path connects them, this is equal to the
maximum number of vertex disjoint paths connecting them.
PARAMETERS
Input dataset
Name of file containing network to be analyzed. Data type: Digraph
LOG FILE An nxn matrix in which row i column j gives the local point connectivity from
vertex i to vertex j (i ¹ j). This value is precisely the maximum number of
vertex independent paths from i to j.
TIMING O(N^4).
COMMENTS None
REFERENCES None
NETWORK > REGIONS > COMPONENTS>SIMPLE GRAPHS
PURPOSE Identify the components, of an undirected graph - and the weak or strong
components of a directed graph.
DESCRIPTION In an undirected graph two vertices are members of the same component if there
is a path connecting them. In a directed graph two vertices are in the same weak
component if their is a semi-path connecting them. Two vertices x and y are in
the same strong component if there is a path connecting x to y and a path
connecting y to x.
PARAMETERS
Input dataset:
Name of file containing network data to be analyzed. Dat type: Directed graph.
TIMING O(N^2).
COMMENTS None.
REFERENCES None.
NETWORK > REGIONS > COMPONENTS > VALUED GRAPHS
PURPOSE Identify the weak components corresponding to each cut-off value of a weighted
graph.
DESCRIPTION In a valued graph, the set of dichotomized graphs corresponding to each possible
weight form a nested sequence of graphs. The weak components of each of these
would also be nested and can be combined to form an hierarchical clustering of
weak components. Once two nodes have been placed in the same weak
component of a dichotomized graph for a particular cut-off value they remain in
the same weak component for all smaller cut-off values. This procedure
produces a hierarchical clustering based on these facts.
LOG FILE Hierarchical clustering diagram of the components. The columns are rearranged
and labeled. A '·' in row label i column label j means that vertex j was not in a
weak component with any other vertex (i.e. it was an isolate) using a cut-off
value of i. An 'X' indicates that vertex j was in a non-trivial weak component
with all vertices on the same row as j which can be found by tracing across that
row without encountering a space.
COMMENTS None
REFERENCES None
NETWORK > REGIONS > BICOMPONENTS
PARAMETERS
Input dataset:
Name of file containing graph to be analyzed. Data type: Graph.
TIMING O(N^2).
COMMENTS None.
REFERENCES None.
NETWORK > REGIONS > K-CORES
PARAMETERS
Input dataset:
Name of file containing data to be analyzed. Data type Graph.
LOG FILE A single link hierarchical clustering dendrogram the actors are re-ordered so that
they are located close to other actors in similar k-cores. The level at which any
pair of actors are aggregated is the point at which both can be reached by tracing
from the start to the actors from right to left. The scale at the top gives the level
at which they are clustered. The diagram can be printed or saved. Parts of the
diagram can be viewed by moving the mouse to the beginning of a line in the
dendrogram and clicking. The first click will highlight a portion of the diagram
and the second click will display just the highlighted portion. To return to the
original right click on the mouse. There is also a simple zoom facility simply
change the values and then press enter. If the labels need to be edited
(particularly the scale labels) then you should take the partition indicator matrix
into the spreadsheet editor remove or reduce the labels and then submit the edited
data to Tools>Dendrogram>Draw. In the clustering diagram each level
corresponding to a different value of 'k' in k-core. Behind the dendrogram is a
clustering diagram representing the same thing. Each row is labeled by the
possible values of k. The columns are rearranged and labeled. A '·' in row i
column label j indicates that vertex j is not in any i-core. An 'X' indicates that
vertex j is in an i-core, all other members of j's i-core are found by tracing along
row i in both directions from column j until a space is encountered in each
direction. The column labels corresponding to an 'X' which are connected to j's
'X' are all members of j's i-core.
TIMING O(N^3)
COMMENTS K-Cores are not necessarily cohesive subsets but they do identify areas of the
graph which contain clique like structures.
REFERENCES Seidman S (1983). 'Network structure and minimum degree'. Social Networks,
5, 269-287.
NETWORK > SUBGROUPS > CLIQUES
PURPOSE Find all cliques in a network.
PARAMETERS
Input dataset
Name of file containing data to be analyzed. Data type: Graph.
The tree diagram (or a dendrogram) re-orders the actors so that they are located
close to other actors in similar clusters. The level at which any pair of actors are
aggregated is the point at which both can be reached by tracing from the start to
the actors from right to left. The scale at the top gives the level at which they are
clustered and corresponds to the number of overlaps. The diagram can be printed
or saved. Parts of the diagram can be viewed by moving the mouse to the split
point in a tree diagram or the beginning of a line in the dendrogram and clicking.
The first click will highlight a portion of the diagram and the second click will
display just the highlighted portion. To return to the original right click on the
mouse. There is also a simple zoom facility simply change the values and then
press enter. If the labels need to be edited (particularly the scale labels) then you
should take the partition indicator matrix into the spreadsheet editor remove or
reduce the labels and then submit the edited data to Tools>Dendrogram>Draw.
Behind the diagram is a window containing the number of cliques and a list as
specified above. This is followed by a clustering diagram representing the same
clustering as the tree diagram (or dendrogram). The columns are rearranged and
labeled. A '·' in row label i column label j means that vertex j was not in i cliques
with any other vertex. An 'X' indicates that vertex j was in i cliques with all
vertices on the same row as j which can be found by tracing across that row
without encountering a space.
COMMENTS None.
REFERENCES Luce R and Perry A (1949). A method of matrix analysis of group structure.
Psychometrika 14, 95-116.
PARAMETERS
Input dataset:
Name of file containing network to be analyzed. Data type: Graph.
Value of N: (Default = 2)
All members of an n-clique are connected by a path of length n or less. A value
of 1 would give all Luce and Perry cliques; the maximum value of N-1 would
give the components of the graph.
The tree diagram (or a dendrogram) re-orders the actors so that they are located
close to other actors in similar clusters. The level at which any pair of actors are
aggregated is the point at which both can be reached by tracing from the start to
the actors from right to left. The scale at the top gives the level at which they are
clustered and corresponds to the number of overlaps. The diagram can be printed
or saved. Parts of the diagram can be viewed by moving the mouse to the split
point in a tree diagram or the beginning of a line in the dendrogram and clicking.
The first click will highlight a portion of the diagram and the second click will
display just the highlighted portion. To return to the original right click on the
mouse. There is also a simple zoom facility simply change the values and then
press enter. If the labels need to be edited (particularly the scale labels) then you
should take the partition indicator matrix into the spreadsheet editor remove or
reduce the labels and then submit the edited data to Tools>Dendrogram>Draw.
Behind the diagram is a window containing the number of n-cliques and a list as
specified above. This is followed by a clustering diagram representing the same
clustering as the tree diagram (or dendrogram). The columns are rearranged and
labeled. A '·' in row label i column label j means that vertex j was not in i n-
cliques with any other vertex. An 'X' indicates that vertex j was in i n-cliques
with all vertices on the same row as j which can be found by tracing across that
row without encountering a space.
DESCRIPTION An n-clan is an n-clique which has diameter less than or equal to n as an induced
subgraph. These are found by using the n-clique routine and checking the
diameter condition.
The routine will also provide an analysis of the overlapping structure of the n-
clans. This analysis gives information on the number of times each pair of actors
are in the same n-clan and gives an hierarchical clustering based upon this
information.
PARAMETERS
Input dataset:
Name of file containing network to be analyzed. Data type: Graph.
Value of N: (Default = 2)
All members of an n-clan are in an n-clique and have the additional property that
they are connected by a path of length n or less in which each vertex is also a
member of the n-clique. A value of 1 would give all Luce and Perry cliques; the
maximum value of N-1 would give the components of the graph.
The following output is also produced if YES was inserted on the form in reply
to the question 'Analyze pattern of overlaps?' The first part of the output will be
the tree diagram or dendrogram corresponding to the single link clustering of the
n-clan overlap matrix. In the n-clan overlap matrix a value of k in row i column j
means that vertices i and j occurred in the same n-clan k times. The ith diagonal
entry gives the number of n-clans which contain i.
The tree diagram (or a dendrogram) re-orders the actors so that they are located
close to other actors in similar clusters. The level at which any pair of actors are
aggregated is the point at which both can be reached by tracing from the start to
the actors from right to left. The scale at the top gives the level at which they are
clustered and corresponds to the number of overlaps. The diagram can be printed
or saved. Parts of the diagram can be viewed by moving the mouse to the split
point in a tree diagram or the beginning of a line in the dendrogram and clicking.
The first click will highlight a portion of the diagram and the second click will
display just the highlighted portion. To return to the original right click on the
mouse. There is also a simple zoom facility simply change the values and then
press enter. If the labels need to be edited (particularly the scale labels) then you
should take the partition indicator matrix into the spreadsheet editor remove or
reduce the labels and then submit the edited data to Tools>Dendrogram>Draw.
Behind the diagram is a window containing the number of n-clans and a list as
specified above. This is followed by a clustering diagram representing the same
clustering as the tree diagram (or dendrogram). The columns are rearranged and
labeled. A '·' in row label i column label j means that vertex j was not in i n-
clans with any other vertex. An 'X' indicates that vertex j was in i n-clans with all
vertices on the same row as j which can be found by tracing across that row
without encountering a space.
REFERENCES Mokken R (1979). Cliques, clubs and clans. Quality and Quantity 13, 161-173.
NETWORK > SUBGROUPS > K-PLEX
PURPOSE Find all k-plexes in a network.
DESCRIPTION A k-plex is a maximal subgraph with the following property: each vertex of the
induced subgraph is connected to at least n-k other vertices, where n is the
number of vertices in the induced subgraph. The basic algorithm is a depth first
search.
The routine will also provide an analysis of the overlapping structure of the k-
plexes. This analysis gives information on the number of times each pair of
actors are in the same k-plex and gives an hierarchical clustering based upon this
information.
PARAMETERS
Input dataset:
Name of file containing network to be analyzed. Data type: Graph.
Value of K: (Default = 2)
The value of k specifies the relative minimum size of the degree of each vertex
compared with the size of the k-plex. A value of 1 corresponds to a Luce and
Perry clique. Every vertex in a k-plex of size n has degree at least n-k in the
subgraph induced by the k-plex. The range of k is 1 to N. (A value of N would
give the whole graph as the only k-plex).
Yes means that an analysis of k-plex overlap will be performed. This includes the
construction of an k-plex co-membership matrix, and an hierarchical clustering
which is saved in a partition indicator matrix as described below.
The following output is also produced if YES was inserted on the form in reply
to the question 'Analyze pattern of overlaps?' The first part of the output will be
the tree diagram or dendrogram corresponding to the single link clustering of the
k-plex overlap matrix. In the k-plex overlap matrix a value of m in row i column
j means that vertices i and j occurred in the same k-plex m times. The ith
diagonal entry gives the number of k-plexes which contain i.
The tree diagram (or a dendrogram) re-orders the actors so that they are located
close to other actors in similar clusters. The level at which any pair of actors are
aggregated is the point at which both can be reached by tracing from the start to
the actors from right to left. The scale at the top gives the level at which they are
clustered and corresponds to the number of overlaps. The diagram can be printed
or saved. Parts of the diagram can be viewed by moving the mouse to the split
point in a tree diagram or the beginning of a line in the dendrogram and clicking.
The first click will highlight a portion of the diagram and the second click will
display just the highlighted portion. To return to the original right click on the
mouse. There is also a simple zoom facility simply change the values and then
press enter. If the labels need to be edited (particularly the scale labels) then you
should take the partition indicator matrix into the spreadsheet editor remove or
reduce the labels and then submit the edited data to Tools>Dendrogram>Draw.
Behind the diagram is a window containing the number of k-plexes and a list as
specified above. This is followed by a clustering diagram representing the same
clustering as the tree diagram (or dendrogram). The columns are rearranged and
labeled. A '·' in row label i column label j means that vertex j was not in i k-
plexes with any other vertex. An 'X' indicates that vertex j was in i k-plexes with
all vertices on the same row as j which can be found by tracing across that row
without encountering a space.
COMMENTS It is advisable to initially select k and the minimum size n so that k< (n+2)/2 - in
this case the diameter of the k-plex is 2 (or less). If a k-plex is connected and k ≥
(n+2)/2 then the diameter is always less than or equal to 2k-n+1, however it
should not be assumed that the k-plex is connected and this would need to be
examined.
REFERENCES Seidman S and Foster B (1978). A graph theoretic generalization of the clique
concept. J or Math Soc, 6, 139-154.
Seidman S and Foster B (1978). A note on the potential for genuine cross-
fertilization between anthropology and mathematics. Social Networks 1, 65-72.
NETWORK > SUBGROUPS > LAMBDA SETS
PURPOSE List all lambda sets of a graph.
DESCRIPTION The edge connectivity of a pair of vertices is the minimum number of edges
which must be deleted so that there is no path connecting them.
A lambda set is a maximal subset of vertices with the property that the edge
connectivity of any pair of vertices within the subset is strictly greater than the
edge connectivity of any pair of vertices, one of which is in the subset and one of
which is outside.
The algorithm employed first computes the maxima flow (i.e. the connectivity)
between all pairs of vertices (see NETWORKS>COHESION>MAX FLOW)
and uses this information to construct the lambda sets.
PARAMETERS
Input dataset:
Name of file containing network to be analyzed. Data type: Graph.
The single link hierarchical diagram is followed by a maximum flow matrix. The
maximum flow between i and j is given by the value in row i column j. The
diagonal is set equal to the number of vertices, theoretically this value should be
infinite.
TIMING 0(N^4).
COMMENTS Note this algorithm works on integer valued graphs by the natural extension of
connectivity to minimum weight cutsets.
REFERENCES Borgatti S P, Everett M G and Shirey P R (1990). 'LS Sets, Lambda Sets and
other cohesive subsets'. Social Networks 12, 337-357.
NETWORK >SUBGROUPS >FACTIONS
PURPOSE Optimizes a cost function which measures the degree to which a partition
consists of clique like structures using a tabu search method.
DESCRIPTION Given a partition of a binary network of adjacencies into n groups, then a count
of the number of missing ties within each group summed with the ties between
the groups gives a measure of the extent to which the groups form separate
clique like structures. The routine uses a tabu search minimization procedure to
optimize this measure to find the best fit.
PARAMETERS
Input dataset:
Name of file containing network to be analyzed. Data type: Digraph.
TIMING Each iteration of the tabu search algorithm is O(N^2). Random tests with default
parameters as specified indicate O(N^2.5).
The algorithm seeks to find the minima of the cost function. Even if successful
this result may still be a high value in which case the factions may not represent
cohesive subgroups.
To test the robustness of the solution the algorithm should be run a number of
times from different starting configurations. If there is good agreement between
these results then this is a sign that there is a clear split of the data into the
reported factions.
F Glover (1990). Tabu Search - Part II. ORSA Journal on Computing 2, 4-32.
NETWORK > EGO NETWORKS > DENSITY
PURPOSE Compute standard ego network measures for every actor in a network.
DESCRIPTION This routine systematically constructs the ego network for every actor within the
network and computes a collection of ego network measures. For directed data
both in and out networks can be considered separately or together.
PARAMETERS
Input network:
Name of file which contains network to be analyzed. Data type: Digraph.
LOG FILE A table of ego network measures. All measures exclude ties involving ego itself.
The measures include the following:
Size. The number of actors (alters) that ego is directly connected to.
Ties. The total number of ties in the ego network (not counting ties involving
ego).
Pairs. The total number of pairs of alters in the ego network -- i.e., potential ties.
Density. The number of ties divided by the number of pairs, times 100.
Diameter. The longest geodesic distance within the ego network (unless
infinite).
ReachEffic. 2-step reach as a percentage of the number of alters plus the sum of
the their network sizes.
TIMING O(N^3)
COMMENTS None
REFERENCES None
NETWORKS > EGO NETWORKS > STRUCTURAL HOLES
DESCRIPTION Compute several measures of structural holes, including all of the measures
developed by Ron Burt. The measures are computed for all nodes in the network,
treating each one in turn as ego.
PARAMETERS
Input dataset:
Name of file containing network to analyze. Data type: Directed Graph.
LOGFILE
Three tables are output. First is the set of monadic (nodal) structural hole
measures based on redundancy and constraint. The following measures are
displayed:
effsize. Burt's measure of the effective size of ego's network (essentially, the
number of alters minus the average degree of alters within the ego network, not
counting ties to ego).
efficiency. The effective size divided by the number of alters in ego's network.
The second table is the dyadic redundancy matrix. For each ego (rows) it gives
the extent to which each of its alters are tied to all of ego's other alters (i.e., the
extent to which the alter is redundant).
The third table is the dyadic constraint matrix. For each ego (rows) it gives the
extent to which it is constrained by each of its alters. Ego is contained by alter j
if (a) j represents a large proportion of ego's relational investment, and (b) if ego
is heavily invested in other people who are in turn heavily invested in j. In short,
j constrains Ego if ego is heavily invested in j directly and indirectly.
TIMING O(N^3)
REFERENCES Burt, R.S. 1992. Structural Holes: The social structure of competition.
Cambridge: Harvard University Press.
NETWORK > CENTRALITY > DEGREE
PURPOSE Calculates the degree and normalized degree centrality of each vertex and gives
the overall network degree centralization.
DESCRIPTION The number of vertices adjacent to a given vertex in a symmetric graph is the
degree of that vertex. For non-symmetric data the in-degree of a vertex u is the
number of ties received by u and the out-degree is the number of ties initiated by
u. In addition if the data is valued then the degrees (in and out) will consist of
the sums of the values of the ties. The normalized degree centrality is the degree
divided by the maximum possible degree expressed as a percentage. The
normalized values should only be used for binary data.
For a given binary network with vertices v1....vn and maximum degree centrality
cmax, the network degree centralization measure is S(cmax - c(vi)) divided by the
maximum value possible, where c(vi) is the degree centrality of vertex vi.
The routine calculates these measures and some descriptive statistics based on
these measures. Directed graphs may be symmetrized and the analysis is
performed as above, or an analysis of the in and out degrees can be performed.
PARAMETERS
Input dataset:
Name of file containing network to be analyzed. Data type: Valued Graph
Name of file which will contain degree and normalized degree centrality of each
vertex.
LOG FILE A table which contains a list of the degree and normalized degree (n Degree)
centralities expressed as a percentage for each vertex.
Descriptive statistics which give the mean, standard deviation, variance,
minimum value and maximum value for each list generated. This is followed by
the degree network centralization index expressed as a percentage.
For directed data the tables are the same as for undirected except that separate
values are calculated for in and out degrees.
TIMING O(N).
COMMENTS Degree centrality measures network activity. For valued data the non-normalized
values should be used and the degree centralization should be ignored.
DESCRIPTION The farness of a vertex is the sum of the lengths of the geodesics to every other
vertex. The reciprocal of farness is closeness centrality. The normalized
closeness centrality of a vertex is the reciprocal of farness divided by the
minimum possible farness expressed as a percentage. As an alternative to taking
the reciprocal after the summation, the reciprocals can be taken before. In this
case the closeness is the sum of the reciprocated distances so that infinite
distances contribute a value of zero. This can also be normalized by dividing by
the maximum value. In addition the routine also allows the use user to measure
distance by the sums of the lengths of all the paths or all the trails. If the data is
directed the routine calculates separate measures for in-closeness and out
closeness.
For a given network with vertices v1....vn and maximum closeness centrality cmax,
the network closeness centralization measure is S(cmax - c(vi)) divided by the
maximum value possible, where c(vi) is the closeness centrality of vertex vi.
PARAMETERS
Input dataset:
Name of file containing network to be analyzed. Data type: Digraph
Type:
Choices are:
Freeman (geodesic paths) distances are lengths of geodesic paths, the standard
Freeman measure.
Reciprocal Distances distances are the reciprocal of the lengths of the geodesic
paths.
All paths distances between actors are the sums of the distances on all paths
connecting them.
All trails distances between the actors are the sums of the distances on all trails
connecting them.
LOG FILE A table which contains a list of the farness (or closeness) and normalized
closeness centrality expressed as a percentage, for each vertex. Descriptive
statistics which give the mean, standard deviation, variance minimum value and
maximum value for both lists. This is followed by the closeness network
centralization index expressed as a percentage. If the data is directed then
separate in and out values are calculated.
TIMING O(N^3) for Freeman and reciprocal distances, the other two can be exponential.
COMMENTS Closeness centrality be thought of as an index of the expected time-until-arrival
for things flowing through the network via optimal paths.
DESCRIPTION Let bjk be the proportion of all geodesics linking vertex j and vertex k which pass
through vertex i. The betweenness of vertex i is the sum of all bjk where i, j and
k are distinct. Betweenness is therefore a measure of the number of times a
vertex occurs on a geodesic. The normalized betweenness centrality is the
betweenness divided by the maximum possible betweenness expressed as a
percentage.
For a given network with vertices v1....vn and maximum betweenness centrality
cmax, the network betweenness centralization measure is S(cmax - c(vi)) divided by
the maximum value possible, where c(vi) is the betweenness centrality of vertex
vi.
The routine calculates these measures, and some descriptive statistics based on
these measures, for symmetric and unsymmetric graphs.
PARAMETERS
Input dataset:
Name of file containing network to be analyzed. Data type: Digraph.
LOG FILE A table which contains a list of the betweenness and normalized betweenness
centrality expressed as a percentage for each vertex.
Descriptive statistics which give the mean, standard deviation, variance,
minimum value and maximum value for both lists. This is followed by the
betweenness network centralization index expressed as a percentage.
TIMING O(N^3).
PURPOSE Counts the number of nodes each node can reach in k or less steps. For k = 1, this
is equivalent to degree centrality. For directed networks, both in-reach and out-
reach are calculated.
DESCRIPTION The input is a binary network. The output is a node by distance matrix X in
which xij indicates the proportion of nodes that node i can reach in j or fewer
steps. In a connected network, each row will eventually reach 1 (100%). The
routine also calculates the eccentricity of each node. That is the distance of the
node in question to the one that is furthest away.
PARAMETERS
Input dataset:
Name of file containing network to be analyzed. Data type: Digraph
LOG FILE A table that gives the proportion of nodes reached by each node at each level of
distance. The proportion is expressed as a value from zero to one. A value of x in
row i column j means that 100x% of nodes are reachable from i in a path of
length j or less. For directed data values for those that can be reached from the
node and those that can reach the target node are reported. Descriptive statistics
which give the mean, standard deviation, variance minimum value and
maximum value for the proportion are given.
Finally the eccentricity of each node is given, for directed data both in and out
eccentricity are calculated.
TIMING O(N^2).
COMMENTS When searching for key individuals who are well positioned to reach many
people in a few number of steps, this measure provides a natural metric for
assessing each node.
REFERENCES
NETWORK > CENTRALITY > BETWEENNESS >LINES
DESCRIPTION Let bjk be the proportion of all geodesics linking vertex j and vertex k which pass
through edge i. The betweenness of edge i is the sum of all bjk where j and k are
distinct. Betweenness is therefore a measure of the number of times an edge
occurs on a geodesic.
PARAMETERS
Input dataset:
Name of file containing network to be analyzed. Data type: Digraph.
LOG FILE A matrix in which the i,j th entry gives the edge betweenness of the edge (i,j).
TIMING O(N^3).
DESCRIPTION The betweenness of each vertex is calculated and those with a score of zero are
deleted, the procedure is then repeated on the reduced graph until all vertices
have been deleted. Initially all vertices are placed in the hierarchy and then at
each level the deleted vertices are removed.
PARAMETERS
Input dataset:
Name of file containing network to be analyzed. Data type: Digraph.
LOG FILE The partition vector described above. A cluster diagram in which the columns
have been re-arranged so that actors in the same cluster at each level are
consecutive. A value of 1 in a row labeled x and column labelled j means that
actor j was in the cluster at level x.
TIMING O(N^3).
COMMENTS
PURPOSE Calculates the brokerage measures proposed by Gould & Fernandez (1989).
DESCRIPTION Given (a) a graph, and (b) a partition of nodes, this procedure calculates
measures of five kinds of brokerage. Brokerage occurs when, in a triad of nodes
A, B and C, A has a tie to B, and B has a tie to C, but A has no tie to C. That is, A
needs B to reach C, and B is therefore a broker. When A, B, and C may belong to
different groups, 5 kinds of brokerage are possible. The five kinds are named
using terminology from social roles. In the description below, the notation G(x)
is used to indicate the group that node x belongs to. Important: It is assumed that
a-->b-->c. For example, a (the source node) gives information to b (the broker),
who gives information to c (the destination node).
Coordinator. Counts the number of times b is a broker and G(a) = G(b) = G(c),
that is, all three nodes belong to the same group.
Consultant. Counts the number of times b is a broker and G(a) = G(c), but G(b)¹
G(a); that is, the broker belongs to one group, and the other two belong to a
different group.
Gatekeeper. Counts the number of times b is a broker and G(a) ¹ G(b) and G(b)
= G(c), that is, the source node belongs to a different group.
Representative. Counts the number of times b is a broker and G(a) = G(b) and
G(c) ¹ G(b). That is, the destination node belongs to a different group.
Liaison. Counts the number of times b is a broker and G(a) ¹ G(b) ¹ G(c). That
is, each node belongs to a different group.
When b is not the only intermediary between a and c, it is possible to give b only
partial credit. That is, if there are two paths of length two between a and c, one of
which involves b, we can choose to give b only 1/2 point instead of a full point.
This is an option in the program.
The routine calculates these measures for each node in the network, and also the
total of the five.
The program also computes the expected values of each brokerage measure given
the number of groups and the size of each group. That is, the expected values
under the assumption that brokerage is independent of the group status of nodes.
A final output divides the observed brokerage values by these expected scores.
Partition vector:
The name of an UCINET dataset that contains a partition of the actors. To
partition the data matrix into groups specify a vector by giving the dataset name,
a dimension (either row or column) and an integer value. For example, to use the
second row of a dataset called ATTRIB, enter "ATTRIB ROW 2". The program
will then read the second row of ATTRIB and use that information to define the
groups. All actors with identical values on the criterion vector (i.e. the second
row of attrib) will be placed in the same group.
Method: (default = 'unweighted')
Choices are 'unweighted' and 'weighted'. Unweighted directs the program to
simply count up the number of times that a given node b is in a brokering
position, regardless of how many other nodes are serving the same function with
the same pair of endpoints a and c. Weighted directs the program to give partial
scores in inverse proportion to the number of alternatives.
LOG FILE 1) A table giving the brokerage scores for each node.
2) A table giving the brokerage scores divided by the expected values.
3) A table giving the expected values.
TIMING O(n^3).
COMMENTS None
DESCRIPTION Let mjk be the amount of flow between vertex j and vertex k
which must pass through i for any maximum flow. The flow
betweenness of vertex i is the sum of all mjk where i, j and k are
distinct and j < k. The flow betweenness is therefore a measure
of the contribution of a vertex to all possible maximum flows.
For a given binary network with vertices v1....vn and maximum flow betweenness
centrality cmax, the network flow betweenness centralization measure is S(cmax -
c(vi)) divided by the maximum value possible, where c(vi) is the flow
betweenness centrality of vertex vi.
The routine calculates these measures, and some descriptive statistics based on
these measures for symmetric, unsymmetric and valued graphs.
PARAMETERS
Input dataset:
Name of file containing network to be analyzed. Data type: Valued symmetric
graph - integer values only.
LOG FILE The maximum flow matrix. This gives the maximum flow between all pairs of
vertices - the diagonals give the network size.
TIMING O(N^4).
COMMENTS The measure is based upon the concept of information flow. In valued data the
values should in some way correspond to the capacity for flow, hence valued data
should represent similarity.
DESCRIPTION Given an adjacency matrix A, the centrality of vertex i (denoted ci), is given by
ci =aSAijcj where a is a parameter. The centrality of each vertex is therefore
determined by the centrality of the vertices it is connected to. The parameter á is
required to give the equations a non-trivial solution and is therefore the
reciprocal of an eigenvalue. It follows that the centralities will be the elements of
the corresponding eigenvector. The normalized eigenvector centrality is the
scaled eigenvector centrality divided by the maximum difference possible
expressed as a percentage.
For a given binary network with vertices v1....vn and maximum eigenvector
centrality cmax, the network eigenvector centralization measure is S(cmax - c(vi))
divided by the maximum value possible, where c(vi) is the eigenvector centrality
of vertex vi.
This routine calculates these measures and some descriptive statistics based on
these measures. This routine only handles symmetric data and in these
circumstances the eigenvalues provide a measure of the accuracy of the centrality
measure. To help interpretation the routine calculates all positive eigenvalues but
only gives the eigenvector corresponding to the largest eigenvalue.
PARAMETERS
Input dataset:
Name of file containing network to be analyzed. Data type: Valued Graph
(Symmetric data only).
LOG FILE A table of positive eigenvalues. The eigenvalues are placed in descending order
under the heading VALUE. The table gives information on 'how dominant' the
largest eigenvalue is. The table gives the percentage and cumulative percentage
of the total eigenvalue sum for each eigenvalue. The ratio of each eigenvalue to
the next largest is also presented.
TIMING O(N^3).
COMMENTS The ratio of the largest eigenvalue to the next largest should be at least 1.5 and
preferably 2.0 or more for the centrality measure to be robust. If this is not the
case then a full factor analysis should be undertaken.
REFERENCES Bonacich P (1972). Factoring and Weighting Approaches to status scores and
clique identification. Journal of Mathematical Sociology 2, 113-120.
NETWORK > CENTRALITY > POWER
PURPOSE Compute Bonacich's power based centrality measure for every vertex and give an
overall network centralization index for this centrality measure.
DESCRIPTION Given an adjacency matrix A, the centrality of vertex i (denoted ci), is given by
ci =SAij(a+bcj) where a and b are parameters. The centrality of each vertex is
therefore determined by the centrality of the vertices it is connected to.
For a given binary network with vertices v1....vn and maximum degree centrality
cmax, the network degree centralization measure is S(cmax - c(vi)) divided by the
maximum value possible, where c(vi) is the degree centrality of vertex vi.
The routine calculates power centrality and some descriptive statistics of the
measure.
PARAMETERS
Input dataset:
Name of file containing network to be analyzed. Data type: Valued Graph
(Symmetric data only).
LOG FILE A table which contains the power centrality of each actor.
Descriptive statistics which give the mean, standard deviation, variance,
minimum value and maximum value for the measure.
TIMING O(N^3).
COMMENTS It is advisable to select b so that its absolute value is less than the absolute value
of the reciprocal of the largest eigenvalue of the adjacency matrix. An upper-
bound on the eigenvalues can be obtained by the largest row or (column) sums of
the matrix.
REFERENCES Bonnacich P (1987). Power and Centrality: A family of Measures. American
Journal of Sociology 92, 1170-1182.
NETWORK>CONNECTIONS>HUBBEL/KATZ (INFLUENCE)
PURPOSE Calculate the influence measure between every pair of vertices using the models
of Hubbell, Katz or Taylor.
For Hubbell the influence matrix is I + S(bA)^i that equals inverse of (I - bA)
under certain conditions. It follows that for Katz the influence matrix is inverse
of (I - bA) -I under the same condition. Taylor's measure is a normalized version
of the Katz measure. For each power in the series subtract the column marginals
from the row marginals and normalize by the total number of walks of that
length.
PARAMETERS
Input dataset:
Name of file containing network to be analyzed. Data type: Valued graph.
Computational Method:
Choices are:
Taylor - takes the Katz influence matrix and takes the column marginals from the
row marginals and normalizes.
TIMING O(N^3).
COMMENTS None.
REFERENCES Hubbell C H (1965). 'An input-output approach to clique identification'.
Sociometry, 28, 377-399.
Katz L (1953). 'A new status index derived from sociometric data analysis'.
psychometrika, 18, 34-43.
DESCRIPTION The weighted function of the set of all paths connecting vertex i to vertex j is any
weighted linear combination of the paths such that the sum of the weights is
unity. Assuming that each link in a path is independent, and the variance of a
single link is unity, it can be concluded that the variance of a path is simply its
length.
The information measure between two vertices i and j is the inverse of the
variance of the weighted function. The information centrality of a vertex i is the
harmonic mean of all the information measures between i and all other vertices in
the network.
The routine calculates these measures and some descriptive statistics based on
these measures for symmetric graphs.
PARAMETERS
Input dataset:
Name of file containing network to be analyzed. Data type: Graph.
LOG FILE A table which contains a list of the information content Together with descriptive
statistics which give the mean, standard deviation, variance, minimum value and
maximum value.
TIMING O(N^3).
COMMENTS None
REFERENCES Stephenson K and Zelen M (1991). 'Rethinking Centrality'. Social Networks 13.
NETWORKS>CENTRALITY>MULTIPLE MEASURES
DESCRIPTION Only normalized versions of the measures for undirected data are given . There
are no descriptive statistics nor are there any centralization measures.
TIMING O(N^2).
COMMENTS
DESCRIPTION The group degree centrality of a group of actors is the size of the set of actors
who are directly connected to group members. This routine uses a simple greedy
algorithm to optimize this measure for a fixed size group. Local minima are
avoided by taking a number of different random starting configurations.
PARAMETERS
Input dataset:
Name of file containing network to be analyzed. Data type: Graph.
LOG FILE The fit is the percentage of actors (both within and outside) adjacent to group
members. The starting fit, final fit and the number of actors together with the
final number of actors connected to the group are reported. This is followed by a
list of the members of the group with the highest group degree centrality.
TIMING O(N^2).
COMMENTS Note that this routine just finds one group. There could be many others.
REFERENCE Everett, M.G. and Borgatti, S.P. (1999) The Centrality of Groups and Classes.
Journal of Mathematical Sociology 23 181-202.
GROUP > CENTRALITY > DEGREE > TEST
PURPOSE Performs a permutation test to assess whether a specified group has a high degree
group centrality score.
DESCRIPTION The group degree centrality of a group of actors is the size of the set of actors
who are directly connected to group members. This routine uses a simple
sampling procedure to test whether a specified group has a higher group degree
centrality measure than those produced at random.
Central Group
Name of UCINET file containing a column vector which specifies the actors in
the specified group. A 1 in row j indicates that actor j is in the group and a 0
indicates that the actor is not a member.
LOG FILE The group degree centrality for the specified data set, this is labelled as the
observed # reached. The mean and standard deviation of the group centrality for
the random samples. Finally the number of times, expressed as a p-value, that a
random sample achieved a group centrality score as high or higher than the
specified group.
TIMING O(N^2).
COMMENTS None
REFERENCE Everett, M.G. and Borgatti, S.P. (1999) The Centrality of Groups and Classes.
Journal of Mathematical Sociology 23 181-202.
NETWORK > CORE/PERIPHERY > CONTINUOUS
DESCRIPTION Simultaneously fits a core/periphery model to the data network and estimates the
degree of coreness or closeness to the core of each actor. This is done by finding
a vector C such that the product of C and C transpose is as close as possible to
the original data matrix. In addition a number of measures which try to assess
the degree to which the network falls into a core/periphery structure for different
sizes of core are calculated. Each measure starts with the actor with the highest
coreness score and places them in the core and all other actors are placed in the
periphery. The core is then successively increased by moving the actor with the
highest coreness score from the periphery into the core. This is continued until
the periphery consists of a single actor. nDiff is a generalization of centralization
and sums the differences between the actor in the core with the lowest coreness
score with all those in the periphery and adds to this the sum of the difference
between the actor with the highest score in the periphery and all the actors in the
core. This value is then normalized. Diff is similar but places a weighting on the
size of the core, this weighting is equal to the square root of the core size and so
the measure gives greater value to smaller cores. The correlation measure
correlates the given coreness scores with the ideal scores of a one for every core
member and a zero for actors in the periphery. Finally, Ident is the same as the
correlation measure but uses Euclidean distance in place of correlation.
Prevent Negatives:
It is possible for the best C to contain negative values, choosing yes prevents this
happening.
LOG FILE The correlation or Euclidean distance between the model and the data at the start
and end of the optimization procedure together with the number of iterations
required. Minres option just gives the final correlation.
The coreness of each actor, this has been normalized so that the sum of squares is
one. Followed by some descriptive statistics including gini coefficients and an
heterogeneity measure. The gini coefficient measures how the scores are
distributed over the population and measures the amount of inequality in the
data. If everyone had the same score it gives a value of zero, if a single actor had
a value of 1 and everyone else had a score of zero it gives a value of 1. The
composite score is an adjusted measure which takes account of the fact that we
are looking for core-periphery structures. The heterogeneity measure is based on
a simple summing of proportions which measures the extent to which the scores
are evenly distributed.
This is followed by a table of the four concentration measures which assess the
extent to which the data fits a core periphery structure. Each column gives a
different measure, the value in row i places the i actors with the highest coreness
in the core and the remainder in the periphery.
This is followed by a recommended core size based on the correlation measure.
See the comments below.
Finally the expected values are given, this is C times C transpose and then
normalized so that it has the same mean and standard deviation as the data.
TIMING O(N^3)
COMMENTS The concentration measures can need careful interpretation. If nDiff has a clear
maxima which is not at 1 or n-1 then this indicates a solid core periphery
structure. Often nDiff has a number of maxima indicating that there are a group
of actors situated between the core and the periphery. If the user still wishes to
specify a core then the other measures can be used. Diff is a biased measure and
gives more weight to smaller cores and again if this has a clear maxima this can
indicate a core. If this does not yield any conclusive results or there is no
requirement to favor smaller cores then it is recommended that the correlation is
used together with nDiff or Diff. The correlation measure can indicate an area in
which to focus and the other measures can be used to fine tune the measure to
identify a core size. Ident should be used in the same way as correlation but it
places more weight on the absolute scores.
PARAMETERS
Input dataset:
Name of file containing network to be analyzed. Data type:
Valued Digraph.
CORR
The fit function is the correlation between the permuted data
matrix and an ideal structure matrix consisting of ones in the
core block interactions and zeros in the peripheral block
interactions. This value is maximized.
DENSITY
The fit function is the density of the core block interactions.This
value is maximized.
SXY
The fit function is the element wise product of the permuted
data matrix and an ideal structure matrix consisting of ones in
the core block interactions and zeros in the peripheral block
interactions.This value is maximized.
EMPTYPER
The fit function is the number of entries in the peripheral block
interactions. This value is minimized.
LOG FILE The starting and the final correlation of the ideal structure and
the permuted adjacency matrix (regardless of which option was
chosen). A listing of the members of the core and the periphery.
A blocked adjacency matrix dividing the actors into the core and
periphery.
The algorithm seeks to find the minima (maxima) of the cost function. Even if
successful this result may still be a high (low) value in which case the partition
may not represent a core/periphery model.
To test the robustness of the solution the algorithm should be run a number of
times from different starting configurations. If there is good agreement between
these results then this is a sign that there is a clear split of the data into a
core/periphery structure.
DESCRIPTION The profile of an actor is the row vector corresponding to the actor in the
adjacency matrix. Multiple relations are permissible and the profile vector is the
concatenation of each individual relation profile vector. This matrix can be real
or binary.
Structurally equivalent actors have the same profile except for the diagonal
entries of the adjacency matrix. This routine compares the profile vectors of all
pairs of actors and hence computes a measure of profile similarity. Measures of
similarity can be made using Euclidean distance, Pearson correlation, exact
matches or matches of positive entries only. Euclidean distance produces a
distance matrix and all the other options produce a similarity matrix. This matrix
is then analyzed by single link hierarchical clustering.
PARAMETERS
Input dataset:
Name of file containing network to be analyzed. Data type: Multirelational.
Ignore - Diagonals are treated as missing values so that the comparisons of xii
with xji and xij with xjj are dropped.
Retain - Profile vectors are compared directly element by element, including the
xii and xjj elements.
LOG FILE Single link hierarchical clustering dendrogram (or tree diagram) of the structural
equivalence matrix. The level at which any pair of actors are aggregated is the
point at which both can be reached by tracing from the start to the actors from
right to left. The diagram can be printed or saved. Parts of the diagram can be
viewed by moving the mouse to the split point in a tree diagram or the beginning
of a line in the dendrogram and clicking. The first click will highlight a portion of
the diagram and the second click will display just the highlighted portion. To
return to the original right click on the mouse. There is also a simple zoom
facility simply change the values and then press enter. If the labels need to be
edited (particularly the scale labels) then you should take the partition indicator
matrix into the spreadsheet editor remove or reduce the labels and then submit
the edited data to Tools>Dendrogram>Draw.
Behind the plot is the actor by actor structural equivalence matrix. This is
followed by an alternative clustering diagram representing the same information
as above. The columns are rearranged and labeled. A '·' in column label j at level
x means that actor j is not in any cluster at level x. An x indicates that actor j is
in a cluster at this level together with those actors which can be traced across that
row without encountering a space.
TIMING O(N2).
COMMENTS None.
DESCRIPTION Given an adjacency matrix, or a set of adjacency matrices for different relations,
a correlation matrix can be formed by the following procedure. Form a profile
vector for a vertex i by concatenating the ith row in every adjacency matrix; the
i,jth element of the correlation matrix is the Pearson correlation coefficient of the
profile vectors of i and j. This (square, symmetric) matrix is called the first
correlation matrix.
CONCOR uses the above technique to split the initial data into two blocks.
Successive splits are then applied to the separate blocks. At each iteration all
blocks are submitted for analysis, however blocks containing two vertices are not
split. Consequently n-partitions of the binary tree can produce up to 2n blocks.
PARAMETERS
Input dataset:
Name of file containing network to be analyzed. Data type: Multirelational.
Ignore - Diagonals are treated as missing values so that the comparisons of xii
with xji and xij with xjj are dropped.
Retain - Profile vectors are compared directly element by element, including the
xii and xjj elements.
LOG FILE The correlation matrix constructed during the first iteration.
Blocks represented in terms of a clustering dendrogram. The blocks are given for
each level specified in 'Max # of partitions'. The level at which any pair of actors
are aggregated is the point at which both can be reached by tracing from the start
to the actors from right to left. Hence to find all members of vertex i's block at
level k simply locate the value of k on the line connected to i then all actors that
can be reached from this point by tracing to the left are in i's block. The diagram
can be printed or saved. Parts of the diagram can be viewed by moving the
mouse to the split point in a tree diagram or the beginning of a line in the
dendrogram and clicking. The first click will highlight a portion of the diagram
and the second click will display just the highlighted portion. To return to the
original right click on the mouse. There is also a simple zoom facility simply
change the values and then press enter. If the labels need to be edited
(particularly the scale labels) then you should take the partition indicator matrix
into the spreadsheet editor remove or reduce the labels and then submit the edited
data to Tools>Dendrogram>Draw.
Behind the dendrogram is the correlation matrix constructed during the first
iteration. Followed by an alternative cluster diagram. Members of the same block
are connected by row of X's. Hence to find all members of vertex i's block at
level k simply locate the X in column label i at level k and trace along in both
directions until a space is encountered. All column labels corresponding to the
Xs found are members of i's block. A '·' indicates a singleton block.
A blocked adjacency matrix. The rows and columns of the original adjacency
matrix are permuted into blocks. The adjacency matrix is displayed in terms of
the matrix blocks it contains.
The correlation coefficient R-squared of the partitioned data matrix and an ideal
structure matrix. The structure matrix has the same dimension as the data matrix
but each cell in a block is set to the average value of the corresponding block in
the data matrix.
COMMENTS The algorithm splits every non-trivial block at every level. The user may wish to
reject a split at some level - since the history of all splits are given it is a simple
matter to recombine clusters if the user so wishes.
PARAMETERS
Input dataset:
Name of file containing network to be analyzed. Data type:
Graph.
Additional
Are diagonal values valid? (Default = NO)
Whether diagonals are to be included in cost function.
LOG FILE The number of errors and the R-squared value for the initial partition. The R-
squared value is the correlation coefficient of the partitioned data
matrix and an ideal structure matrix. The structure matrix has
the same dimension as the data matrix but each block is set to a
one or zero corresponding to the nearest block in the data matrix.
The final number or errors the R-squared value and the errors in each block after
the optimization.
PURPOSE Optimizes a cost function which measures the degree to which a partition forms
structurally equivalent blocks using a tabu search method.
DESCRIPTION A partition of a network divides the adjacency matrix into matrix blocks. The
variance of the elements of a matrix block gives a measure of an extent to which
the elements within the matrix block conform to structural equivalence. The sum
of the variances of all the matrix blocks gives a measure or cost function of the
degree of structural equivalence for a given partition. The routine attempts to
optimize this cost function to try and find the best partition of the vertices into a
specified number of blocks.
PARAMETERS
Input dataset:
Name of file containing network to be analyzed. Data type: Valued graph.
Additional
Are diagonal values valid? (Default = NO)
Whether diagonals are to be included in cost function.
LOG FILE The correlation coefficient R-squared of the partitioned data matrix and an ideal
structure matrix. The structure matrix has the same dimension as the data matrix
but each cell in a block is set to the average value of the corresponding block in
the data matrix.
List of blocks. Each block is labeled and is specified by the vertices it contains.
The blocked adjacency matrix. The rows and columns of the original adjacency
matrix are permuted into blocks. The adjacency matrix is displayed in terms of
the matrix blocks it contains.
The algorithm seeks to find the minima of the cost function. Even if successful
this result may still have a high value in which case the blocking may not
conform very closely to structural equivalence.
To test the robustness of the solution the algorithm should be run a number of
times from different starting configurations. If there is good agreement between
these results then this is a sign that there is a clear split of the data into the
reported blocks.
Glover F (1990). Tabu Search - Part II. ORSA Journal on Computing 2, 4-32.
NETWORK > ROLES > EXACT > OPTIMIZATION
PURPOSE Optimizes a cost function that gives an approximate measure of the degree to
which a partition corresponds to automorphically equivalent sets using a tabu
search.
DESCRIPTION Two vertices u and v of a labelled graph G are automorphically equivalent if all
the vertices can be relabelled to form an isomorphic graph with the labels of u
and v interchanged. Given a partition of the network then the partition divides the
adjacency matrix into blocks. For an automorphic partition the cell values for a
row or column within a block will have the same distribution of values. An
approximate measure of the extent to which these blocks conform to automorphic
equivalence is given by the following procedure. For each block calculate the
variance of the sum of squares of each row and the variance of the sum of
squares of each column. The approximate automorphic cost is the sum of all
these variances for every block. The routine attempts to optimize this cost
function to try and find the best partition of the vertices into a specified number
of blocks.
PARAMETERS
Input dataset:
Name of file containing network to be analyzed. Data type: Valued graph.
List of blocks. Each block is labelled and is specified by the vertices it contains.
The blocked adjacency matrix. The rows and columns of the original adjacency
matrix are permuted into blocks. The adjacency matrix is displayed in terms of
the matrix blocks it contains.
The algorithm seeks to find the minima of the cost function. Even if successful
this result may still have a high value in which case the blocking may not
conform very closely to automorphic equivalence. In addition there may be a
number of alternative partitions that also produce the minimum value; the
algorithm does not search for additional solutions. Finally it is possible that the
routine terminates at a local minima and does not locate the desired global
minima.
To test the robustness of the solution the algorithm should be run a number of
times from different starting configurations. If there is good agreement between
these results then this is a sign that there is a clear split of the data into the
reported blocks.
REFERENCES Glover F (1989). Tabu Search - Part I. ORSA Journal on Computing 1, 190-206.
Glover F (1990). Tabu Search - Part II. ORSA Journal on Computing 2, 4-32.
NETWORKS>ROLES&POSITIONS>AUTOMORPHIC>ALL PERMUTATIONS
DESCRIPTION Two vertices u and v of a labelled graph G are automorphically equivalent if all
the vertices can be relabelled to form an isomorphic graph with the labels of u
and v interchanged. Automorphic equivalence is an equivalence relation and
therefore partitions the vertices into equivalence classes called orbits. This
routine finds the orbits by examining all possible relabelings of the graph. For a
graph of n vertices there are n! possible Permutations of the labels.
PARAMETERS
Input dataset:
Name of file containing network to be analyzed. Data type: Digraph.
The percentage of all permutations that produced an isomorphic graph (the hit
rate).
TIMING Exponential.
COMMENTS Computation time for this routine is very slow. It is inadvisable to try this on
graphs with more than 10 vertices, impossible on graphs with more than 15.
REFERENCES
NETWORK > ROLES > EXACT >EXCATREGE
PURPOSE Computes a single link hierarchical clustering and a measure of regular
equivalence for binary or nominal data using exact categorical REGE.
DESCRIPTION Two actors are exactly regularly equivalent if they are exactly equally related to
equivalent others. Nominal data is any integer valued adjacency matrix in which
the value represents a coding of the relationship in terms of a category.
For example, we could use 1 to represent close friend, 2 to represent friend and 3
to represent works with. The values 1, 2 and 3 DO NOT measure the strength of
the relationship, they simply refer to the categories.
Two actors are regularly equivalent for nominal data if in addition to the normal
regularity condition they relate to equivalent others in the same category.
For nominal data the initial categories are included at the first iteration. The
process is easily extended to multiple relations.
From this procedure a similarity matrix can be formed with entries which give
the value of the iteration at which vertices were separated into different
categories.
Initially the procedure places all vertices in the same category; or into user
specified categories. Subsequent iterations split the groups into hierarchical
clusters.
PARAMETERS
Input dataset
Name of file containing dataset to be analyzed. Data type: Valued graph - integer
values. Multirelational.
LOG FILE Single link hierarchical clustering dendrogram (or tree diagram) of the regular
similarity measure. The level at which any pair of actors are aggregated is the
point at which both can be reached by tracing from the start to the actors from
right to left. Each level corresponds to an iteration, level 1 represents the initial
clustering specified in PARAMETERS. The top level gives strict regular
equivalence clusters. The higher the level the greater the degree of regular
equivalence The diagram can be printed or saved. Parts of the diagram can be
viewed by moving the mouse to the split point in a tree diagram or the beginning
of a line in the dendrogram and clicking. The first click will highlight a portion of
the diagram and the second click will display just the highlighted portion. To
return to the original right click on the mouse. There is also a simple zoom
facility simply change the values and then press enter. If the labels need to be
edited (particularly the scale labels) then you should take the partition indicator
matrix into the spreadsheet editor remove or reduce the labels and then submit
the edited data to Tools>Dendrogram>Draw.
Behind the dendrogram is an alternative cluster diagram. The columns have been
rearranged and labeled. A '·' in row labeled i column label j indicates that vertex
j is in a singleton cluster at level i. An 'X' indicates that vertex j is in a non-trivial
cluster at level i, all other members of j's cluster are found by tracing along the
row labeled i in both directions from column j until a space is encountered in
each direction. The column labels corresponding to an 'X' which are connected
to j's X are all members of j's cluster at level i.
An actor by actor exact similarity matrix. A k in row i column j means that actor
i and j were separated at level k, provided k is less than the value on the diagonal.
If k is equal to the value on the diagonal then i and j are exactly regularly
equivalent.
TIMING O(N^3).
COMMENTS None.
REFERENCES Everett M G (1996) and S.P.Borgatti Exact colorations of graphs and digraphs.
Social Networks 18, 319-331.
NETWORK > ROLES & POSITIONS> EXACT > MAXSIM
PURPOSE Calculate a measure of approximate exact equivalence for valued data.
The sorted profile of vertex i of a valued network is the row vector of i with the
elements placed in ascending order. The maxsim distance is the Euclidean
distance between the sorted profile of a pair of vertices. For directed data the
column profiles are automatically concatenated on to the row profiles.
PARAMETERS
Input dataset:
Name of file containing network to be analyzed. Data type: Valued graph. Binary
data is automatically converted to a reciprocal distance matrix.
LOG FILE Single link hierarchical clustering dendrogram (or tree diagram) of the maxsim
distance matrix. The level at which any pair of actors are aggregated is the point
at which both can be reached by tracing from the start to the actors from right to
left. The diagram can be printed or saved. Parts of the diagram can be viewed by
moving the mouse to the split point in a tree diagram or the beginning of a line in
the dendrogram and clicking. The first click will highlight a portion of the
diagram and the second click will display just the highlighted portion. To return
to the original right click on the mouse. There is also a simple zoom facility
simply change the values and then press enter. If the labels need to be edited
(particularly the scale labels) then you should take the partition indicator matrix
into the spreadsheet editor remove or reduce the labels and then submit the
edited data to Tools>Dendrogram>Draw.
Behind the plot is the actor by actor maxsim matrix. This is followed by an
alternative clustering diagram representing the same information as above. The
columns are rearranged and labeled. A '·' in column label j at level x means that
actor j is not in any cluster at level x. An x indicates that actor j is in a cluster at
this level together with those actors which can be traced across that row without
encountering a space.
TIMING O(N^3).
COMMENTS This algorithm is not suitable for data in which the values have low variance or
are sparse.
REFERENCES Everett M G (1985). 'Role similarity and complexity in social networks'. Social
Networks 7, 353-359.
DESCRIPTION Two actors are regularly equivalent if they are equally related to equivalent
others. REGE is an iterative algorithm, within each iteration a search is
implemented to optimize a matching function.
The matching function between vertices i and j is based upon the following. For
each k in i's neighborhood search for an m in j's neighborhood of similar value.
A measure of similar values is based upon the absolute difference of magnitudes
of ties. This measure is then weighted by the degree of equivalence between k
and m at the previous iteration. It is this match that is optimized. This is
summed for all members of i's neighborhood over all relations and normalized to
provide the current iteration's measure of equivalence between i and j. The
procedure is repeated for all pairs of vertices for a fixed number of iterations.
PARAMETERS
Input dataset:
Name of file containing data to be analyzed Data type: Valued graph.
Multirelational.
Undirected data will give a trivial result with all non-isolate vertices being
equivalent.
LOG FILE Single link hierarchical clustering dendrogram (or tree diagram) of the regular
similarity measure. The level at which any pair of actors are aggregated is the
point at which both can be reached by tracing from the start to the actors from
right to left. The diagram can be printed or saved. Parts of the diagram can be
viewed by moving the mouse to the split point in a tree diagram or the beginning
of a line in the dendrogram and clicking. The first click will highlight a portion of
the diagram and the second click will display just the highlighted portion. To
return to the original right click on the mouse. There is also a simple zoom
facility simply change the values and then press enter. If the labels need to be
edited (particularly the scale labels) then you should take the partition indicator
matrix into the spreadsheet editor remove or reduce the labels and then submit
the edited data to Tools>Dendrogram>Draw.
Behind the dendrogram is an alternative cluster diagram. The columns have been
rearranged and labeled. A '·' in row labeled i column label j indicates that vertex
j is in a singleton cluster at level i. An 'X' indicates that vertex j is in a non-trivial
cluster at level i, all other members of j's cluster are found by tracing along the
row labeled i in both directions from column j until a space is encountered in
each direction. The column labels corresponding to an 'X' which are connected
to j's X are all members of j's cluster at level i.
An actor by actor REGE similarity matrix. Values vary between 0 and 100. A
value of 100 indicates strict regular equivalence.
TIMING O(N^5).
COMMENTS The values obtained for non-equivalent vertices are not robust measures of
equivalence. The number of iterations affects these values there is little
correlation between the values from one iteration to the next, even at the rank
order level. This situation is improved if the number of iterations are increased.
For these reasons users with binary or nominal data are advised to use
CATEGORICAL REGE
REFERENCES White D R (1984). REGE: A regular graph equivalence algorithm for computing
role distances prior to block modelling. Unpublished manuscript. University of
California, Irvine.
DESCRIPTION Two actors are regularly equivalent if they are equally related to equivalent
others. Nominal data is any integer valued adjacency matrix in which the value
represents a coding of the relationship in terms of a category.
For example, we could use 1 to represent close friend, 2 to represent friend and 3
to represent works with. The values 1, 2 and 3 DO NOT measure the strength of
the relationship, they simply refer to the categories.
Two actors are regularly equivalent for nominal data if in addition to the normal
regularity condition they relate to equivalent others in the same category.
For nominal data the initial categories are included at the first iteration. The
process is easily extended to multiple relations.
From this procedure a similarity matrix can be formed with entries which give
the value of the iteration at which vertices were separated into different
categories.
Initially the procedure places all vertices in the same category; or into user
specified categories. Subsequent iterations split the groups into hierarchical
clusters.
PARAMETERS
Input dataset
Name of file containing dataset to be analyzed. Data type: Valued graph - integer
values. Multirelational.
LOG FILE Single link hierarchical clustering dendrogram (or tree diagram) of the regular
similarity measure. The level at which any pair of actors are aggregated is the
point at which both can be reached by tracing from the start to the actors from
right to left. Each level corresponds to an iteration, level 1 represents the initial
clustering specified in PARAMETERS. The top level gives strict regular
equivalence clusters. The higher the level the greater the degree of regular
equivalence The diagram can be printed or saved. Parts of the diagram can be
viewed by moving the mouse to the split point in a tree diagram or the beginning
of a line in the dendrogram and clicking. The first click will highlight a portion of
the diagram and the second click will display just the highlighted portion. To
return to the original right click on the mouse. There is also a simple zoom
facility simply change the values and then press enter. If the labels need to be
edited (particularly the scale labels) then you should take the partition indicator
matrix into the spreadsheet editor remove or reduce the labels and then submit
the edited data to Tools>Dendrogram>Draw.
Behind the dendrogram is an alternative cluster diagram. The columns have been
rearranged and labeled. A '·' in row labeled i column label j indicates that vertex
j is in a singleton cluster at level i. An 'X' indicates that vertex j is in a non-trivial
cluster at level i, all other members of j's cluster are found by tracing along the
row labeled i in both directions from column j until a space is encountered in
each direction. The column labels corresponding to an 'X' which are connected
to j's X are all members of j's cluster at level i.
An actor by actor exact similarity matrix. A k in row i column j means that actor
i and j were separated at level k, provided k is less than the value on the diagonal.
If k is equal to the value on the diagonal then i and j are regularly equivalent.
TIMING O(N^3).
COMMENTS None.
REFERENCES Borgatti S P and Everett M G (1989). The class of all regular equivalences:
algebraic structure and computation. Social Networks 11, 65-88.
DESCRIPTION Two actors are regularly equivalent if they are equally related to equivalent
others. Given a partition of a network then the partition divides the adjacency
matrix into matrix blocks. In a binary matrix the partition is regular if each
block either contains all zeros (a zero block) or at least one 1 in every row and
every column (a one-block). A measure of the extent to which a partition is
regular is therefore given by the minimum number of changes required to the
elements of the adjacency matrix to satisfy this criteria.
This cost function assumes that any block above a certain specified density will
be changed to a one-block and below this density to a zero-block. The routine
attempts to optimize this cost function to try and find the best partition of the
vertices into a specified number of blocks.
PARAMETERS
Input dataset
Name of file containing dataset to be analyzed. Data type: Digraph.
This search direction is a mildest ascent direction and from there new search
directions are explored. This exploration only continues for a fixed number of
iterations in a series. If no improvement is made after the fixed number of
iterations the algorithm terminates with the current minimum. Increasing the
parameter gives a more exhaustive and therefore slower search.
The larger the value the more difficult it will be to come back to a previously
explored local minimum, however it will also be more difficult to explore the
vicinity of that minimum.
LOG FILE The value of the cost function or fit. A value of zero represents exact regular
equivalence.
List of blocks. Each block is labeled and is specified by the vertices it contains.
The blocked adjacency matrix. The rows and columns of the original adjacency
matrix are permuted into blocks. The adjacency matrix is displayed in terms of
the matrix blocks it contains.
The algorithm seeks to find the minima of the cost function. Even if successful
this result may still have a high value in which case the blocking may not
conform very closely to regular equivalence.
To test the robustness of the solution the algorithm should be run a number of
times from different starting configurations. If there is good agreement between
these results then this is a sign that there is a clear split of the data into the
reported blocks.
REFERENCES Glover F (1989). Tabu Search - Part I. ORSA Journal on Computing 1, 190-
206.
Glover F (1990). Tabu Search - Part II. ORSA Journal on Computing 2, 4-32.
DESCRIPTION Three vertices u,v,w taken from a directed graph are transitive if whenever vertex
u is connected to vertex v and vertex v is connected to vertex w then vertex u is
connected to vertex w. The density of transitive tripes is the number of triples
which are transitive divided by the number of paths of length 2, i.e. the number
of triples which have the potential to be transitive.
This definition can be extended to valued data. Strong transitivity occurs only if
the final edge is stronger than the two in the original path. This can be relaxed so
that the user can define the minimum value of the final edge (weak transitivity).
For distances transitivity can be defined in terms of the number of triples
satisfying the triangle inequality, and for probabilities in terms of the product of
probabilities of the edges.
PARAMETERS
Input dataset
Name of file containing dataset to be analyzed. Data type: Valued graph.
Adjacency - A triple xik,xij,xjk is transitive if xik is 1 whenever xij and xjk are both
1.
LOG FILE Number of non-vacuous transitive triples, number of triples, number of triples in
which i -j-k is a path, then the number of non-vacuous transitive triples expressed
as a percentage of number of triples and number of triples in which i -j-k is a
path.
TIMING O(N^3).
REFERENCES None.
NETWORK > PROPERTIES > DENSITY
PURPOSE Calculate the density of a network or matrix.
DESCRIPTION The density of a binary network is the total number of ties divided by the total
number of possible ties. For a valued network it is the total of all values divided
by the number of possible ties. In this case the density gives the average value.
The routine will perform the analysis for non-square matrices.
PARAMETERS
Input dataset
Name of file containing dataset to be analyzed. Data type: Valued graph.
TIMING O(N^2)
COMMENTS None.
REFERENCES None.
NETWORK>PROPERTIES>E-I INDEX
PURPOSE Calculate the E-I index of a partition of a network and perform a permutation
test to evaluate its significance.
DESCRIPTION Given a partition of a network into a number of mutually exclusive groups then
the E-I index is the number of ties external to the groups minus the number of
ties that are internal to the group divided by the total number of ties. This value
can range from 1 to -1, but for a given network density and group sizes its range
may be restricted and so it can be rescaled. The index is also calculated for each
group and for each individual actor. A permutation test is performed to see
whether the network E-I index is significantly higher or lower than expected.
PARAMETERS
Input Dataset
Name of UCINET dataset to analyzed. Data type: Valued Graph.
Attribute
The name of an UCINET dataset that contains a partition of the actors. To
partition the data matrix into groups specify a vector by giving the dataset name,
a dimension (either row or column) and an integer value. For example, to use the
second row of a dataset called ATTRIB, enter "ATTRIB ROW 2". The program
will then read the second row of ATTRIB and use that information to define the
groups. All actors with identical values on the criterion vector (i.e. the second
row of attrib) will be placed in the same group.
LOG FILE Recoding of the attribute vector used to partition the dataset followed by a
blocked density matrix corresponding to the groups.
A table which gives the whole network results, these include the frequencies in
the observed data followed by a column that gives these frequencies as a
percentage of the total number of ties in the data, the third column gives the
maximum possible given the group sizes, the final column headed density gives
the observed divided by the maximum possible for the internal and external ties
with the final entry in the E-I column giving the value of the E-I index if all the
observed ties had been evenly spread within and between the groups ie the
expected value. The important values from the table are then reproduced together
with the rescaled E-I index.
The results of the permutation test are presented in a table. The observed values
are repeated in column 1, the next 4 cols give the minimum, mean, maximum and
standard deviation derived from the permutation test. This is followed by the
number of times the random test obtains a value greater than or equal to the
observed and less than or equal to the observed. This are expressed as a
probability and can be used as p values.
TIMING O(N)
COMMENTS None
REFERENCES Krackhardt, David and Robert N. Stern (1988). Informal networks and
organizational crises: an experimental simulation. Social Psychology Quarterly
51(2), 123-140.
NETWORK>PROPERTIES>CLUSTERING COEFFICIENT
PURPOSE Calculate the clustering coefficient of every actor and the clustering and
weighted clustering coefficient of the whole network.
DESCRIPTION The clustering coefficient of an actor is the density of its open neighborhood. The
overall clustering coefficient is the mean of the clustering coefficient of all the
actors. The weighted overall clustering coefficient is the weighted mean of the
clustering coefficient of all the actors each one weighted by its degree. This last
figure is exactly the same as the transitivity index of each transitive triple
expressed as a percentage of the triples in which there is a path from i to j. See
NETWORKS>PROPERTIES>TRANSITIVITY.
PARAMETERS
Input network dataset:
Name of file containing dataset to be analyzed. Data type: Digraph.
LOG FILE The overall clustering coefficient and the weighted overall clustering coefficient.
A table with the actor level clustering coefficient together with their degree.
TIMING O(N^2)
COMMENTS None.
REFERENCES Watts D J (1999) Small worlds. Princeton University Press, Princeton, New
Jersey.
2-MODE>CATEGORICAL CORE/PERIPHERY
PURPOSE Uses a genetic algorithm to fit a core/periphery model to two mode data.
DESCRIPTION Simultaneously fits a core/periphery model to the data network, and identifies
which actors belong in the core and which belong in the periphery and which
events belong in the core and which events belong in the periphery. The rows and
columns are partitioned independently. The fit is simply the correlation between
the data matrix and an idealized structure matrix in which there is a one in the
core block interactions and a zero in the peripheral block interactions.
PARAMETERS
Input dataset:
Name of file containing two-mode network to be analyzed. Data type: Matrix.
LOG FILE The starting and the final correlation of the ideal structure and the permuted
incidence matrix . A blocked incidence matrix dividing the actors and events
independently into the core and periphery.
The algorithm seeks to find the maxima of the cost function. Even if successful
this result may still be a low value in which case the partition may not represent a
core/periphery model.
To test the robustness of the solution the algorithm should be run a number of
times from different starting configurations. If there is good agreement between
these results then this is a sign that there is a clear split of the data into a
core/periphery structure.
PARAMETERS
Input dataset:
Name of file containing two-mode network to be analyzed. Data type: Matrix.
LOG FILE The starting and the final correlation of the ideal structure and the permuted
incidence matrix . A blocked incidence matrix dividing the rows and columns
independently into two clusters each.
The algorithm seeks to find the maxima of the cost function. Even if successful
this result may still be a low value in which case the partition may not have
found cohesive clusters.
To test the robustness of the solution the algorithm should be run a number of
times from different starting configurations. If there is good agreement between
these results then this is a sign that there is a clear split of the data into
subgroups.
See Factions .
REFERENCES Borgatti SP and Everett M G (1997) Network analysis of 2-mode data. Social
Networks 19 243-269.
DL LANGUAGE
The DL Protocol is specified below the commands are given in blue followed by
a description of their usage. Examples of importing are given in the UCINET
users guide.
DL
DESCRIPTION Identifies the file as a Data Language file. This is a required command.
SYNTAX DL
SYNTAX N = <integer>
COMMENTS Should be placed before any phrases that can only be interpreted if the number
of rows or columns is already known. For example, it should be placed before
any command regarding labels.
NR
SYNTAX NR = <integer>
COMMENTS Should be placed before any commands that depend on the number of rows,
such as the ROW LABELS command.
NM
SYNTAX NM = <integer>
COMMENTS Should be placed before any commands that depend on the number of matrices,
such as the MATRIX LABELS command.
ROW LABELS:
DESCRIPTION Indicates the start of a series of row labels. The labels may be up to 18
characters in length (if longer they are truncated). They must be separated by
spaces, carriage returns, equal signs or commas. Labels with embedded spaces
are not advisable, but can be entered by surrounding the label in quotes (e.g.,
"Humpty Dumpty"). Labels are automatically converted to uppercase.
COLUMN LABELS:
DESCRIPTION Indicates the start of a series of column labels. The labels may be up to 18
characters in length (if longer they are truncated). They must be separated by
spaces, carriage returns, equal signs or commas. Labels with embedded spaces
are not advisable, but can be entered by surrounding the label in quotes (e.g.,
"Humpty Dumpty"). Labels are automatically converted to uppercase.
LABELS:
DESCRIPTION Indicates the start of a series of labels applicable to both the rows and the
columns. Warning: The matrix must be square! The labels may be up to 18
characters in length (if longer they are truncated). They must be separated by
spaces, carriage returns, equal signs or commas. Labels with embedded spaces
are not advisable, but can be entered by surrounding the label in quotes (e.g.,
"Humpty Dumpty"). Labels are automatically converted to uppercase.
SYNTAX LABELS:
MATRIX LABELS:
DESCRIPTION Signals the start of a series of matrix labels. The labels may be up to 19
characters in length (if longer they are truncated). They must be separated by
spaces, carriage returns, equal signs or commas. Labels with embedded spaces
are NOT advisable, but can be entered by surrounding the label in quotes (e.g.,
"Humpty Dumpty"). Labels are automatically converted to uppercase.
EMBEDDED
DESCRIPTION If present, this keyword always follows the word LABELS, as in ROW
LABELS EMBEDDED or LABELS EMBEDDED. It indicates that dimension
labels are found embedded in the data itself. For example in the case of ROW
LABELS EMBEDDED, it means the first item (up to a blank or comma) in
every line of the data is a row label. In the case of COL LABELS
EMBEDDED, it indicates that the first line of data should be treated as column
labels.
COMMENTS None.
FORMAT
DESCRIPTION Identifies the layout of the data. The following formats are available:
FULLMATRIX. Indicates the data are in the form of a matrix. This is the default
format. Example (with DIAGONAL = PRESENT):
2110
1201
0020
1002
UPPERHALF. The data consist of the values xij where j > i or j ³ i. Only the
values in the upper right triangle of a square matrix are included. The diagonal
may or may not be included, depending on the value of the DIAGONAL
parameter. Example (with DIAGONAL = PRESENT):
2110
201
20
2
LOWERHALF. The data consist of the values xij where j < i or j £ i. Only the
values in the lower left triangle of a square matrix are included. The diagonal
may or may not be included, depending on the value of the DIAGONAL
parameter. Example (with DIAGONAL = PRESENT):
2
12
002
1002
NODELIST1. This is used to read 1/0 matrices only. Each line of data consists of
a row number (call it i) followed by a list of column numbers (call each one j)
such that xij = 1. For example, the following matrix
1110
1101
0000
1001
1 321
4 14
2 241
NODELIST1B. This is used to read 1/0 matrices only. Each line of data
corresponds to a matrix row (call it i). The first number on the line is the number
of non-zero cells in that row. This is followed by a list of column numbers (call
each one j) such that xij = 1. For example, the following matrix
1110
1101
0000
1001
3 123
3 124
0
2 14
Note that rows must appear in numerical order, and none may be skipped (unlike
the NODELIST1 format).
EDGELIST1. This format is used to read in data forming a matrix in which the
rows and columns refer to the same kinds of objects (e.g., an illness-by-illness
proximity matrix, or a person-by-person network). The 1-mode matrix X is built
from pairs of indices (a row and a column indicator). Pairs are typed one to a
line, with indices separated by spaces or commas. The presence of a pair i,j
indicates that there is a link from i to j, which is to say a non-zero value in xij.
Optionally, the pair may be followed by a value representing an attribute of the
link, such as its strength or quality. If no value is present, it is assumed to be 1.0.
If a pair is omitted altogether, it is assigned a value of 0.0. For example, the
following matrix,
0053
0000
0000
0100
1 3 5.0
42
1 4 3.0
Amy Cathy 5
Denise Bonnie
Amy Denise 3
If the datafile includes a LABELS statement with the labels (Amy, Cathy,
Bonnie, Denise), in that order, the matrix will look like the matrix shown above.
However, if a LABELS statement is not present, then the program will assign
labels to rows/columns in the order in which they are encountered {Amy, Cathy,
Denise, Bonnie}. So the matrix will look like this:
0035
0000
0100
0000
If you do include labels as part of a LABELS statement, they must match the
labels in the data exactly. Otherwise, the labels in the data will be considered
additional nodes. Also, since the EDGELIST1 format automatically accepts
labels as part of the data, the LABELS=EMBEDDED statement is not necessary
(but doesn't hurt).
EDGELIST2. This is used to read in data forming a matrix in which the rows
and columns refer to different kinds of objects (e.g., illnesses and treatments).
The 2-mode matrix X is built from pairs of indices (a row and a column
indicator). Pairs are typed one to a line, with indices separated by spaces or
commas. The presence of a pair i,j indicates that there is a link from row i to
column j, which is to say a non-zero value in xij. Optionally, the pair may be
followed by a value representing an attribute of the link, such as its strength or
quality. If no value is present, it is assumed to be 1.0. If a pair is omitted
altogether, it is assigned a value of 0.0. For example, the following matrix,
64
35
79
11 6
21 3
32 9
31 7
12 4
The row index is always given first, followed by the column index. Index labels
may be used instead of index numbers, as follows:
afghan size 6
beagle size 3
chow ferocity 9
chow size 7
afghan ferocity 4
For further details concerning labels, see the description of the EDGELIST1
format.
21111000
12111000
11211000
11121000
11112000
00000211
00000121
00000112
rows 1 to 8
cols 1 to 8
value = 0
rows 1 to 5
cols 1 to 5
value = 1
rows 5 6 7 8
cols 5 to 8
value = 1
diagonal 0
value = 2
The first three lines of data assign a value of 0 to all cells in the matrix. The next
three lines, isolate the top left quadrant of the matrix and assign all cells a value
of 1. The next three lines do the same for the bottom right quadrant. The last two
lines give a value of 2 to every cell along the main or 0th diagonal.
1100
1100
0011
0011
and
1110
1110
1110
0001
The first line of data ("1 2") indicates that items 1 and 2 belong in the same class
or pile. The second line indicates that 3 and 4 belong together. The pound sign
(#) separates one partition from another.
FULLMATRIX|FM
UPPERHALF|UH
LOWERHALF|LH
NODELIST1|NL1
NODELIST2|NL2
NODELIST1B|NL1B
EDGELIST1|EL1
EDGELIST2|EL2
BLOCKMATRIX|BM
PARTITION|PT|PS|PR
COMMENTS None.
DIAGONAL
DESCRIPTION For square matrices, indicates whether the main diagonal is present or absent.
The default is present. If absent, the program expects that diagonal values will
have been omitted from the file. Example of a 4-by-4 matrix with no diagonal:
234
5 78
91 3
456
COMMENTS None.
BERNARD & KILLWORTH FRATERNITY
DATASET BFRAT
BACKGROUND Bernard & Killworth, later with the help of Sailer, collected five sets of data on
human interactions in bounded groups and on the actors' ability to recall those
interactions. In each study they obtained measures of social interaction among
all actors, and ranking data based on the subjects' memory of those interactions.
The names of all cognitive (recall) matrices end in C, those of the behavioral
measures in B.
BACKGROUND Bernard & Killworth, later with the help of Sailer, collected five sets of data on
human interactions in bounded groups and on the actors' ability to recall those
interactions. In each study they obtained measures of social interaction among
all actors, and ranking data based on the subjects' memory of those interactions.
The names of all cognitive (recall) matrices end in C, those of the behavioral
measures in B.
BKHAMB records amateur HAM radio calls made over a one-month period, as
monitored by a voice-activated recording device. BKHAMC contains rankings
by the operators of how frequently they talked to other operators, judged
retrospectively at the end of the one-month sampling period. A value of 0
meaning no interaction up to a maximum of 9.
DATASET BKOFF
BACKGROUND Bernard & Killworth, later with the help of Sailer, collected five sets of data on
human interactions in bounded groups and on the actors' ability to recall those
interactions. In each study they obtained measures of social interaction among
all actors, and ranking data based on the subjects' memory of those interactions.
The names of all cognitive (recall) matrices end in C, those of the behavioral
measures in B.
DATASET BKTEC
BACKGROUND Bernard & Killworth, later with the help of Sailer, collected five sets of data on
human interactions in bounded groups and on the actors' ability to recall those
interactions. In each study they obtained measures of social interaction among all
actors, and ranking data based on the subjects' memory of those interactions. The
names of all cognitive (recall) matrices end in C, those of the behavioral
measures in B.
DATASET DAVIS
BACKGROUND These data were collected by Davis et al in the 1930s. They represent observed
attendance at 14 social events by 18 Southern women. The result is a person-by-
event matrix: cell (i,j) is 1 if person i attended social event j, and 0 otherwise.
REFERENCES Breiger R. (1974). The duality of persons and groups. Social Forces, 53, 181-
190.
PURPOSE Toggle output made from over-write to append; or change the name of the log
file.
DESCRIPTION The output from running any routine is placed in an ASCII file called
OUTPUT.LOG. It is this file that is used in all of the commands under the menu
heading OUTPUT. This file is usually over-written each time a new routine is
run, UCINET does allow the user to append each run to this file and therefore
keep a complete log of all output.
PARAMETERS
LOG FILE OVERWRITES or APPENDS: (Default = OVERWRITE).
OVERWRITE causes the current contents of the log file to be deleted each time
a new option from the menu is run.
APPEND causes the output from each procedure to be added to the log file.
TIMING Constant.
COMMENTS None.
REFERENCES None.
GAGNON & MACRAE PRISON
DATASET PRISON
BACKGROUND In the 1950s John Gagnon collected sociometric choice data from 67 prison
inmates. All were asked, "What fellows on the tier are you closest friends with?"
Each was free to choose as few or as many "friends" as he desired. The data were
analyzed by MacRae and characterized by him as "less clear cut" in their internal
structure than similar data from schools or residential populations.
REFERENCE MacRae J. (1960). Direct factor analysis of sociometric data. Sociometry, 23,
360-371.
KAPFERER MINE
DATASET KAPMINE
BACKGROUND Bruce Kapferer (1969) collected data on men working on the surface in a mining
operation in Zambia (then Northern Rhodesia). He wanted to account for the
development and resolution of a conflict among the workers. The conflict
centered on two men, Abraham and Donald; most workers ended up supporting
Abraham.
Kapferer observed and recorded several types of interactions among the workers,
including conversation, joking, job assistance, cash assistance and personal
assistance. Unfortunately, he did not publish these data. Instead, the matrices
indicate the workers joined only by uniplex ties (based on one relationship only,
KAPFMU) or those joined by multiple-relation or multiplex ties (KAPFMM).
DATASET KAPTAIL
BACKGROUND Bruce Kapferer (1972) observed interactions in a tailor shop in Zambia (then
Northern Rhodesia) over a period of ten months. His focus was the changing
patterns of alliance among workers during extended negotiations for higher
wages.
The data are particularly interesting since an abortive strike occurred after the
first set of observations, and a successful strike took place after the second.
DATASET KNOKBUR
BACKGROUND In 1978, Knoke & Wood collected data from workers at 95 organizations in
Indianapolis. Respondents indicated with which other organizations their own
organization had any of 13 different types of relationships.
REFERENCES Knoke D. and Wood J. (1981). Organized for action: Commitment in voluntary
associations. New Brunswick, NJ: Rutgers University Press.
Knoke D. and Kuklinski J. (1982). Network analysis, Beverly Hills, CA: Sage.
KRACKHARDT OFFICE CSS
DESCRIPTION Each file contains twenty-one 21x21 matrices. Matrix n gives actor n's
perception of the whole network.
BACKGROUND David Krackhardt collected cognitive social structure data from 21 management
personnel in a high-tech, machine manufacturing firm to assess the effects of a
recent management intervention program. The relation queried was "Who does
X go to for advice and help with work?" (KRACKAD) and "Who is a friend of
X?" (KRACKFR). Each person indicated not only his or her own advice and
friendship relationships, but also the relations he or she perceived among all
other managers, generating a full 21 by 21 matrix of adjacency ratings from each
person in the group.
DATASET NEWFRAT
BACKGROUND These 15 matrices record weekly sociometric preference rankings from 17 men
attending the University of Michigan in the fall of 1956; data from week 9 are
missing. A "1" indicates first preference, and no ties were allowed.
The men were recruited to live in off-campus (fraternity) housing, rented for
them as part of the Michigan Group Study Project supervised by Theodore
Newcomb from 1953 to 1956. All were incoming transfer students with no prior
acquaintance of one another.
REFERENCES Newcomb T. (1961). The acquaintance process. New York: Holt, Reinhard &
Winston.
White H., Boorman S. and Breiger R. (1977). Social structure from multiple
networks, I. Blockmodels of roles and positions. American Journal of Sociology,
81, 730-780.
PADGETT FLORENTINE FAMILIES
DESCRIPTION PADGETT
PADGW
BACKGROUND Breiger & Pattison (1986), in their discussion of local role analysis, use a subset
of data on the social relations among Renaissance Florentine families (person
aggregates) collected by John Padgett from historical documents. The two
relations are business ties (PADGB - specifically, recorded financial ties such as
loans, credits and joint partnerships) and marriage alliances (PADGM).
As Breiger & Pattison point out, the original data are symmetrically coded. This
is acceptable perhaps for marital ties, but is unfortunate for the financial ties
(which are almost certainly directed). To remedy this, the financial ties can be
recoded as directed relations using some external measure of power - for
instance, a measure of wealth. PADGW provides information on (1) each
family's net wealth in 1427 (in thousands of lira); (2) the number of priorates
(seats on the civic council) held between 1282-1344; and (3) the total number of
business or marriage ties in the total dataset of 116 families (see Breiger &
Pattison (1986), p 239).
Substantively, the data include families who were locked in a struggle for
political control of the city of Florence in around 1430. Two factions were
dominant in this struggle: one revolved around the infamous Medicis (9), the
other around the powerful Strozzis (15).
REFERENCES Breiger R. and Pattison P. (1986). Cumulated social roles: The duality of persons
and their algebras. Social Networks, 8, 215-256.
Kent D. (1978). The rise of the Medici: Faction in Florence, 1426-1434. Oxford:
Oxford University Press.
READ HIGHLAND TRIBES
DATASET GAMA
BACKGROUND Hage & Harary (1983) use the Gahuku-Gama system of the Eastern Central
Highlands of New Guinea, described by Read (1954), to illustrate a clusterable
signed graph. Read's ethnography portrayed an alliance structure among three
tribal groups containing balance as a special case; among Gahuku-Gama the
enemy of an enemy can be either a friend or an enemy.
The signed graph has been split into two matrices: GAMAPOS for alliance
("rova") relations, GAMANEG for antagonistic ("hina") relations. To reconstruct
the signed graph, multiply GAMANEG by -1, and add the two matrices.
DATASET WIRING
BACKGROUND These are the observational data on 14 Western Electric (Hawthorne Plant)
employees from the bank wiring room first presented in Roethlisberger &
Dickson (1939). The data are better known through a scrutiny made of the
interactions in Homans (1950), and the CONCOR analyses presented in Breiger
et al (1975).
The employees worked in a single room and include two inspectors (I1 and I3),
three solderers (S1, S2 and S3), and nine wiremen or assemblers (W1 to W9).
The interaction categories include: RDGAM, participation in horseplay;
RDCON, participation in arguments about open windows; RDPOS, friendship;
RDNEG, antagonistic (negative) behavior; RDHLP, helping others with work;
and RDJOB, the number of times workers traded job assignments.
REFERENCES Breiger R., Boorman S. and Arabie P. (1975). An algorithm for clustering
relational data with applications to social network analysis and comparison with
multidimensional scaling. Journal of Mathematical Psychology, 12, 328-383.
DATASET SAMPSON
BACKGROUND Sampson recorded the social interactions among a group of monks while
resident as an experimenter on vision, and collected numerous sociometric
rankings. The labels on the data have the abbreviated names followed by the
codings used by Breiger and Boorman in all their work. During his stay, a
political "crisis in the cloister" resulted in the expulsion of four monks (Nos. 2,
3, 17, and 18) and the voluntary departure of several others - most immediately,
Nos. 1, 7, 14, 15, and 16. (In the end, only 5, 6, 9, and 11 remained). All the
numbers used refer to the Boorman and Breiger numbering and are not row or
column labels. Hence in the end Bonaventure, Berthold, Ambrose and Louis all
remanied.
Most of the present data are retrospective, collected after the breakup occurred.
They concern a period during which a new cohort entered the monastery near
the end of the study but before the major conflict began. The exceptions are
"liking" data gathered at three times: SAMPLK1 to SAMPLK3 - that reflect
changes in group sentiment over time (SAMPLK3 was collected in the same
wave as the data described below). Information about the senior monks was not
included.
Four relations are coded, with separate matrices for positive and negative ties on
the relation. Each member ranked only his top three choices on that tie. The
relations are esteem (SAMPES) and disesteem (SAMPDES), liking (SAMPLK)
and disliking (SAMPDLK), positive influence (SAMPIN) and negative
influence (SAMPNIN), praise (SAMPPR) and blame (SAMPNPR). In all
rankings 3 indicates the highest or first choice and 1 the last choice. (Some
subjects offered tied ranks for their top four choices).
REFERENCES Breiger R., Boorman S. and Arabie P. (1975). An algorithm for clustering
relational data with applications to social network analysis and comparison with
multidimensional scaling. Journal of Mathematical Psychology, 12, 328-383.
DATASET TARO
BACKGROUND These data represent the relation of gift-giving (taro exchange) among 22
households in a Papuan village. Hage & Harary (1983) used them to illustrate a
graph Hamiltonian cycle. Schwimmer points out how these ties function to
define the appropriate persons to mediate the act of asking for or receiving
assistance among group members.
BACKGROUND These data come from a six-year research project, concluded in 1976, on
corporate power in nine European countries and the United States. Each matrix
represents corporate interlocks among the major business entities of two
countries - the Netherlands (SZCID) and West Germany (SZCIG).
The volume describing this study, referenced below, includes six chapters on
network theoretical and analytical issues related to data of this type.
REFERENCES Ziegler R., Bender R. and Biehler H. (1985). Industry and banking in the
German corporate network. In F. Stokman, R. Ziegler & J. Scott (eds), Networks
of corporate power. Cambridge: Polity Press, 1985.
Stokman F., Wasseur F. and Elsas D. (1985). The Dutch network: Types of
interlocks and network structure. In F. Stokman, R. Ziegler & J. Scott (eds),
Networks of corporate power. Cambridge: Polity Press, 1985.
THURMAN OFFICE
DATASET THUROFF
BACKGROUND Thurman spent 16 months observing the interactions among employees in the
overseas office of a large international corporation. During this time, two major
disputes erupted in a subgroup of fifteen people. Thurman analyzed the outcome
of these disputes in terms of the network of formal and informal associations
among those involved.
THURA shows the formal organizational chart of the employees and THURM
the actors linked by multiplex ties.
REFERENCE Thurman B. (1979). In the office: Networks and coalitions. Social Networks, 2,
47-63.
WOLFE PRIMATES
WOLFK indicates the putative kin relationships among the animals: 18 may be
the granddaughter of 19. WOLFI contains four columns of information about the
individual animals: (1) ID number of the animal; (2) age in years; (3) sex; (4)
rank in the troop.
ZACHARY KARATE CLUB
DATASET ZACHARY
BACKGROUND These are data collected from the members of a university karate club by Wayne
Zachary. The ZACHE matrix represents the presence or absence of ties among
the members of the club; the ZACHC matrix indicates the relative strength of
the associations (number of situations in and outside the club in which
interactions occurred).
Zachary (1977) used these data and an information flow model of network
conflict resolution to explain the split-up of this group following disputes among
the members.
REFERENCE Zachary W. (1977). An information flow model for conflict and fission in small
groups. Journal of Anthropological Research, 33, 452-473.
KRACKHARDT HIGH-TECH MANAGERS
DATASET Krack-High-Tec, High-Tec-Attributes
BACKGROUND These are data collected from the managers of a high-tec company. The company
manufactured high-tech equipment on the west coast of the United States and had
just over 100 employees with 21 managers. Each manager was asked to whom do
you go to for advice and who is your friend, to whom do you report was taken
from company documents. In addition attribute information was collected. This
consisted of the managers age (in years), length of service or tenure (in years),
level in the corporate hierarchy (coded 1,2 and 3; 1=CEO, 2 = Vice President, 3 =
manager) and department (coded 1,2,3,4 with the CEO in department 0 ie not in a
department). This data is used by Wasserman and Faust in their network analysis
book.
BACKGROUND This data arose from an early experiment on computer mediated communication.
Fifty academics interested in interdisciplinary research were allowed to contact
each other via an Electronic Information Exchange System (EIES). The data
collected consisted of all messages sent plus acquaintance relationships at two
time periods (collected via a questionnaire).The data includes the 32 actors who
completed the study. In addition attribute data on primary discipline and number
of citations was recorded. TIME_1 and TIME_2 give the acquaintance
information at the beginning and end of the study. This is coded as follows: 4 =
close personal fiend, 3= friend, 2= person I've met, 1 = person I've heard of but
not met, and 0 = person unknown to me (or no reply). NUMBER_OF
MESSAGES is the total number of messages person i sent to j over the entire
period of the study. The attribute data gives the number of citations of the actors
work in the social science citation index at the beginning of the study together
with a discipline code: 1 = Sociology, 2 = Anthropology, 3 =
Mathematics/Statistics, 4 = other. This data is used by Wasserman and Faust in
their network analysis book.
REFERENCES Freeman, S C and L C Freeman (1979). The networkers network: A study of the
impact of a new communications medium on sociometric structure. Social
Science Research Reports No 46. Irvine CA, University of California.
BACKGROUND This data has been selected by Wasserman and Faust (1994) from a list of 63
countries given by Smith and White (1988). The selection was intended to be a
representative sample of countries which spanned the globe physically,
economically and politically and was used by them in their network analysis
book. The data records interaction of the countries with respect to trade of four
goods, namely:manufactured goods, food and live animals, crude materials (not
food) and minerals and fuels. The final matrix records exchange of diplomats
between the countries. All trade (including the diplomats) is from the row to the
column. The Trade_Attribute data lists average population growth between 1970
and 1981, average GNP growth (per capita) over the same period, secondary
school enrollment ratio in 1981, and energy consumption in 1981 (in kilo coal
equivalents per capita).
REFERENCES Smith D and D White (1988). Structure and dynamics of the global economy:
Network analysis of international trade 1965-1980. Unpublished Manuscript.
DATASET CAMP92
BACKGROUND These data were collected by Steve Borgatti, Russ Bernard, Bert Pelto and Gery
Ryan at the 1992 NSF Summer Institute on Research Methods in Cultural
Anthropology. This was a 3 week course given to 14 carefully selected
participants. Network data were collected at the end of each week. These data
were collected at the end of the second week. The data were collected by placing
each person's name on a card and asking each respondent to sort the cards in
order of how much interaction they had with that person since the beginning of
the course (known informally as "camp"). This results in rank order data in which
a "1" indicates the most interaction while a "17" indicates the least interaction.
REFERENCES None
GALASKIEWICZ'S CEO'S AND CLUBS
DATASET Galask
BACKGROUND This data gives the affiliation network of 26 CEO's and their spouses of major
corporations and banks in the Minneapolis area to 15 clubs, corporate and
cultural boards. Membership was during the period 1978-1981. This data is used
by Wasserman and Faust.