The package RMThreshold attempts to determine an objective threshold which separates signal from noise in large real-valued, symmetric matrices. Such matrices can for instance describe correlation or mutual information between data of...
moreThe package RMThreshold attempts to determine an objective threshold which separates signal from noise in large real-valued, symmetric matrices. Such matrices can for instance describe correlation or mutual information between data of various origin, or might represent the set of edges in undirected networks. RMThreshold takes advantage of the predictions of Random Matrix Theory (RMT) for the distribution of the spacing between the eigenvalues of such matrices. That distribution is usually called Nearest Neighbor Spacing Distribution (NNSD). The predictions of RMT are valid in the limit of large matrix dimensions. RMT was initiated by Eugene Wigner in the context of nuclear physics in 1955 (Wigner E. P., Annals of Mathematics, 1955). RMT predicts two extreme scenarios for the NNSD of eigenvalues: 1.) If the matrix elements are completely random, the NNSD is characterized by Gaussian Orthogonal Ensemble (GOE) statistics, and the shape of the NNSD resembles the Wigner-Dyson distribution (" Wigner surmise "): , where s is the eigenvalue spacing and P(s) it's distribution. This distribution approaches zero for s = 0 which can be imagined as if there was some sort of " repulsion " between the eigenvalues. 2.) If the matrix has a non-random, modular structure (associated with block-like composition), the NNSD comes close to an Exponential distribution: Both functions differ most at s = 0, where PGOE = 0 and Pexp = 1. An imaginary " repulsion " does not occur in the modular case, and zero-spacings between the eigenvalues frequently occur. This case might apply to the adjacency matrix of a large undirected network consisting of relatively independent clusters with weak connections between them. The connections might possibly just being noise by their nature. By identifying an appropriate threshold for such matrices, it should be possible to reveal the underlying modular structure of the network, i.e. to identify the clusters. Now, if we assume that a matrix or a network actually has a modular structure which is hidden by noise, it should be possible to identify a signal-noise separating threshold by finding the threshold at which the NNSD changes from the Wigner-Dyson case to the Exponential case. Consequentially, the main function of the package (rm.get.threshold) increments a suppositional threshold monotonically, thereby recording the eigenvalue spacing distribution of the thresholded matrix. A typical procedure to infer a signal-noise separating threshold by using the package RMThreshold may consist of the following steps: 1.) checking the conformity of the input matrix using the function rm.matrix.validation, 2.) running the main function rm.get.threshold in order to find a candidate threshold, 3.) optionally repeat running rm.get.threshold on a smaller interval of thresholds, and 4.) applying the identified threshold to the matrix. The thresholded matrix created by the latter step should then represent the real signal. Some important steps of this procedure are described in more detail in the following text.