Initial data

For analysis, a semicolon-delimited CSV file containing species-samples format data is used. The first line contains the names of the samples as column names. The first column contains the species names as row names. The file can contain both quantitative and qualitative data, but the analysis is carried out only on qualitative data of the presence or absence of the species in the sample.

Analysis

The analysis is carried out using two-sided confidence intervals or Bayesian credible intervals of hypergeometric or binomial distributions.

In the table of analysis results, SP1 and SP2 are the names of of the species in a pair, N1 and N2 are the occurrence of the first and the second species, N1N2 is their joint observed occurrence, p.lt is the probability that the species will be simultaniously found in less than the observed number of samples, p.gt - the probability that the species will be simultaniously found in the same or more than the observed number of samples (or the values of cumulative distribution function).

Draw graph

The graph is built on significantly associated (positive, negative or both) pairs of species. The size of nodes is logarithmically proportional to the frequency of occurrence of the species. The thickness of the edge is inversely proportional to the probability of obtaining current or greater (positive association) or lesser (negative association) occurrence.

Optimal community structure clustering based on GLPK package gives you maximum graph modularity, but may take some time.

Affinity propagation clustering is only method that takes into account negative associations between species. All other options consider their weight equal to zero

Graph analysis

In graph analysis results Connectivity is a quotient internal connection by the number of all possible connections. Mean connection strength is a mean of all internal 1-Px/p.value connections strength. Total connectivity strength is a mean of all internal connections strength plus zeros for all insignificant connections.

About

Method presentation (In Russian, use VPN to mask russian IP)

Method desciption by V. K. Shitikov (In Russian, with R code examples)

Contact with author: Dmitriy Seleznev

Application hosted by Papanin Institute for Biology of Inland Waters RAS