Data from: Discovering biogeographic and ecological clusters with a graph theoretic spin on factor analysis
dataset
posted on 2022-06-11, 04:11authored byJohn Alroy
Factor analysis (FA) has the advantage of highlighting each semi-distinct cluster of samples in a data set with one axis at a time, as opposed to simply arranging samples across axes to represent gradients. However, in the case of presence-absence data it is confounded by absences when gradients are long. No statistical model can cope with this problem because the raw data simply do not present underlying information about the length of such gradients. Here I propose a simple way to tease out this information. It is a simple emendation of FA called stepping down, which involves giving an absence a negative value when the missing species nowhere co-occurs with the species found in the relevant sample. Specifically, a binary co-occurrence graph is created, and the magnitude of negative values is made a function of how far the graph must be traversed in order to link the missing species with each species that is present. Simulations show that standard FA yields inferior results to FA based on stepped-down matrices in terms of mapping clusters into axes one-by-one. Standard FA is also uninformative when applied to a global bat inventory data set. Step-down FA (SDFA) easily flags the main biogeographic groupings. Methods like correspondence analysis, non-metric multidimensional scaling, and Bayesian latent variable modelling are not commensurate with SDFA because they do not seek to find a one-to-one mapping of axes and clusters. Stepping down seems promising as a means of illustrating clusters of samples, especially when there are subtle or complex discontinuities in gradients.
Usage Notes
bat referencesA list of references to publications yielding site-specific inventory data for bats from around the world. Raw data are also reposited in the Ecological Register.bat_references.txtbat registerSite-specific inventory data for bats from around the world. Each line includes a count of the individuals belonging to a species found at a site. Raw data are also reposited in the Ecological Register.bat_register.txt