Spatial segmentation methods
Spatial statistics is one of the important topics in statistics. As a part of modelling a spatial distribution, spatial clustering is also an important component of spatial data analysis. Spatial data are often heterogeneous, indicating that there may not be a unique simple statistical model describing the data. However, if we cluster the data into homogeneous clusters or domains, it will be easier to apply the appropriate statistical model for each domain. The problem of finding homogeneous domains is known as segmentation, partitioning or clustering. It is commonly used in many areas including disease surveillance, spatial epidemiology, population genetics, landscape ecology, crime analysis and many other fields. For example, identifying the areas of risk and areas of safety is very important in spatial epidemiology. Generally, spatial clustering consists of two problems: identifying the number of homogeneous clusters and their boundaries.
There is an extensive literature on a variety of different clustering problems and algorithms. In this thesis, we are particularly interested in developing new numerical algorithms for the spatial segmentation problem. We use binary data indicating the presence or absence of a particular plant species which are observed over a twodimensional lattice. To solve this segmentation problem, we propose to use the changepoint methodology, which is commonly used in time series analysis and many other research areas to identify abrupt points and their locations. In this thesis, we propose three new algorithms based on Sequential Importance Sampling (SIS), Markov Chain Monte Carlo (MCMC) and Cross-Entropy (CE) methods to spatial segmentation problem. The proposed algorithms are applied to both artificially generated and real data sets to illustrate their usefulness. Our results show that the proposed methodologies effectively identify homogeneous domains and their boundaries in spatial binary data.