Finds clusters for SpatialPointsDataFrame based on distinct methods

clusters(occ_pr, cluster_method = "hierarchical", split_distance,
         n_k_means, set_seed = 1, verbose = TRUE)

Arguments

occ_pr	SpatialPointsDataFrame of occurrence records. Projection must be one that allows safe calculation of distances (e.g., Azimuthal equidistant)
cluster_method	(character) name of the method to be used for clustering the occurrences. Options are "hierarchical" and "k-means"; default = "hierarchical". See details for more information on the two available methods.
split_distance	(numeric) distance in meters that will limit connectivity among hull polygons created with chunks of points separated by long distances. This parameter is used when `cluster_method` = "hierarchical".
n_k_means	(numeric) number of clusters in which the species occurrences will be grouped when using the "k-means" `cluster_method`.
set_seed	(numeric) integer value to specify a seed. Default = 1.
verbose	(logical) whether or not to print messages about the process. Default = TRUE.

Value

A SpatialPointsDataFrame with an extra column in data defining clusters.

Details

cluster_method must be chosen based on the spatial configuration of the species occurrences. Both methods make distinct assumptions and one of them may perform better than the other depending on the spatial pattern of the data.

The k-means method, for example, performs better when the following assumptions are fulfilled: Clusters are spatially grouped—or “spherical” and Clusters are of a similar size. Owing to the nature of the hierarchical clustering algorithm it may take more time than the k-means method. Both methods make assumptions and they may work well on some data sets, and fail on others.

Another important factor to consider is that the k-means method always starts with a random choice of cluster centers, thus it may end in different results on different runs. That may be problematic when trying to replicate your methods. With hierarchical clustering, most likely the same clusters can be obtained if the process is repeated.

For more information on these clustering methods see Aggarwal and Reddy (2014) https://goo.gl/RQ2ebd.

Examples

# data
data("occ_p", package = "rangemap")

# preparing spatial points
occ <- as.data.frame(unique(occ_p))
WGS84 <- sp::CRS("+init=epsg:4326")
occ_sp <- sp::SpatialPointsDataFrame(coords = occ[, 2:3], data = occ,
                                     proj4string = WGS84)

# reprojecting for measuring distances
LAEA <- LAEA_projection(spatial_object = occ_sp)
occ_pr <- sp::spTransform(occ_sp, LAEA)

# clustering
occ_clus <- clusters(occ_pr, cluster_method = "k-means", n_k_means = 2)
#> Clustering method: k-means