Finds clusters for SpatialPointsDataFrame based on distinct methods
clusters(occ_pr, cluster_method = "hierarchical", split_distance, n_k_means, set_seed = 1, verbose = TRUE)
occ_pr | SpatialPointsDataFrame of occurrence records. Projection must be one that allows safe calculation of distances (e.g., Azimuthal equidistant) |
---|---|
cluster_method | (character) name of the method to be used for clustering the occurrences. Options are "hierarchical" and "k-means"; default = "hierarchical". See details for more information on the two available methods. |
split_distance | (numeric) distance in meters that will limit connectivity
among hull polygons created with chunks of points separated by long distances.
This parameter is used when |
n_k_means | (numeric) number of clusters in which the species occurrences
will be grouped when using the "k-means" |
set_seed | (numeric) integer value to specify a seed. Default = 1. |
verbose | (logical) whether or not to print messages about the process. Default = TRUE. |
A SpatialPointsDataFrame with an extra column in data defining clusters.
cluster_method
must be chosen based on the spatial configuration of the
species occurrences. Both methods make distinct assumptions and one of them may
perform better than the other depending on the spatial pattern of the data.
The k-means method, for example, performs better when the following assumptions are fulfilled: Clusters are spatially grouped—or “spherical” and Clusters are of a similar size. Owing to the nature of the hierarchical clustering algorithm it may take more time than the k-means method. Both methods make assumptions and they may work well on some data sets, and fail on others.
Another important factor to consider is that the k-means method always starts with a random choice of cluster centers, thus it may end in different results on different runs. That may be problematic when trying to replicate your methods. With hierarchical clustering, most likely the same clusters can be obtained if the process is repeated.
For more information on these clustering methods see Aggarwal and Reddy (2014) https://goo.gl/RQ2ebd.
# data data("occ_p", package = "rangemap") # preparing spatial points occ <- as.data.frame(unique(occ_p)) WGS84 <- sp::CRS("+init=epsg:4326") occ_sp <- sp::SpatialPointsDataFrame(coords = occ[, 2:3], data = occ, proj4string = WGS84) # reprojecting for measuring distances LAEA <- LAEA_projection(spatial_object = occ_sp) occ_pr <- sp::spTransform(occ_sp, LAEA) # clustering occ_clus <- clusters(occ_pr, cluster_method = "k-means", n_k_means = 2)#>