Analysis of extrapolation risks in partitions using the MOP metric — explore_partition

This function calculates environmental dissimilarities and identifies non-analogous conditions by comparing the training data against the test data for each partition, using the MOP (Mobility-Oriented Parity) metric.

Usage

explore_partition_extrapolation(data, include_train_background = TRUE,
                                       include_test_background = FALSE,
                                       variables = NULL,
                                       mop_type = "detailed",
                                       calculate_distance = TRUE,
                                       where_distance = "all",
                                       return_raster_result = TRUE,
                                       raster_variables = NULL,
                                       progress_bar = FALSE, ...)

Arguments

data: an object of class prepared_data returned by the prepare_data() function.
include_train_background: (logical) whether to include the background points used in training to define the environmental range of the training data. If set to FALSE, only the environmental conditions of the training presence records will be considered. Default is TRUE, meaning both presence and background points are used.
include_test_background: (logical) whether to compute MOP for both the test presence records and the background points not used during training. Default is FALSE, meaning MOP will be calculated only for the test presences.
variables: (character) names of the variables to be used in the MOP calculation. Default is NULL, meaning all variables in data will be used.
mop_type: (character) type of MOP analysis to be performed. Options available are "basic", "simple" and "detailed". Default is 'simples'. See projection_mop() for more details.
calculate_distance: (logical) whether to calculate distances (dissimilarities) between train and test data. Default is TRUE.
where_distance: (character) specifies which values in train data should be used to calculate distances. Options are: "in_range" (only conditions within the train range), "out_range" (only conditions outside the train range), and "all" (all conditions). Default is "all".
return_raster_result: (logical) whether to return a SpatRaster showing the spatial distribution of test data that falls within and outside the range of the training data. Default is TRUE.
raster_variables: a SpatRaster object representing the predictor variables used to calibrate the models. Preferably the same object used in prepare_data. Only used if return_raster_result = TRUE.
progress_bar: (logical) whether to display a progress bar during processing. Default is FALSE.
...: additional arguments passed to mop().

Value

A data.frame containing:

MOP distances (if calculate_distance = TRUE);
an indicator of whether environmental conditions at each test record fall within the training range;
the number of variables outside the training range;
the names of variables with values lower or higher than the training range;
if the prepared_data object includes categorical variables, it will also contain columns indicating which values in the testing data were not present in the training data.

If return_raster_result = TRUE, it also returns a SpatRaster showing the spatial distribution of test data that falls within and outside the range of the training data.

Examples

#Prepare data
# Import occurrences
data(occ_data, package = "kuenm2")

# Import raster layers
var <- terra::rast(system.file("extdata", "Current_variables.tif",
                               package = "kuenm2"))

# Prepare data for maxnet model
sp_swd <- prepare_data(algorithm = "maxnet", occ = occ_data,
                       x = "x", y = "y",
                       raster_variables = var,
                       species = occ_data[1, 1],
                       n_background = 100,
                       categorical_variables = "SoilType",
                       features = c("l", "lq"),
                       r_multiplier = 1,
                       partition_method = "kfolds")
#> Warning: 3 rows were excluded from database because NAs were found.

# Analysis of extrapolation risks in partitions
res <- explore_partition_extrapolation(data = sp_swd,
                                       raster_variables = var,
                                       include_test_background = TRUE)
#Plot spatial spatial distribution of test data
terra::plot(res$Spatial_results)