Two-Sample Permutation Test — two_sample

This function carries out an hypothesis test in which the null hypothesis is that the two samples are governed by the same underlying generative probability distribution against the alternative hypothesis that they are governed by two different generative probability distributions.

Usage

two_sample_test(
  x,
  y,
  stats = list(stat_t),
  B = 1000L,
  M = NULL,
  alternative = "two_tail",
  combine_with = "tippett",
  type = "exact",
  seed = NULL,
  ...
)

Arguments

x: A numeric vector or a numeric matrix or a list representing the 1st sample. Alternatively, it can be a distance matrix stored as an object of class dist, in which case test statistics based on inter-point distances (marked with the _ip suffix) should be used.
y: A numeric vector if x is a numeric vector, or a numeric matrix if x is a numeric matrix, or a list if x is a list, representing the second sample. Alternatively, if x is an object of class dist, it should be a numeric scalar specifying the size of the first sample.
stats: A list of functions produced by as_function specifying the chosen test statistic(s). A number of test statistic functions are implemented in the package and can be used as such. Alternatively, one can provide its own implementation of test statistics that (s)he deems relevant for the problem at hand. See the section User-supplied statistic function for more information on how these user-supplied functions should be structured for compatibility with the flipr framework. Default is list(stat_t).
B: The number of sampled permutations. Default is 1000L.
M: The total number of possible permutations. Defaults to NULL, which means that it is automatically computed from the given sample size(s).
alternative: A single string or a character vector specifying whether the p-value is right-tailed, left-tailed or two-tailed. Choices are "right_tail", "left_tail" and "two_tail". Default is "two_tail". If a single string is provided, it is assumed that it should be applied to all test statistics provided by the user. Alternative, the length of alternative should match the length of the stats parameter and it is assumed that there is a one-to-one correspondence.
combine_with: A string specifying the combining function to be used to compute the single test statistic value from the set of p-value estimates obtained during the non-parametric combination testing procedure. For now, choices are either "tippett" or "fisher". Default is "tippett", which picks Tippett's function.
type: A string specifying which formula should be used to compute the p-value. Choices are exact (default), upper_bound and estimate. See Phipson & Smith (2010) for details.
seed: An integer specifying the seed of the random generator useful for result reproducibility or method comparisons. Default is NULL.
...: Extra parameters specific to some statistics.

Value

A list with three components: the value of the statistic for the original two samples, the p-value of the resulting permutation test and a numeric vector storing the values of the permuted statistics.

User-supplied statistic function

A user-specified function should have at least two arguments:

the first argument is data which should be a list of the n1 + n2 concatenated observations with the original n1 observations from the first sample on top and the original n2 observations from the second sample below;
the second argument is perm_data which should be an integer vector giving the indices in data that are considered to belong to the first sample.

It is possible to use the use_stat function with nsamples = 2 to have flipr automatically generate a template file for writing down your own test statistics in a way that makes it compatible with the flipr framework.

See the stat_t function for an example.

Examples

n <- 10L
mx <- 0
sigma <- 1

# Two different models for the two populations
x <- rnorm(n = n, mean = mx, sd = sigma)
delta <- 10
my <- mx + delta
y <- rnorm(n = n, mean = my, sd = sigma)
t1 <- two_sample_test(x, y)
t1$pvalue
#> [1] 0.9870103

# Same model for the two populations
x <- rnorm(n = n, mean = mx, sd = sigma)
delta <- 0
my <- mx + delta
y <- rnorm(n = n, mean = my, sd = sigma)
t2 <- two_sample_test(x, y)
t2$pvalue
#> [1] 0.9750223