Determine each samples distance from the center of the data using Mahalanobis distance.

calculate_sample_mahalanobis_distances(
  tomic,
  value_var = NULL,
  max_pcs = 10,
  scale = FALSE
)

Arguments

tomic

Either a tidy_omic or triple_omic object

value_var

the measurement variable to use for calculating distances

max_pcs

the maximum number of principal components to used for representing the covariance matrix.

scale

if TRUE then the data will be scaled before calculating distances

Value

The samples tibble with a new column `pc_distance` which contains the Mahalanobis distances of individual samples from the PC elipsoid

Details

Since `romic` is built around using tall data where there are more features than samples calculating Mahalanobis distance off of the covariance matrix is not possible. Instead, we use SVD to create a low-dimensional representation of the covariance matrix and calculate distances from the center of the data in this space. This essentially involves weighting the principal components by their loadings.

Examples

calculate_sample_mahalanobis_distances(brauer_2008_tidy)
#> # A tibble: 36 × 4
#>    sample nutrient    DR pc_distance
#>    <chr>  <chr>    <dbl>       <dbl>
#>  1 G0.05  G         0.05       188. 
#>  2 G0.1   G         0.1        125. 
#>  3 G0.15  G         0.15       128. 
#>  4 G0.2   G         0.2        125. 
#>  5 G0.25  G         0.25        83.4
#>  6 G0.3   G         0.3        101. 
#>  7 N0.05  N         0.05       371. 
#>  8 N0.1   N         0.1        226. 
#>  9 N0.15  N         0.15       123. 
#> 10 N0.2   N         0.2        100. 
#> # ℹ 26 more rows