Methods for non-specific filtering of variables
Source:R/metabolomics_computation.R
met.FilterVariable.Rd
met.FilterVariable
filters non-informative variables (i.e., features with very small values, near-constant values, or low repeatability) from the dataset, dependent on the user-specified method for filtering. The function applies a filtering method, ranks the variables within the dataset, and removes variables based on its rank. The final dataset should contain no more than than 5000 variables for effective computing. If more features are present, the IQR filter will be applied to keep only a number of 5000, even if filter = "none"
. Data filtering is performed as part of the data preparation workflow met.read_data
.
Usage
met.FilterVariable(
mSetObj = NA,
filter = "none",
remain.num = NULL,
qcFilter = "F",
qc.rsd = 0.25,
all.rsd = NULL
)
Arguments
- mSetObj
Enter the name of the created mSet object (see
InitDataObjects
andRead.TextData
).- filter
(Character) Select an option for unspecific filtering based on the following ranking criteria:
"none"
apply no unspecific filtering."rsd"
filters features with low relative standard deviation across the dataset."nrsd"
is the non-parametric relative standard deviation."mean"
filters features with low mean intensity value across the dataset."median"
filters features with low median intensity value across the dataset."sd"
filters features with low absolute standard deviation across the dataset."mad"
filters features with low median absolute deviation across the dataset."iqr"
filters features with a low inter-quartile range across the dataset.
- remain.num
(Numerical) Enter the number of variables to keep in your dataset. If
NULL
, the following empirical rules are applied during data filtering with the methods specified infilter = ""
:Less than 250 variables: 5% will be filtered
250 - 500 variables: 10% will be filtered
500 - 1000 variables: 25% will be filtered
More than 1000 variables: 40% will be filtered
- qcFilter
(Logical) Filter the variables based on the relative standard deviation of features in QC samples (
TRUE
), or not (FALSE
). This filter can be applied in addition to other, unspecific filtering methods.- qc.rsd
(Numeric) Define the relative standard deviation cut-off in %. Variables with a RSD greater than this number will be removed from the dataset. It is only necessary to specify this argument if
qcFilter
isTRUE
. Otherwise, it will be ignored.- all.rsd
(Numeric or
NULL
) Apply a filter based on the in-group relative standard deviation (RSD, in %) or notNULL
. Therefore, the RSD of every feature is calculated for every group in the dataset. If the RSD of a variable in any group exceeds the indicated threshold, it is removed from the dataset. This filter can be applied in addition to other filtering methods and is especially useful to perform on data with technical replicates.
References
adapted from FilterVariable
(https://github.com/xia-lab/MetaboAnalystR).
Author
Nicolas T. Wirth mail.nicowirth@gmail.com Technical University of Denmark License: GNU GPL (>= 2)