Reads transcriptomics data from one/several file(s) or dataframe and creates a SummarizedExperiment
object.
Usage
rna.read_data(
data = NULL,
files.ind = NULL,
expdesign = NULL,
csvsep = ";",
dec = ".",
sheet = 1,
name = "SymbolID",
id = "gene_id",
values = "FPKM",
id2name.table = NULL,
id2name.id = NULL,
id2name.name = NULL,
pfx.counts = "counts.",
rsd_thresh = NULL,
filt_type = c("condition", "complete", "fraction", NULL),
filt_thr = 3,
filt_min = NULL
)
Arguments
- data
File or dataframe containing transcriptomics data, if
files.ind
is not used.- files.ind
Prefixes of several files in the working directory containing transcriptomics data, if
data
is not used.- expdesign
Experimental design as file path or data frame, if made previously.
- csvsep
Delimiter if reading CSV file(s).
- dec
Decimal separator if reading CSV, TSV, or TXT files.
- sheet
Sheet of an Excel file to be read.
- name
Header of column containing primary gene IDs (e.g., gene names).
- id
Header of column containing alternative gene IDs (e.g., locus IDs).
- values
Name of the column containing the values (if
files.ind != NULL
).- id2name.table
File containing a table with ID to name mappings.
- id2name.id
Header of column containing alternative gene IDs in
id2name.table
.- id2name.name
Header of column containing primary gene IDs in
id2name.table
.- pfx.counts
Prefix in headers of columns containing gene abundances (if
data != NULL
).- rsd_thresh
Provide a relative standard deviation (RSD) threshold in % for genes. The RSD is calculated for each condition and if the maximum RSD value determined for a given protein exceeds
rsd_thresh
, the gene is discarded. The RSD filter is applied before further missing value filters based on the threefilt_
arguments.- filt_type
(Character string) "complete", "condition" or "fraction", Sets the type of filtering applied. "complete" will only keep genes with valid values in all samples. "condition" will keep genes that have a maximum of
filt_thr
missing values in at least one condition. "fraction" will keep genes that have afilt_min
fraction of valid values in all samples.- filt_thr
(Integer) Sets the threshold for the allowed number of missing values in at least one condition if
filt_type = "condition"
. In other words: "keep genes that have a maximum of 'filt_thr' missing values in at least one condition."- filt_min
(Numeric) Sets the threshold for the minimum fraction of valid values allowed for any genes if
filt_type = "fraction"
.