prot.workflow
performs variance stabilization normalization (prot.normalize_vsn
), missing value imputation (prot.impute
), principal component analysis (prot.pca
), differential enrichment test (prot.test_diff
), and pathway enrichment analysis. If desired, standardized plots and a report are generated and exported as separate files.
Usage
prot.workflow(
se,
normalize = TRUE,
imp_fun = c("SampMin", "man", "bpca", "knn", "QRILC", "MLE", "MinDet", "MinProb",
"min", "zero", "mixed", "nbavg"),
q = 0.01,
knn.rowmax = 0.5,
type = c("all", "control", "manual"),
control = NULL,
contrast = NULL,
alpha = 0.05,
alpha_pathways = 0.1,
lfc = 1,
heatmap.show_all = TRUE,
heatmap.kmeans = F,
k = 6,
heatmap.col_limit = NA,
heatmap.show_row_names = TRUE,
heatmap.row_font_size = 6,
volcano.add_names = FALSE,
volcano.label_size = 2.5,
volcano.adjusted = TRUE,
plot = FALSE,
export = FALSE,
report = TRUE,
report.dir = NULL,
pathway_enrichment = FALSE,
pathway_kegg = FALSE,
kegg_organism = NULL,
custom_pathways = NULL,
out.dir = NULL
)
Arguments
- se
SummarizedExperiment
object, proteomics data parsed withprot.read_data
.- normalize
(Logical) Should the data be normalized via variance stabilization normalization?
- imp_fun
(Character string) Function used for data imputation. "SampMin", "man", "bpca", "knn", "QRILC", "MLE", "MinDet", "MinProb", "min", "zero", "mixed", or "nbavg". See (
prot.impute
) for details.- q
(Numeric) q value for imputing missing values with method
imp_fun = 'MinProb'
.- knn.rowmax
(Numeric) The maximum percent missing data allowed in any row for
imp_fun = 'knn'
. Default: 0.5.- type
(Character string) Type of differential analysis to perform. "all" (contrast each condition with every other condition), "control" (contrast each condition to a defined control condition), "manual" (manually define selected conditions).
- control
(Character string) The name of the control condition if
type = control
.- contrast
(Character string or vector of strings) Define the contrasts to be tested if
type = manual
in the form:"ConditionA_vs_ConditionB"
, orc("ConditionA_vs_ConditionC", "ConditionB_vs_ConditionC")
.- alpha
(Numeric) Significance threshold for adjusted p values.
- alpha_pathways
(Numeric) Significance threshold for adjusted p values in pathway enrichment analysis.
- lfc
(Numeric) Relevance threshold for log2(fold change) values. Only proteins with a |log2(fold change)| value above
lfc
for a given contrast are considered "significant" (if they additionally fullfil thealpha
criterion).- heatmap.show_all
(Logical) Shall all samples be displayed in the heat map (
TRUE
) or only the samples contained in the definedcontrast
(FALSE
)?- heatmap.kmeans
(Logical) Shall the proteins be clustered in the heat map with the k-nearest neighbour method (
TRUE
) or notFALSE
)?- k
(Integer) Number of protein clusters in heat map if
heatmap.kmeans = TRUE
.- heatmap.col_limit
(Integer) Define the outer breaks in the heat map legend. Example: if
heatmap.col_limit = 3
, the color scale will span from -3 to 3. Alls values below -3 will have the same color as -3, and all values above 3 will have the same color as 3.- heatmap.show_row_names
(Logical) Show protein names in heat map (
TRUE
) or notFALSE
).- heatmap.row_font_size
(Numeric) Font size of protein names in heat maps if
heatmap.show_row_names = TRUE
.- volcano.add_names
(Logical) Show protein names in volcano plots (
TRUE
) or notFALSE
).- volcano.label_size
(Numeric) Font size of protein names in volcano plots if
volcano.add_names = TRUE
.- volcano.adjusted
(Logical) Shall adjusted p values be shown on the y axis of volcano plots (
TRUE
) or raw p values (FALSE
)?.- plot
(Logical) Show the generated plots in the
Plots
pane of RStudio (TRUE
) or notFALSE
).- export
(Logical) Exported the generated plots as PNG and PDF files (
TRUE
) or notFALSE
).- report
(Logical) Render and export a report in PDF and HTML format that summarizes the results (
TRUE
) or notFALSE
).- report.dir
(Character string) Provide the name of or path to a folder into which the report will be saved.
- pathway_enrichment
(Logical) Perform pathway over-representation analysis for each tested contrast (
TRUE
) or notFALSE
).- pathway_kegg
(Logical) Perform pathway over-representation analysis with gene sets in the KEGG database (
TRUE
) or notFALSE
).- kegg_organism
(Character string) Identifier of the organism in the KEGG database (if
pathway_kegg = TRUE
)- custom_pathways
(a R dataframe object) Data frame providing custom pathway annotations. The table must contain a "Pathway" column listing identified pathway in the studies organism, and an "Accession" column listing the proteins (or genes) each pathway is composed of. The Accession entries must match with protein IDs.
- out.dir
(Character string) absolute path to the location where result TXT files should be exported to.