Title: | Dysregulated Pathway Identification Analysis |
---|---|
Description: | It is used to identify dysregulated pathways based on a pre-ranked gene pair list. A fast algorithm is used to make the computation really fast. The data in package 'DysPIAData' is needed. |
Authors: | Limei Wang [aut, cre], Jin Li [aut, ctb] |
Maintainer: | Limei Wang <[email protected]> |
License: | GPL (>= 2) |
Version: | 1.4 |
Built: | 2024-11-07 04:26:50 UTC |
Source: | https://github.com/lemonwang2020/dyspia |
Calculates DysPIA statistics for a given query gene pair set.
calcDyspiaStat( stats, selectedStats, DyspiaParam = 1, returnAllExtremes = FALSE, returnLeadingEdge = FALSE )
calcDyspiaStat( stats, selectedStats, DyspiaParam = 1, returnAllExtremes = FALSE, returnLeadingEdge = FALSE )
stats |
Named numeric vector with gene pair-level statistics sorted in decreasing order (order is not checked). |
selectedStats |
Indexes of selected gene pairs in the 'stats' array. |
DyspiaParam |
DysPIA weight parameter (0 is unweighted, suggested value is 1). |
returnAllExtremes |
If TRUE return not only the most extreme point, but all of them. Can be used for enrichment plot. |
returnLeadingEdge |
If TRUE return also leading edge gene pairs. |
Value of DysPIA statistic if both returnAllExtremes and returnLeadingEdge are FALSE. Otherwise returns list with the folowing elements:
res – value of DysPIA statistic
tops – vector of top peak values of cumulative enrichment statistic for each gene pair;
bottoms – vector of bottom peak values of cumulative enrichment statistic for each gene pair;
leadingEdge – vector with indexes of leading edge gene pairs that drive the enrichment.
Calculates DysPIA statistic values for all the prefixes of a gene pair set
calcDyspiaStatCumulative(stats, selectedStats, DyspiaParam)
calcDyspiaStatCumulative(stats, selectedStats, DyspiaParam)
stats |
Named numeric vector with gene pair-level statistics sorted in decreasing order (order is not checked) |
selectedStats |
indexes of selected gene pairs in a 'stats' array |
DyspiaParam |
DysPIA weight parameter (0 is unweighted, suggested value is 1) |
Numeric vector of DysPIA statistics for all prefixes of selectedStats.
Calculates DysPIA statistic values for the gene pair sets
calcDyspiaStatCumulativeBatch( stats, DyspiaParam, pathwayScores, pathwaysSizes, iterations, seed )
calcDyspiaStatCumulativeBatch( stats, DyspiaParam, pathwayScores, pathwaysSizes, iterations, seed )
stats |
Named numeric vector with gene pair-level statistics sorted in decreasing order (order is not checked). |
DyspiaParam |
DysPIA weight parameter (0 is unweighted, suggested value is 1). |
pathwayScores |
Vector with enrichment scores for the pathways in the database. |
pathwaysSizes |
Vector of pathway sizes. |
iterations |
Number of iterations. |
seed |
Seed vector |
List of DysPIA statistics for gene pair sets.
Calculates differential Mutual information.
calEdgeCorScore_ESEA( dataset, class.labels, controlcharacter, casecharacter, background )
calEdgeCorScore_ESEA( dataset, class.labels, controlcharacter, casecharacter, background )
dataset |
Matrix of gene expression values (rownames are genes, columnnames are samples). |
class.labels |
Vector of binary labels. |
controlcharacter |
Charactor of control in the class labels. |
casecharacter |
Charactor of case in the class labels. |
background |
Matrix of the edges' background. |
A vector of the aberrant correlation in phenotype P based on mutual information (MI) for each edge.
data(gene_expression_p53, class.labels_p53,sample_background) ESEAscore_p53<-calEdgeCorScore_ESEA(gene_expression_p53, class.labels_p53, "WT", "MUT", sample_background)
data(gene_expression_p53, class.labels_p53,sample_background) ESEAscore_p53<-calEdgeCorScore_ESEA(gene_expression_p53, class.labels_p53, "WT", "MUT", sample_background)
The labels for the 50 cell lines in p53 data. Control group's label is 'WT', case group's label is 'MUT'.
data(class.labels_p53)
data(class.labels_p53)
Calculates Dysregulated gene pair score (DysGPS) for each gene pair. Two-sample Welch's T test of gene pairs between case and control samples. The package 'DysPIAData' including the background data is needed to be loaded.
DysGPS( dataset, class.labels, controlcharacter, casecharacter, background = combined_background )
DysGPS( dataset, class.labels, controlcharacter, casecharacter, background = combined_background )
dataset |
Matrix of gene expression values (rownames are genes, columnnames are samples). |
class.labels |
Vector of category labels. |
controlcharacter |
Charactor of control group in the class labels. |
casecharacter |
Charactor of case group in the class labels. |
background |
Matrix of the gene pairs' background. The default is 'combined_background', which includes real pathway gene pairs and randomly producted gene pairs. The 'combined_background' was incluede in 'DysPIAData'. |
A vector of DysGPS for each gene pair.
data(gene_expression_p53, class.labels_p53,sample_background) DysGPS_sample<-DysGPS(gene_expression_p53, class.labels_p53, "WT", "MUT", sample_background)
data(gene_expression_p53, class.labels_p53,sample_background) DysGPS_sample<-DysGPS(gene_expression_p53, class.labels_p53, "WT", "MUT", sample_background)
The score vector of 164923 gene pairs from p53 dataset. It can be loaded from the example datasets of R-package 'DysPIA', and also can be obtained by running DysGPS(), details see DysGPS.R
data(DysGPS_p53)
data(DysGPS_p53)
Runs Dysregulated Pathway Identification Analysis (DysPIA).The package 'DysPIAData' including the background data is needed to be loaded.
DysPIA( pathwayDB = "kegg", stats, nperm = 10000, minSize = 15, maxSize = 1000, nproc = 0, DyspiaParam = 1, BPPARAM = NULL )
DysPIA( pathwayDB = "kegg", stats, nperm = 10000, minSize = 15, maxSize = 1000, nproc = 0, DyspiaParam = 1, BPPARAM = NULL )
pathwayDB |
Name of the pathway database (8 databases:reactome,kegg,biocarta,panther,pathbank,nci,smpdb,pharmgkb). The default value is "kegg". |
stats |
Named vector of CILP scores for each gene pair. Names should be the same as in pathways. |
nperm |
Number of permutations to do. Minimial possible nominal p-value is about 1/nperm. The default value is 10000. |
minSize |
Minimal size of a gene pair set to test. All pathways below the threshold are excluded. The default value is 15. |
maxSize |
Maximal size of a gene pair set to test. All pathways above the threshold are excluded. The default value is 1000. |
nproc |
If not equal to zero sets BPPARAM to use nproc workers (default = 0). |
DyspiaParam |
DysPIA parameter value, all gene pair-level status are raised to the power of 'DyspiaParam' before calculation of DysPIA enrichment scores. |
BPPARAM |
Parallelization parameter used in bplapply. Can be used to specify cluster to run. If not initialized explicitly or by setting 'nproc' default value 'bpparam()' is used. |
A table with DysPIA results. Each row corresponds to a tested pathway. The columns are the following:
pathway – name of the pathway as in 'names(pathway)';
pval – an enrichment p-value;
padj – a BH-adjusted p-value;
DysPS – enrichment score, same as in Broad DysPIA implementation;
NDysPS – enrichment score normalized to mean enrichment of random samples of the same size;
nMoreExtreme' – a number of times a random gene pair set had a more extreme enrichment score value;
size – size of the pathway after removing gene pairs not present in 'names(stats)';
leadingEdge – vector with indexes of leading edge gene pairs that drive the enrichment.
data(pathway_list,package="DysPIAData") data(DysGPS_p53) DyspiaRes_p53 <- DysPIA("kegg", DysGPS_p53, nperm = 100, minSize = 20, maxSize = 100)
data(pathway_list,package="DysPIAData") data(DysGPS_p53) DyspiaRes_p53 <- DysPIA("kegg", DysGPS_p53, nperm = 100, minSize = 20, maxSize = 100)
The list includes 81 pathway results from 'DisPIA.R' as an example used in 'DyspiaSig.R'.
data(DyspiaRes_p53)
data(DyspiaRes_p53)
Returns the significant summary of DysPIA results.
DyspiaSig(DyspiaRes, fdr)
DyspiaSig(DyspiaRes, fdr)
DyspiaRes |
Table with results of running DysPIA(). |
fdr |
Significant threshold of 'padj' (a BH-adjusted p-value). |
A list of significant DysPIA results, including correlation gain and correlation loss.
data(pathway_list,package="DysPIAData") data(DyspiaRes_p53) summary_p53 <- DyspiaSig(DyspiaRes_p53, 0.05) # filter with padj<0.05
data(pathway_list,package="DysPIAData") data(DyspiaRes_p53) summary_p53 <- DyspiaSig(DyspiaRes_p53, 0.05) # filter with padj<0.05
Runs dysregulated pathway identification analysis for preprocessed input data.
DyspiaSimpleImpl( pathwayScores, pathwaysSizes, pathwaysFiltered, leadingEdges, permPerProc, seeds, toKeepLength, stats, BPPARAM )
DyspiaSimpleImpl( pathwayScores, pathwaysSizes, pathwaysFiltered, leadingEdges, permPerProc, seeds, toKeepLength, stats, BPPARAM )
pathwayScores |
Vector with enrichment scores for the pathways in the database. |
pathwaysSizes |
Vector of pathway sizes. |
pathwaysFiltered |
Filtered pathways. |
leadingEdges |
Leading edge gene pairs. |
permPerProc |
Parallelization parameter for permutations. |
seeds |
Seed vector |
toKeepLength |
Number of 'pathways' that meet the condition for 'minSize' and 'maxSize'. |
stats |
Named vector of gene pair-level scores. Names should be the same as in pathways of 'pathwayDB'. |
BPPARAM |
Parallelization parameter used in bplapply. Can be used to specify cluster to run. If not initialized explicitly or by setting 'nproc' default value 'bpparam()' is used. |
A table with DysPIA results. Each row corresponds to a tested pathway. The columns are the following:
pathway – name of the pathway as in 'names(pathway)';
pval – an enrichment p-value;
padj – a BH-adjusted p-value;
DysPS – enrichment score, same as in Broad DysPIA implementation;
NDysPS – enrichment score normalized to mean enrichment of random samples of the same size;
nMoreExtreme' – a number of times a random gene pair set had a more extreme enrichment score value;
size – size of the pathway after removing gene pairs not present in 'names(stats)';
leadingEdge – vector with indexes of leading edge gene pairs that drive the enrichment.
A dataset of transcriptional profiles from p53+ and p53 mutant cancer cell lines. It includes the normalized gene expression for 6385 genes in 50 samples. Rownames are genes, columnnames are samples.
data(gene_expression_p53)
data(gene_expression_p53)
The list of background was used in ”DysGPS.R' and 'calEdgeCorScore_ESEA.R' which is a part of the 'combined_background' in 'DysPIAData'.
data(sample_background)
data(sample_background)
Sets up parameter BPPARAM value.
setUpBPPARAM(nproc = 0, BPPARAM = NULL)
setUpBPPARAM(nproc = 0, BPPARAM = NULL)
nproc |
If not equal to zero sets BPPARAM to use nproc workers (default = 0). |
BPPARAM |
Parallelization parameter used in bplapply. Can be used to specify cluster to run. If not initialized explicitly or by setting 'nproc' default value 'bpparam()' is used. |
parameter BPPARAM value