The Plant Metabolic Network (PMN) provides a broad network of plant metabolic pathway databases that contain curated information from the literature and computational analyses about the genes, enzymes, compounds, reactions, and pathways involved in primary and secondary metabolism in plants.
For Table 1, you might consider adding another Gene Set resource based on curated pathways. These are analogous to GO-Biological Process terms, but are much more focused and constrained. In fact, they also include small molecules and drugs, so they can serve as more than just gene sets.
Likewise, for Table 2, you could add a number of interaction resources with pathway data. As co-founder of WikiPathways, I have to recommend that one in particular! :) It's 100% free, open source and open access. I can also recommend Reactome and Pathway Commons, the latter of which compiles pathway data from multiple sources into BioPAX data format.
You can download all human pathways from WikiPathways in multiple formats, or parse just the Entrez Genes in Human pathways from this single dump file. The advantage to the first option are that you are getting the original data, as curated by contributors; the disadvantage is that you have to perform the ID mapping to unify to Entrez and your preferred small molecule system. The advantage of the second option (the dump file) is that the Entrez ID unification has been done for you; the disadvantage is that anything that didn't map to Entrez is simply discarded (including drugs and small molecules!).
In the past, we used the versions of Reactome, KEGG, and BioCarta provided by MSigDB [1, 2]. MSigDB version 5.0 was released in April, but it's unclear whether the pathway resources were updated. However, the "C2: Canonical Pathways" (CP) collection integrates 9 pathway resources, so I think we should create a C2: CP metanode with a node for each MSigDB CP gene set.
We can have a separate metanode for WikiPathways [3, 4]. The open and crowdsourced nature of WikiPathways is ideal. The inclusion of compounds, tissues, diseases in addition to genes in these pathways could provide a major performance boost for our method. The benefit will depends on how frequently non-gene entities are included in these pathways. What percent of pathways include diseases, tissues, or drugs? Additionally, are non-gene entities identified as free text, or are they structured by a standardized vocabulary?
@alexanderpico, thanks figures 1 and 2 do help, however I am more interested in edge-based measures of overlap. Do you have a general sense of whether the same pathways are represented in multiple databases?
My interpretation of Figure 1 is that it provides a lower bound of uniqueness. The fact that there are many genes unique in KEGG, Reactome, and WikiPathways warrants the inclusion of all three resources. However, it doesn't answer whether the common genes are from duplicated pathways or not.
That measure of overlap is fraught with caveats relating to exactly how edges are modeled. When each of the three resource mentioned here converts to a single exchange format, like BioPAX, for example, we each make a unique set of mapping decisions and compromises. Nevertheless, you're absolutely right that node overlap is a lower bound, but I don't have a good estimate for edge overlap. Just browsing the pathway titles is the most convincing way to see that we cover much of the same ground: metabolism, signaling and gene regulation.
We have downloaded, parsing, and combined MSigDB and WikiPathways (notebook, tsv results). In total we identified, 1,516 human pathways after removing a single duplicated pathway. Most pathways have below 100 genes but some have up to 1,000.
We extracted pathways from the previously-suggested dump file. We removed pathways without any human genes. From a total of 669 wikipathways, 187 were human. @alexanderpico, can you confirm that these are the expected numbers?
Hmm... You are correct that the dump file contains 187 human pathways (just did a browser FIND on the page for 'homo sapiens'), but there are ~293 human pathways in the standard collection. You can access these on the bulk download page in multiple formats, including plain text lists of (non-unified) datanode identifiers. This number is climbing as folks continue to add new content. For example, we have over 300 additional human pathways that are in the works at various stages of completion (or disrepair) that are not included in these bulk downloads.
Due to licensing issues with MSigDB, we've removed MSigDB pathways and switched to Pathway Commons as @alexanderpico initially suggested. Pathway Commons aggregates pathway and binary interaction data from many providers .
Pathway Commons data is freely available, but the data is licensed under the terms of each contributing database. For example, Pathway Commons includes KEGG pathways, which have a problematic license. Accordingly, we only include pathways from Pathway Commons resources that are openly licensed.
Specifically, I identified only two appropriate resources from the 8 Pathway Commons resources that contribute pathways (see notebook cell 7). These resources were Reactome  and the Pathway Interaction Database (PID) . Reactome is licensed as CC BY, while I believe PID data is in the public domain since it was created by US Government employees. At least the PID publication states, "All data in PID is freely available, without restriction on use. " Since Reactome and PID contributed the majority of MSigDB pathways, I suspect that we didn't lose much information by abandoning MSigDB.
Site-specific IGW soil remediation standards are back calculated from the health-based Ground Water Quality Criteria, N.J.A.C. 7:9C using the USEPA soil-water partition equation (USEPA 1996). The soil-water partition equation may be used with default assumptions, and is the only method that does not require any site-specific information. For this reason, this method may be used to develop an initial screening level to determine whether site-specific information is needed. In response to numerous requests for with screening levels for the Impact to Ground Water pathway, the Department has provided such a table in the Soil Water Partition Document. These levels can be used as screening levels where no site-specific information exists. The soil water partition equation can also be used when a site-specific Dilution Attenuation Factor (DAF) is developed, when site-specific organic carbon content is available, and for ionizable phenols when a soil pH is available that is different from the default pH of 5.3. Further guidance on this procedure is available in the Soil-Water Partition Equation guidance document. [pdf 89 Kb]
The Department has identified methods to evaluate the impacts to ground water without the need to develop a site-specific IGW soil remediation standard. The nature and extent of contamination and other site-specific conditions will dictate whether there will be future impacts to ground water and determine if further remediation is required. When specified site conditions are met, the Department would not require further remediation for the impact to ground water pathway.
The Department modeled the transport of contaminants that exhibit very low mobility in soil and has determined that under certain conditions, existing soil contamination is not likely to migrate to ground water. If the person conducting the remediation can demonstrate that at lease a two-foot clean zone is present between the contamination and the water table, no remediation may be required for the impact to ground water pathway. A list of the immobile chemicals that are considered to be immobile and further guidance on this option are available in the Immobile Contaminants Guidance Document. [pdf 27 Kb]
In this notebook we showcase how to use decoupleR forpathway activity inference with a down-sampled PBMCs 10X data-set. Thedata consists of 160 PBMCs from a Healthy Donor. The original data isfreely available from 10x Genomics herefrom this webpage.
PROGENy is acomprehensive resource containing a curated collection of pathways andtheir target genes, with weights for each interaction. For this examplewe will use the human weights (mouse is also available) and we will usethe top 100 responsive genes ranked by p-value. We can usedecoupleR to retrieve it from OmniPath:
MetaboAnalystR 2.0 contains the R functions and libraries underlying the popular MetaboAnalyst web server, including > 500 functions for metabolomic data analysis, visualization, and functional interpretation. The package is synchronized with the MetaboAnalyst web server. After installing and loading the package, users will be able to reproduce the same results from their local computers using the corresponding R command history downloaded from MetaboAnalyst, thereby achieving maximum flexibility and reproducibility. With version 2.0, we aim to address two important gaps left in its previous version. First, raw spectral processing - the previous version offered very limited support for raw spectra processing and peak annotation. Therefore, we have implemented comprehensive support for raw LC-MS spectral data processing including peak picking, peak alignment and peak annotations leveraging the functionality of the xcms (PMIDs: 16448051, 19040729, and 20671148; version 3.4.4) and CAMERA (PMID: 22111785; version 1.38.1) R packages. Second, we have enhanced support for functional interpretation directly from m/z peaks. In addition to an efficient implementation of the mummichog algorithm (PMID: 23861661), we have added a new method to support pathway activity prediction based on the well-established GSEA algorithm (PMID: 16199517). To demonstrate this new functionality, we provide the "MetaboAnalystR 2.0 Workflow: From Raw Spectra to Biological Insights" vignette, available here as a PDF. In this vignette, we perform end-to-end metabolomics data analysis on a subset of clinical IBD samples. 2b1af7f3a8