Metaproteomics Enrichment Analysis

Comprehensive enrichment analysis module of metaproteomic data


The application has three major fucntions: function enrichment, taxon enrichment, and function VS taxon correlation
A few new diagram can be made from the enrichment analysis, like sankey, circos, and tree maps.
For function analysis module, protein list is assinged to function terms (COG/NOG/COG cateogry/NOG/Category/KEGG/GO), or NOG category/COG category to NOG/COG terms. Then enrichment is measured by hypergeometric distributuion p-value. Similar ways are used for taxon enrichment analysis. It has to be noted that the p-value of hypergeometric is based on the hit matched to the database.
Note that the fasta database used for metaproteomics has to be one of: human IGC, or mouse Gene catalog
Choose the meta source accordingly in the following analysis. Refer to iMetaLab website for more information.

Sample figure: enrichment bar

Sample figure: circos interaction

Sample figure: suburst treemap

Sample figure: compostional map

Sample figure: correlation profiling

Sample figure: sankey diagram

Sample figure: rectangle treemap

Sample figure: MetaMap


How to do the function enrichment?
For function analysis, proteins (list) (COG/NOG also works for high level analysis) in each sample (as columns) are assgined to each functional category, the p-value by hypergeometric distribution is cacluated, and corrected (by FDR) within each dataset. Users can choose further filtering by p-value. Functions with at least one qualified match above the p-value cutoff (across all samples) will be kept in the final list and used for all downstream visualization. The orignal function assigment and filtered data can all be exported afterwards.
How to do the taxon enrichment?
For taxon analysis, proteins (list) in each sample (as columns) are assgined to all levels of taxon nodes, the p-value by hypergeometric distribution is cacluated, and corrected (by FDR) within each dataset. Users can choose further filtering by p-value. Taxons with at least one qualified match above the p-value cutoff (across all samples) will be kept in the final list and used for further visualization. The orignal taxon assigment and filtered data can all be exported afterwards.
How to do the Function & Taxon correlation analysis?
For function and taxon interaction/correlation analysis, proteins (list) in each sample (as columns) are assgined to both function category and taxon nodes. The p-value by hypergeometric distribution is cacluated, and corrected (by FDR) within each dataset for both function and taxon data table. Users can choose further filtering by p-value. At least one qualified match (either function, or taxon) across all samples are kept in the final list. Then the taxon and function list are compared to keep the overlapped the proteins in a data matrix, which is the basis of further visualizations. The orignal assigment and filtered data can all be exported afterwards.

Matching in progress, might take long ...

Analyzing, won't be long ...

Contact

Author: Zhibin Ning
Email: ningzhibin@gmail.com
Suggestions and bug report
This application uses a lot of open source R packages. Great acknowlegement to the authors of gplots, d3heatmap, corrplot, colourpicker, htmlwidgets, shinydashboard, shiny, DT, networkD3, circlize and many more.
Source code will be organized and open soon!

Change log

V1.4: 20180925, revision and bug fixed, sankeyplot. revise, slide input changed to numerici input, and range sign ':' changed to '_' to avoid car::recode error
V1.3: 20180515, bug fixed, for function and taxon correlation, p-value filtering not working. revise: add more help information and a few layout modification
V1.2: 20171216, add options to specify taxon levels to plot. some layout adjustment on ui and plot
V1.1: 20171205, major update, with lots of function updates and bug fixes
V0.1: 20170815, functional version online