1. Data Process
Genome: This platform provides conversion of gene expression data from Homo Sapiens, Bos Taurus, Mus Musculus, Drosophila Melanogaster and Arabidopsis Thaliana in RPKM, TPM, ReadCounts, FPKM and RPM formats to log2(TPM) format. Users can select different gene nomenclatures as GeneID, EnsemblID and Symbol, and the generated result file provides the Symbol data of the genes corresponding to different nomenclatures. Click "Example" button to download the example data of human gene expression.
Transcriptome: This platform provides conversion of mRNA expression data from Homo Sapiens, Bos Taurus, Mus Musculus and Drosophila Melanogaster in RPKM, TPM, ReadCounts, FPKM and RPM formats to log2(TPM) format. Users can select different nomenclatures for mRNA, which are Refseq Accession, Transcript name and EnsemblID. Transcript names corresponding to different nomenclatures are provided in the generated result file. Click "Example" button to download the example data.
Protein: This platform provides conversion of protein expression data from Homo Sapiens, Bos Taurus, Mus Musculus and Drosophila Melanogaster in expression value formats to log2(expression value) format. Users can select different nomenclatures for protein, which are UniProt Entry, Refseq Accession, AlphaFoldDB ID, and EnsemblID. Entry names corresponding to different nomenclatures are provided in the generated result file. Click "Example" button to download the example data.
2. Model
Train: This platform provides training model function, which including model selection, hyperparameter settings (EarlyStopping, Epochs, Learning Rate, Batch Size, Loss Function, Optimizer, and Label number for multi-class task) and training methods (validation set, StratifiedKFold, or train-validation-test split). In the generated result folder: model.pt is the trained model; terminal.log is the training log, which lists the training and validation loss and accuracy of each epoch, and finally writes the test results; figure.png is the acc-loss plot during the training process; test_result.txt is the result metrics obtained using the testing set and trained model.
Use: Users can upload trained model or their own trained model (NOTE: the model must be a pth file saved using the torch to achieve cross-platform prediction), and upload the test set (NOTE: the feature dimension of the testing set must be the same as that of the training set used to train the model; including the true label of sample in the testing set to get the model test metrics results; but if the user only wants to use the model to predict whether the sample is positive or not, then feel free to set labels 0 or 1), and finally generate the test result zip. Among all the files of test results, test_metrics_result.txt is the test metrics results using the testing set and the model, and test_prediction.txt is the sample prediction result, which contains the sample's prediction probability value.
3. Data Analyse
Example data are provided for each analysis method. Try to keep the names of the data columns the same as the example data columns, as this will ensure smooth data analysis.
All Analysis Flow:
This function provides differential expressed analysis to ReadCounts and FPKM, which use the DESeq2 and limma packages respectively
Expression Data: The first column is row names, the first row is column names.
Sample Information: You need to make sure that the order of the samples in the expression file corresponds to the group info file order here. Keep the row names in the group file the same as the column names in the expression file: Control_1, Control_2, Treatment_1, Treatment_2. The Group file must contain a Group column, and the value of the group column must be 'Control' or 'Treatment'.
log2(FC) threshold: Genes with |log2(FC)| > threshold were screened as significant differential genes to ensure that the magnitude of change was significant.
padj threshold: Genes with padj < threshold were screened as significant differential genes to ensure significance.
Output results: Volcano plot, differnetial gene cluster heatmap, all significantly up/down/non-regulated differentially expressed genes data files. Since then, GO and KEGG enrichment analysis can be performed on the significant differentially expressed genes if you click "GO" button.
Correlation Heatmap:
This function provides correlation heatmap analysis
Data Set: The first row of input data file is the column name. The seperater is '\t'.
Output results: Correlation heatmap.
Principal Component Analysis:
This function provides pca analysis
Data Set: The first column of the data is the group information. The seperater is '\t'.
Boundary plot: It decides whether to add boundary plots, which are the top and right sub-plots of the figure.
PERMANOVA analysis: It decides whether to add PERMANOVA analysis.
Output results: PCA plot.
Volcano Plot with GSEA:
This function provides volcano plot analysis with GSEA analysis
Data Set: Make sure the column names of the dataset are gene, logFC, padj. The seperater is '\t'.
GSEA Analysis: It decides whether to conduct GSEA analysis, if 'Yes', then please upload GMT file.
Emphasize gene names: The entered gene names should be included in the uploaded data set, the separator must be comma ','
TOP number: The TOP shows the higher-order gene that needs to be displayed. The emphasized genes are in the dashed box.
log2FC and padj threshold for TOP genes: These values should be the same as or stricter than the other values from previous step.
Output results: Volcano with GSEA plot.
VENN Analysis:
This function provides venn analysis
Data Set: The first row is the column names, and each column is the information for each group. The seperater is '\t'.
Plot Type: It decides the type of output figure.
Output results: Venn plot.
Differential Gene Cluster Analysis:
This function provides differential gene cluster analysis
Data Set: The first row of input data file is the column name. The first column of input data file is the row name. The seperater is '\t'.
Show Column Name: Generally the sample name
Show Row Name: Generally the gene name
Add Annotation: For example, 'row annotation' is the Path/Celltype in the example figure, 'column annotation' is the group/Age/Grade/Stage/Sex in the example figure.
Output results: Differential Gene Cluster Heatmap.
GO Enrichment:
This function provides GO enrichment analysis
Data Set: The first column is row names, the first row is column names.
pvalue: pvalue threshold for GO enrichment analysis.
qvalue: qvalue threshold for GO enrichment analysis.
Plot Type: It decides the type of output figure.
Term Number: the number of terms for each ontology to be displayed.
Output results: GO enrichment analysis results and figure.
KEGG Enrichment:
This function provides KEGG enrichment analysis
Data Set: The first column is row names, the first row is column names.
pvalue: pvalue threshold for KEGG enrichment analysis.
qvalue: qvalue threshold for KEGG enrichment analysis.
Plot Type: It decides the type of output figure.
Term Number: the number of terms for each ontology to be displayed.
Output results: KEGG enrichment analysis results and figure.
Dumbbell with Bar Plot:
This function provides dumbbell with bar Plot
Data Set for dumbbell: This file is used to draw dumbbell diagrams. The seperater is '\t'.
Data Set for bar: This file is used to draw barplot diagrams. Keep the contents and name of the first column in both two files similar and same respectively.
Label: the label for the x-axis
Emphasize terms: The entered terms should be included in the uploaded data set, the separator must be comma ','
Output results: Dumbbell with Bar Plot.
4. Mail
Users can fill in their email address and the results are sent to the email address when the task is completed.
5. Download
We provide docker version of AutoMATA , and code for model training and data analysis for use by researchers.