PROST Search

  • This webserver is still experimental. If you encounter any issue or want to perform batch processing you can send an email to: mesih@iastate.edu
  • PROST performs homology search and GO annotation enrichment at the same time
  • PROST is faster and better at finding remote homologs compared to traditional homology search tools

PROST Method

Protein Language Search Tool (PROST) is an accurate and faster homology search tool designed for intractable remote homology prediction tasks. PROST performs better than the current state-of-the-art tools such as CS-BLAST or PHMMER. PROST uses a protein language model and a quantization technique to represent proteins in a numerical format that preserves biophysical, biochemical, and evolutionary information. You can enter your sequence in FASTA format to the form given or use the example input provided. PROST will calculate all distances of database proteins to your query protein and perform a statistical test based on the Z-Score of distance distribution over the whole database. The results will be presented with an expected value (e-value) that indicates how likely the match is random. This value is calculated from the CDF of the z-score and also corrected with Bonferroni multiple p-test corrections. Preprint of PROST is available in BioRxiv

Automatic GO Enrichment Analysis

A different e-value cutoff for the GO annotation enrichment pipeline can be selected. Based on this threshold, PROST will perform GO annotation enrichment. The significance of the GO terms is calculated by creating contingency tables. Tables are created by comparing the number of individual GO term occurrences in the homologs and the total number of terms in the homology list with the individual term occurrence frequency in the Swissprot database and the total number of terms in the Swissprot database. Then, the significance (p-value) is calculated using the term-specific contingency table and assessing it with the Chi-square test. Moreover, the p-values for each term will be corrected using Bonferroni multiple p-test corrections. The GO-terms that have a p-value smaller than 0.05 are then reported here. You can see your query protein's automated GO annotations by clicking the Show/Hide GO Annotations button.