Bioinformatics Tools from Bioinformatics Analysis of Macromolecules class
COMMON ERRORS
Make sure you are not putting a nucleotide sequence into a protein query, and vice-versa.
GENSCAN
http://genes.mit.edu/GENSCAN.html
Intr - internal
Term - terminal
Prom - promoter
Sngl - single
FegenesH, Animal Version - Ab initio gene prediction (SOFTBERRY)
http://www.softberry.com/berry.phtml?topic=fgenesh&group=programs&subgroup=gfind
GeneID
GeneFinder
Notes
Chimeric Sequences
t(9,22)(q34,q11) Chromosomes 9 and 22 are affected. Locations q34 and q11 are affected respectively.
Comparison of Fgenesh (softberry) and Genscan.
(MUST BE IN FASTA FORMAT)
NG_011759 both return 15 exons total with 11 on one stand, and 4 on the other.
EU445484 Genscan - 7 exons. Softberry - 6 exons.
AF083883 Both returned a result of 3 total exons.
Y16787 GenScan returned a result of 6 exons, while Fgenesh returned a result of 8 exons. In reality there are 7 exons total. Fgenesh (Softberry) had results that were superior to GenScan.
Notes from prof and students
Use both in the lab practical. BLAST putative exons for quality control. GenScan uually had the exons in the right location. Fgen was generally wrong with this regard. Fgen was more sensitive one out of the 4 cases.
Bacterial Genomes
When accessing a particular region at NCBI search for the ID number of the gene, then click on “change region shown”.
fgenesB
don’t forget to select the given organism
http://www.softberry.com/berry.phtml?topic=fgenesb&group=programs&subgroup=gfindb
Notes: Tu/Op = transcription units/ operons
Glimmer
http://www.ncbi.nlm.nih.gov/genomes/MICROBES/glimmer_3.cgi
GeneMark (preferred method)
Note, GeneMark not only creates a prediction, but gives alternative predictions and their likelihood.
http://www.ncbi.nlm.nih.gov/genomes/MICROBES/genemark.cgi
MRSA sequence NC_002952 GeneMark accurately predicted two of the three genes, which was closer than either of the two other two other programs. It is agreed in class that GeneMark is the superior overall program.
NC_012578 GeneMark and Softberry both analyzed the sequence and predicted accurately, however Softberry requires the selection of a particular organisms -_-
NC_002505 - GeneMarkined up the numbers perfectly! Softberry was close.
Secondary Structure Prediction (for proteins)
alpha helix
beta strand
coils, turn disorganized region
http://www.ncbi.nlm.nih.gov/Structure/CN3D/cn3d.shtml
http://cib.cf.ocha.ac.jp/bitool/MIX/
http://www.ncbi.nlm.nih.gov/Structure/mmdb/mmdbsrv.cgi?uid=6043
With regards to CD4, all three structural prediction proteins on the Japanese website turned out to be utterly useless. All three programs predicted an abundance of alpha helices, yet the crystalized structure contained none. I feel like I am spinning chicken bones to try and predict the harvest with these useless programs. (Note, when I used these programs during my practice practical for a general bioinformatics class, these tools turned out to be more useful.)
predictprotein.org is far superior to the three rotein prediction methods on the Japanese website.
Use cn3d to view the 3D protein structure. Don’t forget Style - Coloring Shortcuts - econdary Structure
Protein Location Prediction Tools
PSORT II - Protein Sorting
http://psort.hgc.jp/form2.html
PROTCOMP
Protein Compartment
http://www.softberry.com/berry.phtml?topic=index&group=programs&subgroup=proloc
Between Protcomp and PSORTII, Protcomp accurately predicted the location of CD4 (plasma membrane) however PSORT II did not.
Protcomp accurately predicted the location of all 3 isoforms of CD4. It said that all three were located in the plasma membrane.
Accession Number | Protein Location | PSORTII | Protcomp |
AAK64604 | mitochondrial protein | Accurate | Accurate |
CAA47024 | cytoplasmic protein | Accurate | Accurate |
AAA59599 | plasma membrane | Not Accurate | Accurate |
NP_001081025 | nuclear protein | Accurate | Not Accurate |
NP_000509 | cytoplasmic protein | Accurate | Accurate |
AAA61140 | extracellular protein | Not Accurate | Accurate |
NP_001019820 | endoplasmic reticulum | Accurate | Accurate |
NP_006735 | extracellular protein | Accurate | Accurate |
Note: Protcomp was inaccurate for NP_000509 at first, however when a FASTA format was copy/pasted into the query, instead of the version of the sequence at the bottom of the protein database that includes numbered lines etc., the protein prediction came back accurate.
Conclusion for Protein Prediction Location
Both programs have similar degrees of accuracy. Use both programs to predict protein locations and compare the predictions.
Discrepancy During Lab Practical
Make note of the respective predictions for each program. Rationally interpret the results. One prediction may need to be discarded, or it might have a close tie between two predictions. Also remember that protein locations are not static within the cell, proteins move from one location to another, and can often be found in multiple locations.
In Professional Setting
Follow the steps during the Lab Practical discrepancy, as described above. Additionally, it may be a good idea to search the professional databases (such as NCBI) and infer the location of this protein based on similar proteins.
Transmembrane Proteins Prediction
(When comparing to NCBI ctrl F for “transmembrane region”)
Sosui - (may require firefox.)
In Windows go to the control panel. May need to allow the Sosui website, and update Java.
In Mac Click the security tab in FireFox, edit site list, add Sosui.
http://harrier.nagahama-i-bio.ac.jp/sosui/sosui_submit.html
TM Predict
http://www.ch.embnet.org/software/TMPRED_form.html
TMHMM2
http://cbs.dtu.dk/services/TMHMM/
Helical Wheel program - A great compliment to the other three programs, a section of a protein sequence thought to be a helical structure can be pasted into this program for analysis. (NOTE: This program assumes a-priori that any sequence caopy/pasted into the query box is a helical region.)
http://rzlab.ucr.edu/scripts/wheel/wheel.cgi
Main site for the Helical Wheel lab lab
Accession | Sosui | TMPredict | TMHMM2 | Notes | |
Human glycophorin A | NP_002090 | ||||
Vitamin K epoxide reductase | ADN49753 | ||||
Bovine rhodopsin | NP_001014890 | Prediction somewhat off | Prediction was very accurate | Nearly identical to NCBI flat file | TMPredict seemed to be the most clear tool, and the most accurate, with TMHMM2 being a close second. |
Gorilla ABC-transporter | AAA91199 | ||||
CFTR protein | NP_000483 |
Class verdict - all three programs were pretty accurate and performed with minimal error.
Protein Structural Prediction
Predicts signal to go to the ER and associated cleavage site:
Signal P
http://www.cbs.dtu.dk/services/SignalP/
Signal P explaination
http://www.cbs.dtu.dk/services/SignalP-4.1/output.php
Disulfide Bonds (Cysteine - Cysteine bonds) (See also the Prosite Tool under Protein Signatures)
CYS_REC
Used to predict the presence of disulfide bonds in a protein.
http://www.softberry.com/berry.phtml?topic=cys_rec&group=programs&subgroup=propt
DiANNA (not used in BAM class) - A tool for predicting disulfide bonds.
http://clavius.bc.edu/~clotelab/DiANNA/
Predict coiled-coils
COILS
http://www.ch.embnet.org/software/COILS_form.html
Marcoil
http://toolkit.tuebingen.mpg.de/marcoil
COILS/ PCOILS (not used in BAM class)
Predicts coiled coils and compares the prediction to a database of known sequances.
http://toolkit.tuebingen.mpg.de/pcoils
Repetitive Elements Search
Nucleotide Blast for dbALU database. Finds alu repetitive elements in your nucleotide sequence query.
Leucine Zipper
2zip Server
Protein Signatures
Scan Prosite Tool - Tool excludes motifs with high probability by default (this is a good thing).
http://prosite.expasy.org/scanprosite/
Fingerprint Scan - A tool that scans for motif fingerprints.
http://www.bioinf.manchester.ac.uk/cgi-bin/dbbrowser/fingerPRINTScan/FPScan_fam.cgi
Nucleotide Signatures
Search for Human Promoters (nucleic acid analysis) - A threshhold of .8 is used by default to avoid false positives (this is a good thing).
http://linux1.softberry.com/berry.phtml?topic=fprom&group=programs&subgroup=promoter
CpG - Searches for CpG islands.
http://www.softberry.com/berry.phtml?topic=cpgfinder&group=programs&subgroup=promoter
Searching with Putative Chimeric Proteins
BLAST-P Against rough-seq database
https://blast.ncbi.nlm.nih.gov/Blast.cgi?PROGRAM=blastp&PAGE_TYPE=BlastSearch&LINK_LOC=blasthome
Miscilaneous Bioinformatics Links
3D Visualization and Prediction
CN3D
Download http://www.ncbi.nlm.nih.gov/Structure/CN3D/cn3dwin.shtml
CN3D Exercise
α, β, α/β, α+β, and a transmembrane protein
Molecule | Inferred Class | Actual Class |
1rnb, G-specific endonuclease | ||
1cd8, CD8 molecule | ||
3hhb, hemoglobin | ||
1kzu, light-harvesting complex | ||
1a50, tryptophan synthase β subunit | ||
MEGA - MEGA is a program that I have found to be exceptionally useful for phylogenetics analysis. When I took Evolutionary Biology in 2013, MEGA was the main tool that I used for my phylogenetics project. My project involved comparing 5 different genes between marsupials and placentals, and paying special attention to counterparts in each group.