INTRODUCTION
You get a bacterial isolate. You sequence it. You manage to get some contigs after mucking around with some de novo assembly software. Now what? Annotation of course! Your FASTA file is teeming with lifeless chunks of bacterial DNA yearning to be adorned with insightfully labelled features, so it can get some more attention from you, and maybe even be reunited with some old friends in Genbank/ENA. If this sounds familiar, then this blog post is for you.
WHAT IS GENOME ANNOTATION?
Genome annotation is the process of identifying features of interest on a genome sequence. Some of the features relevant to bacterial genomes are protein coding genes, non-coding RNAs, and operons. Features can have all sorts of useful information associated with them in addition to their genomic location and feature type. For example, a protein-coding gene annotation could include items such as the predicted protein product, whether it has a signal peptide, a gene abbreviation and an enzyme classification number. The accuracy and richness of a genome annotation is important, and sometimes critical, to downstream biological interpretation.
In the old days, a basic ORF finder would be run over the contigs. Then the truly dedicated curators would comb over the ORFs, trim back to good looking start codon sites, delete spurious looking ORFs, and so on. Later gene predictor software and BLASTX helped bootstrap this process further. Now there are various "automatic annotation" systems which do a reasonably good job. Manual refinement of the automatic annotation can then be done using curation applications.
Below I list the tools I am aware of for performing and curating bacterial genome annotation. If I've missed any please let me know and I will add them.
In the old days, a basic ORF finder would be run over the contigs. Then the truly dedicated curators would comb over the ORFs, trim back to good looking start codon sites, delete spurious looking ORFs, and so on. Later gene predictor software and BLASTX helped bootstrap this process further. Now there are various "automatic annotation" systems which do a reasonably good job. Manual refinement of the automatic annotation can then be done using curation applications.
Below I list the tools I am aware of for performing and curating bacterial genome annotation. If I've missed any please let me know and I will add them.
WEB SUBMISSION SYSTEMS
- RAST - Rapid Annotation using Subsystem Technology
- BaSYS - Bacterial Annotation System
- xBASE Bacterial Genome Annotation Service
- JVCI Annotation Service
- IGS Annotation Engine
- JGI/DOE IMG annotation service
- PGAAP - NCBI Prokaryotic Genome Automatic Annotation Pipeline
- MAKER Web Annotation Service
- Prokka Web Annotation Server (disclaimer - this is our software - will be public soon)
STANDALONE SYSTEMS
- BG7 bacterial genome annotation system
- AGeS - Annotation/Analysis of Genome Sequences
- MAKER
- Prokka - prokaryotic annotation (disclaimer - this is our software)
CURATION SYSTEMS
CONCLUSION
Beware of systems claiming to do "microbial" annotation. Most of them are only designed for annotating bacteria. They will perform poorly on viruses, fungi and other microbes.
0 comments:
Post a Comment