Biotechvana Bioinformatics (BiBi)
Journal of Biotechvana software and computational
resources in bioinformatics
The GyDB (Gypsy Database) collection is a non-redundant repository of multiple alignments, hidden Markov model profiles, and majority-rule consensus sequences. The collection is based on all currently known protein domains of the distinct mobile genetic elements and related host genes classified at "Gypsy Database of mobile genetic elements". Alignments are available in six formats: FASTA, PIR, MSF, Stockholm, Clustal and Phylip. Hidden Markov model profiles and consensus sequences are constructed based on each protein domain consensus accepted per monophyletic group of classified MGEs and protein domains. The GyDB collection was originally launched as a resource of Biotechvana Bioinformatics that is related to, but it is independent from, the Gypsy Database. This relationship has recently motivated us to finally deposit the GyDB collection within the Gypsy database, where the resource is now publicly accessible as a permanent section.
In this paper, we present the second release of the GyDB Collection: an in-progress repository of manually refined multiple alignments, Hidden Markov Model (HMM) profiles and Majority-rule consensus (MRC) sequences of viruses and mobile genetic elements. Alignments are available in multiple formats plus a color-shaded HTML for preserved motif visualization. The HTML format includes links to each sequence?s GenBank accession at NCBI. HMM profiles and MRCs are based on each of the accepted protein domain consensus accepted per monophyletic group of mobile genetic elements (MGEs) and protein domain. The collection is the repository of the Gypsy Database (GyDB) of Mobile Genetic Elements and contemplates all protein domains encoded by all phylogenetic subsets of MGEs classified at GyDB and, as such, is subjected to continuous growth.
In this paper we introduce the GyDB collection, a non-redundant compilation of manually refined multiple alignments, hidden markov model profiles, and majority-rule consensus sequences based on all currently known protein products encoded by Ty3/Gypsy and Retroviridae LTR retroelements and related nonviral proteins. Alignments are available in six formats: Fasta, Pir, Msf, Stockholm, Clustal, Phylip plus a web-shaded HTML format to facilitate preserved motif visualization. The HTML format includes hyperlinks to each sequence?s Genbank accession at NCBI. Hidden markov model profiles and majority-rule consensus sequences were constructed based on each protein domain consensus accepted per monophyletic group of LTR retroelements and protein domain.