dbEST
dbEST (Nature Genetics 4:332-3;1993) is a division of GenBank that contains sequence data and other information on "single-pass" cDNA sequences, or "Expressed Sequence Tags", from a number of organisms. |
ENCODE Project
The National Human Genome Research Institute (NHGRI) launched a public research consortium named ENCODE, the Encyclopedia Of DNA Elements, in September 2003, to carry out a project to identify all functional elements in the human genome sequence. The project started with two components - a pilot phase and a technology development phase.
The pilot phase tested and compared existing methods to rigorously analyze a defined portion of the human genome sequence (See: ENCODE Pilot Project). The conclusions from this pilot project were published in June 2007 in Nature and Genome Research [genome.org]. The findings highlighted the success of the project to identify and characterize functional elements in the human genome. The technology development phase also has been a success with the promotion of several new technologies to generate high throughput data on functional elements.
With the success of the initial phases of the ENCODE Project, NHGRI funded new awards in September 2007 to scale the ENCODE Project to a production phase on the entire genome along with additional pilot-scale studies. Like the pilot project, the ENCODE production effort is organized as an open consortium and includes investigators with diverse backgrounds and expertise in the production and analysis of data (See: ENCODE Participants and Projects). This production phase also includes a Data Coordination Center [genome.ucsc.edu] to track, store and display ENCODE data along with a Data Analysis Center to assist in integrated analyses of the data. All data generated by ENCODE participants will be rapidly released into public databases (See: Accessing ENCODE Data) and available through the project's Data Coordination Center. |
Ensembl
The Ensembl project was started in 1999, some years before the draft human genome was completed. Even at that early stage it was clear that manual annotation of 3 billion base pairs of sequence would not be able to offer researchers timely access to the latest data. The goal of Ensembl was therefore to automatically annotate the genome, integrate this annotation with other available biological data and make all this publicly available via the web. Since the website's launch in July 2000, many more genomes have been added to Ensembl and the range of available data has also expanded to include comparative genomics, variation and regulatory data.
The number of people involved in the project has also steadily increased. Currently, the Ensembl group consists of between 40 and 50 people, divided in a number of teams. The Genebuild team creates the gene sets for the various species. The result of their work is stored in the core databases, which are taken care of by the Software team. This team also develops and maintains the BioMart data mining tool. The Compara, Variation and Regulation teams are responsible for the comparative and the variation and regulatory data, respectively. The Web team makes sure that all data are presented on the website in a clear and user-friendly way. Finally the Outreach team answers questions from users and gives workshops worldwide about the use of Ensembl. The Ensembl project is headed by Paul Flicek and receives input from an independent scientific advisory board.
Ensembl is a joint project between European Bioinformatics Institute (EBI), an outstation of the European Molecular Biology Laboratory (EMBL), and the Wellcome Trust Sanger Institute (WTSI). Both institutes are located on the Wellcome Trust Genome Campus in Hinxton, south of the city of Cambridge, United Kingdom. |
Entrez-Gene
The Gene database provides detailed information for known and predicted genes defined by nucleotide sequence or map position. Currently, Gene contains more than 14 million entries and includes data from all major taxonomic groups. Each record in the database corresponds to a single gene and is derived from processing by the NCBI Reference Sequence and genome annotation groups. Gene data can be accessed on the web through the Gene home page, programmatically through the Entrez Programming Utilities, or by file transfer through its FTP site. |
EPD - Eukaryotic Promoter Database
This resource allows the access to several databases of experimentally validated promoters: EPD and EPDnew databases. They differ by the validation technique used and the coverage. EPD is a collection of eukaryotic promoters derived from published articles. Instead, the EPDnew databases (HT-EPD) are the result of merging EPD promoters whith in-house analysis of promoter-specific high-throughput data for selected organisms only. This process gives EPDnew high precision and high coverage.
The Eukaryotic Promoter Database is an annotated non-redundant collection of eukaryotic POL II promoters, for which the transcription start site has been determined experimentally. Access to promoter sequences is provided by pointers to positions in nucleotide sequence entries. The annotation part of an entry includes description of the initiation site mapping data, cross-references to other databases, and bibliographic references. EPD is structured in a way that facilitates dynamic extraction of biologically meaningful promoter subsets for comparative sequence analysis. This database contains 4806 promoters from several species.
EPDnew is a new collection of experimentally validated promoters in human, mouse, D. melanogaster and zebrafish genomes. Evidence comes from TSS-mapping from high-throughput expreriments such as CAGE and Oligocapping. ChIP-seq experiments on H2AZ, H3K4me3, Pol-II and DNA methylation are also taken into account during the analysis. The resulting database contains 23360 promoters for the human (H. sapiens) collection, 21239 promoters for the mouse (M. musculus) collection, 15073 promoters for the D. melanogaster collection, 10728 promoters for the zebrafish (D. rerio) collection, 7120 promoters for the worm (C. elegans) collection and 10229 promoters for the A. thaliana collection. |
European Conditional Mouse Mutagenesis Programme
|
Fantom Functional Annotation of Mouse
|
Fantom Functional Annotation of the Mammalian Genome
|
Fgenesh
|
FirstEF
|
FlyBase
|
GenBank
|
Gene Set Enrichment Analysis (GSEA)
Gene Set Enrichment Analysis (GSEA) is a computational method that determines whether an a priori defined set of genes shows statistically significant, concordant differences between two biological states (e.g. phenotypes). |
Genomic Diversity and Phenotype Connection (GDPC)
|
mips RE-dat
|
NCBI Structure Group
|
NCI Cancer Gene Index
|
NIH Knockout Mouse Project
|
OMIM
|
Oryza Map Alighment Project
|