UniGene is an NCBI database that clusters EST sequences from dbEST and GenBank mRNA into gene-oriented clusters. Only ESTs with 3' ends are clustered to provide a more unique representation of transcripts. Contaminant sequences are removed before clustering the cleaned ESTs based on sequence overlaps. The final UniGene clusters represent unique genes and are annotated with gene and tissue information.
UniGene is an NCBI database that clusters EST sequences from dbEST and GenBank mRNA into gene-oriented clusters. Only ESTs with 3' ends are clustered to provide a more unique representation of transcripts. Contaminant sequences are removed before clustering the cleaned ESTs based on sequence overlaps. The final UniGene clusters represent unique genes and are annotated with gene and tissue information.
UniGene is an NCBI database that clusters EST sequences from dbEST and GenBank mRNA into gene-oriented clusters. Only ESTs with 3' ends are clustered to provide a more unique representation of transcripts. Contaminant sequences are removed before clustering the cleaned ESTs based on sequence overlaps. The final UniGene clusters represent unique genes and are annotated with gene and tissue information.
UniGene is an NCBI database that clusters EST sequences from dbEST and GenBank mRNA into gene-oriented clusters. Only ESTs with 3' ends are clustered to provide a more unique representation of transcripts. Contaminant sequences are removed before clustering the cleaned ESTs based on sequence overlaps. The final UniGene clusters represent unique genes and are annotated with gene and tissue information.
Download as PPTX, PDF, TXT or read online from Scribd
Download as pptx, pdf, or txt
You are on page 1of 7
UniGene
• UniGene is NCBI EST cluster database.
• Each cluster is a set of overlapping EST sequences . • The database is constructed based on combined information from dbEST, GenBank mRNA database. • Only ESTs with 3’ ends are clustered. • The resulting 3’EST sequences provide more unique representation of the transcripts. • The next step is to remove contaminant sequences that include bacterial vectors. • The cleaned ESTs are used to search against a database of known unique genes with BLAST. • The compiling step identifies sequence overlaps and derived final sequence. • During this step, errors in individual ESTs are corrected, then sequences are partitioned into clusters and assembled into contig. • The final result is a set of nonredundant, gene clusters known as UniGene clusters. • Each UniGene cluster represents unique gene and is further annotated for its gene locus information, as well as information related to the tissue type where gene has been GSS • In field of bioinformatics and computational biology, genome survey sequences are nucleotide sequences similar to ESTs. • The only difference is that most of them are genomic in origin rather than mRNA. • Genome Survey Sequences are typically generated and submitted to NCBI by labs performing genome sequencing. • They are used, amongst other things, as a framework for the mapping and sequencing of • Genome survey sequencing is a new way to map the genome sequences. • Current genome sequencing approaches are mostly high-throughput shotgun methods, and GSS is often used on the first step of sequencing. • GSSs can provide an initial global view of a genome, which includes both coding and non- coding DNA and contain repetitive section of the genome. UCSC • The UCSC genome browser is an online genome browser hosted by University of California Santa Cruz. • It is an interactive website offering access to genome sequence data from variety of vertebrates and invertebrates species. • The UCSC genome browser hosts genomes from variety of organisms: As of September 2009, this included 24 vertebrates, 14 mammals, 13 insects, 11 species of • The UCSC genome browser is a part of package of tools accessible from the UCSC genome bioinformatics website. • The UCSC genome browser provides users with visualization of results from genome such as SNP associated studies, linkage studies, chromosomal positions of genes, evolutionary relationships, alignments. • It includes many tools such as Genome browser, BLAT, Gene sorter, Genome graphs. TIGR • TIGR Gene Indices (www.tigr.org/tdb/tgi.shtml) is an EST database that uses a different clustering method from UniGene. • It compiles data from dbEST, GenBank mRNA and genomic DNA data, and TIGR’s own sequence database. • Sequences are only clustered if they are more than 95% identical for over a forty nucleotide region in pairwise comparisons. • BLAST and FASTA are used to identify sequence overlaps.