GenBank One of the first prominent bioinformatics databases

GenBank

One of the first prominent bioinformatics databases was GenBank maintained by the U.S. NCBI, the National Center for Biotechnology Information. Over the years the information maintained at NCBI\'s site has grown to include much more than just GenBank.

At NCBI\'s main web site, http://www.ncbi.nlm.nih.gov/, locate the \"Search\" panel at the very top. Select \"Nucleotide\" in the pull-down menu (GenBank contains nucleotide sequences), enter \"GFP\" in the text box beside it, and click on the \"Search\" button.

The search you have initiated will have about 311692 results. The results will be found in databases labelled \"Nucleotide\" (the main collection of nucleotide sequence records), \"EST\" ( a database of expressed sequence tags), and \"GSS\" (genome survey sequences). By default, the preliminary results shown for your search will be for \"Nucleotide\". Despite being the smallest subset of search results, there are still too many results to examine by hand (8506 of them), so we will need to narrow the search. Before you proceed with that, notice the other results reported. For example, the number of search results associated with certain taxa are reported in the top left column; the number of search results associated with a particular molecule type or genetic compartment are reported; etc.

To narrow your search, click on the \"Advanced\" just below the search text box at the top of the window. Use the resultant \"Advanced Search Builder\" window to better control your search. From the pull-down men titled \"All Fields\" select \"Gene Name\". Now enter \"GFP\" in the text box to the right. Leave the rest of the query unspecified. Click the \"Search\" button again. This time you should have approximately 329 results. Further restrict your search to just mRNA sequences by clicking on \"mRNA\" under \"Molecule Types\" on the left-hand side of the window. You should be left with 24 results.

Two entries in the list, with accession numbers M62653.1 and M62654.1, are from a seminal 1992 paper. Click on the entry for M62653.1, the smaller sequence, and look over the resultant record. The information is in \"GenBank format\". Note how it is text-oriented, but has much more information than FASTA format. Explore the various fields in the record that have URL links. Note that the \"Feature\" keyword \"CDS\" should be taken to mean \"protein-coding sequence\".

Based on the information in the record and its links answer the following questions:

a) What is the Latin name of the organism whose DNA was sequenced for this GFP gene?

b) What is the taxonomic ID of the organism whose DNA was sequenced for this GFP gene?

c) How long is the nucleotide sequence? (Note there is a quicker way to get the answer than by counting.)

d) What is the affiliation of the first author on the research paper in which this information was published?

e) This nucleotide sequence codes for a protein. What is the accession number of that protein?

f) At what (nucleotide) position does translation start?

To confirm your answer to the last question, click on \"Graphics\" in the line above the record, and near the top of the window. Note that the output may take some moments to appear depending on how busy the NCBI servers are. In the graphics window, hold your cursor over the thick line labeled with the accession number for the resultant protein from part e. You should get a pop-up window with information that includes the answer to the question in part f.

GFP gene coding used for the organism of Aequorea victoria Taxonomy ID of Aequorea victoria was 6100 GFP gene having 966 nucleotide sequence numbers First author of this research paper (Primary structure of the Aequorea victoria green-fluorescent protein) was Prasher,D.C. , published in PubMed, affiliated to Biology Department, Woods Hole Oceanographic Institution, MA 02543 It codes for green-fluorescent protein [Aequorea victoria], having Accession number AAA27721

GenBank One of the first prominent bioinformatics databases was GenBank maintained by the U.S. NCBI, the National Center for Biotechnology Information. Over the

GenBank One of the first prominent bioinformatics databases

Solution

Get Help Now

Submit a Take Down Notice