SEQUENCE 5 to 3 sequence of the coding strand 1 ccctgtggag c
SEQUENCE (5’ to 3’ sequence of the coding strand) 1 ccctgtggag ccacacccta gggttggcca atctactccc aggagcaggg agggcaggag 61 ccagggctgg gcataaaagt cagggcagag ccatctattg cttacatttg cttctgacac 121 aactgtgttc actagcaacc tcaaacagac accatggtgc acctgactcc tgaggagaag 181 tctgccgtta ctgccctgtg gggcaaggtg aacgtggatg aagttggtgg tgaggccctg 241 ggcaggttgg tatcaaggtt acaagacagg tttaaggaga ccaatagaaa ctgggcatgt 301 ggagacagag aagactcttg ggtttctgat aggcactgac tctctctgcc tattggtcta 361 ttttcccacc cttaggctgc tggtggtcta cccttggacc cagaggttct ttgagtcctt 421 tggggatctg tccactcctg atgctgttat gggcaaccct aaggtgaagg ctcatggcaa 481 gaaagtgctc ggtgccttta gtgatggcct ggctcacctg gacaacctca agggcacctt 541 tgccacactg agtgagctgc actgtgacaa gctgcacgtg gatcctgaga acttcagggt 601 gagtctatgg gacccttgat gttttctttc cccttctttt ctatggttaa gttcatgtca 661 taggaagggg agaagtaaca gggtacagtt tagaatggga aacagacgaa tgattgcatc 721 agtgtggaag tctcaggatc gttttagttt cttttatttg ctgttcataa caattgtttt 781 cttttgttta attcttgctt tctttttttt tcttctccgc aatttttact attatactta 841 atgccttaac attgtgtata acaaaaggaa atatctctga gatacattaa gtaacttaaa 901 aaaaaacttt acacagtctg cctagtacat tactatttgg aatatatgtg tgcttatttg 961 catattcata atctccctac tttattttct tttattttta attgatacat aatcattata 1021 catatttatg ggttaaagtg taatgtttta atatgtgtac acatattgac caaatcaggg 1081 taattttgca tttgtaattt taaaaaatgc tttcttcttt taatatactt ttttgtttat 1141 cttatttcta atactttccc taatctcttt ctttcagggc aataatgata caatgtatca 1201 tgcctctttg caccattcta aagaataaca gtgataattt ctgggttaag gcaatagcaa 1261 tatttctgca tataaatatt tctgcatata aattgtaact gatgtaagag gtttcatatt 1321 gctaatagca gctacaatcc agctaccatt ctgcttttat tttatggttg ggataaggct 1381 ggattattct gagtccaagc taggcccttt tgctaatcat gttcatacct cttatcttcc 1441 tcccacagct cctgggcaac gtgctggtct gtgtgctggc ccatcacttt ggcaaagaat 1501 tcaccccacc agtgcaggct gcctatcaga aagtggtggc tggtgtggct aatgccctgg 1561 cccacaagta tcactaagct cgctttcttg ctgtccaatt tctattaaag gttcctttgt 1621 tccctaagtc caactactaa actgggggat attatgaagg gccttgagca tctggattct 1681 gcctaataaa aaacatttat tttcattgca atgatgtatt taaattattt ctgaatattt 1741 tactaaaaag ggaatgtggg aggtcagtgc atttaaaaca taaagaaatg atgagctgtt 1801 caaaccttgg gaaaatacac tatatcttaa actccatgaa agaaggtgag gctgcaacca 1861 gctaatgcac attggcaaca gcccctgatg cctatgcctt attcatccct cagaaaagga 1921 ttcttgtaga ggcttgattt gcaggttaaa gttttgctat gctgtatttt acattactta 1981 ttgttttagc tgtcctcatg aatgtctttt cactacccat ttg
The different elements of the gene’s structure correspond to the nucleotides in the sequence as follows:
Pro = promoter region, nucleotides 1-103
E1 = exon 1, nucleotides 104-245
I1 = intron 1, nucleotides 246-375
E2 = exon 2, nucleotides 376-598
I2 = intron 2, nucleotides 599-1448
E3 = exon 3, nucleotides 1449-1709
Imagine you are sequencing the genome of a newly discovered eukaryotic organism, and you want to determine the genes that are in the organism’s genome. You put your organism’s sequence into a gene discovery program. Please name five things the program will look for as a means of identifying the genes that lie in the organism’s genome. Indicate where they lie in the sequence of the sequence of the human beta-globin gene above.
Pro E1 E2 12 E3Solution
The program will look for the following factors to identify the genes are:
1.The start codons->where the code for the gene product starts
2.RNA(ribonucleic acid )
3.mRNA
4.Promoters sequences(where the proten binds on genes)
5.end codons->code for the gene product ends
They lie in the following:
1.start codon->pro
2.RNA->E1 AND E2
3.mRNA->l1 and l2
4.Promoters sequences->l1,E2,l2
5.end codon->E3

