By accident a bioinformatics lab mixed up DNA sequences in a

By accident, a bioinformatics lab mixed up DNA sequences in a single data file for two organisms: fruit fly (Drosophila melanogaster) and a grape (Vitis vinifera). Your goal is to figure out which sequences belong to each organism while also learning about the NCBI databases, specifically BLAST. Create a Python program that reads the sequences from a text file (one per line), performs a BLAST search if not previously done, stores the results of each search in a separate file, and then performs analytics on the search results to help solve our sequence mix-up problem. Details:

Put a comment at the top of the Python file called lab6q1 with your name and student number. (0.5 mark)

Download the data file called input.txt and place it in your local directory for reading. Each line of the data file contains a DNA sequence. There are 10 sequences in the file.

Read each line of the data file and print to the console the first 20 characters in the sequence and the length of the query sequence. (1 mark)

Use a counter to keep track of how many sequences have been processed. We will call this count. (0.5 mark)

Using a try-except clause, look for a previously created BLAST result file that should be named dna_lab6_count.xml. For example, dna_lab6_1.xml. Start the number from 1. (2 marks)

If the file exists, open it. Print Using saved file. (1 mark)

Make a BLAST request using NCBIWWW.qblast(\"blastn\", \"nt\", seq) method and store results in a file with filename described above. seq is a variable storing your string sequence. Make sure to close the BLAST request and the save file. Re-open the file for reading. (2 marks)

At this point, you will have an open file containing the BLAST output. Use NCBIXML.parse(blast_infile) to get the BLAST record. (1 mark)

Based on the title of the first BLAST record alignment (blast_record.alignments[0].title) determine if DNA sequence belongs to fruit fly or grape. Use a list where there is a 1 if the DNA sequence is for the grape and a 0 if it is for the fruit fly. (1 marks)

Use another list to store the sizes of the alignment matched (blast_record.alignments[0].length). (1 mark)

Make sure to close the input file storing the BLAST result. (0.5 marks)

After all sequences in the DNA input.txt file are processed, print out the list containing 1 if the DNA sequence is for the grape and a 0 if it is for the fruit fly. Also print out the list storing the sizes of the alignments. (1 mark)

Make sure to close the input.txt file when done. (0.5 marks)

Create a histogram with 50 bins that displays the alignment sequence sizes (from sizes list). Use import matplotlib.mlab. (1 mark)

Create a bar chart that shows how many of the sequences were for the fruit fly and for the grape. Use import matplotlib.pyplot. (2 marks)

Bonus: Create a different chart (either a different type besides bar/histogram) or use a different Python chart library. (1 mark)

Bonus: Apply a linear regression or k-means clustering in a useful way to the BLAST query and/or result data. (1 mark)

Thanks;)

Solution

There are two many parts of the question, I\'m solving the main Blast implementation part

             blast_record = NCBIXML.parse(request_handle)

   if(blast_record.alignments[0].title==grape)

                    outputList.append(1)

             else outputList.append(0)

             matchedSize.append(blast_record.alignments[0].length)

            

By accident, a bioinformatics lab mixed up DNA sequences in a single data file for two organisms: fruit fly (Drosophila melanogaster) and a grape (Vitis vinifer
By accident, a bioinformatics lab mixed up DNA sequences in a single data file for two organisms: fruit fly (Drosophila melanogaster) and a grape (Vitis vinifer

Get Help Now

Submit a Take Down Notice

Tutor
Tutor: Dr Jack
Most rated tutor on our site