Consider the following read returned from the sequencing fac
Consider the following read returned from the sequencing facility:
Recommendation: Refer to the Wikipedia page on FASTQ format for the encoding schemes discussed below.(a) Assume the quality scores are encoded using the Sanger offset (Phred+33). Is this sequence of generally good quality?
(b) Under this encoding, what base is the lowest quality? (You may circle it in the above) What is the probability of this position being correct?
(c) You realize that you were mistaken in the encoding and it is given in the Illumina 1.3+(Phred+64) format. Under this encoding scheme, is this sequence of generally good quality? Is the worst position still the one you circled in question b?
@SRR 001666.1 071 112 SLXA-EAS1 s 7: 1 817 345 GCAT GTGGTGAGGTGGTAGTGATGGTGATATAGAGTGGTAGTATAAGTGT IIIIIIIIIIIIIIIIIIIGIIIIIIIIIIIIIIIIIIIIIIAIIGIICISolution
a) Yes. it is of good quality as the ASCII coding for Phred+33 scoring is from 0 to 41. Here, the ASCII character used is (I) of high quality.
b) The base A or adenine is of lowest quality as it is encoded at the 43rd position of the sequence with low score.
c) Yes. Even in the changed encoding, the sequence is of good quality. But, the worst position is again at the 43rd position (Adenine). A ASCII has lower score compared to G and C ASCIIs.
