please help me to solve this In the lecture notes I displaye

please help me to solve this

In the lecture notes, I displayed the BLAST scoring matrix for DNA: M=(S_i, j) = (matrix) Derive the equation for the Altschul-Dembo-Karlin variable lambda. Estimate the value of lambda.

Solution

In bioinformatics, BLAST for Basic Local Alignment Search Tool is an algorithm for comparing primary biological sequence information, such as the amino-acid sequences of different proteins or the nucleotides of DNA sequences. A BLAST search enables a researcher to compare a query sequence with a library or database of sequences, and identify library sequences that resemble the query sequence above a certain threshold.

Different types of BLASTs are available according to the query sequences. For example, following the discovery of a previously unknown gene in the mouse, a scientist will typically perform a BLAST search of the human genometo see if humans carry a similar gene; BLAST will identify sequences in the human genome that resemble the mouse gene based on similarity of sequence. The BLAST algorithm and program were designed by Stephen Altschul, Warren Gish, Webb Miller, Eugene Myers, and David J. Lipman at the National Institutes of Health and was published in the Journal of Molecular Biology in 1990 and cited over 50,000 times.

The total score for an alignment (67 in the above case) is simply the sum of pairwise scores. The individual pairwise scores are listed beneath the alignment above (+5 on the left through +2 on the right).

Note that some pairs of letters may yield the same score. For example, in the BLOSUM62 matrix used to find the above alignment, S(S,T) = S(A,S) = S(E,K) = S(T,S) = S(D,N) = +1. While alignments are usually reported without the individual pairwise scores shown, the statistics of alignment scores depend implicitly on the probabilities of occurrence of the scores, not the letters. Dependency on the letter frequencies is factored out by the scoring matrix. We see then that the \"message\" obtained from a similarity search consists of a sequence of pairwise scores indicated by the high-scoring alignment.

If we search two sequences X and Y using a scoring matrix Sij to identify the maximal-scoring segment pair (MSP), and if the following conditions hold:

then Karlin-Altschul statistics tell us:

Sij = log(Qij / (PX(i)PY(j))) / l

E = K M N e-lS

Since gaps are disallowed in the MSP, a pairwise sequence comparison for the MSP is analogous to a linear search for the MSS along the diagonals of a 2-d search space. The sum total length of all the diagonals searched is just M N.

Another way to express the pairwise scores is:

Sij = logb(Qij / (PX(i)PY(j)))

where logarithms to some base b are used instead of Natural logarithms. l, which was used in the earlier expression for Sij, is often called the scale of the scoring matrix; it is related to the base of the logarithm as follows:

l = loge b.

By summing the scores Sij for the aligned pairs of letters in the MSP, one obtains the sum of log-odds ratios that is the MSP score. The MSP score is then seen to be the logb (logarithm to some base b) of the odds that an MSP with its score occurs by chance at any given starting location within the random background -- not considering yet how large an area, or how many starting locations, were actually examined in finding the MSP.

Considering the size of the examined area alone, the expected description length of the MSP (measured in information) is log(K M N). As the relative entropy, H, has units of information per length (or letter pair), the expected length of the MSP (measured in letter pairs) is

E(L) = log(K M N) / H,

where H is the relative entropy of the target and background frequencies. H can be computed as:

H = sum_i,j Qij log(Qij / (PX(i) PY(j)))

Here PX(i) PY(j) is the product frequency expected for letter i paired with j in the background search space; and Qij is the frequency at which we expect to observe i paired with j in the MSP.

By definition, the MSP has the highest observed score. Since this score is expected to occur just once, the (expected) MSP score has an (expected) frequency of occurrence, E, of 1.

The appearance of MSPs can be modeled as a Poisson process with characteristic parameter E, as it is possible for multiple, independent segments of the background sequences to achieve the same high score. The Poisson \"events\" are the individual MSPs having the same score S or greater. The probability of one or more MSPs having score S or greater is simply one minus the probability that no MSPs appear having score S or greater. Thus,

P = 1 - e-E

please help me to solve this In the lecture notes, I displayed the BLAST scoring matrix for DNA: M=(S_i, j) = (matrix) Derive the equation for the Altschul-Demb
please help me to solve this In the lecture notes, I displayed the BLAST scoring matrix for DNA: M=(S_i, j) = (matrix) Derive the equation for the Altschul-Demb

Get Help Now

Submit a Take Down Notice

Tutor
Tutor: Dr Jack
Most rated tutor on our site