Can someone please explain in a lot of detail how to go abou
Can someone please explain in a lot of detail how to go about solving these questions? I have provided the correct answers, but I am not very sure how to come up with them.
2. How many bits? The information in DNA is stored as a code made up of four chemical bases: adenine (A), guanine (G), cytosine (C), and thymine (T). Human DNA consists of about 3 billion bases. Computer representations of DNA consist of long strings of the letters A, C, G, and T. Assume we need to store a sequence of length N. a. How many bits will be required if we use a null-terminated ASCII string with one character per DNA base to store the sequence? Write an expression using N that will evaluate to the total number of bits required b. Using ASCII characters to represent 4 different equally likely possibilities and the null byte at the end is wasteful. Suppose instead we change two things. 1) Use the minimum number of bits to represent the 4 possibilities for each letter. 2) Instead of reserving a code to mark the end, we can store the length of the sequence at the beginning in fixed-size unsigned int of sufficient width to represent lengths from 0 to 3 billion bits to represent each letter; Hint: For example, if you need 16 bits to represent the length and 5 your answer will be an expression involving N, 16, and 5 Hint: The numbers 16 and 5 given in the previous hint are not correct. Hint: This problem does not require advanced math or functions. Simple arithmetic will get you the answer. How many bits will be required to represent a sequence of length N? Submit Correct Answers: 2a) 8*N+8 2b)32+2* NSolution
a)
Char data type in C programing language is 8-bits. So, if there are N characters in DNA sequence, then it will take 8*N bits space. Plus, given that the DNA sequence is NULL-terminated ASCII string, so additional 8 bits will be required to represent a DNA sequence.
Hence Total Number of bits = 8*N + 8
b) Firstly note that 4 different characters can be represented using 2 bits only, one possible way is
00 - -> A
01 - -> C
10 --> G
11 --> T
In this part for representing DNA sequence, we it will take 2*N bits as it takes 2 bits for each character. And size of unsigned int in C is 32 bits.
Hence total 32 + 2*N bits.

