A DNA sequence is a sequence of some combination of the char

A DNA sequence is a sequence of some combination of the characters A (adenine), C (cytosine), G (guanine), and T (thymine) which correspond to the four nucleobases that make up DNA. Given a long DNA sequence, its often necessary to compute the number of instances of a certain subsequence. For this exercise, you will develop a program that processes a DNA sequence from a file and, given a subsequence s, searches the DNA sequence and counts the number of times s appears. As an example, consider the following sequence: GGAAGTAGCAGGCCGCATGCTTGGAGGTAAAGTTCATGGTTCCCTGGCCC If we were to search for the subsequence GTA, it appears twice. You will write a program (place your source in a file named dnaSearch.c) that takes, as command line inputs, an input file name and a valid DNA (sub)sequence. That is it should be callable from the command line as: ./dnaSearch dna01.txt GTA Handin dnaSearch.c. No additional files should be handed in.

Solution

// C program dnaSearch.c

#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#include <ctype.h>
#include <string.h>

int main(int argc, char const *argv[])
{
   FILE *inputFile;

   if(argc < 3)
   {
       printf(\"Input arguments missing\ \");
       return 0;
   }

   if ((inputFile = fopen(argv[1], \"r\")) == NULL)
{
printf(\"Error! opening file\");
// Program exits if file pointer returns NULL.
exit(1);   
}

  
   int i,j   ;
   char str[100], sub[100];

   // open file
   inputFile = fopen(argv[1],\"r\");  

   // reads text until newline
fscanf(inputFile,\"%[^\ ]\", str);
strcpy(sub , argv[2]);

   int count = 0;
   const char *tmp = str;
   while(tmp = strstr(tmp, sub))
   {
   count++;
   tmp++;
   }

printf(\"%s appears %d times\ \", sub, count);

   fclose(inputFile);

return 0;
}


/*
dna01.txt
GGAAGTAGCAGGCCGCATGCTTGGAGGTAAAGTTCATGGTTCCCTGGCCC

output:
GTA appaers 2 times


*/

 A DNA sequence is a sequence of some combination of the characters A (adenine), C (cytosine), G (guanine), and T (thymine) which correspond to the four nucleob
 A DNA sequence is a sequence of some combination of the characters A (adenine), C (cytosine), G (guanine), and T (thymine) which correspond to the four nucleob

Get Help Now

Submit a Take Down Notice

Tutor
Tutor: Dr Jack
Most rated tutor on our site