The intended output is to first reverse the whole DNA string, and then convert A <-> T, C <-> G. However, in the actual output, the first character prints as "p", which is coming out of nowhere, but the rest of the output string is fine. Here's the code:
int main() {
const char dna[] = "GATCACAGGTCTATCACCCTATTAACCACTCACGGGAGCTCTCCATGCAT"
"TTGGTATTTTCGTCTGGGGGGTGTGCACGCGATAGCATTGCGAGACGCTG"
"GAGCCGGAGCACCCTATGTCGCAGTATCTGTCTTTGATTCCTGCCTCATT"
"CTATTATTTATCGCACCTACGTTCAATATTACAGGCGAACATACCTACTA"
"AAGTGTGTTAATTAATTAATGCTTGTAGGACATAATAATAACAATTGAAT";
int dna_len = strlen(dna);
char rev_comp[dna_len 1];
char temp = '\0';
char temp_dna[dna_len 1];
for (int i = 0; i < dna_len 1; i ) {
temp_dna[i] = dna[dna_len - i];
if (temp_dna[i] == 'A') {
temp = 'T';
rev_comp[i] = temp;
}
else if (temp_dna[i] == 'T') {
temp = 'A';
rev_comp[i] = temp;
}
else if (temp_dna[i] == 'C') {
temp = 'G';
rev_comp[i] = temp;
}
else if (temp_dna[i] == 'G') {
temp = 'C';
rev_comp[i] = temp;
}
}
rev_comp[dna_len 1] = '\0';
printf("original: %s\n", dna);
printf("rev_comp: %s\n", rev_comp);
return 0;
}
CodePudding user response:
Your loop is wrong and loops up to dna_len (where the null terminator is). It should be:
for (int i = 0; i < dna_len; i ) { // corrected loop
temp_dna[i] = dna[dna_len - i - 1]; // corrected calculation
Also, the final null terminator in rev_comp should be assigned at index dna_len, not dna_len 1 - which is out of bounds so your program has undefined behavior. Printing p is one possible outcome of undefined behavior.
rev_comp[dna_len] = '\0';
You can make a small helper function to just do the reversing before you start swapping characters in the string though. Making small dedicated functions that does one thing only is good for debugging your program later. It's then easier to isolate and find problems. Example:
void rev(const char *in, size_t len, char *out) {
for(size_t i = 0; i < len; i) {
out[len - i - 1] = in[i];
}
out[len] = '\0';
}
And call it with
rev(dna, dna_len, rev_comp);
before swapping the letters:
for (int i = 0; i < dna_len; i ) {
char *ch = &rev_comp[dna_len - i - 1];
switch(*ch) {
case 'A': *ch = 'T'; break;
case 'T': *ch = 'A'; break;
case 'G': *ch = 'C'; break;
case 'C': *ch = 'G'; break;
}
}
CodePudding user response:
@TedLyngmo has already pointed out the indexing errors in your original code, but another consideration you may address is thinking about being able to reuse some of the code that you are writing in other programs later. Rather than writing specialized code over-and-over again for each individual program you write, identifying common parts of the code you may want to use again in another program and creating a short function for that part of the code makes that possible.
You will likely have the need to reverse a string more times than just in this program, so writing a reusable function to reverse a string that you can use wherever it is needed makes sense. Depending on your career path, you may also have the need to transform A <-> T, C <-> G more than in this one program, so a short function to do that may make sense as well.
Caveat: If upmost efficiency is required (dealing with billions of characters strings), then it would make sense to combine the operations and take advantage of a single iteration over the DNA sequence string. By working from each end of the string towards the middle you can handle two-characters per-iteration reducing by-half the number of iterations needed.
To make a reusable function for each the reversal and the transform of the string you can write the functions as follows. The string reversal function shows how to work from each end toward the middle requiring only half the number of iterations as the string has characters:
#include <stdio.h>
#include <string.h>
/* reverse src in dest copying 2-characters per-iteration. */
void strrev(char *dest, const char *src)
{
size_t begin = 0, end = strlen(src); /* begin and 1-past-end indexes */
dest[end] = 0; /* nul-terminate dest */
for(; begin < end--; begin) {
dest[begin] = src[end]; /* end to begin */
dest[end] = src[begin]; /* begin to end */
}
}
/* transform A <-> T, C <-> G */
void xformATCG (char *s)
{
do {
if (*s == 'A')
*s = 'T';
else if (*s == 'T')
*s = 'A';
else if (*s == 'C')
*s = 'G';
else if (*s == 'G')
*s = 'C';
} while (*s );
}
If you like, you can write a simple print function that will break long lines of output at a specific number of characters similar to how you show with your initialization of dna[]. For what it's worth you could add:
/* simple print with break at brk chars function */
void prnwbrk (const char *s, size_t brk)
{
size_t n = 0; /* counter */
while (s[n]) { /* loop until end-of-string */
if (n && n % brk == 0) /* if brk chars, output \n */
putchar ('\n');
putchar (s[n ]); /* output char */
}
putchar ('\n'); /* final \n */
}
Now reversing and transforming the string simply becomes a matter of calling strrev() and xformATCG() in main(). You can output between each operation to check each step (which makes debugging a bit easier). A short main() could be:
int main (void) {
const char dna[] = "GATCACAGGTCTATCACCCTATTAACCACTCACGGGAGCTCTCCATGCAT"
"TTGGTATTTTCGTCTGGGGGGTGTGCACGCGATAGCATTGCGAGACGCTG"
"GAGCCGGAGCACCCTATGTCGCAGTATCTGTCTTTGATTCCTGCCTCATT"
"CTATTATTTATCGCACCTACGTTCAATATTACAGGCGAACATACCTACTA"
"AAGTGTGTTAATTAATTAATGCTTGTAGGACATAATAATAACAATTGAAT";
char rev_comp[sizeof dna];
prnwbrk (dna, 50); /* print original dna */
putchar ('\n');
strrev (rev_comp, dna); /* reverse and print */
prnwbrk (rev_comp, 50);
putchar ('\n');
xformATCG (rev_comp); /* transform chars and print */
prnwbrk (rev_comp, 50);
}
Example Use/Output
If I understood your question and the operations properly, the reversed and transformed strings would look like:
$ ./bin/revdna
GATCACAGGTCTATCACCCTATTAACCACTCACGGGAGCTCTCCATGCAT
TTGGTATTTTCGTCTGGGGGGTGTGCACGCGATAGCATTGCGAGACGCTG
GAGCCGGAGCACCCTATGTCGCAGTATCTGTCTTTGATTCCTGCCTCATT
CTATTATTTATCGCACCTACGTTCAATATTACAGGCGAACATACCTACTA
AAGTGTGTTAATTAATTAATGCTTGTAGGACATAATAATAACAATTGAAT
TAAGTTAACAATAATAATACAGGATGTTCGTAATTAATTAATTGTGTGAA
ATCATCCATACAAGCGGACATTATAACTTGCATCCACGCTATTTATTATC
TTACTCCGTCCTTAGTTTCTGTCTATGACGCTGTATCCCACGAGGCCGAG
GTCGCAGAGCGTTACGATAGCGCACGTGTGGGGGGTCTGCTTTTATGGTT
TACGTACCTCTCGAGGGCACTCACCAATTATCCCACTATCTGGACACTAG
ATTCAATTGTTATTATTATGTCCTACAAGCATTAATTAATTAACACACTT
TAGTAGGTATGTTCGCCTGTAATATTGAACGTAGGTGCGATAAATAATAG
AATGAGGCAGGAATCAAAGACAGATACTGCGACATAGGGTGCTCCGGCTC
CAGCGTCTCGCAATGCTATCGCGTGCACACCCCCCAGACGAAAATACCAA
ATGCATGGAGAGCTCCCGTGAGTGGTTAATAGGGTGATAGACCTGTGATC
Nothing wrong with doing it all in main(), but thinking ahead can save you from reinventing-the-wheel each time you need to do the same thing in another program. (additionally, writing and debugging a function once, prevents new bugs from slipping in when you reinvent the function later)
Look things over and let me know if you have further questions.
