Hi I am working on a function that receives a file (.txt) pointer, a first struct (could be a list or a hash's table) and a second struct (could be list or a hash's table) and a function which receive 2 structures and store the file words on one of them. I made a word_struct function because I want to have got only function to open a dictionary and store the words in a hash table, and open other text file, and save only the words which isn't in the dictionary in a list. So just I need to change the strct_f function to open each file type. But should be there a faster way to do that, because it takes a few seconds to work (maybe because the text file with the dictionary has got 700.000 words).
It is important to say, that this method avoids storing a big array for the dictionary, and just work with each word. So I haven`t spend a lot of memory at the same time.
Is it a complex and dirty code? and is there other way to do this? Thanks you so much!
void *word_struct(FILE *fptr, void *strct, void *strct2, strct_f strct_f) {
int empty_file = 0, i = 0;
char a;
char *temp = malloc(sizeof(char) * BUFFER);
while (empty_file != 1) {
if ((a = fgetc(fptr)) == EOF) {
empty_file = 1;
if (i != 0) {
temp[i] = '\0';
strct_f(strct, strct2, temp);
}
free(temp);
} else
if ((a >= 97 && a <= 122) || (a >= 65 && a <= 90)) {
temp[i] = (char)tolower(a);
i = 1;
} else {
if (i != 0) {
temp[i] = '\0';
strct_f(strct, strct2, temp);
free(temp);
temp = malloc(sizeof(char) * BUFFER);
i = 0;
}
}
}
return strct;
}
CodePudding user response:
There are some issues in your code:
- using opaque
void *arguments and return value is unsafe: this defeats the compiler type checking features. strct_f strct_fis ugly: you shadow the name of the type with the variable name. Usestrct_f funinstead.amust be defined as anintto testEOFproperly and calltolower()with an argument in the proper range. It is more idiomatic to use the namecfor this variable.- you always allocate
BUFFERbytes for all words. This is wasteful and may cause the program to run slower - you do not test for buffer overflow when storing the next character into the
temparray, invoking undefined behavior on bogus dictionaries. - the last
tempbuffer is not freed, causing a memory leak. - hard coding ASCII values for the letters is non portable, hard to read and less efficient than using
isalpha(c) - the words in the dictionary might legitimately contain dashes and some other non alpha bytes.
Here is a modified version:
#include <ctype.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
void *word_struct(FILE *fp, void *p1, void *p2, strct_f fun) {
char buf[BUFFER];
size_t i = 0;
for (;;) {
int c = getc(fp);
if (isalpha(c)) {
if (i 1 < sizeof(buf))
buf[i ] = (char)tolower(c);
}
} else {
if (i > 0) {
buf[i] = '\0';
fun(p1, p2, strdup(buf));
i = 0;
}
if (c == EOF)
break;
}
}
return p1;
}
