I'm a beginner in C programming and I have some questions regarding how to deal with files.
Let us suppose that we have a binary file with N int values stored. Let us suppose that we what to read the i-th in value in the file.
Is there any real advantage of using fseek for positioning the file pointer to the i-th int value and reading it after the fseek instead of using a sequence of i fread calls?
Intuitively, I think that fseek is faster. But how the function finds the i-th value in the file without reading the intermediary information?
I think that this is implementation-dependent. So, I tried to find the implementation of fseek function, without much success.
CodePudding user response:
Is there any real advantage of using fseek for positioning the file pointer to the i-th int value and reading it after the fseek instead of using a sequence of i fread calls?
Yes, if you want to read a value from the file and you know where it is, there is no reason to read anything else.
Intuitively, I think that fseek is faster. But how the function finds the i-th value in the file without reading the intermediary information?
Your intuition is correct, if you read one value it stands to reason that the it will be more efficient than reading several values. The way it finds the value is simple, generally speaking each position in the file corresponds to 1 byte, if you pass an offset of, for example 7, the next read will start from the 8th byte, imagine your file has the following data:
-58 10 12 14 7 9
^ ^
| |
0 offset of 7
fseek(fp, 7, SEEK_SET);
if(fscanf(fp,"%d",&num) == 1 ){
printf("%d", num);
}
Will output 12.
The file indicator was set to the 7th position, then the reading begins from the next byte. It's as if you had an array and you want to access the 7th position, you'll just use arr[7].
I think that this is implementation-dependent.
Though there are some small details that can be implementation defined, the overall behavior is standardised.
Synopsis
1.
#include <stdio.h> int fseek(FILE *stream, long int offset, int whence);Description:
The fseek function sets the file position indicator for the stream pointed to by stream. If a read or write error occurs, the error indicator for the stream is set and fseek fails.
For a binary stream, the new position, measured in characters from the beginning of the file, is obtained by adding offset to the position specified by whence. The specified position is the beginning of the file if whence is SEEK_SET, the current value of the file position indicator if SEEK_CUR, or end-of-file if SEEK_END. A binary stream need not meaningfully support fseek calls with a whence value of SEEK_END.
For a text stream, either offset shall be zero, or offset shall be a value returned by an earlier successful call to the ftell function on a stream associated with the same file and whence shall be SEEK_SET.
After determining the new position, a successful call to the fseek function undoes any effects of the ungetc function on the stream, clears the end-of-file indicator for the stream, and then establishes the new position. After a successful fseek call, the next operation on an update stream may be either input or output.
Returns:
- The fseek function returns nonzero only for a request that cannot be satisfied.
CodePudding user response:
But how the function finds the i-th value in the file without reading the intermediary information?
It doesn't. It's up to you provide the correct (absolute or relative) offset. You can request, for example, to advance the file pointer by i*sizeof(X).
It still needs to follow the chain of sectors in which the file is located to find the right one, but that doesn't require reading those sectors. That metadata is stored outside of the file itself.
Is there any real advantage of using fseek for positioning the file pointer to the i-th int value and reading it after the fseek instead of using a sequence of i fread calls?
There are potential benefits at every level.
By seeking, the system may have to read less from the disk. The system reads from the disk in sectors, so short seeks might not have this benefit because of caching. But seeking over entire sectors reduces the amount of data that needs to be fetched from the disk.
Similarly, by seeking, the stdio library my have to request less from the OS.
The stdio library normally reads more than it requires so that future calls to fread doesn't need to touch the OS or the disk. A short seek might not require making any system calls, but seeking beyond the end of the buffered data could reduce the total amount of data fetched from the OS.
Finally, the skipped data doesn't need to be copied from the stdio library's buffers to the user's buffer at all when using fseek, no matter how far you seek.
Oh, and let's not forget that you were considering i-1 reads instead of just a large one. Each of those reads consume CPU, both in the library (error checking) and in the caller (error handling).
