Home > Software engineering >  SIGABRT inside fgets
SIGABRT inside fgets

Time:02-03

I have a relatively simple program that runs a bunch of shell scripts, concatenates their output into one string and sets it as statusbar for my display manager. For the most part everything is working fine, but from time to time it crashes for no apart reason. Have inspected the coredump I found the following backtrace:

#0  0x00007fda71dc3d22 raise (libc.so.6   0x3cd22)
#1  0x00007fda71dad862 abort (libc.so.6   0x26862)
#2  0x00007fda71e05d28 __libc_message (libc.so.6   0x7ed28)
#3  0x00007fda71e0d92a malloc_printerr (libc.so.6   0x8692a)
#4  0x00007fda71e1109c _int_malloc (libc.so.6   0x8a09c)
#5  0x00007fda71e12397 malloc (libc.so.6   0x8b397)
#6  0x00007fda71dfb564 _IO_file_doallocate (libc.so.6   0x74564)
#7  0x00007fda71e09db0 _IO_doallocbuf (libc.so.6   0x82db0)
#8  0x00007fda71e08cbc _IO_file_underflow@@GLIBC_2.2.5 (libc.so.6   0x81cbc)
#9  0x00007fda71e09e66 _IO_default_uflow (libc.so.6   0x82e66)
#10 0x00007fda71dfcf2c _IO_getline_info (libc.so.6   0x75f2c)
#11 0x00007fda71dfbe8a _IO_fgets (libc.so.6   0x74e8a)
#12 0x0000564c2b290484 getcmd (dwmblocks   0x1484)
#13 0x0000564c2b2906ab getsigcmds (dwmblocks   0x16ab)
#14 0x0000564c2b290b6f sighandler (dwmblocks   0x1b6f)
#15 0x00007fda71dc3da0 __restore_rt (libc.so.6   0x3cda0)
#16 0x00007fda71e112cc _int_malloc (libc.so.6   0x8a2cc)
#17 0x00007fda71e13175 __libc_calloc (libc.so.6   0x8c175)
#18 0x00007fda71f83d23 XOpenDisplay (libX11.so.6   0x30d23)
#19 0x0000564c2b290952 setroot (dwmblocks   0x1952)
#20 0x0000564c2b290b1a statusloop (dwmblocks   0x1b1a)
#21 0x0000564c2b290e28 main (dwmblocks   0x1e28)
#22 0x00007fda71daeb25 __libc_start_main (libc.so.6   0x27b25)
#23 0x0000564c2b29020e _start (dwmblocks   0x120e)

The last function of the program which has been run before crash looks something like this:

void getcmd(const Block *block, char *output)
{
...
    char *cmd = block->command;
    FILE *cmdf = popen(cmd,"r");
    if (!cmdf){
        return;
    }
    char tmpstr[CMDLENGTH] = "";
    char *s;
    int e;
    do {
        errno = 0;
        s = fgets(tmpstr, CMDLENGTH-(strlen(delim) 1), cmdf);
        e = errno;
    } while (!s && e == EINTR);
    pclose(cmdf);
...
}

So it's just calling popen and trying to read the output with fgets.

From the backtrace it is apparent that SIGABRT is generated inside the fgets call. I have two questions:

  • How is it even possible? Isn't fgets has to return a string or an error if anything went wrong and let me deal with that error instead of bringing the whole program down?
  • What should I do to prevent that behavior?

UPDATE: Inspecting strings from coredump I found out that error which malloc_printerr was trying to report was malloc(): mismatching next->prev_size (unsorted). Don't know if it means anything...

UPDATE: It appears the problem is that getcmd is called from signal handler, but popen and fgets is not signal-safe.

UPDATE: I've added setvbuf(cmdf, NULL, _IONBF, 0); after popen call to make stream unbuffered so fgets wouldn't try to allocate buffers with malloc and hopefully prevent that crash. Unfortunately, I can't reliably reproduce the crash, so I can't tell if this hack helps.

CodePudding user response:

From the stack trace, I can see calls to malloc twice with a signal handler between them. This is going to fail becuase malloc is (generally) not reentrant, so trying to call it from a signal handler is never a good idea. In general, you should not call ANY POSIX async-unsafe function in a signal handler unless you can somehow guarentee that the signal will never be delivered while running any other async-unsafe function1.

So the real question here is why does your signal need to call popen or fgets (both async-unsafe) and what can you do about it? What is the signal being caught? Is it likely to be fatal anyways (SIGSEGV or SIGBUS), or is it an informational signal like SIGIO?

If it is a fatal signal, you should be looking into why it is occurring; the failure in the signal handler is secondary.

If it is a non-fatal signal, then you should move the async-unsafe code out of the signal handler and have the signal handler either set some global variable that the main program will check, or arrange for another thread to do whatever work is needed


1This is possible but quite hard -- generally requires wrapping sigblock calls around all calls to async unsafe things. However, if you only have a few of those in your main program, it may be practical.

CodePudding user response:

  • Your code is calling popen() to run some arbitrary Linux command.

  • The "arbitrary command" is calling XOpenDisplay() to display an X Windows GUI to the user.

  • The crash is occurring in malloc(), deep inside XOpenDisplay. Many other C library functions also use malloc() - including popen().

  • THEORY: You've corrupted memory, hence the "malloc()" failure.

  • LIKELY CANDIDATE: fgets(tmpstr, CMDLENGTH-(strlen(delim) 1), cmdf);

    <= You need to ensure that "n" (the second argument) is NEVER larger than sizeof(tmpstr)-1.

It certainly looks like you're trying to do that ("n" should always be less than CMDLENGTH)... but it's worth double-checking.

SUGGESTION: try Valgrind

  •  Tags:  
  • Related