IdeaMonk

thoughts, ideas, code and other things...

Tuesday, September 22, 2009

Some more GDB fun.

Last weekend I was there for a 3 hour battle of brains for ICPC-Amritapuri prelims. Though our team did not qualify this time, and only one team would be going onsite, I had a really good time debugging with gdb and helping out others debug fast. The problem looked something like this -
  • There is F(n) = a*F(n-1) + b where 1 <= a,b,n <= 1000000000
  • F(0) = 1
  • Find F(n)
Now one would simply begin with a code similar to following -
#include <iostream>
using namespace std;

#define ull unsigned long long

ull a,b;

ull f( ull n){
if (n == 0)
return 1;
else
return a*f(n-1) + b;
}

int main (){
ull n;
cin >> a >> b >> n;
cout << f(n) << endl;
return 0;
}

Having put unsigned long long one would think it will be able to contain numbers as large as 10^9, but there is one thing that everyone missed, what range would F(n-1) fall in? Here we're trying to multiply 'a' (which is as big as 10^9) with F(n-1) (whose range we don't know).

$ g++ -g fn.cpp
$ ./a.out
2 3 4
61
$ ./a.out
10000 100203 2323
11085757089601029659
$ ./a.out
100000000 100000000 100000000
Segmentation fault

There we go, larger values within given limits kill our program. A genius would realize that this would happen even before writing this implementation, a smart one would intuitively know what happened, but what about the average joe? That's where gdb comes to give you inside story of what went wrong, so that instead of spending even 5 minutes figuring out what happened, you actually see and say 'oh, so this is what broke my code'. Here's how -

ideamonk@sacea:~$ gdb ./a.out
GNU gdb 6.8-debian
Copyright (C) 2008 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "i486-linux-gnu"...
(gdb) run
Starting program: /home/ideamonk/a.out
1000000000 100000000 100000000

Program received signal SIGSEGV, Segmentation fault.
0x08048738 in f
(n=99825467) at fn.cpp:12
12 return a*f(n-1) + b;
(gdb)

One thing you can easily notice is that we entered a very large value for 'a' and at each step 'a' is getting multiplied to f(n-1), added with 'b', and multiplied again with 'a' in next recursion. Even though unsigned long long int has huge limit of +18,446,744,073,709,551,615 for 32bit processors, successive multiplications of numbers of range 10^9 is enough to quickly eat out this limit. Hence the segmentation fault... Nopes as pointed out by Prasanna though the compiler marks it as Segmentation Fault, its a Stack overflow actually (read comments for details). Well that's about the scene at ICPC, many people got interested in what gdb is and how did I do that, it's common, everyone has been annoyed by the seg faults in their 1st year c/c++ course.

The other day
tilda - ( a Quake/Doom like terminal for linux ) which I extensively use, stopped working. It happened right after I restarted my Ubuntu after update. Wonder why, Alt+F2 [tilda]... nothing came up. I headed onto terminal, and then ran tilda from there - bingo - a nice little 'Segmentation Fault' came up. When you're aware of scissors like gdb that help you make precise cut, even with a blind fold, who would be scared of entering the foreign territory of someone else's code?
So there started the bughunt....

apt-get source tilda .... OK
./configure
..
...
.....
hmm I dont have vte something... okay lets get it....

apt-get source vte ... OK
cd vte-0.20.0
./configure
..
..
....
.....
Nopes.. what's this tgetent thing... Google... okay so I need to get libncurses-dev before I compile vte.
Thank god they packaged libncurses-dev :)
sudo apt-get install libncurses-dev
Then I make and install vte, and finally make tilda with debugging flag on as -

$ ./configure CFLAGS="-g"
$ make
// lets debug
$ gdb ./src/tilda
GNU gdb 6.8-debian
Copyright (C) 2008 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "i486-linux-gnu"...
(gdb) run
Starting program: /home/ideamonk/Desktop/tilda-0.9.6/src/tilda
[Thread debugging using libthread_db enabled]
[New Thread 0xb7132710 (LWP 1549)]
tilda: No such file or directory
Unable to set tilda's icon: /usr/local/share/pixmaps/tilda.png
[New Thread 0xb6f0eb90 (LWP 1557)]

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0xb7132710 (LWP 1549)]
0x08051365 in tilda_keygrabber_bind (keystr=0x0, tw=0x9855bd8) at key_grabber.c:260
260 if (strcmp ("", keystr) == 0)

Clearly caught! and now we know that we need to fix right here - key_grabber.c:260
keystr happens to be pointing to 0x00 in this case, I think my tilda's configuration was erased by some action, which now makes strcmp do a segmentation fault.
A check for null pointers before strcmp would be good.
So, I attach a check for null pointer before performing a strcmp and make the function return FALSE in case its null. I compile again, and tilda sweetly informs me that I haven't set a keybinding for terminal popup, and nicely shows me the preference box to set one. So now I know what happened, the keybinding... wherever it was stored, got erased. Did I have to look at complete source of tilda? No way :) Didn't I tell you that its like a scissor which is precise even if you use it with a blindfold over your eyes. Here's the tiny patch that would save you from such tilda crashes.

Labels: , ,

4 Comments:

At September 22, 2009 at 2:56 PM , Blogger Prasanna said...

A seg fault happens when you access a memory not on your bounds. I don't see anything wrong with the code. I think, it's a stack overflow. It has nothing to do with 64-bit ints.

Btw, this is a case of tail recursion. If I remember right, gcc will make it a loop. Try -O2 compile flag.

 
At September 22, 2009 at 6:16 PM , Blogger IdeaMonk said...

Yes, I tested the following code right now -
--------------------
#include <iostream>
using namespace std;

unsigned long long foo( unsigned long long bar){
if (bar == 0)
return 1;
else
return foo(bar-1) + 1;
}

main (){
cout << foo(1000000000);
}
-----------------------

Without the -O2 flag, it ends into a segmentation fault, and there seems no sign on big number multiplications in this case. It must be a stack overflow then, while with -O2 flag, I'm getting a fine output.

Is there a way debuggers can actually differentiate between a stack overflow an a seg fault?

While the tilda example is the real seg fault I think, as it tries to access 0x00.

Thanks :)

 
At September 22, 2009 at 11:47 PM , Blogger Prasanna said...

seg fault is the failure to map a logical address to a physical one. Yes, you can emit checks for stack overflow: http://gcc.gnu.org/onlinedocs/gcc-4.1.2/gnat_ugn_unw/Stack-Overflow-Checking.html#Stack-Overflow-Checking

 
At September 28, 2009 at 2:26 PM , Blogger IdeaMonk said...

Thanks!

 

Post a Comment

Subscribe to Post Comments [Atom]

<< Home