|
Linux, alleggl, segmentation fault? |
razor
Member #2,256
April 2002
|
ok reinstalled (i think installing the drivers a bunch of times just fried it). heres the log, how do you get a stack dump? Whoooo Oregon State University |
bcoconni
Member #2,942
November 2002
|
razor: You have to run 'gdb excamera' then type 'run' to start the app debugging then type 'bt' to display the backtrace. Clay : That's odd my backtraces are a bit different from yours. Here's my log : (gdb) run Starting program: /home/internet/alleggl/examp/excamera Program received signal SIG32, Real-time event 32. 0x40331479 in sigsuspend () from /lib/i686/libc.so.6 (gdb) bt #0 0x40331479 in sigsuspend () from /lib/i686/libc.so.6 #1 0x402fae58 in pthread_create () from /lib/i686/libpthread.so.0 #2 0x402faaa1 in pthread_create () from /lib/i686/libpthread.so.0 #3 0x400a3d0c in _unix_get_executable_name () from /usr/local/lib/liballeg.so.4.0 #4 0x400a909e in stretch_sprite () from /usr/local/lib/liballeg.so.4.0 #5 0x4003acec in install_allegro () from /usr/local/lib/liballeg.so.4.0 #6 0x0805e4f5 in _mangled_main () at examp/excamera.c:391 #7 0x400a0aa1 in main () from /usr/local/lib/liballeg.so.4.0 #8 0x40320082 in __libc_start_main () from /lib/i686/libc.so.6 (gdb) cont Continuing. Program received signal SIG32, Real-time event 32. 0x40331479 in sigsuspend () from /lib/i686/libc.so.6 (gdb) bt #0 0x40331479 in sigsuspend () from /lib/i686/libc.so.6 #1 0x402fae58 in pthread_create () from /lib/i686/libpthread.so.0 #2 0x402faaa1 in pthread_create () from /lib/i686/libpthread.so.0 #3 0x400a2fb6 in seqbuf_dump () from /usr/local/lib/liballeg.so.4.0 #4 0x40099b25 in install_timer () from /usr/local/lib/liballeg.so.4.0 #5 0x40076961 in install_keyboard () from /usr/local/lib/liballeg.so.4.0 #6 0x0805e502 in _mangled_main () at examp/excamera.c:393 #7 0x400a0aa1 in main () from /usr/local/lib/liballeg.so.4.0 #8 0x40320082 in __libc_start_main () from /lib/i686/libc.so.6 (gdb) cont Continuing.
Notice that I have to 'continue' twice before excamera actually runs. AFAIK Allegro's app (including AllegroGL's) has always behaved this way with gdb on Linux. Also I can't figure out why stretch_sprite() is called right after install_allegro() in my log. I'm using allegro-4.0.3 / alleggl-0.2.0 and Linux Mandrake 9.0 with some updates and everything works fine here |
Clay Smith
Member #4,320
February 2004
|
razor, to get a stack dump go into the X terminal/console and type 'gdb <program-name>' ok bcoconni following your style, i get a lot more seg faults but it'll probably be more usefull (gdb) run Starting program: /home/clayasaurus/alleggl/examp/excamera Program received signal SIG32, Real-time event 32. 0x4035c281 in sigpending () from /lib/libc.so.6 (gdb) bt #0 0x4035c281 in sigpending () from /lib/libc.so.6 #1 0x4035c347 in sigsuspend () from /lib/libc.so.6 #2 0x402ea238 in pthread_getconcurrency () from /lib/libpthread.so.0 #3 0x402e9aa5 in pthread_create () from /lib/libpthread.so.0 #4 0x400ae5da in _unix_get_executable_name () from /usr/lib/liballeg.so.4.0 #5 0x400f0c40 in _seqbufptr () from /usr/lib/liballeg.so.4.0 (gdb) cont Continuing. Program received signal SIG32, Real-time event 32. 0x4035c281 in sigpending () from /lib/libc.so.6 (gdb) bt #0 0x4035c281 in sigpending () from /lib/libc.so.6 #1 0x4035c347 in sigsuspend () from /lib/libc.so.6 #2 0x402ea238 in pthread_getconcurrency () from /lib/libpthread.so.0 #3 0x402e9aa5 in pthread_create () from /lib/libpthread.so.0 #4 0x400adb20 in seqbuf_dump () from /usr/lib/liballeg.so.4.0 (gdb) cont Continuing. Program received signal SIG32, Real-time event 32. 0x4035c281 in sigpending () from /lib/libc.so.6 (gdb) bt #0 0x4035c281 in sigpending () from /lib/libc.so.6 #1 0x4035c347 in sigsuspend () from /lib/libc.so.6 #2 0x402ea238 in pthread_getconcurrency () from /lib/libpthread.so.0 #3 0x402e6d28 in pthread_cond_wait () from /lib/libpthread.so.0 #4 0x402499c6 in _XUnregisterFilter () from /usr/X11R6/lib/libX11.so.6 #5 0x40249bc5 in _XUnregisterFilter () from /usr/X11R6/lib/libX11.so.6 #6 0x4024a115 in XLockDisplay () from /usr/X11R6/lib/libX11.so.6 #7 0x400c08d9 in _xwin_change_keyboard_control () from /usr/lib/liballeg.so.4.0 (gdb) cont Continuing. Program received signal SIGSEGV, Segmentation fault. 0x080b86bf in aglXGetProcAddress () (gdb) bt #0 0x080b86bf in aglXGetProcAddress () #1 0x0806970c in __allegro_gl_load_extensions (ext=0x8121f80) at gl_ext_api.h:97 #2 0x0807aec2 in __allegro_gl_manage_extensions () at src/glext.c:468 #3 0x080807fc in allegro_gl_x_create_screen (w=640, h=480, vw=0, vh=0, depth=8, fullscreen=0) at src/x.c:279 #4 0x080808c6 in allegro_gl_x_windowed_init (w=640, h=480, vw=0, vh=0, depth=8) at src/x.c:320 #5 0x0805fcdb in allegro_gl_default_gfx_init (w=640, h=480, vw=0, vh=0, depth=8) at src/alleggl.c:1020 #6 0x4007337a in set_gfx_mode () from /usr/lib/liballeg.so.4.0 (gdb) cont Continuing. Program received signal SIGSEGV, Segmentation fault. 0x40072ebe in set_gfx_mode () from /usr/lib/liballeg.so.4.0 (gdb) bt #0 0x40072ebe in set_gfx_mode () from /usr/lib/liballeg.so.4.0 #1 0x080ad8a1 in _IO_stdin_used () (gdb) cont Continuing. Program received signal SIGABRT, Aborted. 0x4035c151 in kill () from /lib/libc.so.6 (gdb) bt #0 0x4035c151 in kill () from /lib/libc.so.6 #1 0x402ea9a1 in pthread_kill () from /lib/libpthread.so.0 #2 0x402eacab in raise () from /lib/libpthread.so.0 #3 0x4035bd94 in raise () from /lib/libc.so.6 #4 0x4035d548 in abort () from /lib/libc.so.6 #5 0x400b3aa0 in stretch_sprite () from /usr/lib/liballeg.so.4.0 #6 0x0000000b in ?? () (gdb) cont Continuing. Program terminated with signal SIGABRT, Aborted. The program no longer exists. (gdb) bt No stack.
|
Bob
Free Market Evangelist
September 2000
|
Quote:
Program received signal SIGSEGV, Segmentation fault. This is odd - it's crashing inside aglXGetProcAddress() in the local program memory area, as if part of the driver got statically linked. This doesn't seem right. -- |
Clay Smith
Member #4,320
February 2004
|
Any suggestions? I think I'm going to reinstall slackware 9 sometime soon, not sure if that would help or not. I wonder how a driver can become part of the program? I don't think its something wrong with my drivers or card because other GL apps can run, but it seems now maybe it is. Oh well. |
razor
Member #2,256
April 2002
|
I might try a earlier version of manrdrake (or heck anoth distro) but I would really rather not, is any one running mandrake 9.2 w/AGL? Whoooo Oregon State University |
Kitty Cat
Member #2,815
October 2002
|
Not sure if this has much bearing, but AllegroGL crashes for me too, in Linux (some programs quit with an Aborted message, others with a Segment Violation). Gentoo 1.4, kernel version 2.4.20, AllegroGL version 0.2.0, Allegro version 4.1.12. Compiled with DEBUGMODE=1 and LOGLEVEL=2. I ran extext, and this is the backtrace: #0 0x400869e4 in allegro_gl_mouse () from /usr/local/lib/libagld.so #1 0x4005298d in __allegro_gl_load_extensions (ext=0x8073a29) at gl_ext_api.h:97 #2 0x4006425f in __allegro_gl_manage_extensions () at src/glext.c:464 #3 0x4006a7e7 in allegro_gl_x_create_screen (w=640, h=480, vw=0, vh=0, depth=32, fullscreen=0) at src/x.c:279 #4 0x4006a8c3 in allegro_gl_x_windowed_init (w=640, h=480, vw=0, vh=0, depth=32) at src/x.c:320 #5 0x40046e96 in allegro_gl_default_gfx_init (w=640, h=480, vw=0, vh=0, depth=32) at src/alleggl.c:1020 And I attached allegro.log. Seems to be related to their problem, since the last line in it is: glXGetProcAddress Extension: Supported though I don't have an nVidia card like they do. -- |
Bob
Free Market Evangelist
September 2000
|
Kitty: that stack trace doesn't make sense - __allegro_gl_load_extensions() certainly doesn't call allegro_gl_mouse() :/ As for the cause of this problem - I have no idea. A possibility is a stack correction. Anyway of using gdb to detect that? -- |
Kitty Cat
Member #2,815
October 2002
|
I don't really know how to use gdb, other than backtracing, signalling, and such. But maybe something's getting clobbered in __allegro_gl_load_extentions()/glXGetProcAddress() or something? I should mention when I run extext through gdb, it stops at that backtrace with an Illegal Instruction signal. If I continue it, then it SIGSEGVs in set_gfx_mode, with no backtrace beyond that. Starting program: /mnt/d/programs/alleggl/examp/extext [New Thread 16384 (LWP 16773)] [New Thread 32769 (LWP 16774)] [New Thread 16386 (LWP 16775)] [New Thread 32771 (LWP 16776)] Program received signal SIGILL, Illegal instruction. [Switching to Thread 16384 (LWP 16773)] 0x400869e4 in allegro_gl_mouse () from /usr/local/lib/libagld.so (gdb) continue Continuing. Program received signal SIGSEGV, Segmentation fault. 0x400cd2dc in set_gfx_mode () from /usr/local/lib/liballeg.so.4.1 (gdb) bt #0 0x400cd2dc in set_gfx_mode () from /usr/local/lib/liballeg.so.4.1 #1 0x00000001 in ?? () The window does get created, but it crashes immediately after. -- |
bcoconni
Member #2,942
November 2002
|
Since stack traces are strange, may I suggest to use make DEBUGMODE=1 LOGLEVEL=2 DEBUGALLEG=1 in order to link the example programs with both AllegroGL and Allegro debug libraries ? This may give better results under gdb. You should also look at the glibc version on your distrib (mine is 2.2.5) since signal handling seems to differ between our Linux distrib. |
Kitty Cat
Member #2,815
October 2002
|
Me and Bob determined, at lest my problem to be an invalid return address by dlsym(), which points into the allegro_gl_mouse struct (and thus tries to run it as code when executed). Still clueless as to why and everthing, though. And I'm also not sure how to tell what my glibc version is.. :/ -- |
Thomas Fjellstrom
Member #476
June 2000
|
Quote: Still clueless as to why and everthing, though. And I'm also not sure how to tell what my glibc version is.. :/ If GDBs back trace is invalid, it means somewhere you overwrote something you shouldn't have. In my experiance any how. Though what it gave you was more confusing than the normal bad back trace full of "??" functions -- |
Kitty Cat
Member #2,815
October 2002
|
The backtrace is right. dlsym is supposed to return the address for the glXGetProcAddressARB function, but instead returns some other place in memory, so when the "function" is executed, it jumps into the allegro_gl_mouse structure and tries to run it as code before finally dying with SIGILL. -- |
bcoconni
Member #2,942
November 2002
|
Your backtraces show that you use the debug shared library of AGL and : Quote: dlsym is supposed to return the address for the glXGetProcAddressARB function, but instead returns some other place in memory, so when the "function" is executed Aaargghhh!!! I forgot to make position independant code (PIC) for the debug shared library. Can you edit make/makefile.unx and find those lines : #-------------------------------# # --- Compiler optimizations ---# ifdef DEBUGMODE CFLAGS = -g -W -Wall -Wno-unused CFLAGS += -DDEBUGMODE=$(DEBUGMODE) ifdef LOGLEVEL CFLAGS += -DLOGLEVEL=$(LOGLEVEL) endif else CFLAGS = -O2 -Wall -ffast-math -fomit-frame-pointer SHARED = @SHARED@ endif and move the 'SHARED' parameter outside the ifdef/endif pair : #-------------------------------# # --- Compiler optimizations ---# ifdef DEBUGMODE CFLAGS = -g -W -Wall -Wno-unused CFLAGS += -DDEBUGMODE=$(DEBUGMODE) ifdef LOGLEVEL CFLAGS += -DLOGLEVEL=$(LOGLEVEL) endif else CFLAGS = -O2 -Wall -ffast-math -fomit-frame-pointer endif SHARED = @SHARED@ Of course, you have to rebuild the lib : make DEBUGMODE=1 veryclean ./configure --enable-shared make DEBUGMODE=1 make install Does it fix your issue ? |
Bob
Free Market Evangelist
September 2000
|
So, does AGL always need to be built with ./configure --enabled-shared? What does --enabled-shared do? -- |
Kitty Cat
Member #2,815
October 2002
|
--enable-shared makes the makefile create the shared/.so lib instead of the default staticlink one. And it may be a bit before I can really try that. Even though AllegroGL is on a different drive, configure still wants to put stuff on the main partition, which somehow is all filled up (4GB!). Though if I can safely delete /var/tmp (500+MB), I should be a-ok. If not, I'm working on copying my root partition to a 13GB partition, but it's gonna take a while.. EDIT: -- |
bcoconni
Member #2,942
November 2002
|
Bob said: So, does AGL always need to be built with ./configure --enabled-shared? What does --enabled-shared do?
Under *nix, AGL can optionaly be built as a shared lib (the default build is static however). Kitty Cat said: Okay, got it compiled and all. Still crashes in allegro_gl_mouse when trying to use glXGetProcAddressARB.
So dlsym() can't resolve glXGetProcAddressARB ? Then let's try to help it to find the symbol.
add a new variable 'handle', a call to dlopen() and don't forget to modify the dlsym()'s first parameter (lines 473 and 476) :
This way, we explicitely tell dlsym() to look for "glXGetProcAddressARB" in "libGL.so" so that it can't mess up with AGL's symbols. |
Kitty Cat
Member #2,815
October 2002
|
Okay, so adding in the handle explicitly works. Though I don't understand why it couldn't find glXGetProcAddressARB (or rather, not return the proper address to it) without the handle, but glXGetProcAddress worked just fine.. -- |
bcoconni
Member #2,942
November 2002
|
Kitty Cat said: Though I don't understand why it couldn't find glXGetProcAddressARB (or rather, not return the proper address to it) without the handle, but glXGetProcAddress worked just fine All this mess up is nothing but name clashing. Some symbols are defined either in AGL and libGL.so and ld rules are rather opaque. |
Kitty Cat
Member #2,815
October 2002
|
Ok.. so dlsym needs to be passed a handle to the shared lib for OpenGL to properly find the function. This brings to mind a few questions. First, how come it only started happening recently? Some people don't seem to have the problem.. and Second, when should the handle be closed? At the end of the program, or after you no longer use the handle pointer? Should we expect a new release coming soon, or should I just leave the source hacked and use it as is for now? -- |
razor
Member #2,256
April 2002
|
The fix above didn't work for me, Program received signal SIG32, Real-time event 32. 0x40327714 in pthread_getconcurrency () from /lib/i686/libpthread.so.0 (gdb) bt #0 0x40327714 in pthread_getconcurrency () from /lib/i686/libpthread.so.0 #1 0x403272b8 in pthread_getconcurrency () from /lib/i686/libpthread.so.0 #2 0x40326a61 in pthread_create () from /lib/i686/libpthread.so.0 (gdb) cont Continuing. Program received signal SIG32, Real-time event 32. 0x40327714 in pthread_getconcurrency () from /lib/i686/libpthread.so.0 (gdb) bt #0 0x40327714 in pthread_getconcurrency () from /lib/i686/libpthread.so.0 #1 0x403272b8 in pthread_getconcurrency () from /lib/i686/libpthread.so.0 #2 0x40326a61 in pthread_create () from /lib/i686/libpthread.so.0 (gdb) cont Continuing. Program received signal SIGSEGV, Segmentation fault. 0x4005cc78 in glVertexAttrib2fvARB () from /usr/local/lib/libagl.so (gdb) bt #0 0x4005cc78 in glVertexAttrib2fvARB () from /usr/local/lib/libagl.so #1 0x406fd1d1 in _nv000044gl () from /usr/lib/libGLcore.so.1 Cannot access memory at address 0x1 (gdb) cont Continuing. Program received signal SIG32, Real-time event 32. 0x40327714 in pthread_getconcurrency () from /lib/i686/libpthread.so.0 (gdb) bt #0 0x40327714 in pthread_getconcurrency () from /lib/i686/libpthread.so.0 #1 0x403272b8 in pthread_getconcurrency () from /lib/i686/libpthread.so.0 #2 0x4032453f in pthread_join () from /lib/i686/libpthread.so.0 #3 0x400d4fea in seqbuf_dump () from /usr/local/lib/liballeg.so.4.0 (gdb) cont Continuing. Program received signal SIG32, Real-time event 32. 0x40327714 in pthread_getconcurrency () from /lib/i686/libpthread.so.0 (gdb) bt #0 0x40327714 in pthread_getconcurrency () from /lib/i686/libpthread.so.0 #1 0x403272b8 in pthread_getconcurrency () from /lib/i686/libpthread.so.0 #2 0x4032453f in pthread_join () from /lib/i686/libpthread.so.0 #3 0x400d5abc in _unix_get_executable_name () from /usr/local/lib/liballeg.so.4.0 (gdb) cont Continuing. Shutting down Allegro due to signal #11 Program received signal SIGSEGV, Segmentation fault. 0x4005cc78 in glVertexAttrib2fvARB () from /usr/local/lib/libagl.so (gdb) bt #0 0x4005cc78 in glVertexAttrib2fvARB () from /usr/local/lib/libagl.so #1 0x406fd1d1 in _nv000044gl () from /usr/lib/libGLcore.so.1 Cannot access memory at address 0x1 (gdb) cont Continuing. Program terminated with signal SIGSEGV, Segmentation fault. The program no longer exists. Theres my backtrace, I compiled using --enable-shared too. Whoooo Oregon State University |
Kitty Cat
Member #2,815
October 2002
|
You should try typing "bt" (without quotes) when it hits a SIGSEGV, to retrieve a longer and more verbose backtrace. I've never seen a SIG32/Real-time event signal before though. I wonder why you're getting them, or even if they're supposed to be fatal.. it doesn't seem it.. -- |
razor
Member #2,256
April 2002
|
I did type bt, or do you mean type it again? Whoooo Oregon State University |
bcoconni
Member #2,942
November 2002
|
Kitty Cat said: First, how come it only started happening recently? Some people don't seem to have the problem. The extension mechanism has been completely revamped in AGL 0.2.0 but in my case, the problem arised when I installed the lastest NVidia's drivers. At first, I thought that the drivers were the culprit since the crash did not occur with the previous releases of the drivers. After some investigation, it appeared that AGL was the actual culprit. Obviously NVidia has changed their way to build the drivers, but how ? AFAIK in Linux, one symbol should not be defined in two different libraries that are linked together to an executable : in this case, the run-time linker may screw up the program because it may fail at choosing the right symbol between the 2 libs. Moreover, in our case, the symbol does not have the same type in the two libs : for instance, glXGetProcAddressARB can be resolved as an address of a function pointer in libagl or as an address of an actual function in libGL.so. The behavior of the program obviously depends on the "choice" the run-time linker makes. I could not find some good documentation on how the run-time linker resolves symbols in Linux and I'm too lazy to read the sources . May be some Linux Guru can help us here ? This issue has already been encountered here in the GLEW project and they fixed it by using a namespace different than libGL's (glewXGetProcAddressARB instead of glXGetProcAddressARB) and aliasing it with real names (#define glXGetProcAddressARB glewXGetProcAddressARB). Quote: Second, when should the handle be closed? At the end of the program, or after you no longer use the handle pointe
In our case, it doesn't matter : since dlclose() maintains a number of references to the lib and since our executable is linked to libGL.so, we are assured that the number of references won't be less than one which means that "our" dlclose() won't actually close the lib and that the function pointer "aglXGetProcAddress" will still be valid even after dlclose() is called. Quote: Should we expect a new release coming soon, or should I just leave the source hacked and use it as is for now? Since that is a major bug in AGL 0.2.0 I would vote for a new release ASAP. Bob any objections ? Razor : 0x4005cc78 in glVertexAttrib2fvARB () from /usr/local/lib/libagl.so the run-time linker has screwed up your program since the symbol "glVertexAttrib2fvARB" has been wrongly resolved in libagl.so instead of libGL.so. This is definitely the same bug as the one exposed above. |
Bob
Free Market Evangelist
September 2000
|
Quote: This is definitely the same bug as the one exposed above. However, razor claims to have applied your fix, to no avail. Quote: Bob any objections ? As soon as we resolve this issue with razor, I'm good for a new release. Something I'd like though is to add the new extensions which aren't in AGL. -- |
|
|