Title: Segfault on hardened Linux systems Post by: BioMike on December 03, 2010, 09:12:31 PM Cross posting into its own topic, maybe more people have more knowledge about these type of things,
but might have missed it in the gentoo ebuild topic. Building bitcoind on hardened linux seems to cause a segfault when setting the amount of cores to use for generating bitcoins. Starting with -gen=0 doesn't trigger the segfault. ----- Ok, I've changed one of my systems to a hardened system and was able to reproduce the bug on version 0.3.17. Here is the backtrace. No idea what is going wrong. Code: gdb bitcoind The misbehaving line is: Code: if (!fPrinted) in the following code: Code: // AMD reports a lower model number in 64-bit mode fPrinted is false according to the backtrace, I don't know why this is causing the segfault? Title: Re: Segfault on hardened Linux systems Post by: wumpus on December 04, 2010, 09:17:13 AM fPrinted is uninitialized. Give it a value before testing on it!
Title: Re: Segfault on hardened Linux systems Post by: jgarzik on December 04, 2010, 09:22:13 AM Not sure about C++, but in C, "static bool foo;" is guaranteed to initialize 'foo' to false.
Title: Re: Segfault on hardened Linux systems Post by: davout on December 04, 2010, 10:02:22 AM I think it's randomly initialized.
http://stackoverflow.com/questions/827393/default-value-for-bool-in-c They're not talking about the static case tho. Nonetheless this code looks really really weird certainly missing lines or context. // Okay let's initialize a variable to a fixed or random value Reminds me of facepalming hard when reading stuff like if (v == true) { Title: Re: Segfault on hardened Linux systems Post by: bencoder on December 04, 2010, 11:25:50 AM Nonetheless this code looks really really weird certainly missing lines or context. // Okay let's initialize a variable to a fixed or random value I don't think it's so weird, it's just ensuring it will only print that cpu info line for the first call of the function. Title: Re: Segfault on hardened Linux systems Post by: davout on December 04, 2010, 11:32:46 AM Nonetheless this code looks really really weird certainly missing lines or context. // Okay let's initialize a variable to a fixed or random value I don't think it's so weird, it's just ensuring it will only print that cpu info line for the first call of the function. lol, my bad Title: Re: Segfault on hardened Linux systems Post by: wumpus on December 05, 2010, 06:41:28 PM Not sure about C++, but in C, "static bool foo;" is guaranteed to initialize 'foo' to false. You are right. 'static' moves the variable from the stack the the BSS segment, which is always initialised with zeroes. I didn't notice.Title: Re: Segfault on hardened Linux systems Post by: Hal on December 06, 2010, 12:30:12 AM Traces on optimized code are unreliable. Can you build without optimization?
Title: Re: Segfault on hardened Linux systems Post by: BioMike on December 06, 2010, 06:27:07 AM Traces on optimized code are unreliable. Can you build without optimization? Sure, although there is little optimization in place. And the bit there is , is fairly standard (-O2, -march=pentium4, -pipe, -ggdb). Is "-O1, -ggdb" sufficiently unoptimized? Title: Re: Segfault on hardened Linux systems Post by: BioMike on January 09, 2011, 05:41:30 PM Took some time. Removed all cflags except -ggdb. Same problem, it does however give an extra line in the backtrace (at place 0): memcpy@plt
No idea what to do with that. Title: Re: Segfault on hardened Linux systems Post by: Gavin Andresen on January 09, 2011, 05:57:43 PM The CPU miner code has all sorts of now-mostly-worthless (because GPU mining is so much more energy-efficient than CPU mining) optimizations. Maybe hardened Linux doesn't like the assembly code or SSE instructions?
Title: Re: Segfault on hardened Linux systems Post by: BioMike on January 09, 2011, 07:04:16 PM Doesn't seem to be SSE related (building without SSE support also causes it to segfault). Assembly isn't in that function as far as I can see (might be in one of the boost calls though).
I also noticed that the backtrace points to an other line then the one I saw earlier (still in the same function, only a few lines above, pointing to a memcpy), although nothing changed on that code. This where it points to now: Code: memcpy(&cpu, &a, sizeof(cpu)); <edit> This seems to be caused by the changed optimalisation. </edit> Title: Re: Segfault on hardened Linux systems Post by: Hal on January 10, 2011, 06:52:42 PM Maybe try putting
return false; as the first line of Detect128BitSSE2(), see if that fixes it. If so, and you're really energetic, move the return down to just before, then just after the memcpy, to confirm that's where the problem is. Title: Re: Segfault on hardened Linux systems Post by: ArtForz on January 11, 2011, 05:08:02 PM Hrrrm, could this be caused by the asm in CallCPUID? CPUID clobbers ebx and edx, yet we don't seem to be saying so...
Title: Re: Segfault on hardened Linux systems Post by: BioMike on January 11, 2011, 08:03:04 PM @Hal, I'll be giving that a try this weekend.
@ArtForz, it falls over that when there is no optimalization in place (gcc default), with optimalizations in place it falls over the other part. Title: Re: Segfault on hardened Linux systems Post by: Gavin Andresen on January 22, 2011, 09:08:45 PM BioMike: any progress tracking this down?
I just committed a fix to the git integration tree CallCPUID code to declare ebx/edx clobbered... Title: Re: Segfault on hardened Linux systems Post by: BioMike on January 23, 2011, 06:26:18 PM BioMike: any progress tracking this down? I just committed a fix to the git integration tree CallCPUID code to declare ebx/edx clobbered... I haven't spend time on this any more. I'll do a check this evening. Title: Re: Segfault on hardened Linux systems Post by: BioMike on January 23, 2011, 08:36:23 PM Code: g++ -c -O2 -march=pentium4 -pipe -ggdb -Wno-invalid-offsetof -Wformat -DFOURWAYSSE2 -DUSE_SSL -I"/usr/include" -I"/usr/include/crypto++" -I"/usr/include/db4.8" -o obj/nogui/main.o main.cpp Doesn't seem ok. Code: void CallCPUID(int in, int& aret, int& cret) Title: Re: Segfault on hardened Linux systems Post by: Gavin Andresen on January 24, 2011, 03:25:29 PM What version of gcc are you using? After a little googling I found this thread (http://lists.xensource.com/archives/html/xen-devel/2005-02/msg01045.html) about the same issue:
Quote Quote CPUID returns information in eax, ebx, ecx, and edx. With -fPIC you have to push ebx onto the stack before calling cpuid and pop it afterward as Bin points out is what the patch to xen-unstable does. The compiler used to generate the push/pop just fine for gcc-3.3. This is an issue specific to gcc-3.4.Unless somebody volunteers to fix/maintain this, I'm inclined to simply remove all of the "try to make the CPU miner go faster" optimizations from bitcoin. CPU mining is, for most people, a waste of electricity. Title: Re: Segfault on hardened Linux systems Post by: BioMike on January 24, 2011, 07:13:29 PM What version of gcc are you using? gcc -v Using built-in specs. Target: i686-pc-linux-gnu Configured with: /var/tmp/portage/sys-devel/gcc-4.4.4-r2/work/gcc-4.4.4/configure --prefix=/usr --bindir=/usr/i686-pc-linux-gnu/gcc-bin/4.4.4 --includedir=/usr/lib/gcc/i686-pc-linux-gnu/4.4.4/include --datadir=/usr/share/gcc-data/i686-pc-linux-gnu/4.4.4 --mandir=/usr/share/gcc-data/i686-pc-linux-gnu/4.4.4/man --infodir=/usr/share/gcc-data/i686-pc-linux-gnu/4.4.4/info --with-gxx-include-dir=/usr/lib/gcc/i686-pc-linux-gnu/4.4.4/include/g++-v4 --host=i686-pc-linux-gnu --build=i686-pc-linux-gnu --disable-altivec --disable-fixed-point --without-ppl --without-cloog --disable-nls --with-system-zlib --disable-werror --enable-secureplt --disable-multilib --enable-libmudflap --disable-libssp --enable-esp --enable-libgomp --with-python-dir=/share/gcc-data/i686-pc-linux-gnu/4.4.4/python --enable-checking=release --disable-libgcj --with-arch=i686 --enable-languages=c,c++ --enable-shared --enable-threads=posix --enable-__cxa_atexit --enable-clocale=gnu --with-bugurl=http://bugs.gentoo.org/ --with-pkgversion='Gentoo Hardened 4.4.4-r2 p1.2, pie-0.4.5' Thread model: posix gcc version 4.4.4 (Gentoo Hardened 4.4.4-r2 p1.2, pie-0.4.5) Unless somebody volunteers to fix/maintain this, I'm inclined to simply remove all of the "try to make the CPU miner go faster" optimizations from bitcoin. CPU mining is, for most people, a waste of electricity. Fine with me. Title: Re: Segfault on hardened Linux systems Post by: ribuck on January 24, 2011, 08:12:57 PM Unless somebody volunteers to fix/maintain this, I'm inclined to simply remove all of the "try to make the CPU miner go faster" optimizations from bitcoin. How about leaving the optimizations in there, unless/until they cause some problem? If an optimization causes a maintenance problem, then it can be removed.If there's hostile action against bitcoin, it might be valuable to muster every last CPU cycle by encouraging everyone to turn on generation. When the standard client was patched to fix the overflow bug, the "valid" block chain overtook the "sabotaged" one in less than a day. One of the reasons for that was the success of the pleading (in this forum) for everyone to install the new client as soon as possible, and to turn on generation. Title: Re: Segfault on hardened Linux systems Post by: spidr_mnky on January 29, 2011, 03:27:06 PM I'm running bitcoin on hardened Gentoo. Everything works short of generation. If I understand the conversation so far, some optimizations fail on hardened systems, but if they are disabled, generation will likely work. On the other hand, the integrity of the network as a whole is bolstered by legitimate clients working efficiently, so removing optimization will probably be a net loss.
How about a compile-time switch? It's not uncommon for optimized code to get along poorly with hardening measures. I'm not familiar with the code base, let alone the developers, so I couldn't intelligently guess about the tradeoffs involved, but it seems to me that it would make sense to include a toggle that defaults to "optimize" (current condition), but can be flipped to "just do it the slow ugly way". That way I could contribute my CPU cycles (if somewhat inefficiently), and the vast majority of the rest of the world, who don't run extremely hardened systems, don't have to be drastically affected. Ideally that could trickle down to a Gentoo USE flag. :) I'll be happy to help with testing, provide traces, etc. My system is protected by ASLR, non-executable stacks, GCC's stack-smashing protection, and any other bit I could flip in the kernel or elsewhere to harden the system, excluding mandatory access control (so no selinux, grsecurity, etc). If it runs on my rig, it should run anywhere. If I disappear, my email username is aabugher, provider is gmail. |