Bitcoin Forum
November 09, 2024, 01:30:38 PM *
News: Latest Bitcoin Core release: 28.0 [Torrent]
 
   Home   Help Search Login Register More  
Pages: [1] 2 »  All
  Print  
Author Topic: Segfault on hardened Linux systems  (Read 3814 times)
BioMike (OP)
Legendary
*
Offline Offline

Activity: 1658
Merit: 1001


View Profile
December 03, 2010, 09:12:31 PM
 #1

Cross posting into its own topic, maybe more people have more knowledge about these type of things,
but might have missed it in the gentoo ebuild topic.

Building bitcoind on hardened linux seems to cause a segfault when setting the amount of cores to
use for generating bitcoins. Starting with -gen=0 doesn't trigger the segfault.

-----
Ok, I've changed one of my systems to a hardened system and was able to reproduce the bug
on version 0.3.17. Here is the backtrace. No idea what is going wrong.

Code:
gdb bitcoind
GNU gdb 6.8
Copyright (C) 2008 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "i686-pc-linux-gnu"...
(gdb) run
Starting program: /usr/bin/bitcoind
[Thread debugging using libthread_db enabled]
bitcoin server starting
[New Thread 0xb6aa46d0 (LWP 6475)]
[New Thread 0xb6477b70 (LWP 6478)]
[New Thread 0xb5c76b70 (LWP 6479)]
[New Thread 0xb5475b70 (LWP 6480)]
[New Thread 0xb4c74b70 (LWP 6481)]
[New Thread 0xb4473b70 (LWP 6482)]
[New Thread 0xb3c57b70 (LWP 6483)]
[New Thread 0xb3456b70 (LWP 6484)]
[New Thread 0xb2c39b70 (LWP 6485)]

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0xb2c39b70 (LWP 6485)]
Detect128BitSSE2 () at main.cpp:2956
2956 main.cpp: No such file or directory.
in main.cpp
(gdb) bt full
#0  Detect128BitSSE2 () at main.cpp:2956
fUseSSE2 = <value optimized out>
fPrinted = false
nFamily = 15
nModel = 2
#1  0xb768fdfc in BitcoinMiner () at main.cpp:3313
reservekey = {nIndex = 0, vchPubKey = {<std::_Vector_base<unsigned char, std::allocator<unsigned char> >> = {
      _M_impl = {<std::allocator<unsigned char>> = {<__gnu_cxx::new_allocator<unsigned char>> = {<No data fields>}, <No data fields>}, _M_start = 0x0,
        _M_finish = 0x0, _M_end_of_storage = 0x0}}, <No data fields>}}
nExtraNonce = <value optimized out>
nPrevTime = <value optimized out>
__PRETTY_FUNCTION__ = "void BitcoinMiner()"
#2  0xb7690e32 in ThreadBitcoinMiner (parg=0x0) at main.cpp:2884
No locals.
#3  0xb6acd96e in start_thread () from /lib/libpthread.so.0
No symbol table info available.
#4  0xb6baeb5e in clone () from /lib/libc.so.6
No symbol table info available.

The misbehaving line is:
Code:
if (!fPrinted)

in the following code:

Code:
    // AMD reports a lower model number in 64-bit mode
    if (fAMD && sizeof(void*) > 4 && nFamily * 10000 + nModel >= 160000)
        fUseSSE2 = true;

    static bool fPrinted;
    if (!fPrinted)
    {
        fPrinted = true;
        printf("CPUID %08x family %d, model %d, stepping %d, fUseSSE2=%d\n", nBrand, nFamily, nModel, cpu.nStepping, fUseSSE2);
    }
    return fUseSSE2;

fPrinted is false according to the backtrace, I don't know why this is causing the segfault?
wumpus
Hero Member
*****
Offline Offline

Activity: 812
Merit: 1022

No Maps for These Territories


View Profile
December 04, 2010, 09:17:13 AM
 #2

fPrinted is uninitialized. Give it a value before testing on it!

Bitcoin Core developer [PGP] Warning: For most, coin loss is a larger risk than coin theft. A disk can die any time. Regularly back up your wallet through FileBackup Wallet to an external storage or the (encrypted!) cloud. Use a separate offline wallet for storing larger amounts.
jgarzik
Legendary
*
Offline Offline

Activity: 1596
Merit: 1100


View Profile
December 04, 2010, 09:22:13 AM
 #3

Not sure about C++, but in C, "static bool foo;" is guaranteed to initialize 'foo' to false.

Jeff Garzik, Bloq CEO, former bitcoin core dev team; opinions are my own.
Visit bloq.com / metronome.io
Donations / tip jar: 1BrufViLKnSWtuWGkryPsKsxonV2NQ7Tcj
davout
Legendary
*
Offline Offline

Activity: 1372
Merit: 1008


1davout


View Profile WWW
December 04, 2010, 10:02:22 AM
 #4

I think it's randomly initialized.

http://stackoverflow.com/questions/827393/default-value-for-bool-in-c

They're not talking about the static case tho.

Nonetheless this code looks really really weird certainly missing lines or context.

// Okay let's initialize a variable to a fixed or random value
// If it's true
   // Set it to true,
   // then do something



Reminds me of facepalming hard when reading stuff like
if (v == true) {
    return(true);
}

bencoder
Member
**
Offline Offline

Activity: 90
Merit: 10


View Profile
December 04, 2010, 11:25:50 AM
 #5

Nonetheless this code looks really really weird certainly missing lines or context.

// Okay let's initialize a variable to a fixed or random value
// If it's true
   // Set it to true,
   // then do something


I don't think it's so weird, it's just ensuring it will only print that cpu info line for the first call of the function.
davout
Legendary
*
Offline Offline

Activity: 1372
Merit: 1008


1davout


View Profile WWW
December 04, 2010, 11:32:46 AM
 #6

Nonetheless this code looks really really weird certainly missing lines or context.

// Okay let's initialize a variable to a fixed or random value
// If it's true
   // Set it to true,
   // then do something


I don't think it's so weird, it's just ensuring it will only print that cpu info line for the first call of the function.

lol, my bad

wumpus
Hero Member
*****
Offline Offline

Activity: 812
Merit: 1022

No Maps for These Territories


View Profile
December 05, 2010, 06:41:28 PM
 #7

Not sure about C++, but in C, "static bool foo;" is guaranteed to initialize 'foo' to false.
You are right. 'static' moves the variable from the stack the the BSS segment, which is always initialised with zeroes. I didn't notice.

Bitcoin Core developer [PGP] Warning: For most, coin loss is a larger risk than coin theft. A disk can die any time. Regularly back up your wallet through FileBackup Wallet to an external storage or the (encrypted!) cloud. Use a separate offline wallet for storing larger amounts.
Hal
VIP
Sr. Member
*
Offline Offline

Activity: 314
Merit: 4226



View Profile
December 06, 2010, 12:30:12 AM
 #8

Traces on optimized code are unreliable. Can you build without optimization?

Hal Finney
BioMike (OP)
Legendary
*
Offline Offline

Activity: 1658
Merit: 1001


View Profile
December 06, 2010, 06:27:07 AM
 #9

Traces on optimized code are unreliable. Can you build without optimization?

Sure, although there is little optimization in place. And the bit there is , is fairly standard (-O2, -march=pentium4, -pipe, -ggdb).
Is "-O1, -ggdb" sufficiently unoptimized?
BioMike (OP)
Legendary
*
Offline Offline

Activity: 1658
Merit: 1001


View Profile
January 09, 2011, 05:41:30 PM
 #10

Took some time. Removed all cflags except -ggdb. Same problem, it does however give an extra line in the backtrace (at place 0): memcpy@plt
No idea what to do with that.
Gavin Andresen
Legendary
*
Offline Offline

Activity: 1652
Merit: 2301


Chief Scientist


View Profile WWW
January 09, 2011, 05:57:43 PM
 #11

The CPU miner code has all sorts of now-mostly-worthless (because GPU mining is so much more energy-efficient than CPU mining) optimizations.  Maybe hardened Linux doesn't like the assembly code or SSE instructions?

How often do you get the chance to work on a potentially world-changing project?
BioMike (OP)
Legendary
*
Offline Offline

Activity: 1658
Merit: 1001


View Profile
January 09, 2011, 07:04:16 PM
 #12

Doesn't seem to be SSE related (building without SSE support also causes it to segfault). Assembly isn't in that function as far as I can see (might be in one of the boost calls though).

I also noticed that the backtrace points to an other line then the one I saw earlier (still in the same function, only a few lines above, pointing to a memcpy), although nothing changed on that code. This where it points to now:
Code:
memcpy(&cpu, &a, sizeof(cpu));

<edit>
This seems to be caused by the changed optimalisation.
</edit>
Hal
VIP
Sr. Member
*
Offline Offline

Activity: 314
Merit: 4226



View Profile
January 10, 2011, 06:52:42 PM
 #13

Maybe try putting

return false;

as the first line of Detect128BitSSE2(), see if that fixes it. If so, and you're really energetic, move the return down to just before, then just after the memcpy, to confirm that's where the problem is.


Hal Finney
ArtForz
Sr. Member
****
Offline Offline

Activity: 406
Merit: 257


View Profile
January 11, 2011, 05:08:02 PM
 #14

Hrrrm, could this be caused by the asm in CallCPUID? CPUID clobbers ebx and edx, yet we don't seem to be saying so...

bitcoin: 1Fb77Xq5ePFER8GtKRn2KDbDTVpJKfKmpz
i0coin: jNdvyvd6v6gV3kVJLD7HsB5ZwHyHwAkfdw
BioMike (OP)
Legendary
*
Offline Offline

Activity: 1658
Merit: 1001


View Profile
January 11, 2011, 08:03:04 PM
 #15

@Hal, I'll be giving that a try this weekend.

@ArtForz, it falls over that when there is no optimalization  in place (gcc default), with optimalizations in place it falls over the other part.
Gavin Andresen
Legendary
*
Offline Offline

Activity: 1652
Merit: 2301


Chief Scientist


View Profile WWW
January 22, 2011, 09:08:45 PM
 #16

BioMike:  any progress tracking this down?

I just committed a fix to the git integration tree CallCPUID code to declare ebx/edx clobbered...

How often do you get the chance to work on a potentially world-changing project?
BioMike (OP)
Legendary
*
Offline Offline

Activity: 1658
Merit: 1001


View Profile
January 23, 2011, 06:26:18 PM
 #17

BioMike:  any progress tracking this down?

I just committed a fix to the git integration tree CallCPUID code to declare ebx/edx clobbered...


I haven't spend time on this any more. I'll do a check this evening.
BioMike (OP)
Legendary
*
Offline Offline

Activity: 1658
Merit: 1001


View Profile
January 23, 2011, 08:36:23 PM
 #18

Code:
g++ -c -O2 -march=pentium4 -pipe -ggdb -Wno-invalid-offsetof -Wformat -DFOURWAYSSE2 -DUSE_SSL -I"/usr/include" -I"/usr/include/crypto++" -I"/usr/include/db4.8"  -o obj/nogui/main.o main.cpp
main.cpp: In function 'void CallCPUID(int, int&, int&)':
main.cpp:2981: error: PIC register clobbered by '%ebx' in 'asm'
main.cpp: In function 'bool Detect128BitSSE2()':
main.cpp:2981: error: PIC register clobbered by '%ebx' in 'asm'
main.cpp:2981: error: PIC register clobbered by '%ebx' in 'asm'
make: *** [obj/nogui/main.o] Error 1

Doesn't seem ok.

Code:
void CallCPUID(int in, int& aret, int& cret)
{
    int a, c;
    asm (
        "mov %2, %%eax; " // in into eax
        "cpuid;"
        "mov %%eax, %0;" // eax into a
        "mov %%ecx, %1;" // ecx into c
        :"=r"(a),"=r"(c) /* output */
        :"r"(in) /* input */
        :"%eax","%ebx","%ecx","%edx" /* clobbered register */
        //:"%eax","%ecx" /* clobbered register */
    );
    aret = a;
    cret = c;
}
Gavin Andresen
Legendary
*
Offline Offline

Activity: 1652
Merit: 2301


Chief Scientist


View Profile WWW
January 24, 2011, 03:25:29 PM
 #19

What version of gcc are you using?  After a little googling I found this thread about the same issue:
Quote
Quote
CPUID returns information in eax, ebx, ecx, and edx. With -fPIC you have to push ebx onto the stack before calling cpuid and pop it afterward as Bin points out is what the patch to xen-unstable does.
The compiler used to generate the push/pop just fine for gcc-3.3. This is an issue specific to gcc-3.4.

Unless somebody volunteers to fix/maintain this, I'm inclined to simply remove all of the "try to make the CPU miner go faster" optimizations from bitcoin.  CPU mining is, for most people, a waste of electricity.


How often do you get the chance to work on a potentially world-changing project?
BioMike (OP)
Legendary
*
Offline Offline

Activity: 1658
Merit: 1001


View Profile
January 24, 2011, 07:13:29 PM
 #20

What version of gcc are you using? 

gcc -v
Using built-in specs.
Target: i686-pc-linux-gnu
Configured with: /var/tmp/portage/sys-devel/gcc-4.4.4-r2/work/gcc-4.4.4/configure --prefix=/usr --bindir=/usr/i686-pc-linux-gnu/gcc-bin/4.4.4 --includedir=/usr/lib/gcc/i686-pc-linux-gnu/4.4.4/include --datadir=/usr/share/gcc-data/i686-pc-linux-gnu/4.4.4 --mandir=/usr/share/gcc-data/i686-pc-linux-gnu/4.4.4/man --infodir=/usr/share/gcc-data/i686-pc-linux-gnu/4.4.4/info --with-gxx-include-dir=/usr/lib/gcc/i686-pc-linux-gnu/4.4.4/include/g++-v4 --host=i686-pc-linux-gnu --build=i686-pc-linux-gnu --disable-altivec --disable-fixed-point --without-ppl --without-cloog --disable-nls --with-system-zlib --disable-werror --enable-secureplt --disable-multilib --enable-libmudflap --disable-libssp --enable-esp --enable-libgomp --with-python-dir=/share/gcc-data/i686-pc-linux-gnu/4.4.4/python --enable-checking=release --disable-libgcj --with-arch=i686 --enable-languages=c,c++ --enable-shared --enable-threads=posix --enable-__cxa_atexit --enable-clocale=gnu --with-bugurl=http://bugs.gentoo.org/ --with-pkgversion='Gentoo Hardened 4.4.4-r2 p1.2, pie-0.4.5'
Thread model: posix
gcc version 4.4.4 (Gentoo Hardened 4.4.4-r2 p1.2, pie-0.4.5)

Unless somebody volunteers to fix/maintain this, I'm inclined to simply remove all of the "try to make the CPU miner go faster" optimizations from bitcoin.  CPU mining is, for most people, a waste of electricity.

Fine with me.
Pages: [1] 2 »  All
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!