Some 'technical commentary' about Core code esp. hardware utilisation

Last of the V8s (OP)

Legendary

Activity: 1652
Merit: 4402

Be a bank

Some 'technical commentary' about Core code esp. hardware utilisation

July 06, 2017, 10:22:38 AM

Merited by ABCbits (1)

thought this was worth preserving - was thoroughly off topic in another thread and may get deleted.
haven't edited the quote so there's lots of political stuff which would be off topic here (but it is your board!)
please see @-ck's thread for some initial commentary on point 13. otherwise,
invite dev and tech regulars to comment

Quote from: Troll Buster on July 06, 2017, 12:53:20 AM

Quote from: DooMAD on July 05, 2017, 07:55:47 PM

The thing to bear in mind is that Core have an exemplary record for testing, bugfixing and just generally having an incredibly stable and reliable codebase. So while people may run SegWit2x code in the interim to make sure it's activated, I envision many of them would switch back to Core the moment Core release compatible code. As such, any loss in Core's dominance would probably only be temporary.

In short, I agree there's probably enough support to active a 2MB fork, but I disagree that Core will lose any significant market share over the long term, even if the 2MB fork creates the longest chain and earns the Bitcoin mantle.

Nokia was also good at testing and reliability, where are they now?

And Core code is shit, anyone experienced in writing kernels/drivers, or ultra low latency communication/financial/military/security systems would instantly notice:

1. The general lack of regards for L0/L1/TLB/L2/L3/DRAM latency and data locality.
2. Lack of cache line padding and alignment.
3. Lack of inline assembly in critical loops.
4. Lack of CPU and platform specific speed ups.
5. Inefficient data structures and data flow.
6. Not replacing simple if/else with branchless operations.
7. Not using __builtin_expect() to make branch predictions more accurate.
8. Not breaking bigger loops into smaller loops to make use of L0 cache (Loop tiling).
9. Not coding in a way that deliberately helps CPU prefetcher cheats time.
10. Unnecessary memory copying.
11. Unnecessary pointer chasing.
12. Using pointers instead of registers in performance sensitive areas.
13. Inefficient data storage (LevelDB? Come on, the best LevelDB devs moved onto RocksDB years ago)
14. Lack of simplicity.
15. Lack of clear separation of concerns.
16. The general pile-togetherness commonly seen in projects involving too many people of different skill levels.

The bottleneck of performance today is memory, the CPU register is 150-400 times faster than main memory, 10x that if you use the newest CPUs and code in a way to make use of all the execution units parallelly and make use of SIMD (out-of-order execution window size, 168 in Sandy Bridge, 192 in Haswell, 224 in Skylake).

One simple cache miss and you end up wasting the time for 30-400 CPU instructions. Even moving 1 byte from one core to another takes 40 nanoseconds, that's enough time for 160 instructions on a 4GHz CPU.

You take one look at Core's code and you know instantly most of the people who wrote it knows only software but not hardware, they know how to write the logic, they know how to allocate and release memory, but they don't understand the hardware they're running the code on, they don't know how electrons are being moved from one place to another inside the CPU at the nanometer level, if you don't have instinctive knowledge of hardware, you'll never be able to write great codes, good maybe, but not great.

Since inception, Core was written by amateurs or semi-professionals, picked up by other amateurs or semi-professionals, it works, there are small nugget of good code here and there, contributed by people who knew what they were doing, but over all the code is nowhere near good, not even close, really just a bunch of slow crap code written by people of different skill levels.

There are plenty of gurus out there who can make Core's code run two to four times faster without even trying. But most of them won't bother, if they're going to work for the bankers they'd expect to get paid handsomely for it.

Quote from: DooMAD on July 05, 2017, 07:55:47 PM

So while people may run SegWit2x code in the interim to make sure it's activated, I envision many of them would switch back to Core the moment Core release compatible code. As such, any loss in Core's dominance would probably only be temporary.

In short, I agree there's probably enough support to active a 2MB fork, but I disagree that Core will lose any significant market share over the long term, even if the 2MB fork creates the longest chain and earns the Bitcoin mantle.

So even a Core fan boy have to agree that Core must fall in line to stay relevant.

A fan boy can fantasize everyone flocking back to Core after they lose the first to market advantage.

But the key is even if Core decide to fall in line to stay relevant, they can no longer play god like before.

So what's your point.

https://i.imgur.com/UIm67kh.jpg

gmaxwell

Moderator
Legendary

Activity: 4774
Merit: 10891

Re: Some 'technical commentary' about Core code esp. hardware utilisation

July 06, 2017, 11:28:18 AM
Last edit: July 08, 2017, 12:30:35 AM by gmaxwell

Merited by ABCbits (11)

What you're seeing here is someone trying to pump his ego by crapping on the work of others and trying to show off to impress you with how uber technical he is-- not the first or the last one of those we'll see.

A quarter of the items in the list like "Lack of inline assembly in critical loops." are both untrue and also show up in other abusive folks lists as things Bitcoin Core is doing and is awful for doing because its antithetical to portability reliability or the posters idea of code aesthetics (or because MSVC stopped supporting inline assembly thus anyone who uses it is a "moron").

Here is the straight dope: If the comments had merit and the author were qualified to apply them-- where is the patch? Oh look at that, no patches.

Many of the of the people working on the project have a long term experience with low level programming (for example I spend many years building multimedia codecs; wladimir does things like video drivers and IIRC used to work in the semiconductor industry), and the codebase reflects many points of optimization with micro-architectural features in mind. But _most_ of the codebase is not a hot-path and _all_ of the codebase must be optimized for reliability and reviewability above pretty much all else.

Some of these pieces of advice are just a bit outdated as well-- it makes little sense to bake in an optimization that a compiler will reliably perform on its own at the expense of code clarity and maintainability; especially in the 99% of code that isn't hot or on a latency critical path. (Examples being loop invariant code motion and use of conditional moves instead of branching).

Similarly, some are true for generic non-hot-path code: E.g. it's pretty challenging in idiomatic, safe C++ to avoid some amount of superfluous memory copying (especially prior to C++11 which we were only able to upgrade to in the last year due to laggards in the userbase), but in the critical path for validation there is virtually none (though there are an excess of small allocations, help improving that would be very welcome). Though, you're not likely to know that if you're just tossing around insults on the internet instead of starting up a profiler.

And of course, we're all quite busy keeping things running reliably and improving-- and pulling out the big tens of percent performance improvements that come from high level algorithmic improvements. Eeking out the last percent in micro-optimizations isn't always something that we have the resources to do even where they make sense from a maintainability perspective. But, instead we're off building the fastest ECC validation code that exists out there bar none; because thats simply more important.

Could there be more micro-optimizations: Absolutely. So step on up and get your hands dirty because there is 10x as much work needed as there are resources are. There is almost no funding (unlike the millions poured into BU just to crank out crashware); and we can't have basically any failures-- at least not in the consensus critical parts. Oh yea, anonymous people will be abusive to you on the internet too. It's great fun.

Quote

Inefficient data storage

Oh please. Cargo cult bullshit at its worst. Do you even know what leveldb is used for in Bitcoin? What reason do you believe that $BUZZWORD_PACKAGE_DEJURE is any better for that? Did it occur to you that perhaps people have already benchmarked other options? Rocks has a lot of feature set which is completely irrelevant for our very narrow use of leveldb-- I see in your other posts that you're going on about superior compression in rocksdb: Guess what: we disable compression and rip out out of leveldb, because it HURTS PERFORMANCE and actually makes the database larger-- for our use case. It turns out that cryptographic hashes are not very compressible. (And as CK pointed out, no the blockchain isn't stored in it-- that would be pretty stupid)

Pretty sad that you feel qualified to through out that long list of insults without having much of an idea about the architecture of the software.

Quote

Since inception, Core was written by amateurs or semi-professionals, picked up by other amateurs or semi-professionals

The regular contributors who have written most of the code are the same people pretty much through the entire life of the project; and they're professionals with many years of experience. Perhaps you'd care to share with use your lovely and impressive works?

Quote

run two to four times faster without even trying.

Which wouldn't even hold a candle to the multiple orders of magnitude speedup we've produced so far cumulatively through the life of the project-- exactly my point about micro-optimizations. Of course, contributions are welcome. But it's a heck of a lot easier to wave your arms and insult people who've produced hundred fold improvements, because you think a laundry list of magic moves is going to get another couple times (and they might-- but at what cost?)

If you'd like to help out it's open and ready-- though you'll be held to the same high standard of review and validation and not just given a pass because a micro-benchmark got 1% faster-- reliability is the first concern... but 2x-level improvements in latency or throughput critical paths would be very very welcome even if they were a bit painful to review.

If you're not interested or able-- well then maybe you're just another drunken sports fan throwing concessions from the stand convinced that you could do so much better than the team, though you won't ever take to the field yourself. Tongue

It doesn't impress, quite the opposite: because you're effectively exploiting the fact that we don't self-promote much, and so you can get away with slinging some rubbish about how terrible we are just to try to make yourself look impressive. It's a low blow against some very hard working people whom owe nothing to you.

If you do a really outstanding job perhaps you'll be able to overcome the embarrassment of:

Quote

2) Say what you will about Craig, he's still a mathematician, the math checks out.

(Hint: Wright's output is almost all pure gibberish; though perhaps you were too busy having fuck screamed at you to notice little details like his code examples for quadratic signature hashing being code from a testing harness that has nothing to do with validation, his fix being a total no op, his false claims that quadratic sighashing is an implementation issue, false claims about altstack having anything to do with turing completeness, false claims that segwit makes the system quadratically slower, false claim that Bitcoin Core removed opcode, yadda yadda. )
I for one an not impressed. Show us some contributions if you want to show that you know something useful, not hot air.

Last of the V8s (OP)

Legendary

Activity: 1652
Merit: 4402

Be a bank

Re: Some 'technical commentary' about Core code esp. hardware utilisation

July 06, 2017, 12:13:11 PM

^thank you so much

https://i.imgur.com/UIm67kh.jpg

Troll Buster

Newbie

Activity: 42
Merit: 0

Re: Some 'technical commentary' about Core code esp. hardware utilisation

July 06, 2017, 07:23:41 PM
Last edit: July 07, 2017, 08:24:19 AM by Troll Buster

Quote from: gmaxwell on July 06, 2017, 11:28:18 AM

What you're seeing here is someone trying to pump his ego by shitting on other things and show off to impress you with how uber technical he is-- not the first or the last one of those we'll see.

What you're seeing here is someone trying to defend obvious bad design choices.

Quote from: gmaxwell on July 06, 2017, 11:28:18 AM

A quarter of the items in the list like "Lack of inline assembly in critical loops." are both untrue and also show up in other abusive folks lists as things Bitcoin Core is doing and is awful for doing because its antithetical to portability reliability or the posters idea of code aesthetics (or because MSVC stopped supporting inline assembly thus anyone who uses it is a "moron").

What aesthetics? Your code is ugly anyway, wait, you think your code has aesthetics? Shit.

You know decades ago people invented this little thing call #ifdef right?

Just use #ifdef _MSC_VER/#else/#endif around the inline assembly if you want to bypass MSVC.

This is basic stuff, anyone who pretend to be an expert and doesn't know this is also a "moron".

Quote from: gmaxwell on July 06, 2017, 11:28:18 AM

Here is the straight dope: If the comments had merit and the author were qualified to apply them-- where is the patch? Oh look at that, no patches.

You ignored the part that was already explained to you:
There are plenty of gurus out there who can make Core's code run two to four times faster without even trying. But most of them won't bother, if they're going to work for the bankers they'd expect to get paid handsomely for it.

Like any self respecting coder is going to clean up your crap and let you claim all the credits.

Quote from: gmaxwell on July 06, 2017, 11:28:18 AM

Many of the of the people working on the project have a long term experience with low level programming

And many people on the project quit because they didn't like working with you, what's your point?

Quote from: gmaxwell on July 06, 2017, 11:28:18 AM

(for example I spend many years building multimedia codecs; wladimir does things like video drivers and IIRC used to work in the semiconductor industry), and the codebase reflects many points of optimization with micro-architectural features in mind. But _most_ of the codebase is not a hot-path and _all_ of the codebase must be optimized for reliability and reviewability above pretty much all else.

gmaxwell must think I am the only one who knows the code sucks.

Why don't you walk outside your little church and look around once in a while.

It's not just the micro optimizations that's in question, even the basic design choices are obviously flawed.

People have been laughing at your choices for years and here you are defending it because you wrote some codec to watch porn with higher fps some years ago.

Quote from: gmaxwell on July 06, 2017, 11:28:18 AM

Some of these pieces of advice are just a bit outdated as well-- it makes little sense to bake in an optimization that a compiler will reliably perform on its own at the expense of code clarity and maintainability; especially in the 99% of code that isn't hot or on a latency critical path. (Examples being loop invariant code motion and use of conditional moves instead of branching).

Translation: My code is great, everyone else is wrong, nobody else can possibly improve it.

It's your style and your choices, it tells people you don't understand performance at the instinctive level.

Even simple crap like switching to --i instead of ++i, will reduce assembly instructions regardless of what optimization flags you use on the compiler.

Quote from: gmaxwell on July 06, 2017, 11:28:18 AM

Similarly, some are true for generic non-hot-path code: E.g. it's pretty challenging in idiomatic, safe C++ to avoid some amount of superfluous memory copying (especially prior to C++11 which we were only able to upgrade to in the last year due to laggards in the userbase), but in the critical path for validation there is virtually none (though there are an excess of small allocations, help improving that would be very welcome). Though, you're not likely to know that if you're just tossing around insults on the internet instead of starting up a profiler.

Translation: My code is great, everyone else is wrong.

Quote from: gmaxwell on July 06, 2017, 11:28:18 AM

And of course, we're all quite busy keeping things running reliably and improving-- and pulling out the big tens of percent performance improvements that come from high level algorithmic improvements. Eeking out the last percent in micro-optimizations isn't always something that we have the resources to do even where they make sense from a maintainability perspective. But, instead we're off building the fastest ECC validation code that exists out there bar none; because thats simply more important.

Could there be more micro-optimizations: Absolutely. So step on up and get your hands dirty because there is 10x as much work needed as there are resources are. There is almost no funding (unlike the millions poured into BU just to crank out crashware); and we can't have basically any failures-- at least not in the consensus critical parts. Oh yea, anonymous people will be abusive to you on the internet too. It's great fun.

Again, it's not just the micro-optimizations, it's the big fat bad design choices.

We didn't come here to bash you or bash your code, the topic just came up, and in that 1 page people were already mocking your choices.

Like this one from ComputerGenie:

Quote from: ComputerGenie on July 06, 2017, 04:11:25 AM

Quote from: Troll Buster on July 06, 2017, 02:32:48 AM

Why the hell is Core still stuck on LevelDB anyway?

The same reason BDB hasn't ever been replaced, because even after a softtfork and a hard fork, new wallets must still be backwards-compatible with already nonfunctional 2011 wallets. Roll Eyes

That should tell you something.

Quote from: gmaxwell on July 06, 2017, 11:28:18 AM

Inefficient data storage Oh please. Cargo cult bullshit at its worst. Do you even know what leveldb is used for in Bitcoin? What reason do you believe that $BUZZWORD_PACKAGE_DEJURE is any better for that? Did it occur to you that perhaps people have already benchmarked other options? Rocks has a lot of feature set which is completely irrelevant for our very narrow use of leveldb-- I see in your other posts that you're going on about superior compression in rocksdb: Guess what: we disable compression and rip out out of leveldb, because it HURTS PERFORMANCE for our use case. It turns out that cryptographic hashes are not very compressible.

Everyone knows compression costs performance, it's for space efficiency, wtf are you even on about.

Most CPU is running idle most of the time, and SSD is still expensive.

So just use RocksDB, or just toss in a lz4 lib, add an option in the config and let people with decent CPU enable compression and save 20G and more.

I just copied the entire bitcoind dir (blocks, index, exec, everything) onto a ZFS pool with lz4 compression enabled and at 256k record size it saved over 20G for me.

Works just fine, and ZFS isn't even known for its performance.

Quote from: gmaxwell on July 06, 2017, 11:28:18 AM

(And as CK pointed out, no the blockchain isn't stored in it-- that would be pretty stupid)

That was not what CK said, what CK said was: I'm not a fan of its performance either [#1058]

Do you have difficulty reading or are you just being intentionally dishonest?

Quote from: gmaxwell on July 06, 2017, 11:28:18 AM

Doesn't matter how many life time people spent on it, when you see silly shit like sha256() twice, you know it's written by amateurs.

Quote from: gmaxwell on July 06, 2017, 11:28:18 AM

All this bullshit talk is meaningless when your basic level silly choices are all over the place.

Here, your internal sha256 lib, the critical hashing function all encode/decode operation relies on, the one that hasn't been updated since 2014:

https://github.com/bitcoin/bitcoin/blob/master/src/crypto/sha256.cpp

SHA256 is one of the key pieces of Bitcoin operations, the blocks use it, the transactions use it, the addresses even use it twice.

So what's your excuse for not making use of SSE/AVX/AVX2 and the Intel SHA extension? Aesthetics? Portability? Pfft.

There are mountains of accelerated SHA2 libs out there, like this one,
Supports Intel SHA extension, supports ARMv8, even has MVSC headers:

Quote

https://github.com/noloader/SHA-Intrinsics/blob/master/sha256-x86.c

SHA-1, SHA-224 and SHA-256 compression functions using Intel SHA intrinsics and ARMv8 SHA intrinsics

For AVX2, here is one from Intel themselves:

Quote

https://patchwork.kernel.org/patch/2343841/

Optimized sha256 x86_64 routine using AVX2's RORX instructions

Provides SHA256 x86_64 assembly routine optimized with SSE, AVX and
AVX2's RORX instructions. Speedup of 70% or more has been
measured over the generic implementation.

Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>

There is your 70% speed up for a single critical operation on your hot-path.

This isn't some advanced shit, that Intel patch was created in March 26, 2013, your sha256 lib was last updated in Dec 19, 2014, so the patch existed over a year before your last update. We have even faster stuff now using Intel SHA intrinsics.

You talk a lot of shit but your code and choices are like they're made by amateurs.

Working in "cryptocurrency" doesn't automatically make you an expert because the word has "crypto" in it.

Fix your silly shit instead of keep talking about it.

cr1776

Legendary

Activity: 4662
Merit: 1378

Re: Some 'technical commentary' about Core code esp. hardware utilisation

July 06, 2017, 07:55:40 PM

Quote from: Troll Buster on July 06, 2017, 07:23:41 PM

Quote from: gmaxwell on July 06, 2017, 11:28:18 AM

What you're seeing here is someone trying to pump his ego by shitting on other things and show off to impress you with how uber technical he is-- not the first or the last one of those we'll see.

What you're seeing here is someone trying to defend obvious bad design choices.
...
--i instead of ++i
...
Fix your silly shit instead of keep talking about it.

As someone who has 30 years of experience plus a BS in CS and CE, and an MS in CS (from top 10 US CS/CE programs), this kind of language isn't a way to (a) make your point, and (b) get anyone to listen to you with any degree of respect.

In open source projects, if you have something like your --i and ++i change, open a pull request or at minimum link to the specific code you are talking about. Most well written, non-student compilers will handle cases like that and there will be no different between things like ++i and i++ and the code generated except perhaps in a class that obfuscates the operation in some extremely obscure way. But, as I said, if it is that easy, please point out what you are talking about.

tspacepilot

Legendary

Activity: 1456
Merit: 1085

I may write code in exchange for bitcoins.

Re: Some 'technical commentary' about Core code esp. hardware utilisation

July 06, 2017, 08:04:49 PM
Last edit: July 06, 2017, 08:17:31 PM by tspacepilot

@TrollBuster

You replied with a lot of "translations", but I think gmaxwell put it pretty clearly:

Quote from: gmaxwell on July 06, 2017, 11:28:18 AM

Here is the straight dope: If the comments had merit and the author were qualified to apply them-- where is the patch? Oh look at that, no patches.

Some of your "translations" are really questionable:

Quote from: Troll Buster on July 06, 2017, 07:23:41 PM

Quote from: gmaxwell on July 06, 2017, 11:28:18 AM

Translation: My code is great, everyone else is wrong, nobody else can possibly improve it.

That doesn't seem right. My reading of gmaxwell was a very strongly worded invitation for you to go ahead and improve it.

Quote

Quote from: gmaxwell on July 06, 2017, 11:28:18 AM

All this bullshit talk is meaningless when your basic level silly choices are all over the place.

Couldn't you, like, fix a few of the 'basic level silly choices' in order to strengthen your argument?

As far as I can tell you've been invited to offer improvements rather than just insults, but it seems that you chose to reply with further insults.

If, for some reason, you can't provide a patch but can provide some helpful discussion which might lead to improvements then it seems like you might need to alter your approach.

I'm not worshipping at anyone's "church" here, I'm just noticing the dynamic: you've been invited to prove the worth of your assumptions, but your reply doesn't seem to be headed in that direction.

Troll Buster

Newbie

Activity: 42
Merit: 0

Re: Some 'technical commentary' about Core code esp. hardware utilisation

July 06, 2017, 08:32:03 PM
Last edit: July 06, 2017, 09:34:44 PM by Troll Buster

Quote from: cr1776 on July 06, 2017, 07:55:40 PM

If greg wants to be treated with respect, he shouldn't begin and end a reply with insults.

This --i and ++i is basic stuff and you want to argue about it? wtf have you been doing for the past 30 years?

And it's not just the speed, it's the smaller byte code which allow you to pack more code into the tiny L0 instruction cache and reduce cache miss, which still costs you 4cycles when you re-fetch it from L1 to L0.

It also means you can fit more code in that tiny 32kb L1 instruction cache, so your other loops/threads can run faster by not being kicked out of the cache by other codes. It also saves power on embedded systems.

This is what I was talking about, the world is flooded with "experts" with "30 years experience" and "50 alphabet soup titles" but still have absolutely no idea wtf actually happens inside a CPU.

Only talentless coders talk about credentials instead of the code.

This is not some super advanced stuff, this is entry level knowledge that's not even up for debate.
The information is everywhere, this took 1 second to find, look:

Quote

https://stackoverflow.com/questions/2823043/is-it-faster-to-count-down-than-it-is-to-count-up/2823164#2823164

Which loop has better performance? Increment or decrement?

What your teacher have said was some oblique statement without much clarification. It is NOT that decrementing is faster than incrementing but you can create much much faster loop with decrement than with increment.

int i;
for (i = 0; i < 10; i++){
   //something here
}

after compilation (without optimisation) compiled version may look like this (VS2015):

-------- C7 45 B0 00 00 00 00 mov dword ptr ,0
-------- EB 09 jmp labelB
labelA 8B 45 B0 mov eax,dword ptr
-------- 83 C0 01 add eax,1
-------- 89 45 B0 mov dword ptr ,eax
labelB 83 7D B0 0A cmp dword ptr ,0Ah
-------- 7D 02 jge out1
-------- EB EF jmp labelA

The whole loop is 8 instructions (26 bytes). In it - there are actually 6 instructions (17 bytes) with 2 branches. Yes yes I know it can be done better (its just an example).

Now consider this frequent construct which you will often find written by embedded developer:

i = 10;
do{
   //something here
} while (--i);

It also iterates 10 times (yes I know i value is different compared with shown for loop but we care about iteration count here). This may be compiled into this:

00074EBC C7 45 B0 01 00 00 00 mov dword ptr ,1
00074EC3 8B 45 B0 mov eax,dword ptr
00074EC6 83 E8 01 sub eax,1
00074EC9 89 45 B0 mov dword ptr ,eax
00074ECC 75 F5 jne main+0C3h (074EC3h)

5 instructions (18 bytes) and just one branch. Actually there are 4 instruction in the loop (11 bytes).

The best thing is that some CPUs (x86/x64 compatible included) have instruction that may decrement a register, later compare result with zero and perform branch if result is different than zero. Virtually ALL PC cpus implement this instruction. Using it the loop is actually just one (yes one) 2 byte instruction:

00144ECE B9 0A 00 00 00 mov ecx,0Ah
label:
   // something here
00144ED3 E2 FE loop label (0144ED3h) // decrement ecx and jump to label if not zero

Do I have to explain which is faster?

Here is more on the L0 and uops instruction cache:

Quote

http://www.realworldtech.com/haswell-cpu/2/

Sandy Bridge made tremendous strides in improving the front-end and ensuring the smooth delivery of uops to the rest of the pipeline. The biggest improvement was a uop cache that essentially acts as an L0 instruction cache, but contains fixed length decoded uops. The uop cache is virtually addressed and included in the L1 instruction cache. Hitting in the uop cache has several benefits, including reducing the pipeline length by eliminating power hungry instruction decoding stages and enabling an effective throughput of 32B of instructions per cycle. For newer SIMD instructions, the 16B fetch limit was problematic, so the uop cache synergizes nicely with extensions such as AVX.

The Haswell uop cache is the same size and organization as in Sandy Bridge. The uop cache lines hold upto 6 uops, and the cache is organized into 32 sets of 8 cache lines (i.e., 8 way associative). A 32B window of fetched x86 instructions can map to 3 lines within a single way. Hits in the uop cache can deliver 4 uops/cycle and those 4 uops can correspond to 32B of instructions, whereas the traditional front-end cannot process more than 16B/cycle. For performance, the uop cache can hold microcoded instructions as a pointer to microcode, but partial hits are not supported. As with the instruction cache, the decoded uop cache is shared by the active threads.

Troll Buster

Newbie

Activity: 42
Merit: 0

Re: Some 'technical commentary' about Core code esp. hardware utilisation

July 06, 2017, 08:42:19 PM

Quote from: tspacepilot on July 06, 2017, 08:04:49 PM

Couldn't you, like, fix a few of the 'basic level silly choices' in order to strengthen your argument?

As far as I can tell you've been invited to offer improvements rather than just insults, but it seems that you chose to reply with further insults.

If, for some reason, you can't provide a patch but can provide some helpful discussion which might lead to improvements then it seems like you might need to alter your approach.

I'm not worshipping at anyone's "church" here, I'm just noticing the dynamic: you've been invited to prove the worth of your assumptions, but your reply doesn't seem to be headed in that direction.

By "you can't provide a patch" you mean things like the Intel sha256 patch I posted at the end?

LOL what kind of bullshit echo chamber is this? You guys are funny.

cr1776

Legendary

Activity: 4662
Merit: 1378

Re: Some 'technical commentary' about Core code esp. hardware utilisation

July 06, 2017, 10:11:09 PM

Merited by ABCbits (1)

Perhaps you should try reading and understanding prior to attacking. I never "argued" with you about --i and ++i. I asked for specifics in the code about which you were referring - which should be easy to provide - and pointed out compilers are quite smart about optimizations, but without knowing what code you are referencing it is impossible to review.

Easy question: where is this "switching to --i instead of ++i" which you are speaking about? Just post a link to it on GitHub.

And regarding your "talentless coders talking about credentials" you again seem to have a huge chip on your shoulder. I spoke about my experience - when people come in and start insulting, attacking, denigrating with a lot of hand-waving and a big chip on their shoulder and no specifics, they are ignored (or not hired) in my experience at big (22000 plus people) and small organizations (3+). And rightly so. I think everyone would appreciate specifics instead of baseless, groundless, inaccurate attacks.

Without more detail no one can evaluate whether you are good at coding or just insulting.

Quote from: Troll Buster on July 06, 2017, 08:32:03 PM

Quote from: cr1776 on July 06, 2017, 07:55:40 PM

If greg wants to be treated with respect, he shouldn't begin and end a reply with insults.

This --i and ++i is basic stuff and you want to argue about it? wtf have you been doing for the past 30 years?

And it's not just the speed, it's the smaller byte code which allow you to pack more code into the tiny L0 instruction cache and reduce cache miss, which still costs you 4cycles when you re-fetch it from L1 to L0.

It also means you can fit more code in that tiny 32kb L1 instruction cache, so your other loops/threads can run faster by not being kicked out of the cache by other codes. It also saves power on embedded systems.

This is what I was talking about, the world is flooded with "experts" with "30 years experience" and "50 alphabet soup titles" but still have absolutely no idea wtf actually happens inside a CPU.

Only talentless coders talk about credentials instead of the code.

This is not some super advanced stuff, this is entry level knowledge that's not even up for debate.
The information is everywhere, this took 1 second to find, look:

Quote

Here is more on the L0 and uops instruction cache:

Quote

Troll Buster

Newbie

Activity: 42
Merit: 0

Re: Some 'technical commentary' about Core code esp. hardware utilisation

July 06, 2017, 10:29:01 PM
Last edit: July 07, 2017, 08:29:36 AM by Troll Buster

#10

Quote from: cr1776 on July 06, 2017, 10:11:09 PM

Perhaps you should try reading and understanding prior to attacking. I never "argued" with you about --i and ++i.

Yes you did, I even highlighted it:

Quote from: cr1776 on July 06, 2017, 07:55:40 PM

In open source projects, if you have something like your --i and ++i change, open a pull request or at minimum link to the specific code you are talking about. Most well written, non-student compilers will handle cases like that and there will be no different between things like ++i and i++ and the code generated except perhaps in a class that obfuscates the operation in some extremely obscure way. But, as I said, if it is that easy, please point out what you are talking about.

Stop talking bullshit.

Quote from: cr1776 on July 06, 2017, 10:11:09 PM

And regarding your "talentless coders talking about credentials" you again seem to have a huge chip on your shoulder. I spoke about my experience - when people come in and start insulting, attacking, denigrating with a lot of hand-waving and a big chip on their shoulder and no specifics, they are ignored (or not hired) in my experience at big (22000 plus people) and small organizations (3+). And rightly so. I think everyone would appreciate specifics instead of baseless, groundless, inaccurate attacks.

Here is a tip, if you don't want to be mocked, next time don't start an argument with:
"As someone who has 30 years of experience plus a BS in CS and CE, and an MS in CS (from top 10 US CS/CE programs)"

You walked in here knowing you had no idea wtf was going on inside a CPU, thrown out a bunch of titles, made a bunch of false claims while making demands, and you want to talk about etiquette?

Your code sucks, everyone else is doing better, I shown you the proof, I pointed you to the right direction, take it or leave it.

You're a nothing burger with 50 stickers on it and I simply don't give a shit what you think.

gmaxwell

Moderator
Legendary

Activity: 4774
Merit: 10891

Re: Some 'technical commentary' about Core code esp. hardware utilisation

July 06, 2017, 10:58:09 PM
Last edit: July 09, 2017, 06:40:51 AM by gmaxwell

#11

Quote

And many people on the project quit because they didn't like working with you, what's your point?

Name one.

Quote

People have been laughing at your choices for years and here you are defending it because you wrote some codec to watch porn with higher fps some years ago.

Says the few days old account...

Quote

Quote from: gmaxwell on July 06, 2017, 11:28:18 AM

Everyone knows compression costs performance, it's for space efficiency, wtf are you even on about.

Most people's CPU is running idle most of the time, and SSD is still expensive.

So just use RocksDB, or just toss in a lz4 lib, add an option in the config and let people with a decent CPU to enable compression and save 20+G.

Reading failure on your part. The blocks are not in a database. Doing so would be very bad for performance. The chainstate is not meaningfully compressible beyond key sharing (and if it were, who would care, it's 2GBish). The chainstate is small and entirely about performance. In fact we just made it 10% larger or so in order to create a 25%-ish initial sync speedup.

If you care about how much space the blocks are using, turn on pruning and you'll save 140GB. LZ4 is a really inefficient way to compress blocks-- it mostly just exploits repeated pubkeys from address reuse Sad

the compact serilization we have better (28% reduction) but it's not clear if its worth the slowdown, especially since you can just prune and save a lot more.

Especially since if what you want is generic compression of block files you can simply use a filesystem that implements it... and it will helpfully compress all your other data, logs, etc.

Quote

So what's your excuse for not making use of SSE/AVX/AVX2 and the Intel SHA extension? Aesthetics? Portability? Pfft.

There was an incomplete PR for that, it was something like a 5% performance difference for initial sync at the time; it would be somewhat more now due to other optimizations. Instead we spent more time eliminating redundant sha256 operations in the codebase, which got a lot more speed up then this final bit of optimization will. It's used in the fibre codebase without autodetection. Please feel free to finish up the autodetection for it. It's a perfect project for a new contributor. We also have a new AMD host so that x86_64 sha2 extensions can be tested on it.

Troll Buster

Newbie

Activity: 42
Merit: 0

Re: Some 'technical commentary' about Core code esp. hardware utilisation

July 06, 2017, 11:35:17 PM
Last edit: July 07, 2017, 12:07:08 AM by Troll Buster

#12

Quote from: gmaxwell on July 06, 2017, 10:58:09 PM

Quote

And many people on the project quit because they didn't like working with you, what's your point?

Name one.

How about the entire XT team for starters:

Quote

https://bulldozer00.com/2016/01/19/ready-and-waiting/

Because of the steadfast stubbornness of the Bitcoin Core software development team to refuse to raise the protocol’s maximum block size in a timely fashion, three software forks are ready and waiting in the wings to save the day if the Bitcoin network starts to become overly unreliable due to continuous growth in usage.

So, why would the Bitcoin Core team drag its feet for 2+ years on solving the seemingly simple maximum block issue? Perhaps it’s because some key members of the development team, most notably Greg Maxwell, are paid by a commercial company whose mission is to profit from championing side-chain technology: Blockstream.com. Greg Maxwell is a Blockstream co-founder.

Quote from: gmaxwell on July 06, 2017, 10:58:09 PM

Says the few days old account...

Right, if the logic doesn't work, just fall back to using registration date and post counts to establish authority.

Like the guy above you who claimed to have "30 years experience" while demonstrating less knowledge about CPU and compilers than a snot nosed newbie drone programmer.

Quote from: gmaxwell on July 06, 2017, 10:58:09 PM

At the time I didn't even know you guys were stupid enough to not compress the 150G of blocks, until someone reminded me in that thread. Seriously what is the point leaving blocks from 2009 uncompressed? SSD is cheap these days but not that cheap.

Quote from: gmaxwell on July 06, 2017, 10:58:09 PM

If you care about how much space the blocks are using, turn on pruning and you'll save 140GB.

So after all the talk about your l33t porn codec skills, your solution to save space is to just prune the blocks? LOL. You might as well say "Just run a thin wallet".

Why do you think compression experts around the world invented algorithms like Lz4? Why do you think it's part of ZFS? Because it is fast enough and it works, it is simple proven tech used by millions of low power NAS around the world for years.

Quote from: gmaxwell on July 06, 2017, 10:58:09 PM

There is a PR for that, it was something like a 5% performance difference for initial sync at the time; it would be somewhat more now due to other optimizations. It's used in the fibre codebase without autodetection. Please feel free to finish up the autodetection for it.

I would have made patches a long time ago if the whole project wasn't already rotten to the core.

I see you just added this part:

Quote from: gmaxwell on July 06, 2017, 10:58:09 PM

LZ4 is a really inefficient way to compress blocks-- it mostly just exploits repeated pubkeys from address reuse Sad the compact serilization we have better (28% reduction) but it's not clear if its worth the slowdown, especially since you can just prune and save a lot more.

Here, there are over 100 compression algorithms, all invented and benchmarked for you.
You'll easily find one that has a size/speed/mem profile that just happen to work great on bitcoin block files and is better than LZ4.

Just pick ONE.

Quote

http://mattmahoney.net/dc/text.html
Large Text Compression Benchmark

Program Options enwik8
------- ------- ----------
cmix v13 15,323,969
durilca'kingsize -m13000 -o40 -t2 16,209,167
paq8pxd_v18 -s15 16,345,626
paq8hp12any -8 16,230,028
drt|emma 0.1.22 16,679,420
zpaq 6.42 -m s10.0.5fmax6 17,855,729
drt|lpaq9m 9 17,964,751
mcm 0.83 -x11 18,233,295
nanozip 0.09a -cc -m32g -p1 -t1 -nm 18,594,163
cmv 00.01.01 -m2,3,0x03ed7dfb 18,122,372

cr1776

Legendary

Activity: 4662
Merit: 1378

Re: Some 'technical commentary' about Core code esp. hardware utilisation

July 06, 2017, 11:41:07 PM

#13

How about just posting a link (as I've asked 3 times now) to where you advocate "switching to --i instead of ++i"?

I am quite aware of what "goes on inside a CPU" and have actually done several CPU designs. Although I think you need to drop the "Buster" since you are just trolling us.

Quote from: Troll Buster on July 06, 2017, 10:29:01 PM

Quote from: cr1776 on July 06, 2017, 10:11:09 PM

Perhaps you should try reading and understanding prior to attacking. I never "argued" with you about --i and ++i.

Yes you did, I even highlighted it:

Quote from: cr1776 on July 06, 2017, 07:55:40 PM

Most well written, non-student compilers will handle cases like that and there will be no different between things like ++i and i++ and the code generated except perhaps in a class that obfuscates the operation in some extremely obscure way.

Well technically you posted ++i and i++, but this whole time I've been talking about ++i and --i, that was what you were responding to, and you stated that compilers can handle everything, they can't, and that's entry level knowledge.

Quote from: cr1776 on July 06, 2017, 10:11:09 PM

gmaxwell

Moderator
Legendary

Activity: 4774
Merit: 10891

Re: Some 'technical commentary' about Core code esp. hardware utilisation

July 07, 2017, 03:24:09 AM
Last edit: July 07, 2017, 07:51:39 PM by gmaxwell

#14

Quote from: Troll Buster on July 06, 2017, 11:35:17 PM

XT team for starters:

Fun fact: Mike Hearn contributed a grand total of something like 6 relatively minor pull requests-- most just changing strings. It's popular disinformation that he was some kind of major contributor to the project. Several of his changes that weren't string changes introduced remote vulnerabilities (but fortunately we caught them with review.)

Quote

Right, if the logic doesn't work, just fall back to using registration date and post counts to establish authority.

Yes, I've been using Bitcoin pretty much its entire life and I can easily demonstrate it. My expertise is well established, why is it that you won't show us yours though you claim to be so vastly more skilled than everyone here?

Quote

From 2009? ... you know that the blocks are not accessed at all, except by new peers that read all of them right? They're not really accessed any less accessed than blocks from 6 months ago. (they're also pretty much completely incompressable with lz4, since unlike modern blocks they're not full of reused addresses).

As to why? Because a 10% decrease in size isn't all that interesting esp at the cost of making fetching blocks for bloom filtered lite nodes much more cpu intensive, as that's already a DOS vector.

[Edit: dooglus points out the very earliest blocks are actually fairly compressible presumably because they consist of nothing but coinbase transactions which have a huge wad of zeros in them.]

Quote

So after all the talk about your l33t porn codec skills, your solution to save space is to just prune the blocks? LOL. You might as well say "Just run a thin wallet".

Uh, sounds like you're misinformed on this too: Pruning makes absolutely no change in the security, privacy, or behavior of your node other than that you no longer help new nodes do their initial sync/scanning. Outside of those narrow things a pruned node is completely indistinguishable. And instead of only reducing the storage 10%, it reduces it 99%.

Quote

Why do you think compression experts around the world invented algorithms like Lz4? Why do you think it's part of ZFS? Because it is fast enough and it works, it is simple proven tech used by millions of low power NAS around the world for years.

Here, there are over 100 compression algorithms, all invented and benchmarked for you.
You'll easily find one that has a size/speed/mem profile that just happen to work great on bitcoin block files and is better than LZ4.

Lz4 is fine stuff, but it isn't the right tool for Bitcoin almost all the data in Bitcoin is cryptographic hashes which are entirely uncompressable. This is why a simple change to more efficient serialization can get over 28% reduction while your LZ4 only gets 10%. As far as other things-- no we won't: block data is not like ordinary documents and traditional compressors don't do very much with it.

(And as an aside, every one of the items in your list are exceptionally slow. lol, for example I believe the top item in it takes it about 12 hours to decompress its 15MB enwiki8 file. heh way to show off your ninja recommendation skills)

If you'd like to work on compression, I can point you to the compacted serialization spec that gets close to 30%... but if you think you're going to use one of the paq/ppm compressors ... well, hope you've got a fast computer.

Quote

I would have made patches a long time ago if the whole project wasn't already rotten to the core.

Can you show us a non-trivial patch you made to any other project anywhere?

Troll Buster

Newbie

Activity: 42
Merit: 0

Re: Some 'technical commentary' about Core code esp. hardware utilisation

July 07, 2017, 04:34:47 AM
Last edit: July 07, 2017, 08:35:02 AM by Troll Buster

#15

Quote

Irrelevant.
You challenged me to find one person who quit because of you. I gave you a whole team.
Here is another team, the Bitcoin Classic team, they left for similar reasons.

Quote

I didn't claim anything about myself.
Your code sucks, you said nope they're great, so I shown you where, I shown you how to improve it.
Then you went all "Says the few days old account", "i spent years on a porn codec" ego authority bullshit.
I don't care what you think about yourself or me.
Stick to the tech or stfu.

Quote

So now you're going to use new peers as an excuse to not compress the blocks?

That is so stupid.

When compression is enabled, and a new peer requests an old block.
Just send him the entire compressed block as is and let him process it.
It'll actually save bandwidth and download time.

Just add the compression feature and setting.
Some user would like to save 20G on their SSD by changing a 0 to 1, some wouldn't.
Just add the feature and move on, what's so complicated.
Compression is standard stuff, don't argue over stupid shit.

Quote

Who said anything about security or privacy.
To suggest pruning over simple compression was silly enough.
One minute you go all "My expertise is well established"
Next minute you talk total nonsense.
It's like amateur hour.

Quote

Lz4 is fine stuff, but it isn't the right tool for Bitcoin almost all the data in Bitcoin is cryptographic hashes which are entirely uncompressed. This is why a simple change to more efficient serialization can get over 28% reduction while your LZ4 only gets 10%. As far as other things-- no we won't: block data is not like ordinary documents and traditional compressors don't do very much with it.

(And as an aside, every one of the items in your list are exceptionally slow. lol, for example I believe the top item in it takes it about 12 hours to decompress its 15MB enwiki8 file. heh way to show off your ninja recommendation skills)

If you'd like to work on compression, I can point you to the compacted serialization spec that gets close to 30%... but if you think you're going to use one of the paq/ppm compressors ... well, hope you've got a fast computer.

Look, here is the bottom line.
Compression is a common feature used everywhere for decades.
It's not some new high tech secret, why talk so much bullshit making it sound so complicated.

The point is you're already a few years late.
10%, 20%, 30%, Lz4, not Lz4, who gives a shit, in the end it's a space/time trade off.
If you can't decide what settings to use, just offer 3 settings, low/medium/high.
If you can't decide which algorithm to use, let user choose 1 out of 3 algorithms, give users the choice.
Compression is simple, libs and examples are everywhere, just figure it out.
Stop giving stupid excuses and stop mumbling irrelevant bullshit.

Quote

Can you show us a non-trivial patch you made to any other project anywhere?

Like they said, "I could tell you but then I'd have to kill you."
Too much hassle.

tspacepilot

Legendary

Activity: 1456
Merit: 1085

I may write code in exchange for bitcoins.

Re: Some 'technical commentary' about Core code esp. hardware utilisation

July 07, 2017, 06:49:24 AM

#16

Quote from: Troll Buster on July 06, 2017, 11:35:17 PM

Quote from: gmaxwell on July 06, 2017, 10:58:09 PM

I would have made patches a long time ago if the whole project wasn't already rotten to the core.

So here you're just admitting that you're only here to troll?

Quote from: Troll Buster on July 07, 2017, 04:34:47 AM

Like they said, "I could tell you but then I'd have to kill you."
Too much hassle.

There was this other thing that they said, something about talk being cheap. Then there was another one I heard once that went something like 'put up or shut up'. Maybe those are relevant here.

At this point it's pretty clear to me that Troll Buster is just here to spew bile. It's really striking how puffed up he is about his skills and badassery and then when someone asks him to point to a project he's worked on or generally to prove his talk with something more than a google search his reply is all 'hey, look over there!'

I'll keep watching this thread because amongst all the chest thumping are some interesting technical details, but I think we can go ahead and recognize that Troll Buster isn't going to be contributing anything more than the chest thumping.

Troll Buster

Newbie

Activity: 42
Merit: 0

Re: Some 'technical commentary' about Core code esp. hardware utilisation

July 07, 2017, 07:29:17 AM

#17

Quote from: tspacepilot on July 07, 2017, 06:49:24 AM

but I think we can go ahead and recognize that Troll Buster isn't going to be contributing anything more than the chest thumping.

Shit, you mean I could end up just like you?

wiffwaff

Newbie

Activity: 6
Merit: 0

Re: Some 'technical commentary' about Core code esp. hardware utilisation

July 07, 2017, 08:49:14 AM

#18

Quote from: tspacepilot on July 07, 2017, 06:49:24 AM

Troll Buster is pointing out poor decisions that can be improved upon and people here are trying to find something of their's to be able to bash. This is typical Core tactics, whereby they fail to address the issue being highlighted and instead attempt to launch person attacks on the person stepping forward.

This is exactly why bitcoin development fragmented under the fifth column attacks that forced out the best and brightest, leaving us with the cesspit we have today.

Go on, fire up some BIP148 hashing power. I double-dare you.

Last of the V8s (OP)

Legendary

Activity: 1652
Merit: 4402

Be a bank

Re: Some 'technical commentary' about Core code esp. hardware utilisation

July 07, 2017, 09:04:29 AM

#19

Quote from: wiffwaff on July 07, 2017, 08:49:14 AM

Quote from: tspacepilot on July 07, 2017, 06:49:24 AM

https://bitcointalk.org/index.php?action=profile;u=221980;sa=showPosts

Quote from: wiffwaff on December 28, 2016, 10:00:48 AM

Quote

What is the benefit of bitcoin core?

Bitcoin Core will signal support for and recognise SegWit enabled blocks, amongst other additions. Depending on your stance in the max block size issue, you might like to consider using one of the other many bitcoin clients such as https://www.bitcoinunlimited.info/ which supports bigger maximum block sizes as a solution to the full blocks we currently are experiencing.

Gosh Roger Ver/fake satoshi Craig Wright you forked out for quite an old account.
frighted?

https://i.imgur.com/UIm67kh.jpg

tspacepilot

Legendary

Activity: 1456
Merit: 1085

I may write code in exchange for bitcoins.

Re: Some 'technical commentary' about Core code esp. hardware utilisation

July 07, 2017, 03:35:32 PM

#20

Quote from: wiffwaff on July 07, 2017, 08:49:14 AM

Quote from: tspacepilot on July 07, 2017, 06:49:24 AM

My reading was that Troll Buster's points were all replied to by gmaxwell. Once you cut away all of the "stupid", "worthless", etc, invective there were a few criticisms in there and I'm pretty sure they were addressed. Then, when Mr. Buster continued with the "you're all so stupid" style posts, I think people naturally respond with "ok, can you, like, show us why you're so smart", to which Mr. Buster sorta ran away screaming:

Quote

Like they said, "I could tell you but then I'd have to kill you."
Too much hassle.

Quote

This is exactly why bitcoin development fragmented under the fifth column attacks that forced out the best and brightest, leaving us with the cesspit we have today.

Go on, fire up some BIP148 hashing power. I double-dare you.

Hardly seems unreasonable to reply with technical answer to the few bits of detail in Troll Buster's post (which gmaxwell did) and then to address his invective and screaming by asking him to show something more productive than insults.

Pages: [1] 2 » All

Bitcoin Forum > Bitcoin > Development & Technical Discussion > Some 'technical commentary' about Core code esp. hardware utilisation

« previous topic next topic »