Parallelism in block verification [subthread from:"Are there any benchmark ...?]

aliashraf (OP)

Legendary

Offline

Activity: 1456
Merit: 1174

Always remember the cause!

Parallelism in block verification [subthread from:"Are there any benchmark ...?]

May 09, 2021, 01:42:51 PM
Last edit: May 10, 2021, 05:56:50 PM by aliashraf

#1

Quote from: gmaxwell on May 06, 2021, 03:07:48 AM

Quote from: NotATether on May 05, 2021, 01:16:30 PM

Transaction verification is single threaded

Hasn't been since 2012.

I'm afraid, this PR wasn't helpful enough for utilizing modern multicore CPUs:

Firstly it is NOT about parallel transaction verification, instead it postpones script verification by queuing it for a future multi thread processing which is called right before ending the block verification that is done single threaded. This thread joins to the final multi-thread script processing.
The logic behind such a sophisticated scheme apparently has something to do with the complexities produced by transaction packages where we have transactions with fresh inputs that are outputs of other transactions in the same block. I've proposed an algorithm for handling this situation, BTW.

Secondly, it uses cs_main lock excessively, keeping the lock constantly active through the script verification itself, hence leaving little room for parallelism to have a scalable impact on the block verification process. In practice, it has been experimentally shown (and even discussed in the PR 2060 related comments) that the overhead of the scheme rapidly outperforms gains once it is configured for utilizing more threads than the actual CPU cores and it stays around 30% improvement with 4 Cores, not an encouraging result and far from scalable behavior.

gmaxwell

Moderator
Legendary

Offline

Activity: 4158
Merit: 8382

Are there any benchmark about Bitcoin full node client resource usage?

May 09, 2021, 02:44:55 PM
Last edit: May 10, 2021, 08:18:47 PM by gmaxwell

Merited by Welsh (6)

#2

Quote from: aliashraf on May 09, 2021, 01:42:51 PM

I'm afraid, this PR wasn't helpful enough for utilizing modern multicore CPUs:
[...] In practice, it has been experimentally shown (and even discussed in the PR 2060 related comments) that the overhead of the scheme rapidly outperforms gains once it is configured for utilizing more threads than the actual CPU cores and it stays around 30%

Of course something stops getting more gains if it uses more threads than CPU cores. There are only so many cores and once you're using them entirely adding more threads just adds overheads.

The claim that transaction validation is single threaded is just false and that is all my reply was pointing out. That PR is from 2012-- I linked it to show just how long tx processing has been in parallel, there have been subsequent changes that further improved the performance further.

Quote

Firstly it is NOT about parallel transaction verification, instead it postpones script verification by queuing it for a future multi thread processing which is called right before ending the block verification that is done single threaded. This thread joins to the final multi-thread script processing.
The logic behind such a sophisticated scheme apparently has something to do with the complexities produced by transaction packages where we have transactions with fresh inputs that are outputs of other transactions in the same block.

Stop engaging in abusive technobabbling. There is no interaction with "transaction packages" or anything of the sort there.

The verification of transactions in blocks runs in parallel with everything else. One thread loads transactions from the block into a queue of transactions that need to be validated, other threads pull transactions from the queue and validate them. When the main thread is done loading the queue, it too joins into the validation which has been in progress the whole time. There is nothing particularly fancy about this.

Quote

Secondly, it uses cs_main lock excessively, keeping the lock constantly active through the script verification itself, hence leaving little room for parallelism to have a scalable impact on the block verification process.

All you're doing is copying from the PR summary, but it doesn't even sound like you understand what the words mean. What the PR is saying is that it holds cs_main until the block finished validating, so that the user never sees a incompletely validated block. When blocks contain only a few inputs this limits parallelism, but once blocks have a couple times more inputs than you have cores it achieves full parallelism. Remember again, this was written in 2012 when most of the chain had only a few transactions per block, so the fact that it was parallel for only a block at a time was a limitation then-- it's not much of a limitation now. Today a typical block contains over 6000 inputs, so there is ample parallelism even when limited to a single block at a time.

Of course, there is always room for improvement. But one improves things by actually understanding them, writing code, and testing it. Not by slinging verbal abuse on a forum in a desperate attempt to impress the peanut gallery, so I guess you wouldn't know anything about that. Cheesy

aliashraf (OP)

Legendary

Offline

Activity: 1456
Merit: 1174

Always remember the cause!

Are there any benchmark about Bitcoin full node client resource usage?

May 09, 2021, 05:36:45 PM

#3

Quote from: gmaxwell on May 09, 2021, 02:44:55 PM

Quote from: aliashraf on May 09, 2021, 01:42:51 PM

Firstly it is NOT about parallel transaction verification, instead it postpones script verification by queuing it for a future multi thread processing which is called right before ending the block verification that is done single threaded. This thread joins to the final multi-thread script processing.
The logic behind such a sophisticated scheme apparently has something to do with the complexities produced by transaction packages where we have transactions with fresh inputs that are outputs of other transactions in the same block.

Stop engaging in abusive technobabbling. There is no interaction with "transaction packages" or anything of the sort there.

You better stop talking nonsense, I believe ...
Your comments are misleading and non-constructive. My comments are not "abusive" by any measure.
Your claim about transaction packages" not being relevant here is totally false and laughable just like what you are saying below:

Quote

The verification of transactions runs in parallel with everything else. One thread loads transactions from the block into a queue of transactions that need to be validated, other threads pull transactions from the queue and validate them. When the main thread is done loading the queue, it too joins into the validation which has been in progress the whole time. There is nothing particularly fancy about this.

Absolutely false and misleading. The code does not "load" transactions to "queue of transactions" you are deliberately misrepresenting the code for some mysterious purpose that I don't understand.

Now let's clean your mess:

In the real world, the real bitcoin core client, does NOT validate transactions in parallel, because in bitcoin there is a possibility for transactions to be chained in a single block, forming a transaction package, hence you can't simply "dispatch" txns of a block between threads waiting for them to join, it is why block verification was implemented single thread.

PR 2060 was an improvement, based on "deferring" the CPU intensive part of the task, i.e. script verification by queuing this part for future parallel processing.

It was good but not a complete scaling solution because the main block validation process remaining single thread, occasionally waits for UTXO checks, so, we don't get linear improvement in terms of block processing times with installing more cpus/cores. Period.

Quote

When blocks contain only a few inputs this limits parallelism, but once blocks have a couple times more inputs than you have cores it achieves full parallelism.

Absolutely false and misleading. I have not checked, but I suppose in 2012 most blocks had ways more than "a couple times more inputs" than an average node's cpu cores.

Quote

Of course, there is always room for improvement. But one improves things by actually understanding them, writing code, and testing it. Not by slinging verbal abuse on a forum in a desperate attempt to impress the peanut gallery, so I guess you wouldn't know anything about that. Cheesy

Come on

This is btctalk, not github, and it is very inappropriate asking people to write code or STFU. Roll Eyes

Reading comments like this, I'm realizing more and more that bitcoin is overtaken by junior programmers who do not miss any occasion for bragging with their if-then-else stuff just like what children do.

gmaxwell

Moderator
Legendary

Offline

Activity: 4158
Merit: 8382

Re: Are there any benchmark about Bitcoin full node client resource usage?

May 09, 2021, 06:53:14 PM
Last edit: May 09, 2021, 07:13:19 PM by gmaxwell

#4

Quote from: aliashraf on May 09, 2021, 05:36:45 PM

Your claim about transaction packages" not being relevant here is totally false and laughable

There is no such thing as 'transaction packages' in blocks from the perspective of validation.

Packages are a concept used in selecting transactions to mine so that low-fee ancestor transactions will be selected in order to mine their high fee children. They don't exist elsewhere.

Quote

The verification of transactions runs in parallel with everything else. One thread loads transactions from the block into a queue of transactions that need to be validated, other threads pull transactions from the queue and validate them. When the main thread is done loading the queue, it too joins into the validation which has been in progress the whole time. There is nothing particularly fancy about this.

Absolutely false and misleading. The code does not "load" transactions to "queue of transactions" you are deliberately misrepresenting the code for some mysterious purpose that I don't understand.

Sure it does. That is exactly how it works. The validation loop iterates over each transaction and each input in each transaction, and for each one it loads it into a queue. Concurrently, background threads take work from the queue to validate it.

When the master thread is done loading work it also joins the processing until the queue is empty (if it isn't already empty by the time it gets there).

Quote

In the real world, the real bitcoin core client, does NOT validate transactions in parallel,

It does. Since you're non-technical I don't expect you to read the code to check for yourself, but you can simply run reindex on a node with -assumevalid set to false. You'll see that once it gets up to 2014 or so that it is using all the cores.

Quote

because in bitcoin there is a possibility for transactions to be chained in a single block, forming a transaction package, hence you can't simply "dispatch" txns of a block between threads waiting for them to join, it is why block verification was implemented single thread.

Transactions in a block are required by the consensus rules to be topologically ordered. That means that all the ancestors of a transaction come first. There is no concept of a 'package' in a block.

When the validation is iterating through the block to load the validation queues it saves the new outputs created by each (as of yet unvalidated) transaction, so that they're available when dispatching the work off for any future transactions that consume them. They don't have to be validated before other transactions can consume them, because if there is any invalidity anywhere the whole block will be invalid.

So you can have e.g. an invalid TxA whos outputs are spent by valid TxB whos outputs are spent by valid txC, and its perfectly fine that the validation accepts TxB and TxC before later detecting that TxA is invalid. A's invalidity will trigger the rejection of the block.

Extracting the outputs for other transactions to use does require a linear pass through the transactions but it's fairly inexpensive and doesn't require any validation. It is required in any case because the block serialization can only be decoded in-order too (because you can't tell where transaction 2 begins until you've parsed transaction 1-- the format doesn't have explicit lengths).

Similarly, checking if a UTXO has already been consumed is also inherently somewhat sequential (e.g. consider when the 5th and 50th txn both spend the same input), but these checks are cheap. Often attempts to make processes like that more parallel just slow them down because of the synchronization overheads. That's why it's not possible to give much in the way of useful parallelism advice without testing.

Quote

PR 2060 was an improvement, based on "deferring" the CPU intensive part of the task, i.e. script verification by queuing this part for future parallel processing.

It's not deferred, it's queued to different threads and run in parallel.

Quote

It was good but not a complete scaling solution because the main block validation process remaining single thread, occasionally waits for UTXO checks, so, we don't get linear improvement in terms of block processing times with installing more cpus/cores. Period.

You essentially never get linear improvement with more cores, as memory bandwidth and disk bandwidth don't scale with them and there are synchronization overheads. ... so that statement is essentially empty.

You've now moved the goal post entirely. In the original thread NotATether misadvised ETFbitcoin that to benchmark validation he only needed to use a single core because 'verification is single threaded'. I stepped in to point out that it's been parallel since 2012. Benchmarking just a single core isn't representative as a result.

Your reply-- basically doubling down and defending misinformation you spread in the past-- is essentially off-topic. If you want to argue that the parallelism isn't perfect, sure it's not-- it pretty much never is except for embarrassingly parallel tasks. And surely there are opportunities to improve it but they're not likely to be correctly identified from the armchair of someone who hasn't tried implementing them or benchmarking them. Your response is full of criticisms essentially copy and pasted from the benchmarks and authors comments about the limitations back in 2012, but the code in Bitcoin has continued to advance since 2012.

But absolutely none of that has anything to do with a person saying that its single threaded (it isn't) so you can just benchmark on a single core (you can't, at least not with representative results).

Ironically, if the parallelism actually were perfect then benchmarking on a single thread would again be a more reasonable thing to do (because you could just multiply up the performance).

Quote

When blocks contain only a few inputs this limits parallelism, but once blocks have a couple times more inputs than you have cores it achieves full parallelism.

Absolutely false and misleading. I have not checked, but I suppose in 2012 most blocks had ways more than "a couple times more inputs" than an average node's cpu cores.

In 2012 most blocks were created in 2010 or before and had few transactions at all. Today most blocks were created in 2015 or later and the nearly empty blocks early in the chain are a rounding error in the sync time, and so most blocks have a great many inputs and make good use of multiple cores. Could it be even better? Almost certainly, though many things have been tried and found to not pay off... and other things have turned out to be so complicated that people haven't gotten far enough to benchmark them to see if they'll help yet. I wouldn't be too surprised to learn that there was more speedup to be had from reducing synchronization overheads at the *expense* of parallelism-- esp since as cpu performance increases the relative cost of synchronization tends to increase (mostly because memory bandwidth has increased slower than cpu speeds have).

Regardless, my comment that it is parallel and has been since 2012 in no way suggests that the parallelism is magically perfect.

Your interactions have been cited to me multiple times by actual experts as to why they don't post here entirely. It really makes it unenjoyable to post here to know that so predictably a simple correction of a factual error with a cite will generate a rabid gish-gallop substantially off-topic response defending some bit of misleading info some poster had previously provided. It makes the forum worse for everyone and there is just no need for it.

Quote

Reading comments like this, I'm realizing more and more that bitcoin is overtaken by junior programmers who do not miss any occasion for bragging with their

Essentially no one who programs bitcoin comments in this subforum anymore when in the past almost all of them did. I believe the only remaining person is Achow and he comments fairly infrequently. Responses like your make them not bother.

aliashraf (OP)

Legendary

Offline

Activity: 1456
Merit: 1174

Always remember the cause!

Re: Are there any benchmark about Bitcoin full node client resource usage?

May 09, 2021, 08:46:01 PM

#5

Quote from: gmaxwell on May 09, 2021, 06:53:14 PM

Quote from: aliashraf on May 09, 2021, 05:36:45 PM

Your claim about transaction packages" not being relevant here is totally false and laughable

There is no such thing as 'transaction packages' in blocks from the perspective of validation.

Packages are a concept used in selecting transactions to mine so that low-fee ancestor transactions will be selected in order to mine their high fee children. They don't exist elsewhere.

Besides the fact that throughout the source code, in the comments, transactions with dependent inputs are referenced using this term, I tried but didn't find any other term to use for such "packaged" transactions when they are present in a block, it'll be highly appreciated if you would give me one.

Quote

The verification of transactions runs in parallel with everything else. One thread loads transactions from the block into a queue of transactions that need to be validated, other threads pull transactions from the queue and validate them. When the main thread is done loading the queue, it too joins into the validation which has been in progress the whole time. There is nothing particularly fancy about this.

Absolutely false and misleading. The code does not "load" transactions to "queue of transactions" you are deliberately misrepresenting the code for some mysterious purpose that I don't understand.

Sure it does. That is exactly how it works. The validation loop iterates over each transaction and each input in each tran saction, and for each one it loads it into a queue. Concurrently, background threads take work from the queue to validate it.

To what "it" refers?
The transaction?
No! The input script of the transaction! And it is the point:

For queuing the transaction input scripts, the code checks for the existence of the input in the UTXO set. Doesn't it?
So, it is not about parallel transaction processing because other than their input scripts, transaction are processed sequentially.

Quote

In the real world, the real bitcoin core client, does NOT validate transactions in parallel,

It does. Since you're non-technical I don't expect you to read the code to check for yourself, but you can simply run reindex on a node with -assumevalid set to false. You'll see that once it gets up to 2014 or so that it is using all the cores.

No, it doesn't. I've read the code, actually I'm "technical" enough to understand that it is queuing the script processing part and there may be room for making it better ...
Comparatively your take from the code is, in the best sense, somehow "raw" because you think it is the transaction that is getting queued but it is not.
BTW, I don't get it, how "using all the core" is enough for justifying your idea.

Quote

because in bitcoin there is a possibility for transactions to be chained in a single block, forming a transaction package, hence you can't simply "dispatch" txns of a block between threads waiting for them to join, it is why block verification was implemented single thread.

Transactions in a block are required by the consensus rules to be topologically ordered. That means that all the ancestors of a transaction come first. There is no concept of a 'package' in a block.

When the validation is iterating through the block to load the validation queues it saves the new outputs created by each (as of yet unvalidated) transaction, so that they're available when dispatching the work off for any future transactions that consume them. They don't have to be validated before other transactions can consume them, because if there is any invalidity anywhere the whole block will be invalid.

So you can have e.g. an invalid TxA whos outputs are spent by valid TxB whos outputs are spent by valid txC, and its perfectly fine that the validation accepts TxB and TxC before later detecting that TxA is invalid. A's invalidity will trigger the rejection of the block.

Extracting the outputs for other transactions to use does require a linear pass through the transactions but it's fairly inexpensive and doesn't require any validation. It is required in any case because the block serialization can only be decoded in-order too (because you can't tell where transaction 2 begins until you've parsed transaction 1-- the format doesn't have explicit lengths).

Similarly, checking if a UTXO has already been consumed is also inherently somewhat sequential (e.g. consider when the 5th and 50th txn both spend the same input), but these checks are cheap. Often attempts to make processes like that more parallel just slow them down because of the synchronization overheads. That's why it's not possible to give much in the way of useful parallelism advice without testing.

Good. Very good. Now you are talking, and I'm not against any point you made here, and I didn't say a word contradicting it!

And it is the problem, I suppose:
You get offended by any comment, no matter what! o
Few weeks ago someone in this forum told something about block verification in bitcoin being sequential because of "packaged transactions" (take it easy). I proposed an algorithm for this and the discussion went on in other directions ...
Later, I checked the code and noted the queuing thing and input scripts being subject to multi-threading instead of the whole transaction asking myself why not going after the whole txn? ...
Additionally, I noted the excessive use of locks and became concerned about their impact on parallelism.
Put these alongside a true story: being stuck with a week long synchronization nightmare just few days ago Wink

Then I saw your comment on the other topic and felt it is appropriate to remind the partial nature of the way parallelism has been implemented rather than leaving people confused with a simple sentence: Transaction validation is multi-threaded!

My impression about the way PR2060 is implemented was and still is that there is a concern about the "packaged transactions" (come on, t is good term for the purpose) and I thought it is adequate to share.

But when it comes to Greg Maxwell, things get super sensitive. No?
You think every point is made for a purpose, a bad purpose probably, ...

It is not true. As an engineer and a researcher, I can't, and I won't blindly follow every single verdict BIG Maxwell issues, don't feel offended about it, because at the same time I have deep respect for you and other guys who have been serving on the board for a decade,

Honestly, this discussion didn't add anything to my knowledge about the code and bitcoin but I' sure many readers didn't know how exactly things get done in the validation process and found it useful and if you could keep it more informative rather than "challenging" it'd be even more useful.

gmaxwell

Moderator
Legendary

Offline

Activity: 4158
Merit: 8382

Re: Are there any benchmark about Bitcoin full node client resource usage?

May 09, 2021, 09:41:25 PM
Last edit: May 10, 2021, 08:29:24 PM by gmaxwell

#6

Quote from: aliashraf on May 09, 2021, 08:46:01 PM

Besides the fact that throughout the source code, in the comments, transactions with dependent inputs are referenced using this term, I tried but didn't find any other term to use for such "packaged" transactions when they are present in a block, it'll be highly appreciated if you would give me one.

100% of the use of the word is in the context of the mempool and tx selection. As mentioned 'packages' were introduced as part of child-pays-for-parent mining. Blocks have always contained transactions whos parents were in the same block from day one, the package concept in the mempool was introduced so that it could include high feerate children when the parents wouldn't otherwise make it into the blocks.

There are simply no packages in blocks or in block validation. The term you want is just *transaction*, maybe child transactions, or "transactions with unconfirmed parents". There really isn't a term for what you're looking for because supporting the case of child txn in the same blocks just works in Bitcoin with no special additional code, so it's never been a separate freestanding thing that needed a name.

Calling that packages would be inappropriate because it suggests they're in some way grouped or something-- but they're not. A dependent transaction can be anywhere in the block so long as it's after its parents.

Quote

So, it is not about parallel transaction processing because other than their input scripts, transaction are processed sequentially.

The validation of a transaction is almost exclusively the validation of its inputs. Outputs are only checked for size and total amount, nlocktime is an integer comparison, as is the maximum weight. For these kinds of checks bouncing the cacheline over to another thread would take more time than the check.

Unless the signatures are cached the signatures take almost all the time. And when the signatures are cached the inputs almost always are (unless the coinscache was recently flushed), in cases when everything is cached memory bandwidth (e.g. against sync overheads and copying) is probably the limiting factor... particularly because the validity of entire transactions is also cached and if those lookups are all cache hits then the only validation that happens is nlocktimes and checking for conflicting spends, which is absurdly fast. If you look at the bench numbers I posted in the other thread you'll see that with cold caches validation took 725ms (and that was on an 18 core system, though POWER so its threading is higher overhead) while the hot cache example was 73ms. So clearly performance considerations are already dominated by worst cases, not average cases as they're already MUCH faster than the worst case.

It could queue the entire transactions rather than the individual inputs but this would achieve worse parallelism e.g. because a block pretty reliably will have >6000 inputs these days, but may have anywhere between 500 and 2000 txn just the same. At least at the time that was written working tx at a time was slower per my chat logs. That might not be true anymore because more of the workload is full blocks, because more recent cpus suffer more from overhead, etc. This was the sort of thing I was referring to in my prior post when I said that less parallelism might turn out to be faster now, due to overheads.

Quote

BTW, I don't get it, how "using all the core" is enough for justifying your idea

Because if its hitting a load average equal to the number of cores it's using all the available resources so adding more parallelism won't make it faster (and if it's not quite equal but a little lower that bounds the speedup that can be had by more parallelism). Of course, the load could be substantially overhead so something with a high load avg could be made faster-- but it would be made faster by reducing overheads. Nothing specific you've said addresses overhead, your focus is exclusively on parallelism and your suggested algorithm would likely have extremely high overheads.

But most critically: It shows that the validation is multithreaded, which was the entire point of my message.

Quote

Few weeks ago someone in this forum told something about block verification in bitcoin being sequential because of "packaged transactions" (take it easy).

Turns out that people around here often have mistaken beliefs. Among other sources they get propagated by altcoin scammers that claim to have fixed "problems" that don't even exist. But just because someone was mistaken about something that is no reason to go on at length repeating the error, especially when someone shows up, with citations, to show otherwise.

The validation is multi-threaded. Period. Stop claiming that the validation is single threaded simply because it's not inconceivable that it could be made faster or more parallel.

Quote

I proposed an algorithm for this and the discussion went on in other directions ...

Your proposed algorithm would almost certainly not work well in practice: Every one iterations requires multiple read/write accesses to global data (the set of existing outputs). Also in the worst case it would have to attempt to validate all transactions N^2 times: consider what happens if all transactions are dependent on the transaction before them and the parallel selection is unlucky so that it attempts the first transaction last. It would process and fail every transaction except the first, then process every transaction except the first and fail every one except the second and so on. More commonly each thread would just end up spending all its time waiting for control of the cacheline it needs to check for its inputs.

It's just trying to impose too strong an ordering requirement, as there is no need to validate in order. Bitcoin core doesn't today. Deserialization must be in-order because the encoding makes anything else effectively impossible. And the consensus rules require that outputs exist before they're consumed but that can be (and is) satisfied without ordering validation by first gathering up all the new outputs before resolving the inputs from txn that might consume them.

Quote

Additionally, I noted the excessive use of locks and became concerned about their impact on parallelism.

What use of locks did you note? Search in the browser in that thread shows the word lock never comes up. And during the input validation the only mutable data that gets accessed is the signature cache which has been lock free since 2016, so the only lock the validation threads should even wait on is the one protecting their workqueue.

Quote

Honestly, this discussion didn't add anything to my knowledge about the code and bitcoin

Doesn't seem that much of anything ever does...

aliashraf (OP)

Legendary

Offline

Activity: 1456
Merit: 1174

Always remember the cause!

Re: Are there any benchmark about Bitcoin full node client resource usage?

May 10, 2021, 10:55:41 AM
Last edit: May 10, 2021, 11:22:09 AM by aliashraf

#7

Quote from: gmaxwell on May 09, 2021, 09:41:25 PM

Quote from: aliashraf on May 09, 2021, 08:46:01 PM

Besides the fact that throughout the source code, in the comments, transactions with dependent inputs are referenced using this term, I tried but didn't find any other term to use for such "packaged" transactions when they are present in a block, it'll be highly appreciated if you would give me one.

100% of the use of the word is in the context of the mempool and tx selection.

Being "in the context of mempool" is a correct remark, but I don't understand why should we invent any other term for packages when they migrate to blocks? I mean other than just being offensive against an imaginary opponent Cheesy

Quote

Calling that packages would be inappropriate because it suggests they're in some way grouped or something-- but they're not. A dependent transaction can be anywhere in the block so long as it's after its parents.

But they are actually "grouped or something" and there is no "dependent transaction" in bitcoin literature. Now you are inventing a new term with ambiguous meaning just for fighting with me Tongue

Quote

So, it is not about parallel transaction processing because other than their input scripts, transaction are processed sequentially.

The validation of a transaction is almost exclusively the validation of its inputs. Outputs are only checked for size and total amount, nlocktime is an integer comparison, as is the maximum weight. For these kinds of checks bouncing the cacheline over to another thread would take more time than the check.

You are deliberately "overlooking" the most expensive task here: checking for the inputs to exist in the UTXO db which is currently done in the main thread, sequetially.

Quote

It could queue the entire transactions rather than the individual inputs but this would achieve worse parallelism e.g. because a block pretty reliably will have >6000 inputs these days, but may have anywhere between 500 and 2000 txn just the same. At least at the time that was written working tx at a time was slower per my chat logs. That might not be true anymore because more of the workload is full blocks, because more recent cpus suffer more from overhead, etc. This was the sort of thing I was referring to in my prior post when I said that less parallelism might turn out to be faster now, due to overheads.

Besides contradicting yourself in a single paragraph, your reasoning suffers from an architectural weakness:
Look, when it comes to parallelism, you don't do it by breaking apart cpu intensive tasks and I/O bound ones artificially, it is OS (and not a programmer's) job. Violating this simple rule of thumb would lead to excessive context switching overheads, exactly the same problem we are dealing with here.
Mission critical projects desperately need software architects, ways more than "genius" if-then-else coders. It is a known issue for open-source projects as they are subject to deviation of engineering principles and best practices because there is a tendency toward coding before thinking or even discussing. The way they like to put it: thinking by code! The most stupid slogan ever.

Software engineers/architects do not think by code, never, ever. I, personally postpone coding as much as possible because I've been there for a long time and I regret writing down thousands of lines of stupid code that although they "work" they are much of a burden rather than a relief for the project and users.

I'm telling you that the strategy chosen by sipa in PR 2060, which has survived until now in spite of further improvements, was not a good one because it does too much context switching.
A single thread should take care of a single transaction, a software architect speaks here, so don't bother wasting your time and mine, arguing about basics. Bitcoin is a disruptive technology but not in the field of software engineering and architecture, programmers are not allowed to bend the rules whenever they "feel" something, it is chaos and should be avoided. Nobody will be confused or intimidated by a programmer who is making excuses for his wrong choices or bragging with his if-then-else writing "power".

Quote

Few weeks ago someone in this forum told something about block verification in bitcoin being sequential because of "packaged transactions" (take it easy).

Turns out that people around here often have mistaken beliefs. Among other sources they get propagated by altcoin scammers that claim to have fixed "problems" that don't even exist.

Isn't it over? The anti-alt crusade?
Let's move on and let them to breath. Bitcoin is doing well and is not threatened by Alt-coins, no need to run such a crusade at all. It is not a sect or a cult, it is an open source project and ecosystem, forks happen, divergence is natural and a sign of maturing. Take it easy.

Quote

I proposed an algorithm for this and the discussion went on in other directions ...

Your proposed algorithm would almost certainly not work well in practice:

OK. Challenge accepted. Cheesy

I have not the luxury of living in a wealthy country, I usually do jobs for a living, I have to, but after this mess, I feel it is necessary to allocate a part of my time for proving you wrong in your way: proving by code. Tongue

So, I'll implement the algorithm that you are "almost" (thanks god) certain about it not working well, in a few weeks and we will see. it is totally unnecessary, TBH, because your following objections are totally wrong:

Quote

Every one iterations requires multiple read/write accesses to global data (the set of existing outputs).

The thread will be blocked leaving room for the next thread, don't worry about threads being locked, on the contrary, assigning cpu intensive tasks to threads that never get locked is somehow a counter-purpose design choice, let threads to block, please!!

Quote

Also in the worst case it would have to attempt to validate all transactions N^2 times: consider what happens if all transactions are dependent on the transaction before them and the parallel selection is unlucky so that it attempts the first transaction last.

Firstly, it is actually N^2/2 for the worst case.
Secondly, it is not how we design algorithms for real world problems, being too much concerned about a case that simply doesn't happen. It is enough for an algorithm to finish in real time when the worst case happens, if ever.

Quote

It's just trying to impose too strong an ordering requirement, as there is no need to validate in order. Bitcoin core doesn't today.

Oppositely, it is the current implementation of block verification that imposes an ordering requirement, my algorithm even doesn't enforce parent txns to show up earlier in the block. It has nothing to do with the order of transactions being dependent (your term) or not, once a transaction fails because of its input(s) not being present in the UTXO, it 'll be re-scheduled for a retry as long as at least one more confirmed transaction is added to the set, I don't even look at the txn's index in the block, though, it may be necessary because of consensus, if so, I can keep track of the index, waiting for a confirmed txn with lower index to retry.

gmaxwell

Moderator
Legendary

Offline

Activity: 4158
Merit: 8382

Re: Are there any benchmark about Bitcoin full node client resource usage?

May 10, 2021, 02:12:01 PM
Last edit: May 10, 2021, 08:00:20 PM by gmaxwell

#8

Quote from: aliashraf on May 10, 2021, 10:55:41 AM

But they are actually "grouped or something"

They aren't the mempool package stuff makes groups, but in blocks transactions aren't grouped in any way, the ordering is arbitrary beyond being topological. But thanks for validating my point that applying term used for actual grouping in the mempool would lead to confusion.

Quote

and there is no "dependent transaction" in bitcoin literature.

LMGTFY.

Quote

Firstly, it is actually N^2/2 for the worst case.

One doesn't generally write in the constant factors when discussing asymptotic behavior because the true constant factors depend on the exact operations performed. Usually it's sufficient to just think in terms of the highest exponent.

Quote

Secondly, it is not how we design algorithms for real world problems, being too much concerned about a case that simply doesn't happen

It is when you write systems where reliability is important or when they must resist attack. As much as possible Bitcoin shouldn't have any quadratic operations in validation, even if one is "probably safe most of the time" it just wastes effort analyzing the attack potential (and risk of getting it wrong or some other change violating the safety assumptions).

In your case there is no just justification for the quadratic behavior as the same processing can be accomplished without it. If it were a question of trade-off that would be one thing, but it isn't. The quadratic behavior exist in your proposal purely because you're out of your depth. I even explained how to fix it, but you didn't notice because you're too invested in claiming everyone else is wrong.

Quote

Oppositely, it is the current implementation of block verification that imposes an ordering requirement, my algorithm even doesn't enforce parent txns to show up earlier in the block. It has nothing to do with the order of transactions being dependent (your term) or not, once a transaction fails because of its input(s) not being present in the UTXO, it 'll be re-scheduled for a retry as long as at least one more confirmed transaction is added to the set, I don't even look at the txn's index in the block, though, it may be necessary because of consensus, if so, I can keep track of the index, waiting for a confirmed txn with lower index to retry.

You've missed my point. Yes, consensus requires that txn in blocks are in topological order, but that doesn't mean you need to process them that way. The fact that your suggestion won't actually validate a transaction until it has validated its parents is unnecessary and inefficient, as is its retry and fail approach, instead of picking things up again when it actually can make progress. Consider what your algorithm would do when all the transactions were dependent on the one before them. It would process them all sequentially without any parallelism at all *at best* (and have quadratic behavior at worst). Meanwhile, in that same case Bitcoin does the majority of the processing entirely in parallel. One needs to have collected the outputs from the prior transactions to use them later, but one need not have validated any of the prior transactions.

Maybe if you'd spare us a minute of lecturing us on what a brilliant architect you are-- and how Bitcoin developers are evil idiots, especially me even though I haven't been a bitcoin developer for years-- you might actually learn something.

But who am I kidding?

If you can't manage to notice that you never accomplish anything but raging on a forum while the people you constantly call idiots accomplish stuff you're probably not going to learn anything.

aliashraf (OP)

Legendary

Offline

Activity: 1456
Merit: 1174

Always remember the cause!

Re: Are there any benchmark about Bitcoin full node client resource usage?

May 10, 2021, 04:54:42 PM
Last edit: May 11, 2021, 09:02:58 PM by aliashraf

#9

Quote from: gmaxwell on May 10, 2021, 02:12:01 PM

Quote from: aliashraf on May 10, 2021, 10:55:41 AM

Secondly, it is not how we design algorithms for real world problems, being too much concerned about a case that simply doesn't happen

It is when you write systems where reliability is important or when they must resist attack. As much as possible Bitcoin shouldn't have any quadratic operations in validation, even if one is "probably safe most of the time" it just wastes effort analyzing the attack potential (and risk of getting it wrong or some other change violating the safety assumptions).

We hear excuses like this too much from you: reliability, attack resistance, blah, blah.

We are all aware of the requirements, it is mission critical and subject to adversarial behavior, Bitcoin, so what?
Does it imply programmers are free to impose weird assumptions about a crazy attacker who puts like a million Dollar at stake to keep nodes busy validating his block a few more seconds, jeopardizing his premium?

More generally speaking, I've been observing such talking points so many times, there is always an excuse for bad architectural decisions, experts don't buy them, unfortunately average users/customers do.

I regret saying this, but almost every discussion with you eventually reaches this point where you bring up adversarial behaviors, risks, etc.

Bitcoin is a trillion Dollars network now, and this fact alone is a source of stress and induces hyper sensitivity problems, having a prominent figure who repeatedly reminds devs/advocates of the risks and threats, intimidating them, is just too much. One could classify it as an act of terror Cheesy

I'm not afraid, nobody should, no matter how BIG is Bitcoin it is nothing compared to human's talent and reason, actually it is made by us, human beings, we won't give up developing and improving it ever, it is why discussions should take place and proceed free from superstition and intimidation.

Quote

... lecturing us on what a brilliant architect you are-- and how Bitcoin developers are evil idiots, especially me ...

I found it necessary to remind you of the existence of such a field of expertise in software development that obviously you are not familiar with, it was not about me and I don't think it is necessary for an active forum member to provide detailed information about herself/himself to be treated fairly and respectfully.

As of your accusation of me being disrespectful toward you or other devs just because I've said something about PR 2060 not being perfect and there is room for improvements, ...
I think it is very rude to draw such red lines and biting people once they cross.

midnightmagic

Member

Offline

Activity: 88
Merit: 37

Re: Are there any benchmark about Bitcoin full node client resource usage?

May 10, 2021, 09:19:42 PM

#10

Quote from: aliashraf on May 10, 2021, 04:54:42 PM

We hear excuses like this too much from you: reliability, attack resistance, blah, blah.

We are all aware of the requirements, it is mission critical and subject to adversarial behavior, Bitcoin, so what?
Does it imply programmers are free to impose weird assumptions about a crazy attacker who puts like a million Dollar at stake to keep nodes busy validating his block a few more seconds, jeopardizing his premium?

More generally speaking, I've been observing such talking points so many times, there is always an excuse for bad architectural decisions, experts don't buy them, unfortunately average users/customers do.

I regret saying this, but almost every discussion with you eventually reaches this point where you bring up adversarial behaviors, risks, etc.

Bitcoin is a trillion Dollars network now, and this fact alone is a source of stress and induces hyper sensitivity problems, having a prominent figure who repeatedly reminds devs/advocates of the risks and threats, intimidating them, is just too much. One could classify it as an act of terror Cheesy

I'm not afraid, nobody should, no matter how BIG is Bitcoin it is nothing compared to human's talent and logic, actually it is made by us, human beings, we won't give up developing and improving it ever, it is why discussions should take place and proceed free from superstition and intimidation.

You are making a fool of yourself. I have directly vetted and hired a number software architects who went on to be extremely successful. You are definitely not someone I would hire. You do not exhibit any of the characteristics of any software architect who would be capable of succeeding in any environment I've ever built. And now your unfounded arrogance and unthinking mindlessness are on display forever. Congratulations.

NotATether

Legendary

Offline

Activity: 1582
Merit: 6715

bitcoincleanup.com / bitmixlist.org

Re: Parallelism in block verification [subthread from:"Are there any benchmark ...?]

May 11, 2021, 12:56:57 AM

#11

My god, if there was less bickering here and more github commits then we could actually start seeing these improvements you're talking about.

.
.^{BLACKJACK^♠FUN.}

███▄██████
██████████████▀
████████████
█████████████████
████████████████▄▄
░█████████████▀░▀▀
██████████████████
░██████████████
█████████████████▄
░██████████████▀
████████████
███████████████░██
██████████

CRYPTO CASINO &
SPORTS BETTING

│

.
REWARD SYSTEM
.ELITE SEAT..

│

▄▄███████▄▄
▄███████████████▄
███████████████████
█████████████████████
███████████████████████
█████████████████████████
█████████████████████████
█████████████████████████
███████████████████████
█████████████████████
███████████████████
▀███████████████▀
███████████████████

.

Whitepaper
JACK

aliashraf (OP)

Legendary

Offline

Activity: 1456
Merit: 1174

Always remember the cause!

Re: Are there any benchmark about Bitcoin full node client resource usage?

May 11, 2021, 08:41:35 AM
Last edit: May 11, 2021, 01:47:04 PM by aliashraf

#12

Quote from: midnightmagic on May 10, 2021, 09:19:42 PM

I have directly vetted and hired a number software architects ...

Who are you? Bill Gates?
Go and take care of the army of software architects you have hired instead of insulting forum members here. The supreme leader has done enough in this respect. Angry

P.S:
BTW, I'm ready for demolishing any of your "successful" software architects/gladiators right here who dares to show-up trying to justify the sort of "parallelism" suggested by PR 2060 : Queuing input script processing tasks for further threading! I don't expect billionaires like you to have a clue about what I'm saying but your successful servants should have something to share, or maybe they are too busy serving you and becoming more "successful" Huh

aliashraf (OP)

Legendary

Offline

Activity: 1456
Merit: 1174

Always remember the cause!

Re: Parallelism in block verification [subthread from:"Are there any benchmark ...?]

May 11, 2021, 08:57:28 AM
Last edit: May 11, 2021, 11:33:10 AM by aliashraf

#13

Quote from: NotATether on May 11, 2021, 12:56:57 AM

My god, if there was less bickering here and more github commits then we could actually start seeing these improvements you're talking about.

You mean discussing by code Huh

I suppose the whole forum is useless, a waste of time, isn't it? Let's migrate to github for committing our codes Tongue

It is exactly the culture I hate