Bitcoin Forum
May 12, 2024, 05:08:13 AM *
News: Latest Bitcoin Core release: 27.0 [Torrent]
 
   Home   Help Search Login Register More  
Pages: 1 2 [All]
  Print  
Author Topic: Is gossip protocol in Bitcoin perfect?  (Read 799 times)
the_big_feet (OP)
Newbie
*
Offline Offline

Activity: 14
Merit: 4


View Profile
July 01, 2018, 05:07:18 PM
Merited by ABCbits (1)
 #1

With bitcoin's gossip protocol, could we take for granted that if I sent transaction, for example, at 1:00 PM on 1st January 2018 UTC then every node connected to network from 1:00 PM to let's say (for network latency) 1:05 PM 1st January 2018 would receive that transaction, because of how the gossip protocol works? And by everyone I mean completely every node that was connected at that time to the network? Or maybe my transaction could not reach some node(s) for some reason and why?
Given some time for network latency, do messages reach absolutely all connected peers and how long it takes for them to do that?
1715490493
Hero Member
*
Offline Offline

Posts: 1715490493

View Profile Personal Message (Offline)

Ignore
1715490493
Reply with quote  #2

1715490493
Report to moderator
There are several different types of Bitcoin clients. The most secure are full nodes like Bitcoin Core, but full nodes are more resource-heavy, and they must do a lengthy initial syncing process. As a result, lightweight clients with somewhat less security are commonly used.
Advertised sites are not endorsed by the Bitcoin Forum. They may be unsafe, untrustworthy, or illegal in your jurisdiction.
achow101
Moderator
Legendary
*
expert
Offline Offline

Activity: 3388
Merit: 6635


Just writing some code


View Profile WWW
July 01, 2018, 06:41:06 PM
Merited by Foxpup (3), ABCbits (1)
 #2

There is no such guarantee that your transaction will reach all nodes, nor is it possible to really measure the time it takes nor is it possible to really check whether a transaction has reached all nodes.

Nodes come online and offline continuously, so some nodes that just came online probably do not have your transaction. Other nodes may be configured in such a way that your transaction is not relayed by them, so they don't have your transaction (not relaying directly implies not accepting it into their mempool).

bobthegrownup
Member
**
Offline Offline

Activity: 194
Merit: 29


View Profile
December 26, 2018, 03:10:04 AM
 #3

Gossip is extremely redundant, and thats why its been around since the 1980's. Just like an epidemic, it eventually spreads everywhere, even if there is unsuspected downtime in nodes.

I'm still learning alot about it now that im working with the Marlin Protocol, and I've learned that Gossip is quite likely one of the biggest bottlenecks in scaling blockchains since it affects the networking layer. By using cut-through routing, FEC, IP multicast, and more, we can improve on the speed of these communications

I know BloxRoute is a solution that Bitcoin could adopt, but the likelihood that we move onto a centralized solution is zero. Marlin's decentralized proposition is something that we could slowly migrate to, but its something that wont happen tomorrow. Even Dandelion++ has yet to be implemented and it's a very small upgrade on Gossip.
aliashraf
Legendary
*
Offline Offline

Activity: 1456
Merit: 1174

Always remember the cause!


View Profile WWW
December 26, 2018, 03:10:47 PM
 #4

BlockStream has launched a satellite service that is streaming bitcoin transactions and blocks in a push protocol (no up-link needed).
FIBRE is a more sophisticated system mainly focused on miners an alternative is Bitcoin Relay Network and there are more similar projects to mention ...

I think bitcoin legacy P2P protocol is an infrastructure that should be kept almost intact for security purposes because of its decentralized nature. complementary centralized services like above projects seem to be auditible by using original p2p protocol simultaneously s a reference data. For instance it would be hard for a centralized relay service to censor transactions because of original protocol being active simultaneously and once a service has shown patterns of such behaviors, would be identified and isolated eventually.

gmaxwell
Moderator
Legendary
*
expert
Offline Offline

Activity: 4172
Merit: 8419



View Profile WWW
December 26, 2018, 05:07:52 PM
Merited by Foxpup (4), suchmoon (4), ABCbits (1), bones261 (1)
 #5

I know BloxRoute is a solution that Bitcoin could adopt,
BloxRoute appears to be a straight up ICO scam as far as I can tell-- It doesn't appear to propose anything new or interesting that would be and advance over what is in use in the network today, but uses a lot of hype and jargon to try to sound new and exciting.

Then again googling around suggests the thing you claim to be working on is a competing ICO scam. ... and presumably you only posted here to hype it-- especially since your post is offtopic from the thread. Sad

Quote
complementary centralized services like above project
Fibre is a protocol, not a service (and certainly not a centralized service). It's like you're calling Linux a centralized service because amazon runs it.
aliashraf
Legendary
*
Offline Offline

Activity: 1456
Merit: 1174

Always remember the cause!


View Profile WWW
December 26, 2018, 05:47:19 PM
 #6

Fibre is a protocol, not a service (and certainly not a centralized service). It's like you're calling Linux a centralized service because amazon runs it.
Bad analogy. Cheesy
If it was a requirement for Linux to be hosted in a data center, it would fit in such a category. It is not the case for Linux but for FIBRE, almost, it is.

One needs to setup a relay network of nodes that trust each other, to eliminate as much as possible hopes, thereafter this network is used by bitcoin nodes just as a service.

My point is you can't eliminate hopes and reduce the distance between bitcoin nodes, without giving-up with crucial bitcoin p2p relay network features. This is why you need to keep legacy protocol running, no matter what.
gmaxwell
Moderator
Legendary
*
expert
Offline Offline

Activity: 4172
Merit: 8419



View Profile WWW
December 26, 2018, 07:12:26 PM
Merited by Foxpup (2), ABCbits (1)
 #7

If it was a requirement for Linux to be hosted in a data center, it would fit in such a category. It is not the case for Linux but for FIBRE, almost, it is.
No it isn't not at all.

The reason Matt runs a network of well maintained nodes for people to connect to is because doing so is beneficial (because well maintained, well positioned, well networked hosts have better latency than random stuff) and because if someone doesn't do it and allow the public to use it large miners will be the only parties to benefit from that kind of service (because they do it for themselves) and enjoy an advantage against others.

Fibre is just a protocol that renders bitcoin block transmission free of round trips and exploits pre-knoweldge of transaction data, which allows blocks to be transferred in very close to the one way delay 99.9999% of the time.  There is nothing "service", "centeralized", or "data-center" about it. Please discontinue spreading that misinformation.
aliashraf
Legendary
*
Offline Offline

Activity: 1456
Merit: 1174

Always remember the cause!


View Profile WWW
December 26, 2018, 09:33:41 PM
 #8

If it was a requirement for Linux to be hosted in a data center, it would fit in such a category. It is not the case for Linux but for FIBRE, almost, it is.
No it isn't not at all.

The reason Matt runs a network of well maintained nodes for people to connect to is because doing so is beneficial (because well maintained, well positioned, well networked hosts have better latency than random stuff) and because if someone doesn't do it and allow the public to use it large miners will be the only parties to benefit from that kind of service (because they do it for themselves) and enjoy an advantage against others.

Fibre is just a protocol that renders bitcoin block transmission free of round trips and exploits pre-knoweldge of transaction data, which allows blocks to be transferred in very close to the one way delay 99.9999% of the time.  There is nothing "service", "centeralized", or "data-center" about it. Please discontinue spreading that misinformation.
The way you say, it looks like we have a superior p2p solution out there. Well, we don't. I'm ok with Matt running this protocol, but we need someone to do this anyway, don't we? So, please, give me a label for such 'thing' and I will use it instead of "centralized service", no matter what, it is not good in terms of being censorship resistance, permissionless, decentralized, ... at least it is not an exemplary "thing" in this regards. Deal?

My point is about the undisputable importance of keeping legacy p2p gossip protocol active whether we use such "things" or not.

Personally, I cautiously  recommend both.
gmaxwell
Moderator
Legendary
*
expert
Offline Offline

Activity: 4172
Merit: 8419



View Profile WWW
December 27, 2018, 12:16:29 AM
 #9

Well, we don't.
Sure we do.  Simply just saying that we don't without any reasoning or justification (except perhaps avoiding admitting that you were mistaken before) doesn't make it so.

Quote
So, please, give me a label for such 'thing'

I'm not sure what you mean by label. Do you mean a link?

Quote
I'm ok with Matt running this protocol,
Well thats good, because anyone can run anything they want and they don't need your or anyone elses approval and in fact there is nothing you can to to stop them.
bobthegrownup
Member
**
Offline Offline

Activity: 194
Merit: 29


View Profile
December 27, 2018, 01:22:33 AM
 #10

BloxRoute appears to be a straight up ICO scam as far as I can tell-- It doesn't appear to propose anything new or interesting that would be and advance over what is in use in the network today, but uses a lot of hype and jargon to try to sound new and exciting.

Then again googling around suggests the thing you claim to be working on is a competing ICO scam. ... and presumably you only posted here to hype it-- especially since your post is offtopic from the thread. Sad

If I wanted to hype -- I wouldnt choose a low visibility thread. And yes, its on topic because Im suggesting an alternative to the gossip protocol, which is a decentralized relay network. The thread is called "Is gossip protocol in Bitcoin perfect?" and my opinion is that its not, and thats because the alternative im bringing to light are better.

I'd much prefer a discussion around why a decentralized relay network is labelled a scam -- can't you criticize the idea instead of making a low level integrity attack?

I understand why people get jaded quickly, but this is my livelihood. If im wasting my time working on something that is a scam, id like to know that, but not in such a brushoff manner that suggests your googling revealed a treasure trove of truth that has eluded me for months.



gmaxwell
Moderator
Legendary
*
expert
Offline Offline

Activity: 4172
Merit: 8419



View Profile WWW
December 27, 2018, 02:29:30 AM
Last edit: December 27, 2018, 09:21:58 AM by gmaxwell
Merited by Foxpup (12), ABCbits (3), aliashraf (2), bones261 (1)
 #11

I'd much prefer a discussion around why a decentralized relay network is labelled a scam -- can't you criticize the idea instead of making a low level integrity attack?
Because it very clearly is a scam. Just because you've managed to bamboozle yourself doesn't make it right.  Selling some ICO token as a pretext to collect money from unsophisticated investors for some business that makes little sense is flat out unethical, makes everyone in the cryptocurrency space look bad, and the stink of it scares off competent technical contributors because they don't want to get anywhere near the scammyness of it.

Your post really does have nothing to do with the OP's question-- it seems like you didn't even bother to read it, and only read the subject line.  The OP was asking if gossiped messages would reach all nodes or, in other words, "is the communication lossless?". Achow101 responded pretty much completely-- that it can't be guaranteed to reach all (no physically realizable network could make such a guarantee, since nodes can become disconnected). Though in practice almost all txn reach almost all nodes quickly (subject to the intentional delays added to help obscure origins).

Quote
but this is my livelihood. If im wasting my time working on something that is a scam, id like to know that,
That is a fair enough request, but at the same time anyone in this space is utterly flooded with garbage. It's literally impossible for any person to explain why all of it is junk, worse-- in some cases the perpetrators are actively malicious and aggressively retaliate (for ones that are pure marketing they have literally nothing better to do than spend time and money attacking their critics) so people simply don't.

Why is is what you're working on a scam?  Lets start with the fact that it's soliciting investments from the general public (likely in violation of US securities law) for an enterprise which expects to make money ultimately from resources provided by third parties. ... to accomplish a task which is already accomplished by existing distributed software.  Even if the result manages to do something better compared to existing systems, at the end there would be no reason for users to not take the software and rip out the monetization leaving the investing public holding nothing (and using non-free licensing wouldn't help, since few are foolish enough to run closed systems with their cryptocurrency applications)... and I find it personally pretty doubtful that BloxRoute or Marlin will manage to construct something that outperforms the existing free state of the art, considering that both of the whitepapers massively misrepresent the state of the art.

Right now the current state of the art in Bitcoin block propagation is that >99.99% of blocks are transmitted to locations everywhere in the world within the one-way network delay plus less than 10ms while running on commodity hardware and networks.  The one-way network delay is the physical limit, which can only be improved on by putting in networks with more efficient physical paths (which someone can do, e.g. what HFT companies do... but isn't what you're doing [1]). 10ms could be improved on (esp with asic decoders, or creative protocol/software optimization) but it's not clear that going from 10ms to (say) 1ms would actually matter much-- perhaps worth doing, but it isn't something that is going to recoup the investors investment.

Without a clear statement on: How is it going to outperform light+10ms  and by how much  [2], why Bitcoin participants would be willing to _pay_ to gain access to that 0-10ms improvement, and why they wouldn't just cut out the 'middleman' token to do it?-- then the whole thing just looks like something designed to bilk unsophisticated investors which don't know the most basic questions to ask or have been deceived by misrepresentations about the performance of the existing state of the art.

Scamyness bonus round:  Text from the Marlin Protocol whitepaper was clearly plagiarized directly from the bloxroute whitepaper. As an example "The Falcon Network was deployed (by two members of our bloXroute Labs team)" (em mine, but this is not the only material copied outright-- e.g. it wholesale copies the incorrect description of what fibre is).  So not only are you working on a scammy ICO, it's an unoriginal scammy ICO that is ripping off another scammy ICO.

[1] BloXroute claims to be doing that with AWS but anyone can run the existing free and open state of the art software for block relay on AWS and gain whatever advantages locating relay on AWS has... also,  I have measured relay on AWS and it's actually not impressive compared to other hosting options.

[2] To be clear: we already know how to improve on the existing state of the art-- use a more efficient sketch encoding, use a more efficient serialization of encoded transactions, use a faster to decode FEC, optimize the software (and in particular use more precomputation), improve packet scheduling, support using nics with integrated FPGAs for faster decode and packet relay, and get more users using state of the art protocols-- , but even if the improvements achieve the maximum possible gain, it doesn't seem like it'll matter that much, so currently the efforts of open source developers are being spent on other areas.  But since both Marlin and bloxroute misrepresent the state of the art-- and fail to propose any performance advancement beyond it that I'm aware of--, I am doubtful that either group understands it well enough to even match it, much less exceed it.
aliashraf
Legendary
*
Offline Offline

Activity: 1456
Merit: 1174

Always remember the cause!


View Profile WWW
December 27, 2018, 07:27:35 AM
Last edit: December 27, 2018, 10:40:04 AM by aliashraf
 #12

No, we don't. Simply just saying that we do "without any reasoning or justification (except perhaps avoiding admitting that you were mistaken before) doesn't make it so".  Tongue

Let's see what a FIBRE network looks like when it is actually implemented and presented to public:
Quote
from: http://bitcoinfibre.org/public-network.html
Each FIBRE node in the network exposes its Bitcoin P2P Port to registered users (see the bottom of the page). The FIBRE codebase running on each is optimized to ensure that Compact Blocks for each new block are made available to peers after only SPV validation of new blocks, without blocking for lock acquisition.
So, it is for "registered users" and there is some kind of trust involved because FIBRE nodes do not bother wasting their valuable time to fully validate blocks.
Saying that something is a service or is centralized is not an act of humiliating or cursing. It is just helping people to understand its limits and domain of applicability. FIBRE as a protocol when is deployed on a set of nodes acts like a service, a centralized service, that lubricates block propagation in bitcoin. It is how a FIBRE network looks like, from the point of view of nodes that connect to it. Nodes have to keep using legacy connections with other peers as well to be safe against risks involved.

FIBRE is not a disruptive alternative to bitcoin p2p gossip protocol, it has never been suggested as such thing. I think requiring 'good guys' to setup rings of nodes trusting each other around the globe and implementing a specialized UDP protocol for them to act like a single giant node capable of handling hundreds of connections isn't an idea with full conformance to bitcoin culture, neither an alternative approach to p2p in bitcoin.

FIBRE is great I've to admit and I've never said and I'm not saying anything different. I love it and I recommend using it but I'm aware of risks and penalties involved and all I'm trying to express is my deep respect for legacy p2p gossip protocol as the ultimate reliable infrastructure that enables us to keep everything under control, enables us to use supplementary services like FIBRE.

Quote
I'm ok with Matt running this protocol,
Well thats good, because anyone can run anything they want and they don't need your or anyone elses approval and in fact there is nothing you can to to stop them.
Actually, people have a voice and their opinions matter.

I don't understand why should you argue about such a simple fact (other than for trolling purposes):
There is and there will be no pure decentralized, trustless, censorship resistant p2p protocol other than what is being used in bitcoin, you can improve it, you can use complementary services and protocols along with it, but you can't disruptively replace it with a new fancy idea like FIBRE or anything else. Bitcoin p2p network (at least after Bip 151) is not good candidate for being re-engineered topologically or conceptually in a radical way, because it is supposed to keep the network trustless, decentralized and permissionless/open.


P.S:
just went through your last post, it was great, sent some merits, I'm realising that you have been busy working on the subject recently and it is good news, we need improvements and complementary services/techniques and we need to keep an eye open on fundamentals at the same time.
gmaxwell
Moderator
Legendary
*
expert
Offline Offline

Activity: 4172
Merit: 8419



View Profile WWW
December 27, 2018, 10:03:56 AM
 #13

So, it is for "registered users" and there is some kind of trust involved because FIBRE nodes do not bother wasting their valuable time to fully validate blocks.
Oh man, _please_ go re-read my first message to you in this thread.  You are confusing matt's public systems with the protocol-- linux with amazon.

Fibre is a protocol for distributing blocks very fast.  The protocol is open source and distributed, and I helpfully linked to it above.

Having a great protocol alone isn't enough to get block propagation as fast as a large well run centralized miner could achieve for themselves because a large centralized miner could carefully operate a well run DOS protected network of their own nodes to forward their blocks.  It would be bad if only the largest centralized miners had access to those facilities.  As a result, Matt runs a public network of nodes that provides a collection of geographically distributed, well maintained, dos protected nodes for anyone to connect their ordinary stock bitcoin nodes to who wants to, so that the the advantages of that kind of well maintained infrastructure are available to more parties.

Read the heading of the very page you are quoting.  "Public highly optimized fibre network: As there is significant benefit in having at least one publicly-available high-speed relay network, I maintain one."   Fibre is analogous QUIC-- a free licensed protocol with higher performance than the earlier alternatives-- and Matt's public relay network is analogous to Google's webservers-- a privately operated infrastructure using a high speed protocol and making its services available to the public.  

The experience of running this infrastructure is a lot of what made it possible to design fibre (and BIP152) in the first place, since it created a network in a bottle that could be well measured and characterized, exposing the limits of the legacy protocol and made it easy to deploy experimental adjustments in a rapid fire basis without risking disrupting the rest of the network.

Do you see how you are intermixing two different things?

What you're saying about complementary/supplementary services and their value as an addition applies perfectly to Matt's public relay network (and I agree!).  It just doesn't apply to the protocol-- which is what I was pointing to above, not matt's public relay network.

Quote
but you can't disruptively replace it with a new fancy idea like FIBRE or anything else. Bitcoin p2p network is not good candidate

This is amusing because we already completely replaced how blocks are propagated in Bitcoin with technology from FIBRE (optimized for minimizing bandwidth somewhat more than minimizing latency, which FIBRE optimizes) over two years ago now.

I've heard the old adage that people saying “It can’t be done,” are always being interrupted by somebody doing it. Being corrected by people who have been doing it for years is a bit of a twist... Tongue

It would be perfectly possible to use FIBRE for block relay everywhere and abandon BIP152 just like we abandoned the original block relaying mechanism. But the vast majority of nodes presumably don't want to use a couple times the bandwidth in exchange for shaving off the last few milliseconds, and so the benefit of doing hasn't yet been worth the effort to bring it up to that level of maturity (in particular, one has to be pretty sure there are no dangerous bugs for software that will run on all the nodes). This is especially true because most of the latency benefit of FIBRE can be achieved with just a small number of nodes running it, since they do the heavy lifting of carrying blocks over the 100+ms RTT transcontinental links while BIP152 has pretty similar performance to Fibre where the link RTTs are 10ms or less. Block propagation with BIP152 flows along the lowest delay paths, so when some nodes running Fibre manage to wormhole blocks around the globe faster than BIP152 does all the other nodes happily exploit that fact.

Presumably, Fibre protocol will eventually make it into default node software once doing so becomes the lowest hanging fruit-- likely with more knobs and/or automation to set the bandwidth vs latency trade-off more intelligently.   In BIP152 there are already two different trade-offs:  Normal and High Bandwidth mode. HB mode essentially eliminates one round trip time for relay, at the expense of wasting perhaps 13KB of bandwidth per block, nodes figure out which three peers would be most useful to have in HB mode and users it with those... so as to usually achieve the latency advantages of HB mode without much bandwidth cost.  The addition of Fibre into this mix would probably act as a third higher bandwidth lower latency trade-off that nodes would use if either manually configured or if they detected they had over some threshold of available bandwidth, so we could get the X percent fastest nodes on the network using it, thereby capturing most of the latency advantage without most of the bandwidth overhead. Until then, anyone that wants to run it is still free to do so, ... and without ever interacting with Matt's servers, because, again-- matt's servers are just a user of a protocol that is open and useful to anyone.



aliashraf
Legendary
*
Offline Offline

Activity: 1456
Merit: 1174

Always remember the cause!


View Profile WWW
December 27, 2018, 12:02:32 PM
Last edit: December 27, 2018, 05:53:14 PM by aliashraf
 #14

So, it is for "registered users" and there is some kind of trust involved because FIBRE nodes do not bother wasting their valuable time to fully validate blocks.
Oh man, _please_ go re-read my first message to you in this thread.  You are confusing matt's public systems with the protocol-- linux with amazon.

Fibre is a protocol for distributing blocks very fast.  The protocol is open source and distributed, and I helpfully linked to it above.
{...}
What you're saying about complementary/supplementary services and their value as an addition applies perfectly to Matt's public relay network (and I agree!).  It just doesn't apply to the protocol-- which is what I was pointing to above, not matt's public relay network.

.
I get it, but unlike what you claim, FIBRE will never be deployed in any other way: somebody should set-up a handful of high-end nodes on the backbone with luxurious back-up/support facilities, ... and s/he should be compensated somehow for this.

I'm aware that FIBER has its roots in BIP 152 (a great fan of it, I am) and it is practically developed by same devs (Matt, you, others) but there is no way to go further than bip 152 and keep everything smooth, imo. FIBRE is there to fill this gap.

Quote
{...} Matt's public relay network is analogous to Google's webservers-- a privately operated infrastructure using a high speed protocol and making its services available to the public.  
And web services are not p2p, there is a heterogeneous client/server style communication between engaging parties.

Quote
Do you see how you are intermixing two different things?
No, what I see is you mixing BIP 152 with FIBRE, they are not the same. BIP 152 enjoys compact blocks and avoiding redundant txn data transmission, FIBRE prepares an infrastructure as a service for nodes that preferably are compatible with BIP 152, besides using it internally.

Quote
Quote
but you can't disruptively replace it with a new fancy idea like FIBRE or anything else. Bitcoin p2p network is not good candidate

This is amusing because we already completely replaced how blocks are propagated in Bitcoin with technology from FIBRE (optimized for minimizing bandwidth somewhat more than minimizing latency, which FIBRE optimizes) over two years ago now.

I've heard the old adage that people saying “It can’t be done,” are always being interrupted by somebody doing it. Being corrected by people who have been doing it for years is a bit of a twist... Tongue
BIP 152 was a huge step forward, thanks to Matt Corallo and you and other contributors, I admit that, but  you are the one that is mixing things up, imo

To have a FIBRE network you need more than just compact Blocks and compression/FEC and other techniques, you definitely need a handful of distributed-well connected specialized nodes (that are centrally owned and managed by an entity) that can put a trust in each other to bypass full block validation and lock acquisitions.

If you are speaking of BIP 152 and technologies/concepts it uses, it is already implemented in bitcoin and what I mean by bitcoin p2p gossip protocol. FIBRE on the other side goes beyond BIP 152:
Quote
from: http://bitcoinfibre.org/
FIBRE is designed to be easy to operate for anyone already running a network of Bitcoin Core instances, instantly providing high-speed transfer of blocks.
You need to have multiple instances of bitcoin core nodes, to run FIBRE on them, because you need the nodes to trust each other and do not revalidate every block in full, otherwise you would simply configure them to use BIP 152 in HB mode, no additives needed.

As I understand, you are speaking of FIBRE as a technology, and I'm seeing it as a work-around complementary service available for large pools, service providers and altruists to improve latency for themselves or for the public respectively.

Quote
It would be perfectly possible to use FIBRE for block relay everywhere and abandon BIP152 just like we abandoned the original block relaying mechanism.
I doubt it. Actually I denounce it completely.
If you mean improving BIP 152, to use FEC on UDP and stuff like that, it is not FIBRE, it is FIBRE technology and I've no comments to make about its feasibility, just my blessings. But FIBRE itself, is an open-source software ready to be installed on a centrally owned network of nodes that are present in bitcoin ecosystem to form a relay network other than bitcoin itself such that it would help with propagation delay for the participants privately or publicly connected. Amazingly, presence of such relay networks, regardless of them being public or private,  improves bitcoin network's overall performance.

gmaxwell
Moderator
Legendary
*
expert
Offline Offline

Activity: 4172
Merit: 8419



View Profile WWW
December 27, 2018, 06:12:54 PM
Last edit: December 27, 2018, 06:28:44 PM by gmaxwell
Merited by Foxpup (4), ABCbits (1)
 #15

I get it, but unlike what you claim, FIBRE will never be deployed in any other way
It already is, Matt isn't the only user of it. Several mining pools also use it on their own, and Blockstream's satellite uses it (not for high speed, but because it also needed a roundtripless protocol for relaying blocks over a lossy channel).

Quote
but there is no way to go further than bip 152
Sure there is.

Quote
And web services are not p2p, there is a heterogeneous client/server style communication between engaging parties.
Good thing it's not a webservice then.

Quote
FIBRE prepares an infrastructure as a service for nodes
No it doesn't. It's simply a protocol for relaying blocks that never requires a round trip.

Quote
To have a FIBRE network you need more than just compact Blocks and compression/FEC and other techniques, you definitely need a handful of distributed-well connected specialized nodes (that are centrally owned and managed by an entity) that can put a trust in each other to bypass full block validation and lock acquisitions.
You absolutely do not.  In fact, the text "The FIBRE codebase running on each is optimized to ensure that Compact Blocks for each new block are made available to peers after only SPV validation of new blocks, without blocking for lock acquisition" also describes BIP152 in Bitcoin Core right now.   BIP152 was designed to permit blocks to be relayed after only SPV validation.  This wasn't implemented in the very first release with BIP152 because it was optional required a lot of additional work, but was implemented long ago, and similarly-- the ability to relay a compact block without acquiring cs_main was first done in Fibre but has also been in bitcoin core for more than a year.

So if it's already a standard part of BIP152 deployment, why is it mentioned on that page?  Because the page pre-dates these updates.

No trust is required for full nodes to use this sort of optimization for the reasons explained in BIP152.

So again, what you are saying can't be done not only can be done but has been done for a long time already.


Quote
because you need the nodes to trust each other and do not revalidate every block in full, otherwise you would simply configure them to use BIP 152 in HB mode, no additives needed.
As pointed out above, BIP152 also relays blocks without validating them. To what BIP152 HB modes does-- fibre adds: Unconditionally never using any round trips, not in the protocol, not at the network layer-- even when the block contains transactions that the network has never seen before. It also results in packet loss not causing  any delay (other than the time until the next packet).

Quote
As I understand, you are speaking of FIBRE as a technology, and I'm seeing it as a work-around complementary service available for large pools and altruists to improve latency for themselves or for the public respectively.
Yes, you are confusing the fibre protocol with Matt's public relay network.

Quote
Quote
It would be perfectly possible to use FIBRE for block relay everywhere and abandon BIP152 just like we abandoned the original block relaying mechanism.
I doubt it. Actually I denounce it completely.
If you mean improving BIP 152, to use FEC on UDP and stuff like that, it is not FIBRE, it is FIBRE technology and I've no comments to make about its feasibility, just my blessings.

You don't get to decide what Fibre is, the people who made it do. Fibre is a protocol for relaying block without ever making even a single round trip, even in the presence of packet loss, which allows the receiver to recover the block as soon as they received only as much data as they were missing.  This is how Fibre has always been described.  Matt's public relay network is a service run by Matt which pre-dated fibre by many years and helped drive its development. We specifically introduced the name Fibre because people were confusing the earlier relay protocol used by Matt's public relay network with the relay network itself because the protocol didn't have a name (for a long time it was just called relay network protocol, though later I started calling it fast-block-relay-protocol). When we created Fibre we decided to fix that problem by making sure that the free standing nature of the protocol was more clear. It seemed to work more or less until recently, when an ICO and its perpetrators started promoting otherwise.

Don't fall for the re-contextualization of an ICO scam that wants to fraudulently call the innovative free and open protocol we invented for relaying blocks as fast as possible a "service" so that they can sell unsophisticated investors on investing in their competing service.

And as far as "no comments to make about its feasibility"-- you just wrote "but there is no way to go further than bip 152".  So, you do have comments-- but you're simply incorrect about it.
aliashraf
Legendary
*
Offline Offline

Activity: 1456
Merit: 1174

Always remember the cause!


View Profile WWW
December 27, 2018, 07:12:37 PM
Merited by ABCbits (1)
 #16

@gmaxwell,
I just realised your comment about BIP 152 and current bitcoin gossip protocol allowing for spv proved block relay and not banning peers if the relayed block turned out to be invalid, is correct, and I've been overlooking this fact in our debate. Actually I now remember my previous readings in this regard and I think it's Matt's fault Grin seriously, he has highlighted spv proof thing in a confusing way.

Anyway, you are right about Fibre not being equal with Matt relay network and it is possible to implement it directly in bitcoin relay network. Hereby I retract my statements that may have caused confusions in this regard.

About current bitcoin relay network, and it is what I prefer to discuss in details,  I doubt that adding broadband features like FEC over UDP to bitcoin (and not spontaneously in some relay networks) be of much help at least with current technology.

To be more specific: Packet lost is more probable in wireless/broadband communications and it is reasonable for a service like what BlockStream has launched to use FEC as extra information for preventing retransmissions (which is not possible in a unidirectional setup, anyway) but for tresterial links it is not a critical issue and just doesn't look to change things disruptively.

From this I conclude, we are not expecting a major event in networking layer in near future. Do you think otherwise?
gmaxwell
Moderator
Legendary
*
expert
Offline Offline

Activity: 4172
Merit: 8419



View Profile WWW
December 27, 2018, 08:34:23 PM
Merited by Foxpup (3), ABCbits (1)
 #17

To be more specific: Packet lost is more probable in wireless/broadband communications
On a good day transcontinental internet links lose more than 1% of packets. The way congestion control in TCP works essentially guarantees loss on links unless they are massively over-provisioned/under utilized-- packet loss is the mechenism the network uses to tell senders that the link is full-- especially since ECN is not ubiquitous. Periods of 3% loss are not uncommon on international links, even worse is also common in and out of China due to firewalling and weird national telecom policies.

To give some concrete examples: Akamai claims 1.94% packet loss from Atlanta to Amsterdam, 4.5% packet loss from Seattle to Sydney, 4.12% LA to Tokyo. (Akamai's numbers seem a bit high to me, but only a bit)

Without a roundtrip-less protocol, packet loss during block transmission requires a an additional delay of at least one round-trip (but with TCP the effect is even worse). Wireless wasn't a consideration in the design of Fibre.  Without handling this you cannot get the 99th percentile worldwide block transmission times under 450ms regardless of your link bandwidths. (or 95th percentile, per Akamai's loss figures).

The FEC in Fibre isn't used exclusively, or even, primarily to deal with packet loss-- in any case.  It's used to compensate for missing transactions without the sender knowing which transactions the receiver is missing.  Without doing this you cannot get the ~90th percentile world-wide block transmission times under 450ms and you cannot avoid massive propagation penalties when unexpected transactions are included.

The only reason that use of fibre wouldn't be a several times speedup in the 99th percentile worldwide block propagation is simply because it's already in use, and we gain most of the benefits from it even if only a few nodes in each geographic region use it since they'll blast through the lossy paths and nodes in their regions will simply get the blocks from the places to that have it.

This is important for mitigating harm from mining centralization because a centralized miner can achieve equally fast transmission for their own blocks using simpler protocols because they know what transactions they'll include in advance and can extend their own next block without waiting for propagation.

Quote
We are not expecting a major event in networking layer in near future. Do you think otherwise?
I know otherwise, since we are working to replace transaction rumouring with a reconciliation protocol, which will cut tx rumouring bandwidth usage on the order of 40x. After that's complete we'll probably work on integrating fibre for the above reasons.

We're also still actively merging improvements that have first been deployed in the fibre implementation, e.g. supporting multiple outstanding concurrent getblocktxn decodes for compact blocks.
aliashraf
Legendary
*
Offline Offline

Activity: 1456
Merit: 1174

Always remember the cause!


View Profile WWW
December 27, 2018, 09:43:33 PM
 #18

The FEC in Fibre isn't used exclusively, or even, primarily to deal with packet loss-- in any case.  It's used to compensate for missing transactions without the sender knowing which transactions the receiver is missing.  Without doing this you cannot get the ~90th percentile world-wide block transmission times under 450ms and you cannot avoid massive propagation penalties when unexpected transactions are included.
I've become somewhat confused by this comment. The way Matt puts it, FIBRE uses UDP and streaming-like FEC extra bits to tackle network layer (IP) packet lost. Now you are speaking of application layer (bitcoin) extra information requirements for pushing txn data. What am I missing here?

I'll come to other points later.
gmaxwell
Moderator
Legendary
*
expert
Offline Offline

Activity: 4172
Merit: 8419



View Profile WWW
December 27, 2018, 11:45:53 PM
Last edit: December 28, 2018, 12:22:59 AM by gmaxwell
Merited by Foxpup (5), ABCbits (1), bones261 (1)
 #19

FIBRE uses UDP and streaming-like FEC extra bits to tackle network layer (IP) packet lost. Now you are speaking of application layer (bitcoin) extra information requirements for pushing txn data. What am I missing here?
The whole idea essentially! Smiley   Don't worry, you're in common company.

Fibre never sends the block data itself. Not a single blinking packet of it.

Many people get stuck thinking that it sends the block and then additional correction data to recover from packet losses, because they've heard some description of how FEC is used elsewhere but Fibre doesn't work like other things.  This misunderstanding gets in the way of actually understanding what it does.

The way fibre works is that it sends only correction data.  The receiver uses the correction data to correct holes in its knowledge. The holes come from transactions in the block that weren't known to the receiver-- not from packet loss. In technical language: we construct a perfectly efficient list reconciliation protocol out of a non-systematic erasure code.

Say a block packetizes out to 1300 packets.  Because of missing transactions in your mempool you only have 1297 packets worth of transaction data from this block. Fibre sends you a compact block and additional packets. As the additional packets come in you also stream them to your fibre peers, and as soon has you have received _three_ of those packets (1300 - 1297 = 3) you will reconstruct the block, you will tell your fibre peers to stop sending data for that block and you will start relaying the block to your non-fibre peers via BIP152.  At the same time, you'll start generating novel correction data on your own, and streaming it to the fibre peers that haven't told you that they had enough.

If two fibre peers get a block at the ~same time, they will both start streaming it at once.  You only need enough packets from all peers combined to recover your block: so in the above, packet one and two might have been generated by peer A, and three might have been originally generated by peer B.  

When a source generates packets it round-robbins them out to its peers, and those peers send them to each other as they come in. So imagine five fibre speaking nodes A, B, C, D, E.  A is originating the block.  B, C, D are missing three packets and E is missing four.  After A has sent three packets out its network interface  B, C, D will all recover the block because they shared the data with each other. E will recover once it gets a forth packet which either comes from A or could be generated by any of B/C/D because having recovered the block they can generate more novel packets on their own. This makes good use of the full bisection bandwidth of the network rather than being bottlenecked by the outbound bandwidth of the source(s).

Even if you knew nothing (your mempool was empty or the block contained no previously relayed transactions)--  you'll recover the block once 1300 packets (again, from any source) come in. It doesn't matter which packets come in: each contributes equally to your efforts to recover the missing data. You never need to make a roundtrip to recover missing data no matter what is missing or how much: this is critical because a link around the earth can have a RTT of hundreds of milliseconds.

In this protocol packet loss is pretty much irrelevant-- it's not that it "repairs it" so much as it never matters to Fibre in the first place.  If you need six packets, you don't care if you get packets 1, 2, 3, 4, 5, 6 or 2, 6, 4, 8, 10, 12 or 4, 8, 15, 16, 23 and 42... you just need six in that case, any six, from any source in any order because that's how much you were didn't know in advance before the block was even created. The only effect packet loss has is that if packet 4 is sent 0.01ms later than packet 3 and you needed  packets to recover and got 1 and 2 but packet 3 got lost along the way your recovery will end up happening 0.01ms later than it would have absent the loss when packet 4 comes in. -- unlike ordinary transmission where it would happen hundreds of milliseconds later after you've requested and received a retransmission. If you only needed two packets due to knowing more of the transactions in advance then you'd finish after 1, 2 came in.

If your link is momentarily interrupted and you miss the first 1400 packets sent, you'll recover from the next N when the link comes back up-- however much you were missing.  Fibre keeps sending until the peer says it has enough-- effectively it's a rateless code (well, technically the implementation defaults to a maximum of eight times the size of the block, just simply to avoid wasting endless amounts of bandwidth on a peer that has gone offline is will never say stop; on blocksat-- where there is no back-channel-- it's configured to send 2x the block size + it repeats data on a ~24 hour loop as excess capacity is available).

I've left out a bunch of fine details to capture the basic understanding (e.g. the correction doesn't really need to work on a whole packet basis, it's just easier to explain that way.)

UDP is used so that TCP's loss correction retransmissions, reordering, and rate control don't get in the way.
aliashraf
Legendary
*
Offline Offline

Activity: 1456
Merit: 1174

Always remember the cause!


View Profile WWW
December 28, 2018, 02:20:34 PM
Last edit: December 28, 2018, 02:55:43 PM by aliashraf
 #20

@gmaxwell
You did great job describing FIBRE, I appreciate that, your dedication is very impressive Smiley

As I could understand from your explanation, FIBRE is an application layer implementation of FEC that uses an erasure code algorithm, i.e. bit erasures instead of bit errors. Essentially, the receiver treats missed txns as lost packets and known txns as already received packets (their encoded versions) which is very smart.

I definitely need to have more specific information about the erasure code algorithm used for FEC in FIBRE. A block coding algorithm lets sequential pre-computation of code words using pre-knowledge of txns, which is essential for our purpose but the blocks are too local and don't carry message wide information, you decompose the message to adjacent segments and encode each segment independently which is not helpful in recovering long distant errors. A convolutional coding method on the other hand is not adequate for pre-computation of code words and blocks by the receiver while being good for long distance errors.

In spite of above confusion I've tried to do a storyboarding of block relay in FIBRE as follows:

Upon reception of the compact block (typically consisted of the header, coinbase txn and ids of block txns),  the receiver uses its pre-knowledge of txns to build the message (block) in its entirety.
In the very common case of missing txns:
A- The receiver constructs/guesses the sequence of encoded message fragments (code words) using its partial pre-knowledge of embedded txns which ends to realising that there are m missing fragments where it suffices to get m' distinct ones to recover the block and m' < m for being able to act as a transmitter for other peers, besides relaying the receiving stream meanwhile.

B- The receiver initiates the protocol by requesting encoded data (the sequence of code words/packets) from peer(s).

C- The sender(s) start transmitting datagrams containing the fragments of encoded message.

D- Because of the erasure code FEC used, the receiver would be able to decode/recover the message at the very moment it has successfully received m' out of m missing code words.

E- Once it is ready to decode the whole message and fill the gap in its knowledge about the block, the receiver interrupts and ends the session with the sender while continuing block relay to other peers.

I'm curious how much it could be considered an exact illustration but I'm not personally satisfied before being informed about the exact coding schema.

For now:
1-I suppose packet loss concept in networking/transportation layer is radically different with its counterpart in the context of this discussion, missing txns. So, I don't see enough reason to go for UDP, the way BlueMatt highlights it, It is happening in application layer and it would be easy to push reasonably sized messages using TCP and the same connection that we send original compact block with. For  raw multi kB compact blocks, TCP is hell of a reliable transport protocol.
As for reordering and  retransmission in TCP, I think they are neglectable compared to the overhead of missing txns.

2- I think non-systemic nature of the coding, implies frequently used encoding/decoding that are good candidates for hardware acceleration (like what happens for DVB), if I knew about the exact FEC algorithms or at least its class, I could discuss this issue a bit more but it is worth mentioning that such a possibility is an encouraging factor for porting the protocol to UDP.


P.S.
I'm bad boy Tongue
I deliberately pushed you to tell us more about how promising is the future of latency and propagation delay control and you did it. Now you are exhausted but I'm happy. You know how enthusiast I'm about on-chain scaling opportunities and it is very promising that you guys are working on even more improvements after bip 152, I'm happy as I said.  Smiley
gmaxwell
Moderator
Legendary
*
expert
Offline Offline

Activity: 4172
Merit: 8419



View Profile WWW
December 28, 2018, 03:16:49 PM
Merited by ABCbits (1)
 #21

A block coding algorithm lets sequential pre-computation of code words using pre-knowledge of txns, which is essential for our purpose but the blocks are too local and don't carry message wide information, you decompose the message to adjacent segments and encode each segment independently which is not helpful in recovering long distant errors.
The block for error correction  is the entire block, so it is entirely non-local. You normally hear about error coding using small blocks because when you are correcting arbitrary errors and not just erasures the computation for decoding becomes prohibitive for coding large amounts at once. But we only need to handle erasures, so that makes it tractable to code everything at once.

Quote
Upon reception of the compact block (typically consisted of the header, coinbase txn and ids of block txns),  the receiver uses its pre-knowledge of txns to build the message (block) in its entirety.
In Fibre the short-ids are somewhat different from compact blocks because they also include transaction lengths. These are needed so it knows how big the holes for the missing transactions are.

Quote
In the very common case of missing txns:
A- The receiver constructs/guesses the sequence of encoded message fragments (code words) using its partial pre-knowledge of embedded txns which ends to realising that there are m missing fragments where it suffices to get m' distinct ones to recover the block and m' < m for being able to act as a transmitter for other peers, besides relaying the receiving stream meanwhile.
Yep. well m' = m: it needs to get as many fragments as are missing.   The block, broken into fragments is n fragments long. The rx is missing m fragments, m <= n (obviously), and can decode once it has received any m correction fragments sent by the sender.

Quote
B- The receiver initiates the protocol by requesting encoded data (the sequence of code words/packets) from peer(s).
Not quite-- that would require a round trip, which is fatal in terms of delay. Bam several hundred milliseconds out the door for the request to reach the sender and for the sender's reply to get back.

So instead the sender immediately sends encoded data when it gets a block, and the receiver tells it to stop when it has had enough. If the receiver was missing nothing, it'll just get some fragments it doesn't need which were sent during the time it took the sender to hear its stop request. It just throws them out (or relays the on to other peers that haven't said stop, then throws them out).

Otherwise your description seems okay.

Quote
original compact block with. For  raw multi kB compact blocks, TCP is hell of a reliable transport protocol.
As for reordering and  retransmission in TCP, I think they are neglectable compare to the overhead of missing txns.
TCP is a hell of a slow transport over long distance links.  As I pointed out, on international internet links there is typically 1%+ packet loss (akamai says more, but lets use 1% for discussion).  If TCP were used and even one packet were dropped then WHAM the connection is stalled for hundreds of milliseconds waiting for a retransmission, and further the transmission speed it cut down until the window regrows. Prior measurements before fibre and such showed that the effective block transmission speed between mining pools was about 750kbit/sec.

TCP's reliability only hurts and doesn't help-- the fibre recovery doesn't care which of the sent fragments it gets, it just needs enough of them... so pausing the whole stream just to retransmit a lost one doesn't make much sense.

Missed transactions and lost packets on long links happen at about similar rates-- 1%-10%. But regardless, if TCP were used every time there was a dropped packet the transmission would stall, so they couldn't be disregarded unless they were very rare.

Quote
2- I think non-systemic nature of the coding, implies frequently used encoding/decoding that are good candidates for hardware acceleration (like what happens for DVB),
The code we are using was designed specially to be very very fast on general purpose hardware. Almost all its decoding inner work is accomplished via SIMD xors. This was a major barrier in making fibre realistic after I originally proposed the general idea back in 2013-- a naive implementation took too long to decode. Above I say that the decode happens in in the speed of light delays plus 10ms or so, much of that 10ms is the FEC decode (obviously if nothing was missing, no FEC decode was needed). The FEC decode is also faster in the common case that relatively few fragments are missing.



aliashraf
Legendary
*
Offline Offline

Activity: 1456
Merit: 1174

Always remember the cause!


View Profile WWW
December 28, 2018, 05:18:40 PM
Last edit: December 28, 2018, 06:23:34 PM by aliashraf
 #22

A block coding algorithm lets sequential pre-computation of code words using pre-knowledge of txns, which is essential for our purpose but the blocks are too local and don't carry message wide information, you decompose the message to adjacent segments and encode each segment independently which is not helpful in recovering long distant errors.
The block for error correction  is the entire block, so it is entirely non-local. You normally hear about error coding using small blocks because when you are correcting arbitrary errors and not just erasures the computation for decoding becomes prohibitive for coding large amounts at once. But we only need to handle erasures, so that makes it tractable to code everything at once.
But coding everything at once requires total knowledge of the message, which is not guaranteed for the receiver. Hence his partial knowledge becomes useless.

Actually, Iranian scientist, Amin Shokrollahi has invented Raptor codes with excellent property of being linear time in encoding/decoding. Raptor codes are the only known linear time codings in Fountain class coding schemas, they are widely used in the industry and I think one flavor of this codes, online codes, seems to be pretty much adequate for what we are discussing.

Raptor Codes in general and Online Codes specially, are important because of their linear complexity, Online Codes is described by its inventor, Petar Maymounkov as
Quote
from: https://cs.nyu.edu/media/publications/TR2002-833.pdf (Extended abstract of original paper)
a class of near-optimal codes for a very general loss channel
which we call the free channel. Online codes are linear encoding/decoding time codes, based
on sparse bipartite graphs, similar to Tornado codes, with a couple of novel properties: local
encode-ability
and rateless-ness. Local encode-ability is the property that each block of the encoding
of a message can be computed independently from the others in constant time
. This also implies
that each encoding block is only dependent on a constant-sized part of the message and a few
preprocessed bits. Rateless-ness is the property that each message has an encoding of practically
infinite size.

I strongly recommend, taking a closer look on this class of FEC erasure codings.

The thing is, I don't know which FEC coding algorithm is used or intended to be used by you guys. Would you please enlighten me about it?

Quote
Quote
Upon reception of the compact block (typically consisted of the header, coinbase txn and ids of block txns),  the receiver uses its pre-knowledge of txns to build the message (block) in its entirety.
In Fibre the short-ids are somewhat different from compact blocks because they also include transaction lengths. These are needed so it knows how big the holes for the missing transactions are.
I knew it! The first thing came to my mind was the absolute necessity of  txn length information, still Matt says BIP 152 and compact blocks was intentionally designed to be compatible with FIBRE  Roll Eyes

gmaxwell
Moderator
Legendary
*
expert
Offline Offline

Activity: 4172
Merit: 8419



View Profile WWW
December 28, 2018, 06:08:08 PM
Last edit: December 28, 2018, 06:42:29 PM by gmaxwell
 #23

But coding everything at once requires total knowledge of the message, which is not guaranteed for the receiver. Hence his partial knowledge becomes useless.
It is guarenteed for the sender, which is the party doing the coding.  The reciever will know the total message when he is done decoding.

Quote
Raptor codes with excellent property of being linear time in encoding/decoding. Raptor codes are the only known linear time codings in Fountain class coding schemas,
The code we use (Feel free to checkout the code base) is functionally similar (it is a linear time to decode quasi-fountain code) but without the obvious patent problems.

Careful though: "linear in time" does not necessarily mean fast.  

There is now also a plain 16-bit reed solomon implementation which is _almost_ as fast through use of FFT based methods.  I expect that when we make fibre-like forwarding a part of bitcoin core we'll use an RS code since it's just so much easier to specify, also more likely to get even faster still on future hardware... also easy to hardware accelerate. It's also much easier to make a functional but slow implementation (part of why it would be easier to specify).

For the purpose of understanding fibre, it's sufficient to imagine that we already made that change and it's coding-- say, a 2-million byte block in to 3906  fragments each of which is 512-bytes long contains 256 16-bit RS code symbols for 256 parallel RS codes that are  each 3906 symbols long, spanning the whole block.

The primary functional difference between that and what's actually in use is that would be limited to sending ~4x the size of the block in data without repetition... and whats actually in use has a decoder that is maybe 10% faster than the state of the art decoder for a 16-bit RS code of that size.

[I originally proposed the idea using RS codes back in 2013, but back then the fastest suitable implementation we had was >10x slower... which made it uninteresting.]

That understanding lets you see that e.g. a missing transaction cause a need for reconstruction data equal to its own size plus zero to two more fragments, depending on how the txn aligned to the fragments.  There is a trade-off between fragment size and decode time... smaller fragments would cause less loss overhead but take longer to decode. The current parameters in fibre were just tuned to give the minimum high-Nth percentile total decode delay.

An earlier version of the fibre protocol used the lengths to reorder transactions for coding purposes, so that they aligned much more often with fragment boundaries.  But it turned out the extra copies required to reorder the block added more delay than the alignment saved, so it was dropped.

Quote
they are widely used in the industry
Actually, I was unable to find pretty much any production systems using fountaining codes. That's probably part of the reason for the relative lack of existence of actually fast code implementations vs theoretically fast ones that aren't.  [Not that it really matters, but this delayed the deployment of this class of technique by a couple years.]
aliashraf
Legendary
*
Offline Offline

Activity: 1456
Merit: 1174

Always remember the cause!


View Profile WWW
December 28, 2018, 07:54:27 PM
 #24

Careful though: "linear in time" does not necessarily mean fast.  
For Tornado, Raptor and Online codes, time linearity comes with no hidden big constant, hence they are theoretically most optimized algorithms available nowadays.

Quote
There is now also a plain 16-bit reed solomon implementation which is _almost_ as fast through use of FFT based methods.  I expect that when we make fibre-like forwarding a part of bitcoin core we'll use an RS code since it's just so much easier to specify, also more likely to get even faster still on future hardware... also easy to hardware accelerate. It's also much easier to make a functional but slow implementation (part of why it would be easier to specify).
Reed Solomon coding suffers from two weaknesses:
1) It is O(n logn) in  encoding/decoding.
 
2) Along with many other algorithms, it is based on assumptions about the probability of loss, δ, to perform optimal. Such assumptions for a Coupon Collector's problem like missing txns in bitcoin, seems to be problematic.

For instance, suppose you want to tune RS parameters to support a 10% loss channel, i.e. a channel with 90% capacity,  you need to choose n,k such that R=k/n <= 0.9 now even if you choose them to be like (8,10) you have definitely high risks of not being able to recover original message from 8 out of 10 encoded blocks in a 30% noisy day.

In bitcoin block relay problem having no assumption about δ looks more decent which is achievable by rateless-ness feature of Online Codes and its support of general loss channels also called free channels.

Quote
For the purpose of understanding fibre, it's sufficient to imagine that we already made that change and it's coding-- say, a 2-million byte block in to 3906  fragments each of which is 512-bytes long contains 256 16-bit RS code symbols for 256 parallel RS codes that are  each 3906 symbols long, spanning the whole block.
I'll do my best to decode your numbers and the source code for further contributions.


As of Raptor, not being used in the industry, you are correct, I mixed it up with something else,  but a strength for this class of coding is the extensive efforts Shokrollahi has dedicated to standardization and this has been widely recognized, such that he has received
Quote
the IEEE Eric E. Sumner Award in 2007 together with Michael Luby "for bridging mathematics, Internet design and mobile broadcasting as well as successful standardization"[4] and the IEEE Richard W. Hamming Medal in 2012 together with Michael Luby "for the conception, development, and analysis of practical rateless codes"
gmaxwell
Moderator
Legendary
*
expert
Offline Offline

Activity: 4172
Merit: 8419



View Profile WWW
December 28, 2018, 09:22:25 PM
Merited by Foxpup (8), danda (2), ABCbits (1), aliashraf (1), DaCryptoRaccoon (1)
 #25

For Tornado, Raptor and Online codes, time linearity comes with no hidden big constant, hence they are theoretically most optimized algorithms available nowadays.
And existing implementations that I've seen are not just several times slower than the code we use, but also several times slower than a 16-bit RS code that is implemented very well at least for message sizes of interest to us.

It sounds are parroting stuff on the internet which is written by solution partisans and by people without a lot of practical experience-- without experience with any of it yourself.  We tried these things. They were slow. Feel free to knock yourself out looking into them further, I agree they are very interesting.

On a modern computer a "fast algorithm" that cannot make effective use of SIMD will lose access to (say) 93% percent of the cpu's peak throughput.   Turns out that can hide a pretty big log n factor! Smiley    One thing to keep in mind when looking at algorithm asymptotics is that it is often reasonable to read "log n" as "a small constant" because no problem that fits in memory on your target computer will make log n larger than (say) 30... and often (but not always), any problem that doesn't fit in memory will have its time utterly dominated by IO costs and so then cache is critical which asympotics say nothing about. (Pealing decoders, in particular, have poor cache locality and poor branch prediction friendliness).

It's very easy to have a factor of thirty difference between implementations. So it's not especially surprising to see an O(N log N) algorithm matching an O(N) one... even on large inputs.  I've also found that it's not that uncommon for N^2 algorithms on realistic problems to beat N log N ones in practice (often due to cache localities), and found it to be fairly common for O(N) algorithms to beat log(N) ones, for problems small enough to fit in L3 cache (usually due to branch prediction and parallizability).

In any case, for the workload presented by fibre the fastest code implemented for it that we've been able to find is a mixed density quasi random graph code... and fairly close nearest runner up is an FFT using 16-bit RS code (not coincidentally written by the same author).

Quote
For instance, suppose you want to tune RS parameters to support a 10% loss channel, i.e. a channel with 90% capacity,  you need to choose n,k such that R=k/n <= 0.9 now even if you choose them to be like (8,10) you have definitely high risks of not being able to recover original message from 8 out of 10 encoded blocks in a 30% noisy day.

In bitcoin block relay problem having no assumption about δ looks more decent which is achievable by rateless-ness feature of Online Codes

You can just as well use one "ratelessly" by sending as much as is needed: you just end up with a minimum bound on the rate that arises out of the size of the RS symbol size, you can only generate so many non-duplicate symbols.  There is no overhead created from doing this (and in fact all other options are not MDS codes, so they have overhead, though e.g. for ours the probability of a failed decode when you have enough data is 0.03^(extrapackets+1), so negligible but technically it's overhead).

Say you have 512 bits you want to send. You can code it with 32 symbols in a 16-bit RS code then send up to 65535 symbols (128kb) (or 65503 non-systematic symbols) and the receiver will recover once they've received any 32 of them.  So this code works for a rate of 1.0 to 0.00048.  A true rateless code could drive the rate arbitrarily close to zero:  But for our application mempool losses alone mean we never need a code with a rate of less than 0.5 for mempool losses.

A code with a minimum rate of 0.00048 is sufficient with 100% of the mempool missed for up to _99.95%_ packet loss.

While "we will not simultaneously lose more than 80% of all packets and have no mempool hits" is not technically "no assumption" it's close enough for engineering purposes: at 80% loss the FEC is not really the system performance limiting consideration in any case (the limiting one is that any link that is currently losing 80% packets is very likely about to start losing 100% Smiley ). If you have more than 80% packet loss, you have worse things to worry about, also if you have 80% packet loss the efficiency loss from repeating packets is not particularly substantial (since you're losing most packets, its unlikely that you received any particular duplicate), so if you really did want a lower rate code you could just start duplicating packets.   Aside, a couple messages up you were also trying to argue that packet loss wasn't relevant. Smiley

In practice, because we don't want to waste unbounded amounts of bandwidth on a down peer fibre currently only sends up to 8x the size of the block so we never use a rate less than 0.125.  In practice, even under extremely high packet loss a rate of 0.333 is more than low enough to accommodate the loss and no mempool hits at all.  The mean rate we need for decode is like 0.94 since missing tx data and packet loss are about 6% or so.

Quote
As of Raptor, not being used in the industry, you are correct, I mixed it up with something else,
I expect that we won't see those codes widely deployed until after the patents expire. There are few enough applications where actual ratelessness is all that useful compared to just wide rate flexiblity that you can achieve with many fast well understood codes-- especially by using a product of two codes (like two layers of RS codes, or RS+duplication code like I mentioned above).

aliashraf
Legendary
*
Offline Offline

Activity: 1456
Merit: 1174

Always remember the cause!


View Profile WWW
December 28, 2018, 11:08:54 PM
 #26

Greg,
Your latest replies totally convinced me that your approach to  block relay is professional and thoughtful. I needed to remind it verbally, because unfortunately I'm short of sendable merits right now  Smiley

I need more time to do my own research both on different options available and FIBRE code, in case of any interesting finding, I won't hesitate to share.
Pages: 1 2 [All]
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!