Hello everyone, my apologies for not quoting/following a specific post, I've read and enjoyed all, now I have something to share:
A tribute to redundancy
Any software is susceptible to containing bugs, lurking in the shadows, waiting for the show-time.
However, bugs are not magic, they can be identified and fixed through code review and testing, yet there is a trade-off between 1)the resources one dedicates to this process and the wasted opportunities resulting from delays in launching the code, and 2)the actual risks that are involved and have to be addressed. An infinite resource and time allocation, theoretically can address
almost all the risks, but it is not how we develop software in real world.
So, what alternatives do we have to deal with mission-critical software where failure is not tolerable? It is established practice: redundancy. Unlike what is trended here, I'm in support of redundant alternative clients.
As of the fears, uncertainties and doubts which Satoshi has triggered
about the possible chain splits, I'm not proposing it as a naive scenario in which competing clients are exposing the network to unintentional forks, etc. On the contrary, I think it was Satoshi who put bitcoin in bug-related risks by discouraging alternate implementations. Hence,
CVE-2018-17144 was his fault and not Peter Wuille's. Devs are entitled to make mistakes once or twice a year, it is the architect who is in charge of fault tolerance and risk management.
Obviously, for bitcoin as a new class of distributed systems, naively recommending multiple implementations is not reasonable, the same as unbelievable recommendation of one god, one client! As a more detailed assessment of the risks would reveal, there are 3 class of untolerable, high priority failures due to bug where non of them is truly addressed by neither of the two architectures under consideration. Let's take a look:
Risk #1: Unintentional forks happen with disastrous consequences, because of different nodes, implementing different clients or different versions.
Risk #2: Nodes explicitly and widely compromise official protocol specs, yet consensus is reached by a considerable portion of the network.
Risk #3: Healthy, protocol compliant nodes are not able to bootstrap as a result of mild versions of the above case, where the breach is considered to be tolerable, hence no roll-back or even bug-fix is applied.
Single (dominating) client architecture:#1- Is not fully adequate, even for risk #1 because of the versioning and slow adoption process. Despite huge dominance in the network, bitcoin development and versioning follows an exaggerated conservatism and undergoes an exhaustive dedication of time and resource in the upgrade process, a clear deviation of state of the art agile development methodologies with huge resource and opportunity lost consequences.
#2- Fails to meet any measure against this risk. The CVE-2018-17144 mentioned above, is the nearest case (yet not an exemplary one) where a bug can stay in shadows for long times, being inherited version by version, getting enough dominance in the network waiting for the show-time, usually the worst time ever.
#3- Can enforce circumventing measures like what is called "consensus bugs" by Core devs and BIPs are required to comply, a reasonable approach, I've to admit.
Naive multiple clients architecture: ( where we have different implementations installed by the nodes deliberately)
It suffers from the same problems as the single client model with different degrees of severeness, better for risk #2 and worse for #1, to be specific.
I'd suggest a third model: an
Enhanced Heterogeneous Multiple Clients Architecture, as follows:
1-We classify nodes into 2 categories, high-end nodes with more critical mission with regard to the owners' expectations and resource allocation (we expect them to be properly aligned, i.e., more expectations ---> stronger nodes installed).
2-We encourage/support multiple alternative clients to be developed by different parties, with diverse platforms.
3-We certify, introduce the client software after passing the appropriate checklists.
4-Implementations are supposed to meet a well-defined integration spec via RPC/API/..., for supporting a robust voting module, called before changing the state of the blockchain and (optionally) the mempool.
5-We recommend first class users to install 3 different clients of their choice, ordinary less critical users are free to choose one client of their choice,
As a result, bugs cause disagreements in high-end nodes which are resolved by voting, just triggering bug alerts, while the affected low-end clients are not supposed to cause catastrophic damages. Other than very high fault tolerance, it has an astonishingly huge boosting effect on development process.
AFAICT, there is no other comparable architecture for a system such as bitcoin while, interestingly, huge impacts on everything discussed here will be in the horizon, positive impacts, for sure. Bests of both worlds are achieved, and the network is guaranteed to have an astronomical level of fatal failure resistance. Use your own imagination