Multi-language consensus library

TierNolan (OP)

Legendary

Offline

Activity: 1232
Merit: 1083

Multi-language consensus library

February 17, 2015, 01:05:41 PM

#1

The "consensus" critical part of bitcoin is being split into a separate library.

This is (will be) the part of the code that checks if blocks are valid. The idea is that any new versions of bitcoin will use this library. If no changes are made to the library, then the block validation rules will not change.

More than one release of bitcoin might end up using the library.

This has the nice feature that any updates to the consensus rules would only have to change that library (soft forks at least).

In addition, other client implementations could include this library. This would give very high likelihood that the alternative client would be consensus compatible with the reference client.

If they linked dynamically, users could simply download the new consensus .dll, but that might be a little to easy.

However, not all clients are written in c++.

Auto-generating the source code for other languages from the c++ code would be a way around this.

Ideally, the consensus library would use code that is simple. The alternative language source code would have a warning at the top of each file that it must not be manually modified.

When a new version of the consensus library is released, an official implementation in a few other languages could be produced (or maybe some kind of shared file; .dll, .so etc).

The conversion tool could be kept simpler, if the reference source code was kept simpler.

The API for the library would also have to be portable. The easiest would be to restrict it to primitive types and byte arrays. Structs and/or classes might be definable in a portable way for the converter. The reference source code might not even be in a specific programming language, and even the c++ source code would be generated from it.

1LxbG5cKXzTwZg9mjL3gaRE835uNQEteWF

instagibbs

Member

Offline

Activity: 114
Merit: 12

Re: Multi-language consensus library

February 17, 2015, 02:02:10 PM

#2

For clarification: Do you mean port the GUI/wallet parts into different languages?

Porting consensus code would defeat the whole purpose.

TierNolan (OP)

Legendary

Offline

Activity: 1232
Merit: 1083

Re: Multi-language consensus library

February 17, 2015, 02:10:57 PM

#3

I was thinking that the consensus code would be generated from a single reference (probably in c++).

If the API for the library used byte arrays and primitives, then auto-conversion would be pretty easy. The key would be using a sub-set of c++ to keep things easy to convert.

A python implementation (like say Armoury) could use the python version of the library.

Complex crypt functions would have to be implemented differently, but sanity checks could still be performed in the reference code.

For example, a signature's encoding could be checked in the translated code, and then passed to the underlying crypt library. This would reduce the odds of differences in the crypt library's signature encoding rules causing a difference between the 2 implementations.

A change to the reference would automatically update all the alternative library versions.

1LxbG5cKXzTwZg9mjL3gaRE835uNQEteWF

ScripterRon

Full Member

Offline

Activity: 136
Merit: 120

Re: Multi-language consensus library

February 17, 2015, 02:59:06 PM

#4

I'm in the process of changing my Java node to use the bitcoin consensus library. I'm using the Java Native Interface (JNI) to directly call the consensus library routines. So I can use the so/dll libraries shipped with Bitcoin Core 0.10.

There are just two routines in the library at this time: one returns the library version and the other verifies that a transaction correctly spends an output. The inputs are byte arrays and integers, so there shouldn't be a problem.

I do miss the extended error messages though. The consensus library returns a TRUE/FALSE indicator and nothing to say why it failed (the error code just applies to interface problems such as deserialization failure). It would be nice to get back a message describing exactly why the verification failed (bad public key, signature failure, etc).

TierNolan (OP)

Legendary

Offline

Activity: 1232
Merit: 1083

Re: Multi-language consensus library

February 17, 2015, 03:30:30 PM

#5

Quote from: ScripterRon on February 17, 2015, 02:59:06 PM

I'm in the process of changing my Java node to use the bitcoin consensus library. I'm using the Java Native Interface (JNI) to directly call the consensus library routines. So I can use the so/dll libraries shipped with Bitcoin Core 0.10.

That works too. It does mean that you can't do "pure" anything other than c++.

1LxbG5cKXzTwZg9mjL3gaRE835uNQEteWF

ScripterRon

Full Member

Offline

Activity: 136
Merit: 120

Re: Multi-language consensus library

February 17, 2015, 06:06:17 PM

#6

Quote from: TierNolan on February 17, 2015, 03:30:30 PM

That works too. It does mean that you can't do "pure" anything other than c++.

It depends on what you do in your JNI stub (the stub is what is loaded by the Java runtime). I simply turn around and call the consensus library routines after getting the appropriate C-style pointers to the Java data.

By the way, it seems to be working. I brought my node up on Windows and it successfully synchronized with the network, using the consensus library to verify block transactions. I need to compile the stub on Linux and then I can switch my VPS node to use the consensus library.

gmaxwell

Moderator
Legendary

Offline

Activity: 4158
Merit: 8382

Re: Multi-language consensus library

February 17, 2015, 08:42:35 PM

#7

Quote from: TierNolan on February 17, 2015, 01:05:41 PM

However, not all clients are written in c++.

Libconsensus is intentionally C callable, so it can be used from any language that can call an external library.

Beyond libconsensus there is the idea of reducing the consensus code to a bytecode with a trivial interpreter. We're not yet sure how well this will work, but it's something people are also working towards. Libconsensus is a necessary first step which is useful even if the bytecode path doesn't work out.

Mike Hearn

Legendary

Offline

Activity: 1526
Merit: 1129

Re: Multi-language consensus library

February 18, 2015, 11:47:36 AM

#8

ScripterRon, rather than use JNI you could look at JNA. It will let you eliminate all the C code from your project and just call into the library with a regular Java interface. It's much more convenient.

ScripterRon

Full Member

Offline

Activity: 136
Merit: 120

Re: Multi-language consensus library

February 18, 2015, 03:03:29 PM

#9

Quote from: Mike Hearn on February 18, 2015, 11:47:36 AM

ScripterRon, rather than use JNI you could look at JNA. It will let you eliminate all the C code from your project and just call into the library with a regular Java interface. It's much more convenient.

Thanks, Mike. I did look at JNA but I decided to stay with JNI since the interface stub is only a couple of lines of C code and I didn't want to add another dependency to my project. I was already familiar with JNI (I just finished writing native hash routines for a Nxt project).

DeathAndTaxes

Donator
Legendary

Offline

Activity: 1218
Merit: 1079

Gerald Davis

Re: Multi-language consensus library

February 18, 2015, 07:11:05 PM

#10

Quote from: gmaxwell on February 17, 2015, 08:42:35 PM

Quote from: TierNolan on February 17, 2015, 01:05:41 PM

However, not all clients are written in c++.

Libconsensus is intentionally C callable, so it can be used from any language that can call an external library.

Are there any languages which can't call a static external c library? I think this is a solid solution and one of the things I am excited about in the latest release. To my knowledge C# (.net), Java, go, and python all support calling c libraries. Maybe we can put together some requirements (data types, etc) to ensure the library remains easily callable in a variety of languages. I hope to see libconsensus expanded significantly in the future. It is the first step forward in ensuring the safe development of alternative full nodes.

Anyone know if bitcoinj and other libraries intend to integrate libconsensus?

Quote

Beyond libconsensus there is the idea of reducing the consensus code to a bytecode with a trivial interpreter. We're not yet sure how well this will work, but it's something people are also working towards. Libconsensus is a necessary first step which is useful even if the bytecode path doesn't work out.

Interesting. Do you have any links?

gmaxwell

Moderator
Legendary

Offline

Activity: 4158
Merit: 8382

Re: Multi-language consensus library

February 19, 2015, 02:08:39 AM

#11

Quote from: DeathAndTaxes on February 18, 2015, 07:11:05 PM

Are there any languages which can't call a static external c library? I think this is a solid solution and one of the things I am excited about in the latest release. To my knowledge C# (.net), Java, go, and python all support calling c libraries. Maybe we can put together some requirements (data types, etc) to ensure the library remains easily callable in a variety of languages. I hope to see libconsensus expanded significantly in the future. It is the first step forward in ensuring the safe development of alternative full nodes.

Kinda. There are hosting providers that will only allow you to run code written in some trendy language or another, with no native code libraries (I don't know the details as to why). There have been some large and high profile bitcoin services running in those hosting enviroments and thus "unable" to run native code, and thus very interested in complete reimplementation in other languages. I don't know what amount of relevance that kind of motivation will have in the future.

Quote

Anyone know if bitcoinj and other libraries intend to integrate libconsensus?

I think it's really too immature to say right now. At the moment it's just script.

Quote

Beyond libconsensus there is the idea of reducing the consensus code to a bytecode with a trivial interpreter. We're not yet sure how well this will work, but it's something people are also working towards. Libconsensus is a necessary first step which is useful even if the bytecode path doesn't work out.

Interesting. Do you have any links?

[/quote]It's mostly been IRC discussion over the last couple years-- it's a pretty low priority effort, esp since libconsensus is a hard prereq as it's unreasonable to put a whole implementation in a slow bytecode, so first the consensus parts must be completely isolated into parts with limited interaction. There has been some experiment work which has had some payoffs, e.g. http://moxielogic.org/blog/real-world-multiply.html.

The idea is simply enough you can create a C-targetable load/store machine instruction set which can be run with a <1000 line switch statement (moxie is such an example), one which is simple enough to formally specify and even prove multiple distinct implementations mach the specification. The consensus code just gets compiled to a bytecode and then everyone can use the same bytecode. The challenge is that a simple machine has performance that may be unacceptably low, adding a general JIT like things to your VM has insane risks and makes it much harder to reason about or implement exactly. One possible solution to that is extending the architecture to add some crypto blocks similar to how many embedded processors have multimedia accelerators-- macroscopic hardcoded units that do things like perform a whole 8x8 dct-- e.g. so the instruction set is a big switch statement with a small amount of special case handling does things like compute sha256 with native code. It's relatively easy to be quite confident that an implementation of sha256's compression function is correct... other crypto implementations, less so. Hopefully its possible to add just enough native accelerators to get acceptable performance without greatly increasing the implementation complexity/risky. Otherwise the pure bytecode approach will be slow enough that people would either JIT it or replace it with a native implementation and defeat the safety gains.

grau

Hero Member

Offline

Activity: 836
Merit: 1021

bits of proof

Re: Multi-language consensus library

February 19, 2015, 10:28:36 AM

#12

Byte code compiled binaries for JVM proved to protect investment in enterprise software against hardware and operating system changes for the last two decades. The cost of dealing with incompatibilities at every operating system or hardware change is just prohibitive for those whose primary business is not software development. Non-IT enterprises do not want machine code compiled artefacts in their business processes. (Actually they prefer not even have any processes in house, but SaaS.) Services targeting enterprise software deployment consequently cased to support native code.

Bitcoin consensus algorithm will likely be considered as a specialized commodity process in enterprises, like a database server, and used in native compiled from a trusted distributor.

Dealing with consenus accepted or newly created transactions (composing, signing, parsing) however will become part of the business process therefore preferred to be in JVM byte code. This is where an alternate implementation is the right choice today and even in the future, maybe until we have a C++ to JVM compiled artifact with rich features.

For those who want a second opinion of the core consensus library now, e.g. before broadcasting self composed a transaction, a language binding to consensus library might come handy. Therefore I initiated a Lighthouse project for Java language binding of the consensus library.

https://bitsofproof.com/?page_id=944

TierNolan (OP)

Legendary

Offline

Activity: 1232
Merit: 1083

Re: Multi-language consensus library

February 20, 2015, 03:13:34 PM

#13

Quote from: gmaxwell on February 19, 2015, 02:08:39 AM

The challenge is that a simple machine has performance that may be unacceptably low, adding a general JIT like things to your VM has insane risks and makes it much harder to reason about or implement exactly. One possible solution to that is extending the architecture to add some crypto blocks similar to how many embedded processors have multimedia accelerators-- macroscopic hardcoded units that do things like perform a whole 8x8 dct-- e.g. so the instruction set is a big switch statement with a small amount of special case handling does things like compute sha256 with native code.

A compromise would be to do both. The formal consensus definition could be in bytecode with certain functions being accelerated.

The VM would use the accelerator 99% of the time, but 1% of time, it would use the bytecode. Fraud (or coding error) proofs could be used to indicate when to use the non-accelerated code.

1LxbG5cKXzTwZg9mjL3gaRE835uNQEteWF