Bitcoin Forum
April 23, 2024, 06:05:06 PM *
News: Latest Bitcoin Core release: 27.0 [Torrent]
 
   Home   Help Search Login Register More  
Pages: 1 2 [All]
  Print  
Author Topic: Thoughts on type safety and crypto RNGs  (Read 3603 times)
Mike Hearn (OP)
Legendary
*
expert
Offline Offline

Activity: 1526
Merit: 1128


View Profile
December 11, 2014, 01:06:39 PM
Merited by vapourminer (1), ABCbits (1)
 #1

I wrote an article about some of the failures in wallet randomness we've seen in the past 12 months:

  https://medium.com/@octskyward/type-safety-and-rngs-40e3ec71ab3a

It's a 6 minute read, but the tl;dr summary is:

1) Find ways to make the type systems you are working with stronger, either through better tools or better languages

2) Try and get entropy as directly from the kernel as possible, bypassing userspace RNGs

I should practice what I preach - bitcoinj could be upgraded to use the Checker Framework for stricter type checking, and we currently only bypass the userspace RNG when Android is detected. I'll be looking at ways to make things stricter and more direct next year.
1713895506
Hero Member
*
Offline Offline

Posts: 1713895506

View Profile Personal Message (Offline)

Ignore
1713895506
Reply with quote  #2

1713895506
Report to moderator
1713895506
Hero Member
*
Offline Offline

Posts: 1713895506

View Profile Personal Message (Offline)

Ignore
1713895506
Reply with quote  #2

1713895506
Report to moderator
1713895506
Hero Member
*
Offline Offline

Posts: 1713895506

View Profile Personal Message (Offline)

Ignore
1713895506
Reply with quote  #2

1713895506
Report to moderator
Advertised sites are not endorsed by the Bitcoin Forum. They may be unsafe, untrustworthy, or illegal in your jurisdiction.
1713895506
Hero Member
*
Offline Offline

Posts: 1713895506

View Profile Personal Message (Offline)

Ignore
1713895506
Reply with quote  #2

1713895506
Report to moderator
1713895506
Hero Member
*
Offline Offline

Posts: 1713895506

View Profile Personal Message (Offline)

Ignore
1713895506
Reply with quote  #2

1713895506
Report to moderator
1713895506
Hero Member
*
Offline Offline

Posts: 1713895506

View Profile Personal Message (Offline)

Ignore
1713895506
Reply with quote  #2

1713895506
Report to moderator
bcearl
Full Member
***
Offline Offline

Activity: 168
Merit: 103



View Profile
December 11, 2014, 01:46:13 PM
 #2

You should not do crypto in JS or Java in the first place. In those languages, you do not have control about memory management. For example in JS, you have no control over how and were the browser stores your secret data (keys etc.). There is no way to enforce the physical deletion of private data.

Misspelling protects against dictionary attacks NOT
hexafraction
Sr. Member
****
Offline Offline

Activity: 392
Merit: 259

Tips welcomed: 1CF4GhXX1RhCaGzWztgE1YZZUcSpoqTbsJ


View Profile
December 11, 2014, 09:48:09 PM
 #3

You should not do crypto in JS or Java in the first place. In those languages, you do not have control about memory management. For example in JS, you have no control over how and were the browser stores your secret data (keys etc.). There is no way to enforce the physical deletion of private data.

Java allows very specific off-heap allocation on OpenJDK's VM, that allows for crypto data to live in a specific place in memory without fear of being copied by an eager GC, and to be erased from memory before deallocation. Netty also has some specific buffer types that are zero-copy for performance, that are useful even in non-network applications.

I have recently become active again after a long period of inactivity. Cryptographic proof that my account has not been compromised is available.
grau
Hero Member
*****
Offline Offline

Activity: 836
Merit: 1021


bits of proof


View Profile WWW
December 20, 2014, 09:21:04 PM
 #4

1) Find ways to make the type systems you are working with stronger, either through better tools or better languages

+1

unfortunatelly most crypto developer still build on their percieved superior programming skills, instead of using modern languages.

Most exploits arise from programming errors in low level weakly typed languages and not from those exotic "timing" and "memory" attacks that they use to justify their ancient tool set.
gmaxwell
Moderator
Legendary
*
expert
Offline Offline

Activity: 4158
Merit: 8382



View Profile WWW
December 20, 2014, 09:54:57 PM
 #5

in low level
Except the issues with poor cryptographic security Mike is talking about have only been observed-- so far-- in tools written in Java, Javascript, and Python in our ecosystem. None of these are low level languages.
grau
Hero Member
*****
Offline Offline

Activity: 836
Merit: 1021


bits of proof


View Profile WWW
December 21, 2014, 10:40:15 AM
Last edit: December 21, 2014, 11:39:42 AM by grau
 #6

in low level
Except the issues with poor cryptographic security Mike is talking about have only been observed-- so far-- in tools written in Java, Javascript, and Python in our ecosystem. None of these are low level languages.

You are right in that I should not have used "low level" for but type unsafe, since there are high level unsafe languages, like Javascript or Python. Java is type safer and Scala is even better, and that is what Mike said.

Added:
Not using compile time checks that type safety gives is pure arrogance.

BTW what about the heartbleed bug in SSL was it not in Bitcoin core?

Unfortunatelly you only use your intelligence to pinpoint inaccuracy in my sentences.
gmaxwell
Moderator
Legendary
*
expert
Offline Offline

Activity: 4158
Merit: 8382



View Profile WWW
December 21, 2014, 08:10:06 PM
 #7

BTW what about the heartbleed bug in SSL was it not in Bitcoin core?
It was an issue in OpenSSL (bitcoind doesn't expose SSL to the public in a default, or even sane, configuration at least).  Every other language also depends on system libraries too. So the language Bitcoin core was written in was irrelevant in this example.

Quote
Unfortunatelly you only use your intelligence to pinpoint inaccuracy in my sentences.
I'm sorry you feel that I'm nitpicking, but I'm not trying to.

So far our experience in this space is that there is more irresponsible and broken software written in higher level languages, there has been virtually no issues in this space from cryptographic weaknesses (or even conventional software security) in Bitcoin applications written in C / C++. I agree that sounds somewhat paradoxical... but it's not that shocking: The security of these systems depends on the finest details of the behaviour of each part of the software and the interactions, when your system obscures the details some extra work is required to review though the indirection. This somewhat offsets the gains. In cryptographic (and especially consensus) systems it's much harder to "fail safe" and a much wider spectrum of unexpected behaviour is actually bad and exploitable. Languages like Java make some kinds of errored software "more safe" when the software is incorret, but making software more correct is still something that is largely not reaching production industrial software development yet (languages with dependant types and facilities for formal analysis seem like they _may_ result in more correct software).  

There is no replacement for hard work and many view higher level languages as an escape from drudgery, so there may be some language selection bias from the attitude of the authors that has nothing to do with the language itself.  In any case, I think your barb was misplaced, at least in this thread: We've seen bad RNG behaviour from Java software several times, and not just in system libraries. (And not just RNG safety, also things like attempts at full node code being shattered by underlying crypto libraries bubbling up null pointer exceptions that cause false block rejections which would have created forks if it were widely used).

(I do agree though that using untyped languages is basically suicide for any, even moderately large, system where correctness matters.)
grau
Hero Member
*****
Offline Offline

Activity: 836
Merit: 1021


bits of proof


View Profile WWW
December 22, 2014, 08:17:05 AM
 #8

BTW what about the heartbleed bug in SSL was it not in Bitcoin core?
It was an issue in OpenSSL (bitcoind doesn't expose SSL to the public in a default, or even sane, configuration at least).  Every other language also depends on system libraries too. So the language Bitcoin core was written in was irrelevant in this example.

That bug in that library was exemplary for the potentially disasterous consequences of a weak memory model present in C and C++. It did not put Bitcoin at risk, but it likely did if the payment protocol had been in core already. The argument that the bug was in a library is weak and applies to the RNG problem we saw with Java on Android too. We have seen a very similar bad RNG problem in Debian Linux too written in C. Errors like those are not language specific, the consequence of the hearbleed bug however was. The bug itself was not such a desaster was it not paired with a weak memory model.

Bitcoin core can not change its technology as it would likely result in a hard fork between its older and newer versions. We can't touch Satoshi's bugs and should one of the used libraries blurp up or even store some junk, chances are good that those "features" have to be preserved.

On a side chain however the technology is not set in stone. Whatever features, even bugs an other tool and library set displays there defines the consensus of that side chain.

I am using Java and more recently Scala not just because they relieve me from some drudgery, but because their do help me to create more robust and correct programs. Ignoring major advances of computer science should be well justified. I see good reasons to stick with the tool set for Bitcoin core, but not around that. Higher level interfaces and new side chains need not to use the same hammer for all nails.

Mike gave good hints for the selection of new hammers, and that's I applauded.

Peter Todd
Legendary
*
expert
Offline Offline

Activity: 1120
Merit: 1149


View Profile
December 22, 2014, 03:34:17 PM
 #9

So far our experience in this space is that there is more irresponsible and broken software written in higher level languages, there has been virtually no issues in this space from cryptographic weaknesses (or even conventional software security) in Bitcoin applications written in C / C++.

That's an incredibly bold statement given that there's almost no-one writing Bitcoin applications in C / C++ with the exception of Bitcoin Core itself. Equally the demographics of people writing the tiny amount of C / C++ code out there is very different than the demographics writing in more modern languages.

Fact is right now we just can't say anything about what approach is better based solely on where the most bugs have been found; we can say other industries have consistently been moving away from C and to a lesser extent C++ due to the difficulty of writing secure code in those languages.

You're also conflating two separate problems. It may turn out that writing consensus-critical code in other languages is harder, but that's a very different problem than writing secure code in the more general sense. Equally it may turn out that better approaches to writing consensus-critical code are more important than what language you choose to write it in. But right now we just don't know.

Mike Hearn (OP)
Legendary
*
expert
Offline Offline

Activity: 1526
Merit: 1128


View Profile
December 22, 2014, 06:39:22 PM
 #10

I would say that we've got very lucky with respect to Bitcoin Core:  Satoshi was a very careful developer who knew C++ very well and maximised use of its features to increase safety. The developers who followed him are also very skilled, know C++ very well and know how to avoid the worst traps.

The main concern with Core is not that the code is insecure today, but what happens in the years to come. Will the people who follow Gavin, Pieter, Gregory etc be as good? What about alt coins? What if a refactoring or multi-threading of some performance bottleneck introduces a double free? Anyway, not much we can do about this except try and make the environment as safe as possible. I've made some suggestions on how to do this in the past (auto restart on crash, use Boehm GC) and normally Gregory likes to point out possible downsides Smiley but I'm not super comfortable relying on "don't make mistakes" as a policy over the long run.

WRT RNG issues in Java, I'm not aware of any beyond the Android bugs, which were very severe but didn't have anything to do with Java as a language or platform. If there have been issues in Java SE I don't recall hearing about them. Bypassing in-process RNGs is still a good idea though.

Quote
You should not do crypto in JS or Java in the first place. In those languages, you do not have control about memory management. For example in JS, you have no control over how and were the browser stores your secret data (keys etc.). There is no way to enforce the physical deletion of private data.

It's also true of C (e.g. AES keys can persist in XMM registers for a long time after use). Although hexafraction is right that on HotSpot you can do manual heap allocations, it doesn't matter much. If an attacker has complete access to your address space then this is so close to "game over" that it hardly makes any odds whether there are multiple copies in RAM. Even if the password isn't lying around, they can just wait until it is. I'm not a big fan of spending time trying to "clean" address spaces of passwords or keys.

Note that for core crypto, it's looking more and more like long term everything will have to be done in assembly anyway. Pain.
gmaxwell
Moderator
Legendary
*
expert
Offline Offline

Activity: 4158
Merit: 8382



View Profile WWW
December 22, 2014, 08:09:22 PM
Last edit: December 22, 2014, 08:46:24 PM by gmaxwell
 #11

I've made some suggestions on how to do this in the past (auto restart on crash, use Boehm GC)
Our process is not "don't make mistakes", Bitcoin Core largely uses a safer subset of C++ that structurally prevents certain kinds of errors (assuming the subset is followed, we don't have any mechanical enforcement).  I don't believe anyone writing or reviewing code for the project would describe things primary safety strategy as coming from "don't make mistakes", not with the level of review and the general avoidance of riskier techniques.

Though even equip with automatic theorem provers that could reason about cryptographic constructs no language or language facility can free you from having to avoid errors (though avoiding errors is much more than "just don't make them").

Things like "restart on crash" can be quite dangerous, because they let an attacker try their attack over and over, or keep the software running (and mining / authoring irreversible transactions) on a failing system. In most cases if we know that something that the software hasn't accounted for has happened just being shut down is better. If doing this results in a DOS attack, ... DOS attacks against the network are bad, but they're preferable to less recoverable outcomes. I think if anything we'd be likely to go the other way: On a "can never happen" indication of  corruption, write out a "your_system_appears_busted_and_bitcoin_wont_run_until_you_test_it_and_remove_th is_file.txt" that gets checked for at startup.

Quote
WRT RNG issues in Java
There have been Java bitcoin software, e.g. a vanity-generator that generated predictable keys, altcoin software that failed in various ways, bouncycastle causing inconsistency in node software from throwing surprise null pointer exceptions on weird inputs. I wasn't saying that there was any language issue there, but pointing out that even using the most confined language you can find will not prevent people from writing unsound cryptographic software. (And perhaps even making things worse, if the protection against idiotic mistakes makes people forget that they're playing with fire.)

You're also conflating two separate problems. It may turn out that writing consensus-critical code in other languages is harder, but that's a very different problem than writing secure code in the more general sense.
Actually no, you're catching the point I'm making but missing it.  Cryptographic systems in general have the property that you live or die based on implicit details. Cryptographic consensus makes the matter worse only in that a larger class of surprises which turn out to be fatal security vulnerabilities. It's quite possible, and has been observed in practise, to go end up with exploitable systems because some burred/abstracted behaviour is different than you expected. A common example is propagating errors up to to the far side when authentication fails and leaking data about the failure allowing incrementally recovering secret data.  Other examples are that implicit padding behaviour leaking information about keys (there is an example of this in Bitcoin core: OpenSSL's symmetric crypto routines had implicit padding behaviour that make the wallet encryption faster to crack than had been intended.)

I'm certainly a fan of smarter tools that make software safer (I'm conceptually a big fan of Rust, for example). But what I'm seeing deployed out in the wider world is that more actual deployed weak cryptography software is resulting from reasons unrelated to language.  This doesn't necessarily mean anything about non-cryptographic software. And some of it is probably just an attitude correlation; you don't get far in C if you're not willing to pay attention to details. So we might expect other languages to be denser in sloppy approaches. But that doesn't suggest that someone equally attentive might not do better, generally, in something with better properties. (I guess this is basically your demographic correlation).  So I'm certainly not disagreeing with these points; but I am disagreeing with the magic bullet thinking which is provably untrue: Writing in FooLang will absolutely not make your programs safe for people to use. It _may_ be helpful, indeed, but it is neither necessary nor sufficient, as demonstrated by the software deployed in the field.
2112
Legendary
*
Offline Offline

Activity: 2128
Merit: 1065



View Profile
December 22, 2014, 11:08:36 PM
 #12

Equally the demographics of people writing the tiny amount of C / C++ code out there is very different than the demographics writing in more modern languages.
Maybe it is true where you live. Where I live C++ enjoys resurgence in the form of superset/subset language SystemC, where certain things about the programs can be proven.

Likewise, gmaxwell posted here information about new research where a specific C subset (targeting specific TinyRAM architecture) can be used to produce machine-verifiable proofs. AFAIK this is still a long-shot option for Bitcoin, not something usable currently.

My comment here pertains to the consensus-critical code in the dichotomy you've mentioned later.
 

Please comment, critique, criticize or ridicule BIP 2112: https://bitcointalk.org/index.php?topic=54382.0
Long-term mining prognosis: https://bitcointalk.org/index.php?topic=91101.0
grau
Hero Member
*****
Offline Offline

Activity: 836
Merit: 1021


bits of proof


View Profile WWW
December 23, 2014, 10:36:11 PM
 #13

So I'm certainly not disagreeing with these points; but I am disagreeing with the magic bullet thinking which is provably untrue: Writing in FooLang will absolutely not make your programs safe for people to use. It _may_ be helpful, indeed, but it is neither necessary nor sufficient, as demonstrated by the software deployed in the field.

Neither Mike nor myself advertized a language as a magic bullet that makes programs safe.

You however seem to belive in superior powers of maintainer that outweighs advances of languages and runtime enviroments of the last decades.

I'd say you play a more dangerous game than us.
gmaxwell
Moderator
Legendary
*
expert
Offline Offline

Activity: 4158
Merit: 8382



View Profile WWW
December 24, 2014, 01:50:07 AM
 #14

So I'm certainly not disagreeing with these points; but I am disagreeing with the magic bullet thinking which is provably untrue: Writing in FooLang will absolutely not make your programs safe for people to use. It _may_ be helpful, indeed, but it is neither necessary nor sufficient, as demonstrated by the software deployed in the field.

Neither Mike nor myself advertized a language as a magic bullet that makes programs safe.

You however seem to belive in superior powers of maintainer that outweighs advances of languages and runtime enviroments of the last decades.

I'd say you play a more dangerous game than us.

You wrote, "Most exploits arise from programming errors in low level weakly typed languages". I pointed out that in our space we've observed the opposite: There have been more serious cryptographic weaknesses in software written in very high level languages like python, javascript, php, Java. etc. Thats all.  Please tone down the personal insults. You're very close to earning an ignore button press from me. I have scrupulously avoided besmirching your skills-- or even saying that I think your preferred tools are not _good_, only that that people using them suffer errors too-- but in every response you make you attack my competence.
Sergio_Demian_Lerner
Hero Member
*****
expert
Offline Offline

Activity: 549
Merit: 608


View Profile WWW
December 24, 2014, 02:31:14 AM
 #15

All coders make mistakes. In every language, in every library. Formal verification methods are generally too expensive. That's why peer review and audits exists. To detect those errors. And the more auditors, the better.
 
C++ code is generally more concise because of a higher versatility of the grammar (e.g. overloaded operators), but not as easy to understand to anyone but the programmer. C++ is very powerful, but can more easily hide information from the auditor. However the programmer has grater control regarding timing side-channels and secrets leakage.
 
Java code is generally more explicit and descriptive. It forces to do things that make the auditor's work simpler, such as class-file separation.
Obviously you can program C++ as if it were Java, but that's not how c++ libraries are built, nor how c++ programmers have learn. Nobody changes a language standard semantics.

Dynamically-typed languages are the worse, because you cannot fully understand the consequences of function without looking at every existent function call to see the argument types (and sometimes you cannot infer those without going deeper in the call tree!)

One example I remember now is Python strong pseudo-random generator seeding function. If you call the seeding function with a BigInt, it uses the BigInt as seed, but if you call it with an hexadecimal or binary string (and I've seen this), it performs a 32bit hash of the string, and then seeds the random with a 32 bit number. And this is allowed because a 32 bit hash is a default for every object. You can write Python that does not make use of dynamic typing, but that requires checking the type of every argument received, which nobody does.

I would prefer that low-lever crypto code (key management, prng, signature, encryption, authentication) is written in c/c++ (e.g. Sipa's secp256k1 library in Bitcoin) and every other layer is written in a more modern static typed language, such as Java. For most projects, that probably means that 90% of the code would be in Java and 10% would be in c/c++ (and that would probably be crypto library code)
The 90% Java code would be more secure not because Java code is more secure per se, but because it's would be easier to audit. The 10% would be harder but since it would be small you would be able to double the audit time for that part.
 
At the end, you get a more secure system having used the same audit or peer review time.
2112
Legendary
*
Offline Offline

Activity: 2128
Merit: 1065



View Profile
December 24, 2014, 03:04:17 AM
Last edit: December 24, 2014, 04:38:28 AM by 2112
 #16

I would prefer that low-lever crypto code (key management, prng, signature, encryption, authentication) is written in c/c++ (e.g. Sipa's secp256k1 library in Bitcoin) and every other layer is written in a more modern static typed language, such as Java.
I disagree that such a combination would be safer and easier to audit. Java and C++ runtimes are very hard to properly interface, especially in the exception handling and threading aspects. So the purported audit would not only involve auditing the code of the Bitcoin core but also auditing a large portion of the Java runtime.

One could make one or two restrictions in the mixed architecture you're proposing:

1) C/C++ code are only "leaves" on the call tree, i.e. only Java calls C++, C++ never calls Java.

2) "Java" is understood to mean not "validated standard conforming Java" but "subset of Java supported by the gcj ahead-of-time compiler" matched with the gcc/g++ used for the C/C++ code.

otherwise the mixed-language program will have a large minefield in the inter-language interface layer.

Edit:

Historical note: if "Java" would mean "Microsoft Visual J++" with J/Direct instead of JNI as an inter-language layer that could also work relatively smoothly. Those things are of historical interest only although there is at least one vendor in Russia that still maintains a Java toolchain that is unofficially compatible with the historical code: http://www.excelsior-usa.com/ .

Please comment, critique, criticize or ridicule BIP 2112: https://bitcointalk.org/index.php?topic=54382.0
Long-term mining prognosis: https://bitcointalk.org/index.php?topic=91101.0
grau
Hero Member
*****
Offline Offline

Activity: 836
Merit: 1021


bits of proof


View Profile WWW
December 24, 2014, 05:27:18 AM
 #17

So I'm certainly not disagreeing with these points; but I am disagreeing with the magic bullet thinking which is provably untrue: Writing in FooLang will absolutely not make your programs safe for people to use. It _may_ be helpful, indeed, but it is neither necessary nor sufficient, as demonstrated by the software deployed in the field.

Neither Mike nor myself advertized a language as a magic bullet that makes programs safe.

You however seem to belive in superior powers of maintainer that outweighs advances of languages and runtime enviroments of the last decades.

I'd say you play a more dangerous game than us.

You wrote, "Most exploits arise from programming errors in low level weakly typed languages". I pointed out that in our space we've observed the opposite: There have been more serious cryptographic weaknesses in software written in very high level languages like python, javascript, php, Java. etc. Thats all.  Please tone down the personal insults. You're very close to earning an ignore button press from me. I have scrupulously avoided besmirching your skills-- or even saying that I think your preferred tools are not _good_, only that that people using them suffer errors too-- but in every response you make you attack my competence.

If you define your space with Bitcoin core, then yes, it shows very high quality, maintained by remarkable talents of which your are one of.
No doubt on that. I had no intention to insult you with incompetence.

The model that has been successful with Bitcoin core however failed so many of times that it fills libraries with dos and dont's of pointer arithmetic, anatomy of buffer overflow and zero delimited string exploits. I know, Bitcoin core developer carefully avoid those sources, it still did not protect against a bug in OpenSSL. That bug was not cryptographic in nature, but exposing the memory of the process as a consequence of missing array bounds check in the C/C++ runtime. Sure there are arguments for not having those checks in run-time, but those arguments work especially well with languages that check more at compile time, such that runtime violations are less probable.

While exceptional care can be successful, as we observe, it is hard to scale and sustain. This is why the software industry has been moving away from C/C++. It retained relevance in certain areas just like any good technology.

We need magnitudes more code and developer than Bitcoin core to build this economy, therefore it is sane to take any attainable help to sustain quality. I believe that type safe and functional languages, modern runtime enviroments do help. I do not think you doubt this, so please calm down too. I am not attacking you just personally, but doubt the extensibility of your successful model to all projects that use Bitcoin or its innovations.









Peter Todd
Legendary
*
expert
Offline Offline

Activity: 1120
Merit: 1149


View Profile
December 24, 2014, 07:37:19 AM
Last edit: December 24, 2014, 07:48:33 AM by Peter Todd
 #18

Actually no, you're catching the point I'm making but missing it.  Cryptographic systems in general have the property that you live or die based on implicit details. Cryptographic consensus makes the matter worse only in that a larger class of surprises which turn out to be fatal security vulnerabilities. It's quite possible, and has been observed in practise, to go end up with exploitable systems because some burred/abstracted behaviour is different than you expected. A common example is propagating errors up to to the far side when authentication fails and leaking data about the failure allowing incrementally recovering secret data.  Other examples are that implicit padding behaviour leaking information about keys (there is an example of this in Bitcoin core: OpenSSL's symmetric crypto routines had implicit padding behaviour that make the wallet encryption faster to crack than had been intended.)

I'm mainly concerned about whether or not using C(++) with manual memory management is acceptable practice. Screwing up manual memory management exposes you to the king of all implicit details: what garbage happens to be in memory at that very moment.

Given that we have at least C++ available which can insulate you from manual memory management(1), there's just no excuse to be writing code that way anymore by default. Equally writing C++ in a way that exposes you to that class of errors is generally unacceptable.

Bitcoin itself is a perfect example, where some simple "don't be an idiot" development practices have resulted in a whole class of errors having never been an issue for us, letting development focus on the remaining types of errors.

1) Where manual memory management == things that can cause memory corruption and invalid accesses. There are of course other meanings of the term that refer to practices where memory is still "managed" manually at some level, e.g. allocation, but corruption and invalid accesses are not possible.

I'm certainly a fan of smarter tools that make software safer (I'm conceptually a big fan of Rust, for example). But what I'm seeing deployed out in the wider world is that more actual deployed weak cryptography software is resulting from reasons unrelated to language.  This doesn't necessarily mean anything about non-cryptographic software. And some of it is probably just an attitude correlation; you don't get far in C if you're not willing to pay attention to details. So we might expect other languages to be denser in sloppy approaches. But that doesn't suggest that someone equally attentive might not do better, generally, in something with better properties. (I guess this is basically your demographic correlation).  So I'm certainly not disagreeing with these points; but I am disagreeing with the magic bullet thinking which is provably untrue: Writing in FooLang will absolutely not make your programs safe for people to use. It _may_ be helpful, indeed, but it is neither necessary nor sufficient, as demonstrated by the software deployed in the field.

And since when did I say anything about "magic bullets"? I'm talking about acceptable bare minimum practices. Over and over again we've seen that doing manual memory management requires Herculean efforts to get right, yet people do get far enough in C(++) to cause serious problems doing it.

It's no surprise that easier languages attract even less skilled programmers who make more mistakes, but it's foolish to think that giving skilled programmers a tool other than a footgun is going to result in more mistakes. I think the unfortunate thing - maybe the root cause of this problem in the industry - is you definitely do need to teach programmers C at some point in their education so they understand how computers actually work. For that matter we need to teach them assembler too. The problem is C is nice enough to actually use - even the nicest machine architectures aren't - and people trained that way tend to reach for that footgun over and over again in the rest of their careers when really the language should be put on a shelf and only brought out to solve highly specialized tasks - just like assembler.

Equally, how many computer science graduates finish their education with a good understanding of the fact that a programing language is fundamentally a user interface layer between them and machine code?

Peter Todd
Legendary
*
expert
Offline Offline

Activity: 1120
Merit: 1149


View Profile
December 24, 2014, 07:46:35 AM
 #19

Equally the demographics of people writing the tiny amount of C / C++ code out there is very different than the demographics writing in more modern languages.
Maybe it is true where you live. Where I live C++ enjoys resurgence in the form of superset/subset language SystemC, where certain things about the programs can be proven.

I'm referring specifically to the demographics of people writing code for Bitcoin-related applications.

Likewise, gmaxwell posted here information about new research where a specific C subset (targeting specific TinyRAM architecture) can be used to produce machine-verifiable proofs. AFAIK this is still a long-shot option for Bitcoin, not something usable currently.

My comment here pertains to the consensus-critical code in the dichotomy you've mentioned later.

C with machine-verifiable proofs has nothing to do with the type of C programming I'm criticizing; neither does SystemC. Those types of environments are so far removed from the vanilla and unsafe C(++) programming that gets people into trouble that you might as well call them different languages in all but name.

Peter Todd
Legendary
*
expert
Offline Offline

Activity: 1120
Merit: 1149


View Profile
December 24, 2014, 07:57:58 AM
 #20

Dynamically-typed languages are the worse, because you cannot fully understand the consequences of function without looking at every existent function call to see the argument types (and sometimes you cannot infer those without going deeper in the call tree!)

You might be interested to find out that Python is actually moving towards static types; the language recently added support for specifying function argument types in the syntax. How the types are actually checked is undefined in the language itself - you can use third-party modules to impose your desired rules. IIRC next major version, 3.6 (?) will be including a module with one approach to actually enforcing those argument types as a part of the standard library. Similarly class attributes have syntax support for specifying types, and again you can already use third-party modules to enforce those rules.

I wouldn't be surprised if the "sweet spot" for most tasks is a language much like Python with the ability to specify type information as well as the ability to easily enforce 100% usage of that technique in important code, while still giving programmers the option of writing quick-n-dirty untyped code where desired. And of course, with a bit of type information writing compilers that produce reasonably fast code becomes fairly easy - Cython does that already for Python without too much fuss.

grau
Hero Member
*****
Offline Offline

Activity: 836
Merit: 1021


bits of proof


View Profile WWW
December 24, 2014, 10:02:10 AM
Last edit: December 24, 2014, 10:13:59 AM by grau
 #21

It's no surprise that easier languages attract even less skilled programmers who make more mistakes, but it's foolish to think that giving skilled programmers a tool other than a footgun is going to result in more mistakes. I think the unfortunate thing - maybe the root cause of this problem in the industry - is you definitely do need to teach programmers C at some point in their education so they understand how computers actually work. For that matter we need to teach them assembler too. The problem is C is nice enough to actually use - even the nicest machine architectures aren't - and people trained that way tend to reach for that footgun over and over again in the rest of their careers when really the language should be put on a shelf and only brought out to solve highly specialized tasks - just like assembler.

"Easy" applies to e.g. Python, but is unlikely the motivation for those who turn to Haskell or Scala. It is rather skilled programmers who turn to functional languages after they shot into their foot enough to reconsider what they stand on.

Equally, how many computer science graduates finish their education with a good understanding of the fact that a programing language is fundamentally a user interface layer between them and machine code?

Unfortunatelly many of those who get that think that they are better than the compiler and its runtime. Some might be really better, maybe even consistently, but staying ahead of compiler and runtime development is getting harder and their advantage less and less likely.
gmaxwell
Moderator
Legendary
*
expert
Offline Offline

Activity: 4158
Merit: 8382



View Profile WWW
December 24, 2014, 08:23:37 PM
 #22

I'm mainly concerned about whether or not using C(++) with manual memory management is acceptable practice. Screwing up manual memory management exposes you to the king of all implicit details: what garbage happens to be in memory at that very moment.
We mostly do not use manual memory management in Bitcoin core. Virtually all use of delete is in explicit destructors, most things are just RAII. I looked a while back and think found only something like three or four instances of use delete outside of destructors, and I assume those cases will all be changed the next time they're touched (e.g. examples include the wallet encryption, which hasn't been touched in years).

(I thought it was really weird of Mike brought up manual memory management, I see I made an error in not correcting him.
Peter Todd
Legendary
*
expert
Offline Offline

Activity: 1120
Merit: 1149


View Profile
December 24, 2014, 09:02:05 PM
 #23

I'm mainly concerned about whether or not using C(++) with manual memory management is acceptable practice. Screwing up manual memory management exposes you to the king of all implicit details: what garbage happens to be in memory at that very moment.
We mostly do not use manual memory management in Bitcoin core. Virtually all use of delete is in explicit destructors, most things are just RAII. I looked a while back and think found only something like three or four instances of use delete outside of destructors, and I assume those cases will all be changed the next time they're touched (e.g. examples include the wallet encryption, which hasn't been touched in years).

(I thought it was really weird of Mike brought up manual memory management, I see I made an error in not correcting him.


Huh? I quite clearly give Bitcoin Core as an example of C++ done right, precisely because it uses a safe subset of the language that is a higher-level language via the abstractions used. You brought up C, which just isn't a safe language to write code in.

I read Mike's post as pointing out that knowing how to use C++ correctly - what subset to use - is something that does take skill. It's notable that we've changed a few things in Bitcoin Core to, for instance, use pointers where we didn't before, gradually decreasing the safety of the system by using parts of the language beyond that safe subset.

rsvoter
Newbie
*
Offline Offline

Activity: 28
Merit: 0


View Profile
December 24, 2014, 09:25:13 PM
 #24

It's no surprise that easier languages attract even less skilled programmers who make more mistakes, but it's foolish to think that giving skilled programmers a tool other than a footgun is going to result in more mistakes. I think the unfortunate thing - maybe the root cause of this problem in the industry - is you definitely do need to teach programmers C at some point in their education so they understand how computers actually work. For that matter we need to teach them assembler too. The problem is C is nice enough to actually use - even the nicest machine architectures aren't - and people trained that way tend to reach for that footgun over and over again in the rest of their careers when really the language should be put on a shelf and only brought out to solve highly specialized tasks - just like assembler.

"Easy" applies to e.g. Python, but is unlikely the motivation for those who turn to Haskell or Scala. It is rather skilled programmers who turn to functional languages after they shot into their foot enough to reconsider what they stand on.

Equally, how many computer science graduates finish their education with a good understanding of the fact that a programing language is fundamentally a user interface layer between them and machine code?

Unfortunatelly many of those who get that think that they are better than the compiler and its runtime. Some might be really better, maybe even consistently, but staying ahead of compiler and runtime development is getting harder and their advantage less and less likely.

Couldn't have said this better myself.
grau
Hero Member
*****
Offline Offline

Activity: 836
Merit: 1021


bits of proof


View Profile WWW
December 25, 2014, 09:45:58 AM
 #25

Huh? I quite clearly give Bitcoin Core as an example of C++ done right, precisely because it uses a safe subset of the language that is a higher-level language via the abstractions used.

Yes, that "safe" subset of C++ is emulating a simple and restricted reference counting runtime by hand. Certainly doable. Apple is e.g. successful forcing reference counting to application level programmer on iOS, although Objective C gives nice support for that pattern.

Continuing that line of though you could define a "safe" subset of C only using the stack, maybe even functional programming with assembler macros. Runtimes and compiler do no magic therefore talented programmers can emulate any feature of them in any language. It requires skills and it can be fun.

It is not Bitcoin the first program that moves around billions of dollars in value. It is just a new one. Most programs I wrote moved more value than Bitcoin's market capitalization, therefore I know that once you deal with other people's money it gets difficult to argument for not using a help technology offers.
gmaxwell
Moderator
Legendary
*
expert
Offline Offline

Activity: 4158
Merit: 8382



View Profile WWW
December 25, 2014, 10:26:13 AM
Last edit: December 25, 2014, 10:37:23 AM by gmaxwell
 #26

Yes, that "safe" subset of C++ is emulating a simple and restricted reference counting runtime by hand. Certainly doable.
That isn't the case. Yes, reference counting is one tool, in that box but it has considerable costs. Most things are handled by RAII. And then there is unique_ptr... "By hand" is also perhaps misleading... in that, for better or worse, the developer themselves doesn't see the machinery under the hood any more than they see boundary checking in Java.
grau
Hero Member
*****
Offline Offline

Activity: 836
Merit: 1021


bits of proof


View Profile WWW
December 25, 2014, 11:04:51 AM
 #27

Yes, that "safe" subset of C++ is emulating a simple and restricted reference counting runtime by hand. Certainly doable.
That isn't the case. Yes, reference counting is one tool, in that box but it has considerable costs. Most things are handled by RAII. And then there is unique_ptr... "By hand" is also perhaps misleading... in that, for better or worse, the developer themselves doesn't see the machinery under the hood any more than they see boundary checking in Java.
RAII and unique_ptr implement a reference counting store where the (implicit) use count can be 1 or 0, right?
"By hand" does not mean copy and paste. The solutions you use are likely best attainable in the C++ environment.

My point is, that better support for program correctness is available elsewhere and should be used if permissible. When permissible is open for discussion, but I do not buy that the answer would be never.

I even suspect that a language and runtime that is safer to exclude side effects, implicit inputs and aids reasoning on correctness of algorithms is a better choice even for consensus definition.
Mike Hearn (OP)
Legendary
*
expert
Offline Offline

Activity: 1526
Merit: 1128


View Profile
December 27, 2014, 08:36:06 PM
 #28

(I thought it was really weird of Mike brought up manual memory management, I see I made an error in not correcting him.

Sigh. You know I have the deepest respect for you Gregory, but this is not the first time I get the feeling you're commenting on things I've written without having read them closely Sad

I said:

Quote
The main concern with Core is not that the code is insecure today, but what happens in the years to come

I know how the code is currently written. I first read it a few months after it was released, remember Wink But being concerned about an extremely common class of errors is hardly weird. Multiple people on this thread have brought it up.

My experience of working on several large C++ server codebases at Google is that it's quite possible to write robust code ... for a while. When you have a single thread, everything is written by one guy and all data is request scoped, the tools C++ provides can work very well.

But eventually one of the following happens:

1) Someone introduces multi-threading for better scalability, resource management, use of a blocking library etc, and accidentally writes code that races
2) Someone refactors code written by someone else and uninitialised data creeps in
3) Someone starts using a third party library that isn't written in the same way and requires manual heap management (like OpenSSL)
4) Someone profiles and decides to reduce the amount of copying that is going on

More generally: things change, teams change and software gets more complicated. Because nothing in C++ forbids manual memory management and some things require it, eventually it ends up being used. And some time after that, someone makes a mistake.

We can't magically convert Bitcoin Core to a safer language with a stricter type system. We can anticipate that mistakes will happen, and try to put in place systems to automatically catch and handle them.

Quote from: petertodd
You might be interested to find out that Python is actually moving towards static types; the language recently added support for specifying function argument types in the syntax. How the types are actually checked is undefined in the language itself - you can use third-party modules to impose your desired rules.

This sounds somewhat like the Checker framework. It is a pluggable type system for Java. I'd like to see it adapted for Scala and Kotlin too. It has a number of very practical type systems that catch practical errors like mixing up seconds and milliseconds and other unit mismatches.
moni3z
Hero Member
*****
Offline Offline

Activity: 899
Merit: 1002



View Profile
December 27, 2014, 08:58:13 PM
Last edit: December 28, 2014, 01:03:21 AM by moni3z
 #29

Every OS has a proper method for obtaining keystream (/dev/urandom)  https://news.ycombinator.com/item?id=8049739 the problem is if you or somebody else chroot the application, and forget to make userspace CSPRNG available with it, so directly obtaining from the kernel a good idea.

Another problem are people making browser client side js wallets and not containing it inside a browser addon http://matasano.com/articles/javascript-cryptography/
Pages: 1 2 [All]
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!