Title: Are transaction IDs on blockchain.info written backwards? Post by: r.willis on March 31, 2013, 10:00:59 AM Are transaction IDs (32-byte hashes of TX body) on blockchain.info written backwards, i.e. with bytes reversed?
Or it's mistake on my end, and I have it backwards myself? Title: Re: Are transaction IDs on blockchain.info written backwards? Post by: r.willis on March 31, 2013, 03:45:18 PM No one willing to check it? :o
Title: Re: Are transaction IDs on blockchain.info written backwards? Post by: GoldenWings91 on March 31, 2013, 04:11:11 PM TX ids aren't written backwards. Did you mean the block hash? The block hash goes through an endianess change.
http://en.wikipedia.org/wiki/Endianness (http://en.wikipedia.org/wiki/Endianness) Title: Re: Are transaction IDs on blockchain.info written backwards? Post by: r.willis on March 31, 2013, 04:43:59 PM No, I did not mean block hash. Transaction hash (sha256(sha256(TX))), as used
in inv message and to identify transaction inputs. Okay. Take this example TX message: Code: 00000000 f9 be b4 d9 74 78 00 00 00 00 00 00 00 00 00 00 |....tx..........| bd21ae6383d48c044714cb6a2f48834dcefb755390ad7e765e48fc24bf590e20 If you search it on blockchain.info, you will find nothing. If you search it byte-reversed (200e59bf24fc485e767ead905375fbce4d83482f6acb1447048cd48363ae21bd), you will find transaction in question (https://blockchain.info/tx/200e59bf24fc485e767ead905375fbce4d83482f6acb1447048cd48363ae21bd). It seems unmistakeably reversed for me. Title: Re: Are transaction IDs on blockchain.info written backwards? Post by: GoldenWings91 on March 31, 2013, 07:48:35 PM Seems I was mistaken. I was looking at the txid and didn't realize the byte order was already changed.
Title: Re: Are transaction IDs on blockchain.info written backwards? Post by: christop on March 31, 2013, 07:57:17 PM It's in Little Endian byte order (least-significant byte first) in the protocol, but it's written out in Big Endian byte order (most-significant byte first) as most other numbers in English normally are.
Title: Re: Are transaction IDs on blockchain.info written backwards? Post by: r.willis on March 31, 2013, 08:21:06 PM Endianness have meaning when we talk about integers. tx ids are not integers, but array of bytes (chars).
Title: Re: Are transaction IDs on blockchain.info written backwards? Post by: christop on March 31, 2013, 08:52:42 PM A transaction id is a very large integer. Or you could say that an integer is also an array of bytes.
Title: Re: Are transaction IDs on blockchain.info written backwards? Post by: r.willis on March 31, 2013, 09:28:11 PM No, it is not.
Code: 32 hash char[32] The hash of the referenced transaction. But it seems like this strange custom (reversing represenation of tx ids) goes deep into the history of bitcoin. Someone (Satoshi?) implemented it that way, and everyone just follows. Title: Re: Are transaction IDs on blockchain.info written backwards? Post by: christop on March 31, 2013, 11:44:19 PM A cryptographic hash is a big integer. What C++ integer type would you use to store a 256-bit integer besides an array of a smaller integer type (char in this case)?
Title: Re: Are transaction IDs on blockchain.info written backwards? Post by: r.willis on April 01, 2013, 06:25:11 AM Code: A cryptographic hash function is a hash function; that is, an algorithm that takes an arbitrary block of data and returns a fixed-size [b]bit string[/b] Calculation of hash is defined with bit string operations, i.e. shifts and xors. Title: Re: Are transaction IDs on blockchain.info written backwards? Post by: Zeilap on April 01, 2013, 07:33:49 AM Code: A cryptographic hash function is a hash function; that is, an algorithm that takes an arbitrary block of data and returns a fixed-size [b]bit string[/b] Calculation of hash is defined with bit string operations, i.e. shifts and xors. Title: Re: Are transaction IDs on blockchain.info written backwards? Post by: Schleicher on April 01, 2013, 03:18:50 PM If you want to read the actual definition of the sha256 algorithm look here:
http://csrc.nist.gov/publications/fips/fips180-4/fips-180-4.pdf (http://csrc.nist.gov/publications/fips/fips180-4/fips-180-4.pdf) or here: http://tools.ietf.org/html/rfc6234 (http://tools.ietf.org/html/rfc6234) The hash is supposed to be a 256bit integer. Title: Re: Are transaction IDs on blockchain.info written backwards? Post by: christop on April 01, 2013, 03:29:10 PM Code: A cryptographic hash function is a hash function; that is, an algorithm that takes an arbitrary block of data and returns a fixed-size [b]bit string[/b] Calculation of hash is defined with bit string operations, i.e. shifts and xors. Keep in mind that an integer is also a bit string in a binary computer, so Wikipedia's definition of a cryptographic hash function is accurate but incomplete when discussing a specific hash function like SHA-256. Title: Re: Are transaction IDs on blockchain.info written backwards? Post by: r.willis on April 01, 2013, 06:32:28 PM http://tools.ietf.org/html/rfc6234 (http://tools.ietf.org/html/rfc6234) Please provide exact citation. It talks about 8-, 32-, and 64-bit integers, but I see nothing about hash being 256bin integer.The hash is supposed to be a 256bit integer. SHA-256 (the hash function used to compute Bitcoin transaction IDs) treats the hash value as an integer. Please provide credible citation.Title: Re: Are transaction IDs on blockchain.info written backwards? Post by: christop on April 01, 2013, 07:31:59 PM You're right, r.willis. The SHA spec does not explicitly point out that a SHA-256 has is a 256-bit integer.
However, as a programmer I tend to "read between the lines" and simplify specs to manage their complexity and try to understand them better. In the case of SHA, the spec mentions that all words are stored and represented in the Big-Endian order, so I came to the logical conclusion that SHA-256 outputs a 256-bit integer, with H0 being the most-significant 32-bit word and H7 the least-significant (H0 through H7 are appended from left to right). It also simplifies understanding how the Bitcoin protocol treats the SHA-256 hash bit string--as an integer stored in Little Endian. This is consistent with the rest of the protocol as most every other integer is stored in the Little-Endian byte order (IP addresses and TCP port numbers being notable exceptions). Dealing with the hash as an array of 32 char becomes straightforward: hash[0] is the least-significant digit (base-256 digit because a char is 8 bits wide) and hash[31] is the most-significant digit. To print out the hash, it's a simple matter of printing out hash[31] through hash[0], as the Western convention is to write numbers in Big-Endian order. Title: Re: Are transaction IDs on blockchain.info written backwards? Post by: r.willis on April 01, 2013, 07:54:35 PM There is nothing little-endian in SHA-256. First byte is first byte, and should be printed as such (like in hex dump I provided, for example).
One approach to get rid of such inconsistency would be use of base58 encoding (which explicitly treats values as big-endian), with new version/application byte. It will be shorter, too. Title: Re: Are transaction IDs on blockchain.info written backwards? Post by: Zeilap on April 01, 2013, 08:58:12 PM There is nothing little-endian in SHA-256. First byte is first byte, and should be printed as such (like in hex dump I provided, for example). It wouldn't be shorter at all. With base 58, you have 58 possible values per byte, with a byte string you get 256 values per byte. It would only look shorter when printed.One approach to get rid of such inconsistency would be use of base58 encoding (which explicitly treats values as big-endian), with new version/application byte. It will be shorter, too. Title: Re: Are transaction IDs on blockchain.info written backwards? Post by: r.willis on April 01, 2013, 09:16:58 PM For human-readable form, I mean. Now they are printed as byte-reversed hex values.
Title: Re: Are transaction IDs on blockchain.info written backwards? Post by: christop on April 01, 2013, 09:26:28 PM There is nothing little-endian in SHA-256. First byte is first byte, and should be printed as such (like in hex dump I provided, for example). If we consider the SHA-256 hash to be an integer, it can be stored in either byte order. The Bitcoin protocol stores it in Little Endian.One approach to get rid of such inconsistency would be use of base58 encoding (which explicitly treats values as big-endian), with new version/application byte. It will be shorter, too. The integer in your example is 200e...21bd. In Little Endian byte order it is the byte sequence bd 21 ... 0e 20. This is exactly like storing/sending a smaller integer like 12345678 as 78 56 34 12 in Little Endian. The only difference is the number of bits. Title: Re: Are transaction IDs on blockchain.info written backwards? Post by: r.willis on April 01, 2013, 09:38:15 PM It's storing it as bit string (like in standard). It seems when printing it, it interprets it as little-endian integer, so it comes out byte-reversed.
It's counter-intuitive, so I'm proposing use of base58 encoding for display purposes. Title: Re: Are transaction IDs on blockchain.info written backwards? Post by: christop on April 01, 2013, 09:57:08 PM It's storing it as bit string (like in standard). It seems when printing it, it interprets it as little-endian integer, so it comes out byte-reversed. An integer is neither little-endian nor big-endian. An integer must be stored/sent/printed in one of the orders. So the big integer is stored in little-endian byte order in the Bitcoin protocol, but it is printed in big-endian order because that's how Westerners write numbers.It's counter-intuitive, so I'm proposing use of base58 encoding for display purposes. Think of the integer 123. It is not little-endian nor big-endian. I wrote it out in big-endian (the '1' digit is written first, which is on the left-hand side when writing left-to-right), but the integer itself is not big-endian. If I wrote it out in little-endian order, contrary to the Western convention, it would be 321. It's still the same number (one hundred and twenty three), but it's only written out in a different order. The same thing is happening with integers (including the transaction id) in the Bitcoin protocol. Each byte is a single "digit" in that case. Title: Re: Are transaction IDs on blockchain.info written backwards? Post by: r.willis on April 01, 2013, 10:07:56 PM There is no integer at the output of hash function. There are 32 bytes. And bitcoin stores/transmits it as it gets it from hash function, with the same order.
However, it prints it interpreting it as little-endian integer. I.e. hash functions returns [1,2,3]. It's stored like this, used in internal structures etc. Bitcoin (I suppose) prints it like this: "030201". Which I find a strange. Title: Re: Are transaction IDs on blockchain.info written backwards? Post by: kokjo on April 01, 2013, 10:11:02 PM yes, the satoshi client prints it out backwards. and its the de facto standard in all blockchain handling software.
Title: Re: Are transaction IDs on blockchain.info written backwards? Post by: r.willis on April 01, 2013, 10:20:10 PM Thanks for the conclusive answer. Do you feel that base58-encoded txid (and possibly txout too) are better alternatives?
Title: Re: Are transaction IDs on blockchain.info written backwards? Post by: christop on April 01, 2013, 11:09:12 PM There is no integer at the output of hash function. There are 32 bytes. And bitcoin stores/transmits it as it gets it from hash function, with the same order. Ah, I think I finally figured out what you mean now. The client interprets the bytes from the hash function as an integer stored in little-endian order (not as a "little-endian integer", which has no meaning) in various places internally and in the protocol but prints out that integer in big-endian order. It should have treated the hash as an integer in big-endian order and then stored that integer in little-endian order to keep the protocol self-consistent, but that's water under the bridge now. So yes, that is strange.However, it prints it interpreting it as little-endian integer. I.e. hash functions returns [1,2,3]. It's stored like this, used in internal structures etc. Bitcoin (I suppose) prints it like this: "030201". Which I find a strange. I wonder if it does the same thing with block hashes too. Title: Re: Are transaction IDs on blockchain.info written backwards? Post by: kokjo on April 02, 2013, 07:32:05 AM Thanks for the conclusive answer. Do you feel that base58-encoded txid (and possibly txout too) are better alternatives? no. it would just annoy people when converting back and forth.Title: Re: Are transaction IDs on blockchain.info written backwards? Post by: wumpus on April 02, 2013, 07:38:07 AM no. it would just annoy people when converting back and forth. And would make people confuse them with addresses.Title: Re: Are transaction IDs on blockchain.info written backwards? Post by: r.willis on April 02, 2013, 08:12:42 AM It will have different first letter (and lenght), so no more confusion then current way.
Title: Re: Are transaction IDs on blockchain.info written backwards? Post by: 2112 on May 16, 2013, 12:48:01 AM This thread just got referenced in another piece of misinformation regarding the internals of Bitcoin. I used to think that the big-endian vs. little-endian is something that confuses only undergraduates. But apparently many more people continue to get confused by the strange byte ordering in the Bitcoin code: it is neither big-endian nor little-endian. It was most likely defined accidentally to use the internal representation of the OpenSSL library that used hand-written assembly for speed on the 32-bit Intel architecture.
I used to recommend MacOSX as bi-endian platform that is easiest to work with. But Snow Leopard is the version that officially supports bi-endianness and the hardware to run it is getting hard to come by. I'm going to post a short demonstration program that is probably a quickest way to convince the wondering programmer that Bitcoin protocol is neither big-endian nor little-endian. To compile use the following command on Mac OSX 10.[56]: Code: gcc -arch i386 -arch ppc mojibake.c -o mojibake Code: #include <stdio.h> Code: $ arch -i386 ./mojibake l Code: $ arch -ppc ./mojibake l |