Bitcoin Forum
April 19, 2024, 07:52:02 PM *
News: Latest Bitcoin Core release: 26.0 [Torrent]
 
   Home   Help Search Login Register More  
Pages: [1]
  Print  
Author Topic: Merkle Tree with no pointers  (Read 181 times)
This is a self-moderated topic. If you do not want to be moderated by the person who started this topic, create a new topic.
Shymaa-Arafat (OP)
Full Member
***
Offline Offline

Activity: 228
Merit: 156


View Profile
September 14, 2021, 08:46:40 PM
Last edit: September 14, 2021, 09:37:51 PM by Shymaa-Arafat
Merited by vapourminer (2), hugeblack (2), Halab (2), ABCbits (1), TryNinja (1)
 #1

I wasn't sure if this point is suitable to this group, as I think it involves data structures not just Bitcoin.
The idea was first suggested in response to a discussion in the Utreexo project, when storage space in the Pollard CSN node is not as free as in a bridge server.
The tree needs at least 2N pointers for the internal nodes (non leaves), if we add a flag to distinguish leaves with different struct & took care of the messy tree traversals we have to do in getting the proof in addition to deletions & insertions.
https://github.com/mit-dci/utreexo/discussions/316
....
-At the end they decided to use a pointer to aunt


type polNode struct {
   data     Hash
   aunt    *polNode
}



map[miniHash]*polNode



map[uint64]*polNode


.
I came up with this idea of storing the tree simply as a car size 2D array with no pointers at all, just use math indexing to traverse any direction u like

I draw a figure,
https://user-images.githubusercontent.com/83260375/132422816-0fc6605b-9986-4c13-93a2-769ad573bb5e.jpg

And from his 7 nodes example

 14
  |----------------------\
12                        13
|----------\               |--------\
08         09         10        11
|---\       |---\        |---\       |---\
00  01  02  03  04  05  06  07

R[0]=0,1,2,...,7
R[1]=8,9,10,11
R[2]=12,13
R[3]=14
.
the proof becomes
//direction to know + or -
If ((i mod 2)==0) drct=1;
else drct=-1;
// first, the sibling node
proof=R[0,i+drct]

//add the rest thru loop
For(j=1; j≤logN; j++)
{ index= i/(2^j)+drct;
proof=Add(R[j,index]);
}
.
Now, I'm asking what's wrong with my design?I mean if it's correct why didn't they save 2N =2*76m=152M*pointer size in bytes???
(for a pointer to address a space of 150m node it must be at least 28bit so it can't be less than4 bytes, ie 600M saving of ur 4G~15%)
In addition, it's more compact and neat coding
I mean there must be a reason
...
-Last but not least, would the transaction Merkle Tree benefit from this idea, or it's not stored this way?
1713556322
Hero Member
*
Offline Offline

Posts: 1713556322

View Profile Personal Message (Offline)

Ignore
1713556322
Reply with quote  #2

1713556322
Report to moderator
1713556322
Hero Member
*
Offline Offline

Posts: 1713556322

View Profile Personal Message (Offline)

Ignore
1713556322
Reply with quote  #2

1713556322
Report to moderator
1713556322
Hero Member
*
Offline Offline

Posts: 1713556322

View Profile Personal Message (Offline)

Ignore
1713556322
Reply with quote  #2

1713556322
Report to moderator
"Governments are good at cutting off the heads of a centrally controlled networks like Napster, but pure P2P networks like Gnutella and Tor seem to be holding their own." -- Satoshi
Advertised sites are not endorsed by the Bitcoin Forum. They may be unsafe, untrustworthy, or illegal in your jurisdiction.
1713556322
Hero Member
*
Offline Offline

Posts: 1713556322

View Profile Personal Message (Offline)

Ignore
1713556322
Reply with quote  #2

1713556322
Report to moderator
1713556322
Hero Member
*
Offline Offline

Posts: 1713556322

View Profile Personal Message (Offline)

Ignore
1713556322
Reply with quote  #2

1713556322
Report to moderator
gmaxwell
Moderator
Legendary
*
expert
Offline Offline

Activity: 4158
Merit: 8382



View Profile WWW
September 15, 2021, 12:22:25 AM
Merited by EFS (4), ABCbits (2), hugeblack (2)
 #2

Bitcoin nodes do not store merkle trees, so there is nothing to save.

The particular layout you suggest has poor locality.  E.g. the first node above 0 is at 8 instead of being as soon as it could possibly be, at 2. Even in ram access locality matters. (at least to the extent that performance matters at all)

It also only works for trees which are fully populated at every level-- ones where the size is a power of two-- or otherwise it requires padding up to the next power of two size which wastes a *lot* of space.
Shymaa-Arafat (OP)
Full Member
***
Offline Offline

Activity: 228
Merit: 156


View Profile
September 15, 2021, 01:37:05 AM
Last edit: September 16, 2021, 05:56:34 AM by Shymaa-Arafat
 #3

Bitcoin nodes do not store merkle trees, so there is nothing to save.

The particular layout you suggest has poor locality.  E.g. the first node above 0 is at 8 instead of being as soon as it could possibly be, at 2. Even in ram access locality matters. (at least to the extent that performance matters at all)

It also only works for trees which are fully populated at every level-- ones where the size is a power of two-- or otherwise it requires padding up to the next power of two size which wastes a *lot* of space.

First, thank you Sir for replying:

1-I was not sure about Transactions Merkle Trees, that's why I asked at the last paragraph; if not stored for each block, then no use in TX Merkle Tree. As I said, I originally suggested it for the Utreexo design.
-but in any case, where is the transaction Merkle Tree is stored?since I think the number of transactions do not deviate too much from powers of 2. With the current average it could be 2¹¹ or 2¹¹ + 2⁹, from what I understand the numbers will increase after applying Schnorr signature but still near powers of2; and even easier to handle than UTXOS as it's not dynamic.

2-The Utreexo design works by keeping logN roots of only complete full binary trees (a forest) to make sure there's at max 2logN nodes in any proof: all the roots + the regular proof from the tree the UTXO in
(it's like a binary representation of a number, u can always partition a number to a summation of powers of2. So, whatever way he is storing the trees in his forest they are always full binary trees.

3-I don't understand what u meant about the example; if u mean it's not the typical way they compact a tree to 1D array in data structures literature, I made it this way to visualize it more easy; when u insert or delete u need to look the true parent (for 0&1 it's 8), while in the proof fetching which I wrote a Pseudocode u need 9 for 0&1 (not 8), and 8 for 2&3.
They did say that was one of their problems in addition to space saving (if no care for space saving, they would have stored 3pointers :2 to children & 1 to aunt or niece)
So this way they can reach parent/aunt/sibling/... whatever they want with the right math indexing.

4-I thought this 2D (like half pyramid side for the forest, where the 2D array looks like a horizontal right angle triangle layer for each power of2 tree) is easier and do the job of saving the pointers. If they can compact it even more into 1D and get all the math indexing, shift when insert/delete correct ( I'm not sure which makes less shifts with insert/delete the 2D or 1D)fine, but the way it seems they still use pointers.
Shymaa-Arafat (OP)
Full Member
***
Offline Offline

Activity: 228
Merit: 156


View Profile
September 17, 2021, 10:47:41 AM
 #4

I may add that 2D arrays are eventually stored in RAM as 1D array (contiguous physical locations) in row major order. So whether we should use this 2D representation or the classic 1D described in Algorithms & data structures literature depends on whether we prefer more locality of reference for breadth (my 2D) or depth (traditional 1D) traversal.
I think we use both, but we prefer to benefit from UTXOS (leaves) stored contiguous.
Anyways, that's a detail decision both don't need pointers.
.
Also, to be honest I got mixed up about where the TX Merkle Tree is stored. I know full nodes don't need it as they have the full blocks with TXs in it, but then who send the proofs to SPV? whomever does that must keep it & can benefit from the space reduction of moving the pointers
odolvlobo
Legendary
*
Online Online

Activity: 4298
Merit: 3196



View Profile
September 19, 2021, 09:01:22 AM
 #5

There is a clever way to store a binary tree in an array. Nodes are stored in breadth-first order beginning with index 1. (index 0 is unused).

Code:
            1
       2          3
    4    5     6     7
  8  9 10 11 12 13 14 15
...

This scheme enables fast traversal of the tree:

Indexes of nodes at depth D (root is 0) are 2D to 2D+1-1.
The parent of a node at index N is at index N/2.
The indexes of the children of a node at index N are 2N and 2N+1.

Join an anti-signature campaign: Click ignore on the members of signature campaigns.
PGP Fingerprint: 6B6BC26599EC24EF7E29A405EAF050539D0B2925 Signing address: 13GAVJo8YaAuenj6keiEykwxWUZ7jMoSLt
Shymaa-Arafat (OP)
Full Member
***
Offline Offline

Activity: 228
Merit: 156


View Profile
September 19, 2021, 09:28:37 AM
 #6

There is a clever way to store a binary tree in an array. Nodes are stored in breadth-first order beginning with index 1. (index 0 is unused).

Code:
            1
       2          3
    4    5     6     7
  8  9 10 11 12 13 14 15
...

This scheme enables fast traversal of the tree:

Indexes of nodes at depth D (root is 0) are 2D to 2D+1-1.
The parent of a node at index N is at index N/2.
The indexes of the children of a node at index N are 2N and 2N+1.


You probably mean the 1D array presentation I mentioned and said it's known in the literature for so long, maybe I forgot to add the links to MIT Algorithms lecture 8 & the screenshots in this post but they're in the Utreexo GitHub discussion
https://ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-006-introduction-to-algorithms-spring-2020/lecture-notes/index.htm
min 15 I think in this video
https://youtu.be/Xnpo1atN-Iw


(I have a browser problem in adding images here)
https://user-images.githubusercontent.com/83260375/133132854-bb10362f-7c82-4b14-9341-b20b1e01a9ec.jpg
https://user-images.githubusercontent.com/83260375/133132783-1c2a8f66-4950-44f7-98f9-234759b6605b.jpg

Anyways, we kind of stand on the same ground; ie no need to store pointers we can save this storage space.
It's just that I think one row for each tree level would be easier to handle as we have kind of an extra un-ordinary tree traversal needed to fetch the proof thru sibling.
"For me", not necessarily for everyone else, it's easier this way to write the math indexing and the merge or split of full trees when insert or delete in the Utreexo forest design
Shymaa-Arafat (OP)
Full Member
***
Offline Offline

Activity: 228
Merit: 156


View Profile
September 24, 2021, 11:21:47 AM
 #7

In fact, from this reference
https://www.oreilly.com/library/view/mastering-bitcoin/9781491902639/ch07.html
& tracing the push_back function in C++, it's used in Vector data structure. I knew before that TXIDs are stored as a vector
std::vector<CTransactionRef> vtx;
https://github.com/bitcoin/bitcoin/blob/7fcf53f7b4524572d1d0c9a5fdc388e87eb02416/src/primitives/block.h#L66

But I didn't knew it's the Merkle Tree; ie the compact array storage is already used to store the Bitcoin Merkle Tree
It seems that is what is called serialization
Shymaa-Arafat (OP)
Full Member
***
Offline Offline

Activity: 228
Merit: 156


View Profile
October 06, 2021, 01:52:14 PM
Last edit: October 06, 2021, 02:04:44 PM by Shymaa-Arafat
 #8

 I think I should post parts of the Utreexo team reply, especially Tadje Dyraj, and you can read the whole thing in here
https://github.com/mit-dci/utreexo/discussions/316
( The bottom line, their forest in the bridge node does use the compact format with no pointers; he was talking about Pollard nodes which stores only a portion ie not a complete tree. Also maps is used as dictionary bet 2 sorting orders of UTXOS hashes & birth)

Quote
@Shymaa-Arafat, you've mostly posted about storing complete trees. We already have a implementation for that, called forest in the code, which doesn't use any pointers.

The positionMap in the forest struct is there to find a position within the forest given an outpoint hash.

(One downside of inserting elements in the order they show up in the blocks is that we need an additional lookup table to find them, but only the bridge node needs that.)

I really don't know why didn't they just say that from the beginning. Anyways, I had to deliver the info since I discussed it here too
NotATether
Legendary
*
Offline Offline

Activity: 1582
Merit: 6670


bitcoincleanup.com / bitmixlist.org


View Profile WWW
October 06, 2021, 02:10:12 PM
 #9

In fact, from this reference
https://www.oreilly.com/library/view/mastering-bitcoin/9781491902639/ch07.html
& tracing the push_back function in C++, it's used in Vector data structure. I knew before that TXIDs are stored as a vector
std::vector<CTransactionRef> vtx;
https://github.com/bitcoin/bitcoin/blob/7fcf53f7b4524572d1d0c9a5fdc388e87eb02416/src/primitives/block.h#L66

But I didn't knew it's the Merkle Tree; ie the compact array storage is already used to store the Bitcoin Merkle Tree
It seems that is what is called serialization

On a related note, this isn't really serialization - that would be if you compact the data representation to a string or byte sequence that can be sent over the network or to a file. A std::vector data structure cannot be sent over the network without conversion to string, and even then that would depend on whether the element type - CTransactionRef - is serializable.

.
.BLACKJACK ♠ FUN.
█████████
██████████████
████████████
█████████████████
████████████████▄▄
░█████████████▀░▀▀
██████████████████
░██████████████
████████████████
░██████████████
████████████
███████████████░██
██████████
CRYPTO CASINO &
SPORTS BETTING
▄▄███████▄▄
▄███████████████▄
███████████████████
█████████████████████
███████████████████████
█████████████████████████
█████████████████████████
█████████████████████████
███████████████████████
█████████████████████
███████████████████
▀███████████████▀
█████████
.
Pages: [1]
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!