Bitcoin Forum
September 20, 2021, 12:15:31 PM
 News: Latest Bitcoin Core release: 22.0 [Torrent]
 Home Help Search Login Register More
 Pages: [1]
 Author Topic: Merkle Tree with no pointers  (Read 97 times) This is a self-moderated topic. If you do not want to be moderated by the person who started this topic, create a new topic.
Shymaa-Arafat
Member

Online

Activity: 152
Merit: 84

 September 14, 2021, 08:46:40 PMLast edit: September 14, 2021, 09:37:51 PM by Shymaa-ArafatMerited by vapourminer (2), hugeblack (2), Halab (2), ETFbitcoin (1), TryNinja (1)

I wasn't sure if this point is suitable to this group, as I think it involves data structures not just Bitcoin.
The idea was first suggested in response to a discussion in the Utreexo project, when storage space in the Pollard CSN node is not as free as in a bridge server.
The tree needs at least 2N pointers for the internal nodes (non leaves), if we add a flag to distinguish leaves with different struct & took care of the messy tree traversals we have to do in getting the proof in addition to deletions & insertions.
https://github.com/mit-dci/utreexo/discussions/316
....
-At the end they decided to use a pointer to aunt

type polNode struct {
data     Hash
aunt    *polNode
}

map[miniHash]*polNode

map[uint64]*polNode

.
I came up with this idea of storing the tree simply as a car size 2D array with no pointers at all, just use math indexing to traverse any direction u like

I draw a figure,

And from his 7 nodes example

14
|----------------------\
12                        13
|----------\               |--------\
08         09         10        11
|---\       |---\        |---\       |---\
00  01  02  03  04  05  06  07

R[0]=0,1,2,...,7
R[1]=8,9,10,11
R[2]=12,13
R[3]=14
.
the proof becomes
//direction to know + or -
If ((i mod 2)==0) drct=1;
else drct=-1;
// first, the sibling node
proof=R[0,i+drct]

For(j=1; j≤logN; j++)
{ index= i/(2^j)+drct;
}
.
Now, I'm asking what's wrong with my design?I mean if it's correct why didn't they save 2N =2*76m=152M*pointer size in bytes???
(for a pointer to address a space of 150m node it must be at least 28bit so it can't be less than4 bytes, ie 600M saving of ur 4G~15%)
In addition, it's more compact and neat coding
I mean there must be a reason
...
-Last but not least, would the transaction Merkle Tree benefit from this idea, or it's not stored this way?
1632140131
Hero Member

Offline

Posts: 1632140131

Ignore
 1632140131

1632140131
 Report to moderator
Advertised sites are not endorsed by the Bitcoin Forum. They may be unsafe, untrustworthy, or illegal in your jurisdiction. Advertise here.
1632140131
Hero Member

Offline

Posts: 1632140131

Ignore
 1632140131

1632140131
 Report to moderator
1632140131
Hero Member

Offline

Posts: 1632140131

Ignore
 1632140131

1632140131
 Report to moderator
gmaxwell
Moderator
Legendary

Offline

Activity: 3542
Merit: 5622

 September 15, 2021, 12:22:25 AMMerited by ETFbitcoin (2), hugeblack (2)

Bitcoin nodes do not store merkle trees, so there is nothing to save.

The particular layout you suggest has poor locality.  E.g. the first node above 0 is at 8 instead of being as soon as it could possibly be, at 2. Even in ram access locality matters. (at least to the extent that performance matters at all)

It also only works for trees which are fully populated at every level-- ones where the size is a power of two-- or otherwise it requires padding up to the next power of two size which wastes a *lot* of space.
Shymaa-Arafat
Member

Online

Activity: 152
Merit: 84

 September 15, 2021, 01:37:05 AMLast edit: September 16, 2021, 05:56:34 AM by Shymaa-Arafat

Bitcoin nodes do not store merkle trees, so there is nothing to save.

The particular layout you suggest has poor locality.  E.g. the first node above 0 is at 8 instead of being as soon as it could possibly be, at 2. Even in ram access locality matters. (at least to the extent that performance matters at all)

It also only works for trees which are fully populated at every level-- ones where the size is a power of two-- or otherwise it requires padding up to the next power of two size which wastes a *lot* of space.

First, thank you Sir for replying:

1-I was not sure about Transactions Merkle Trees, that's why I asked at the last paragraph; if not stored for each block, then no use in TX Merkle Tree. As I said, I originally suggested it for the Utreexo design.
-but in any case, where is the transaction Merkle Tree is stored?since I think the number of transactions do not deviate too much from powers of 2. With the current average it could be 2¹¹ or 2¹¹ + 2⁹, from what I understand the numbers will increase after applying Schnorr signature but still near powers of2; and even easier to handle than UTXOS as it's not dynamic.

2-The Utreexo design works by keeping logN roots of only complete full binary trees (a forest) to make sure there's at max 2logN nodes in any proof: all the roots + the regular proof from the tree the UTXO in
(it's like a binary representation of a number, u can always partition a number to a summation of powers of2. So, whatever way he is storing the trees in his forest they are always full binary trees.

3-I don't understand what u meant about the example; if u mean it's not the typical way they compact a tree to 1D array in data structures literature, I made it this way to visualize it more easy; when u insert or delete u need to look the true parent (for 0&1 it's 8), while in the proof fetching which I wrote a Pseudocode u need 9 for 0&1 (not 8), and 8 for 2&3.
They did say that was one of their problems in addition to space saving (if no care for space saving, they would have stored 3pointers :2 to children & 1 to aunt or niece)
So this way they can reach parent/aunt/sibling/... whatever they want with the right math indexing.

4-I thought this 2D (like half pyramid side for the forest, where the 2D array looks like a horizontal right angle triangle layer for each power of2 tree) is easier and do the job of saving the pointers. If they can compact it even more into 1D and get all the math indexing, shift when insert/delete correct ( I'm not sure which makes less shifts with insert/delete the 2D or 1D)fine, but the way it seems they still use pointers.
Shymaa-Arafat
Member

Online

Activity: 152
Merit: 84

 September 17, 2021, 10:47:41 AM

I may add that 2D arrays are eventually stored in RAM as 1D array (contiguous physical locations) in row major order. So whether we should use this 2D representation or the classic 1D described in Algorithms & data structures literature depends on whether we prefer more locality of reference for breadth (my 2D) or depth (traditional 1D) traversal.
I think we use both, but we prefer to benefit from UTXOS (leaves) stored contiguous.
Anyways, that's a detail decision both don't need pointers.
.
Also, to be honest I got mixed up about where the TX Merkle Tree is stored. I know full nodes don't need it as they have the full blocks with TXs in it, but then who send the proofs to SPV? whomever does that must keep it & can benefit from the space reduction of moving the pointers
odolvlobo
Legendary

Offline

Activity: 3346
Merit: 2009

 September 19, 2021, 09:01:22 AM

There is a clever way to store a binary tree in an array. Nodes are stored in breadth-first order beginning with index 1. (index 0 is unused).

Code:
1
2          3
4    5     6     7
8  9 10 11 12 13 14 15
...

This scheme enables fast traversal of the tree:

Indexes of nodes at depth D (root is 0) are 2D to 2D+1-1.
The parent of a node at index N is at index N/2.
The indexes of the children of a node at index N are 2N and 2N+1.

Buy stuff on Amazon with BTC or convert Amazon points to BTC here: Purse.io
Join an anti-signature campaign: Click ignore on the members of signature campaigns.
PGP Fingerprint: 6B6BC26599EC24EF7E29A405EAF050539D0B2925 Signing address: 13GAVJo8YaAuenj6keiEykwxWUZ7jMoSLt
Shymaa-Arafat
Member

Online

Activity: 152
Merit: 84

 September 19, 2021, 09:28:37 AM

There is a clever way to store a binary tree in an array. Nodes are stored in breadth-first order beginning with index 1. (index 0 is unused).

Code:
1
2          3
4    5     6     7
8  9 10 11 12 13 14 15
...

This scheme enables fast traversal of the tree:

Indexes of nodes at depth D (root is 0) are 2D to 2D+1-1.
The parent of a node at index N is at index N/2.
The indexes of the children of a node at index N are 2N and 2N+1.

You probably mean the 1D array presentation I mentioned and said it's known in the literature for so long, maybe I forgot to add the links to MIT Algorithms lecture 8 & the screenshots in this post but they're in the Utreexo GitHub discussion
https://ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-006-introduction-to-algorithms-spring-2020/lecture-notes/index.htm
min 15 I think in this video
https://youtu.be/Xnpo1atN-Iw

(I have a browser problem in adding images here)
https://user-images.githubusercontent.com/83260375/133132854-bb10362f-7c82-4b14-9341-b20b1e01a9ec.jpg
https://user-images.githubusercontent.com/83260375/133132783-1c2a8f66-4950-44f7-98f9-234759b6605b.jpg

Anyways, we kind of stand on the same ground; ie no need to store pointers we can save this storage space.
It's just that I think one row for each tree level would be easier to handle as we have kind of an extra un-ordinary tree traversal needed to fetch the proof thru sibling.
"For me", not necessarily for everyone else, it's easier this way to write the math indexing and the merge or split of full trees when insert or delete in the Utreexo forest design
 Pages: [1]