Title: SQL block database: about nested sets Post by: grondilu on June 22, 2012, 01:02:12 PM As I try to implement a SQL bitcoin database for my perl library, I was looking for a way to efficiently store the tree block structure. So I found out about an idea which is kind of neat: nested sets. Here is one document which explains it (it's not the one I actually used as the one I used is not in english, so I had to find an other one for you): https://communities.bmc.com/communities/docs/DOC-9902 EDIT: the wikipedia page (http://en.wikipedia.org/wiki/Nested_set_model) seems fine also. I think it's quite a cleaver method, but a bit tough to grasp. Here is the mysql code I wrote so far (I didn't put here some views, but I did put the main tables, even if they are not related to the subject of this thread): Code: -- Table "blocks" actually only contains block headers The trigger "add_block_in_tree" should work when a block whose parent is known is inserted. But it will not work if block is inserted before its parent is. It should be possible to code this but it is a bit tricky. Anyway if anyone is also interested in this model, I'd gladly hear suggestions. Title: Re: SQL block database: about nested sets Post by: pc on June 22, 2012, 03:08:59 PM I regularly use the nested set model of trees in SQL for my day job (which is completely unrelated to bitcoin). I think Joe Celko was one of the originators (or at least an advocate) for using SQL in this way, and his book "SQL for Smarties" is one of the best books I've ever used professionally. It handles a lot of "clever" uses of SQL of a variety of things, including nested sets, and I highly recommend it.
Title: Re: SQL block database: about nested sets Post by: maaku on June 22, 2012, 05:20:13 PM I wrote/improved-upon a nested-set implementation in Python using SQLAlchemy for this exact purpose. It's hosted on github (https://github.com/monetizeio/sqlalchemy-orm-tree).
It'll probably not be worth the effort to integrate it into your perl library directly (being in Python and on top of SQLAlchemy, etc.), but you can port it, or maybe you'll just find it a useful reference as you write your own. EDIT: I suppose the codebase might be intimidating to someone who not versed in Python or the SQLAlchemy framework. You'll find the insertion/update/deletion code in 'sqlalchemy_tree/orm.py', and the higher-level methods for manipulating the tree in 'sqlalchemy_tree/manager/{class_,instance}.py'. Title: Re: SQL block database: about nested sets Post by: grondilu on June 27, 2012, 08:26:52 AM I wrote/improved-upon a nested-set implementation in Python using SQLAlchemy for this exact purpose. It's hosted on github (https://github.com/monetizeio/sqlalchemy-orm-tree). It'll probably not be worth the effort to integrate it into your perl library directly (being in Python and on top of SQLAlchemy, etc.), but you can port it, or maybe you'll just find it a useful reference as you write your own. EDIT: I suppose the codebase might be intimidating to someone who not versed in Python or the SQLAlchemy framework. You'll find the insertion/update/deletion code in 'sqlalchemy_tree/orm.py', and the higher-level methods for manipulating the tree in 'sqlalchemy_tree/manager/{class_,instance}.py'. I'll have a look at it someday. But meanwhile I've been working on neat stuff and I'd like to share it to everyone. It's not implemented in SQL right now, but I should do it soon. Here it is. I've spent quite some time reading the work of a guy named Vadim Tropashko on this subject. It's quite fascinating, and I think his technique would be perfect for a SQL model of the bitcoin block structure. Tropashko uses rational values as borders for his intervals. One of the main advantage is that you can work on a fix global interval (0 to 1 for instance): you don't have to shift half of the block tree each time you insert a new node. One cool aspect of this technique is the relation with continued fractions. As you know any fraction can be written as a finite continued fraction: Code: p 1 The idea is that the a, b, c, ... sequence provides a natural path for the node's branch. For instance, a sequence such as 1, 1, 2, 1 would mean: "take the second branch at the third node from the root". Now, nested intervals are all about intervals, so we can not be happy with just one rational for each node. We need an upper limit, assuming the rational described above is the lower limit (it is not necessarly the case though, but we'll deal with this problem later). The idea is to associate a Möbius function of a variable x ranging from 0 to 1 (same as our global interval): Code: 1 Then we have p/q = M(0). The parent of f/p is then M(infinity). This is actually already enough to understand that M(x) can be written: Code: p + r x Where r/s is the parent node written as a rational. I have no exact proof for that, but considerations on limits at infinity and value at 0 clearly indicates it should be true. Without the possibility to write M(x) like this, it would be pretty tough to compute any M(x) value in SQL. Anyway, what is important to remember is that to compute M(x) we only need two rational numbers: the one from the actual lowest edge of the node, and the one from its direct parent. The only thing which is lacking is a way to get the first child. M(1) wouldn't do as this is actually the first 'sibling'. Tropashko then gets into quite a lot of complicated stuff and I had a lot of difficulties grasping them, and when I managed to understand them I was wondering if they would be useful anyway. Yet there is one remark that he makes at the very end of his last article, a remark that can make, in my humble opinion, the whole thing much easier. Tropashko remarks that simple continued fractions are notoriously known not to be monotonuous. Depending of the parity of the position of one of the number a, b, c..., sometimes the whole value increases with the number, sometimes it decreases. He then proposes to use positive continued fractions (mind the minus signs): Code: 1 Althouth this expression seems more complicated at first glance, it is actually much easier to visualize and to work with. The main reason for this is that it's easy to convince yourself that this function of x is growing (I mean: x > y => M(x) > M(y) ). This changes everything. We now can be sure that the lowest edge for an interval will be M(0), and the highest edge will be M(1). We'll usually identify a node with M(0). Parent is still M(infinity), so we can still write the function as: Code: p - r x The only difference are the minus signs, but they are not a problem. The first child of M(0) is easy to spot: it's M(1/2). The second child (first sibling of M(1/2)) is M(1/3), the third is M(1/4), and so on. The first sibling of M(0) is M(-1), second sibling is M(-2), and so on. Here is a small part of a genealogy tree, to make things clearer: Code: M(infinity)-->M(0)-------->M(1/2)--> M(1/(2-1/2)) --> ...asymptotically ends at M(1) Then there is still a catch: how do you compute the parent from the lower edge of the interval? Normally, as Tropashko explains it, you'd have to use the extended euclidian algorithm, but we actually don't need to do that for a bitcoin block tree, since each parent is present already in the database anyway. We can therefore retrieve the parent rational value, and use it to compute the Möbius function, and thus the rest of the tree as we need it. So, to summarize, it's pretty easy. Each node tree will have two rational numbers, defining an interval. If the lower edge of node A is inside the interval of node B, then B is an ancestor of node A. To add a child to a node, we just need to get the parent and compute the Möbius function for x = 1/(n+2), where n is the number of already existing children. It shouldn't be difficult to get n, we basically just need to make a loop and test for the existence of M(1/(k+2)) in the database. Moving a whole branch should not be too difficult either. Imagine we have an orphan block with children, and then suddenly the parent block of the orphan is added to the database. To patch the new branch to the database, we need to change intervals for each node of the previously orphan branch. It sure will cost more than adding a single child to a leaf, but hopefully this situation does not occur too often anyway. One cool thing with the fact that positive continued fractions are increasing functions, is that we will not have to store the ordinal value (height, depth, however you name it) of each node. If we sort our queries by rational lower edges, then we'll automatically get the sequencial order of our chain. |