Indexing technology used on BTC/altcoins

AngryDwarf (OP)

Sr. Member

Offline

Activity: 476
Merit: 501

Indexing technology used on BTC/altcoins

January 29, 2016, 11:51:02 PM

#1

Sorry if this topic has been covered before or is in the wrong place, but a search brought back a lot of results.

One of the important design goals of bitcoin is decentralisation. However, running a full bitcoin core node is proving very heavy on the average users PC.
The block size is only 1 MB, produced once every 10 mins on average. This really is not a lot of data to write to disk, and most users have the capacity to store the block data.
When I am catching up on the block chain, I am finding huge amounts of disk writes, magnitudes in excess to the actual data being downloaded. This leads me to believe that it could be the indexing technology that is causing a lot of disk data reorganisation. Is this the case?
I would have thought that the block chain indexing requirements should be quite simple, being rather linear apart from a few forks. Performance could be improved significantly by index technology dedicated to the task. I would look at a virtual memory mmap solution.
Are there any developments in this area?

Scaling and transaction rate: https://bitcointalk.org/index.php?topic=532.msg6306#msg6306
Do not allow demand to exceed capacity. Do not allow mempools to forget transactions. Relay all transactions. Eventually confirm all transactions.

achow101

Staff
Legendary

Offline

Activity: 3374
Merit: 6535

Just writing some code

Re: Indexing technology used on BTC/altcoins

January 30, 2016, 02:50:27 AM

#2

Quote from: AngryDwarf on January 29, 2016, 11:51:02 PM

Sorry if this topic has been covered before or is in the wrong place, but a search brought back a lot of results.

One of the important design goals of bitcoin is decentralisation. However, running a full bitcoin core node is proving very heavy on the average users PC.
The block size is only 1 MB, produced once every 10 mins on average. This really is not a lot of data to write to disk, and most users have the capacity to store the block data.

When the network has been running for several years now, those Mb add up.

Quote from: AngryDwarf on January 29, 2016, 11:51:02 PM

When I am catching up on the block chain, I am finding huge amounts of disk writes, magnitudes in excess to the actual data being downloaded. This leads me to believe that it could be the indexing technology that is causing a lot of disk data reorganisation. Is this the case?

How much in excess? The blockchain is currently 60+ Gb. When you are still syncing, most of the data comes quickly, not in the 10 minute intervals.

Quote from: AngryDwarf on January 29, 2016, 11:51:02 PM

I would have thought that the block chain indexing requirements should be quite simple, being rather linear apart from a few forks. Performance could be improved significantly by index technology dedicated to the task. I would look at a virtual memory mmap solution.
Are there any developments in this area?

The indexing is implementation specific. Bitcoin Core (which pretty much all alts fork off of) uses LevelDB as its database system. You can examine the the performance of LevelDB if you want to figure out how to make it perform better. If you do have a solution that would improve the indexing process, please write the code and submit a pull request to the github.

Bitcoin Core contributor | Tip Me! | GitHub | GPG Key Fingerprint 0x17565732E08E5E41

Moloch

Hero Member

Offline

Activity: 798
Merit: 722

Re: Indexing technology used on BTC/altcoins

January 30, 2016, 03:59:50 AM

#3

Quote from: knightdk on January 30, 2016, 02:50:27 AM

Bitcoin Core (which pretty much all alts fork off of) uses LevelDB as its database system.

FWIW, I believe bitcoin is moving away from leveldb. They are/were researching alternatives like sqlite:
https://lists.linuxfoundation.org/pipermail/bitcoin-dev/2015-October/011633.html

achow101

Staff
Legendary

Offline

Activity: 3374
Merit: 6535

Just writing some code

Re: Indexing technology used on BTC/altcoins

January 30, 2016, 04:07:34 AM

#4

Quote from: Moloch on January 30, 2016, 03:59:50 AM

Quote from: knightdk on January 30, 2016, 02:50:27 AM

Bitcoin Core (which pretty much all alts fork off of) uses LevelDB as its database system.

FWIW, I believe bitcoin is moving away from leveldb. They are/were researching alternatives like sqlite:
https://lists.linuxfoundation.org/pipermail/bitcoin-dev/2015-October/011633.html

There are no official plans to move off of leveldb but there is work going on to test out other databases like sqlite. There probably will be a move to a better database but for now, no such plans exist.

Bitcoin Core contributor | Tip Me! | GitHub | GPG Key Fingerprint 0x17565732E08E5E41

AngryDwarf (OP)

Sr. Member

Offline

Activity: 476
Merit: 501

Re: Indexing technology used on BTC/altcoins

January 30, 2016, 08:03:21 AM

#5

Based on a 22 hour catch up, that would be 1MB * 6 * 22 = <132MB average block data.
Running on SSD write enabled cache, catch up took over 2.5 mins, with writes in 25 - 55 MB/s range. Maybe 40 MB/s on average. So as a rough guestimate we are talking over 6GB of disk writes to catch up on potentially 132MB of data. A lot if users with mechanical disks do not want to run a full node now, and it is the catch up time rather than network bandwidth and storage space which is the biggest issue for some.
Does bitcoin core use a full database implementation, or just implement the indexing technology?

Scaling and transaction rate: https://bitcointalk.org/index.php?topic=532.msg6306#msg6306
Do not allow demand to exceed capacity. Do not allow mempools to forget transactions. Relay all transactions. Eventually confirm all transactions.

AngryDwarf (OP)

Sr. Member

Offline

Activity: 476
Merit: 501

Re: Indexing technology used on BTC/altcoins

January 30, 2016, 11:19:53 AM

#6

Another issue is during a computer crash, bitcoin core frequently requires a full database reindex. This is bad now, and will become an absolute unmanageable nightmare as the blockchain grows.

So does Bitcoin really need to use a fully fledged database system? If not, a specialised data structure could be the way forward. Period local syncing of last known good blocks would mean in the event of failure, a full reindex would not be required; it would simply redownload blocks from the last sync point on recovery. This would also keep small disk writes to a minimum, using periodic linear disk writes instead.

I think there needs to be improvements made here, and depending on the requirements there could be the potential for significant performance and resilience improvements.

Scaling and transaction rate: https://bitcointalk.org/index.php?topic=532.msg6306#msg6306
Do not allow demand to exceed capacity. Do not allow mempools to forget transactions. Relay all transactions. Eventually confirm all transactions.

achow101

Staff
Legendary

Offline

Activity: 3374
Merit: 6535

Just writing some code

Re: Indexing technology used on BTC/altcoins

January 30, 2016, 03:35:38 PM

#7

Quote from: AngryDwarf on January 30, 2016, 08:03:21 AM

Based on a 22 hour catch up, that would be 1MB * 6 * 22 = <132MB average block data.
Running on SSD write enabled cache, catch up took over 2.5 mins, with writes in 25 - 55 MB/s range. Maybe 40 MB/s on average. So as a rough guestimate we are talking over 6GB of disk writes to catch up on potentially 132MB of data. A lot if users with mechanical disks do not want to run a full node now, and it is the catch up time rather than network bandwidth and storage space which is the biggest issue for some.

I do not know why that would be the case but try asking on the github or searching through their issues and pull requests sections to find answers.

Also, it is known the the bottleneck with syncing a full node now is usually on the CPU or the hard drive.

Quote from: AngryDwarf on January 30, 2016, 08:03:21 AM

Does bitcoin core use a full database implementation, or just implement the indexing technology?

Not sure, but I think the indexing is only putting into the database the location on the disk of each block. However, as part of the reindex and a full sync, the blocks need to be reverified, which taxes the CPU.

Quote from: AngryDwarf on January 30, 2016, 11:19:53 AM

Another issue is during a computer crash, bitcoin core frequently requires a full database reindex. This is bad now, and will become an absolute unmanageable nightmare as the blockchain grows.

So does Bitcoin really need to use a fully fledged database system? If not, a specialised data structure could be the way forward. Period local syncing of last known good blocks would mean in the event of failure, a full reindex would not be required; it would simply redownload blocks from the last sync point on recovery. This would also keep small disk writes to a minimum, using periodic linear disk writes instead.

I think the issue is that it cannot know where the last good block was located because the database with the indices is corrupted so it can't be read. It then has to start from the beginning and reindex all of the blocks so that it can know the location of the blocks and be able to pull data from them when needed. If you were to simply start at the last known good block, then it wouldn't be able to access data earlier in the blockchain.

Also, reindexing does not mean that it is redownloading. Reindexing means that Bitcoin Core is reading from the block files and recording in a database the location of those blocks.

Quote from: AngryDwarf on January 30, 2016, 08:03:21 AM

I think there needs to be improvements made here, and depending on the requirements there could be the potential for significant performance and resilience improvements.

probably, but again, ask it on the github as the developers are active there and also know the ins and outs of Bitcoin Core much better than I do.

Bitcoin Core contributor | Tip Me! | GitHub | GPG Key Fingerprint 0x17565732E08E5E41

2112

Legendary

Offline

Activity: 2128
Merit: 1065

Re: Indexing technology used on BTC/altcoins

January 30, 2016, 09:42:30 PM
Last edit: January 30, 2016, 09:56:52 PM by 2112

#8

Man, you seem rather naïve. The non-use of serious database technology and deep enmeshing of educational-toy database engine (LevelDB) into the Core client source code are the key ways maintaining control of the project within the small core team.

It is kind of article of faith and a way of keeping the anarchist spirit of the project: if you want to use a serious DBMS you are a corporatist shill and a bankster bent on taking control of the project from the people!

Also, don't even think of abstracting the storage layers used by Bitcoin Core! Look at what happened to etotheipi and his company when he decided to switch.

Edit: The Core Team has actually achieved the holy grail of billing-by-the-hour software consultants: they included their own fork of LevelDB into the Core and maintain it separately from the Google's one.

Quote from: AngryDwarf on January 30, 2016, 11:19:53 AM

Another issue is during a computer crash, bitcoin core frequently requires a full database reindex. This is bad now, and will become an absolute unmanageable nightmare as the blockchain grows.

So does Bitcoin really need to use a fully fledged database system? If not, a specialised data structure could be the way forward. Period local syncing of last known good blocks would mean in the event of failure, a full reindex would not be required; it would simply redownload blocks from the last sync point on recovery. This would also keep small disk writes to a minimum, using periodic linear disk writes instead.

I think there needs to be improvements made here, and depending on the requirements there could be the potential for significant performance and resilience improvements.

Quote from: AngryDwarf on January 30, 2016, 08:03:21 AM

Based on a 22 hour catch up, that would be 1MB * 6 * 22 = <132MB average block data.
Running on SSD write enabled cache, catch up took over 2.5 mins, with writes in 25 - 55 MB/s range. Maybe 40 MB/s on average. So as a rough guestimate we are talking over 6GB of disk writes to catch up on potentially 132MB of data. A lot if users with mechanical disks do not want to run a full node now, and it is the catch up time rather than network bandwidth and storage space which is the biggest issue for some.
Does bitcoin core use a full database implementation, or just implement the indexing technology?

Quote from: AngryDwarf on January 29, 2016, 11:51:02 PM

Sorry if this topic has been covered before or is in the wrong place, but a search brought back a lot of results.

One of the important design goals of bitcoin is decentralisation. However, running a full bitcoin core node is proving very heavy on the average users PC.
The block size is only 1 MB, produced once every 10 mins on average. This really is not a lot of data to write to disk, and most users have the capacity to store the block data.
When I am catching up on the block chain, I am finding huge amounts of disk writes, magnitudes in excess to the actual data being downloaded. This leads me to believe that it could be the indexing technology that is causing a lot of disk data reorganisation. Is this the case?
I would have thought that the block chain indexing requirements should be quite simple, being rather linear apart from a few forks. Performance could be improved significantly by index technology dedicated to the task. I would look at a virtual memory mmap solution.
Are there any developments in this area?

Please comment, critique, criticize or ridicule BIP 2112: https://bitcointalk.org/index.php?topic=54382.0
Long-term mining prognosis: https://bitcointalk.org/index.php?topic=91101.0

AngryDwarf (OP)

Sr. Member

Offline

Activity: 476
Merit: 501

Re: Indexing technology used on BTC/altcoins

January 30, 2016, 10:52:04 PM

#9

Quote from: 2112 on January 30, 2016, 09:42:30 PM

Man, you seem rather naïve. The non-use of serious database technology and deep enmeshing of educational-toy database engine (LevelDB) into the Core client source code are the key ways maintaining control of the project within the small core team.

It is kind of article of faith and a way of keeping the anarchist spirit of the project: if you want to use a serious DBMS you are a corporatist shill and a bankster bent on taking control of the project from the people!

Also, don't even think of abstracting the storage layers used by Bitcoin Core! Look at what happened to etotheipi and his company when he decided to switch.

Edit: The Core Team has actually achieved the holy grail of billing-by-the-hour software consultants: they included their own fork of LevelDB into the Core and maintain it separately from the Google's one.

Man, that is really a very uneducated and unwarranted attack. My background is in high performance memory based database systems. I am questioning whether a fully fledged database system is really needed in preference to a more optimised data storage structure more suitable towards blockchain technology. At no point have I suggested the use of a serious DBMS, quite the opposite. I was simply tying to get an understanding of the technology requirements. I would seriously question why a system requires over 6 GB/s of disk writes to secure less than 132 MB of data, and why I simple computer crash requires a complete rebuilding of the blockchain index. There is possibly great potential to improve in this area, but without understanding the full requirements I cannot state there definitely is.

Scaling and transaction rate: https://bitcointalk.org/index.php?topic=532.msg6306#msg6306
Do not allow demand to exceed capacity. Do not allow mempools to forget transactions. Relay all transactions. Eventually confirm all transactions.

2112

Legendary

Offline

Activity: 2128
Merit: 1065

Re: Indexing technology used on BTC/altcoins

January 31, 2016, 01:26:54 AM
Last edit: January 31, 2016, 01:37:40 AM by 2112

#10

Quote from: AngryDwarf on January 30, 2016, 10:52:04 PM

Man, that is really a very uneducated and unwarranted attack. My background is in high performance memory based database systems. I am questioning whether a fully fledged database system is really needed in preference to a more optimised data storage structure more suitable towards blockchain technology. At no point have I suggested the use of a serious DBMS, quite the opposite. I was simply tying to get an understanding of the technology requirements. I would seriously question why a system requires over 6 GB/s of disk writes to secure less than 132 MB of data, and why I simple computer crash requires a complete rebuilding of the blockchain index. There is possibly great potential to improve in this area, but without understanding the full requirements I cannot state there definitely is.

Man, I'm sorry to have stepped on your toe. You really seem to have proper technical outlook on the issue. But you seriously lack in the "street smarts" department. The salespeople must love you. You just telegraphed your insecurity in the first two sentences.

In case you didn't read my first post: I haven't questioned anything technical in your observations. You just missed the proverbial forest for the trees. There are multiple areas in which the Core Bitcoin client has "possibly great potential to improve". But very few people care, the issue is about who has the control of the direction of the project. The existing bugs and deficiencies are carefully maintained through the code pulls to preserve the future opportunities for soft forks.

Couple of years ago I had to quickly help a friend with reviving his ailing Windows machine. Every hour counted, we decided to buy the required discrete graphic card in the nearby mall's store catering to computer gamers. Unfortunately we hit it during really busy time, the checkout line was over 1 hour. During that time I learned that many average teenage boys have good handle on the politics of various 3D graphics APIs (e.g. Microsoft DirectX vs. OpenGL) and various quirks of how they affect games releases, prices, platform availability, etc. They were quite "street smart" despite shopping in the typical mall under watchful eyes of their parents.

Edit: Since your background is in high performance memory based database systems, here's the idea: open a new thread about replacing mempool with a general memory-based database. Stick to the technical benefits. And then watch the comments flow about how not using a sensible database for mempool is crucial about maintaining the security of Bitcoin. Maybe that will open your eyes to what is really going on here.

Please comment, critique, criticize or ridicule BIP 2112: https://bitcointalk.org/index.php?topic=54382.0
Long-term mining prognosis: https://bitcointalk.org/index.php?topic=91101.0

AngryDwarf (OP)

Sr. Member

Offline

Activity: 476
Merit: 501

Re: Indexing technology used on BTC/altcoins

January 31, 2016, 06:37:30 AM

#11

Quote

In case you didn't read my first post: I haven't questioned anything technical in your observations. You just missed the proverbial forest for the trees. There are multiple areas in which the Core Bitcoin client has "possibly great potential to improve". But very few people care, the issue is about who has the control of the direction of the project. The existing bugs and deficiencies are carefully maintained through the code pulls to preserve the future opportunities for soft forks.

Sorry for misunderstanding you. Wink

Scaling and transaction rate: https://bitcointalk.org/index.php?topic=532.msg6306#msg6306
Do not allow demand to exceed capacity. Do not allow mempools to forget transactions. Relay all transactions. Eventually confirm all transactions.