Bitcoin Forum
May 17, 2024, 01:02:31 AM *
News: Latest Bitcoin Core release: 27.0 [Torrent]
 
   Home   Help Search Login Register More  
Pages: [1]
  Print  
Author Topic: Performance of Account structures in bitcoind  (Read 746 times)
arosca (OP)
Newbie
*
Offline Offline

Activity: 38
Merit: 0


View Profile
April 26, 2014, 04:40:48 AM
Last edit: April 26, 2014, 04:53:51 AM by arosca
 #1

I was curious to see what kind of negative performance effect a large number of accounts has on bitcoind. The results are not pretty. My tests were not particularly scientific, but here's what I've learned.

Methodology

I created 50K accounts in an empty wallet with a small balance. The resulting wallet file is approximately 13MB. Creating accounts takes approximately 0.03 seconds per account.

Code:
for N = 1 to 5e4
client.move('', accountN, smallAmount)

I then executed a sequence of 10K random transfers between these accounts. These transfers take approximately 0.1 seconds each, on average. Again, this is in an empty wallet, and all of these transfers are internal to the wallet (no transactions are actually sent to the bitcoin network).

Code:
client.move(account1, account2, smallAmount)

Next, I executed the following sequence of external transfers (i.e. actual network transactions), on each transfer sending funds from a random account to another random account.
  • send 100 transfers
  • send 500 transfers
  • send 10 transfers
  • send 50 transfers

Code:
    account1 = getRandomAccount()
    account2 = getRandomAccount(except: account1)
        
    address = client.getaccountaddress(account2)

    if (client.getbalance(account1) > 1e-4):
        tx = client.sendfrom(account1, address, 1e-4)              
    elif client.getbalance() > 1e-4:
        tx = client.sendtoaddress(address, 1e-4)
    else:
        raise Exception('No balance available in any account')  

Results

After each step above, I recorded the size of the wallet file, the time it took for bitcoind to start up (i.e. initialize by reading the wallet and other database files), and the time it took to actually execute the transfers. Here is the summary of my results:
http://i.snag.gy/1Zh8Z.jpg

The results are surprisingly bad. File wallet.dat ballooned to 85MB (!) after only 660 transfers. I have no idea what could possibly take up so much space, but I'll try to inspect the file using BerkeleyDB tools and will add to this post if I gain some insight.

The really bad news is that transfers end up taking several seconds each, on average. As expected the duration increases as the number of transactions in the wallet goes up.

I inspected the bitcoind logs and it appears that most of the delay is because wallet.dat is flushed to disk after each transfer.

Other Observations
I was able to severely corrupt the wallet file by terminating bitcoind process. I did not lose any keys, but the account balance information was corrupted. In essence I was able to lose track of what the correct balance is in each account without any effort at all.

Conclusions
Others have said, both here in this forum and elsewhere: don't use Accounts in a server environment. More importantly, bitcoind itself does not seem to be suitable for any type of system where a large number of transactions is expected to occur. A different solution is needed. There is only one commercial, enterprise-level solution I am aware of (https://bitsofproof.com/?page_id=323).

Additionally, BerkeleyDB (which bitcoind uses to store account, address, and all other data) does not appear to be a sufficiently robust solution if you really care about account balances. I do not know enough about it to comment but it is possible that it would perform better if it were implemented differently. For example, I would like to see an option for transactional replication of all wallet data to a separate disk or server. This would at least ensure an internally-consistent copy of the wallet database exists. As things stand now, if the wallet file gets corrupted, everything is lost, and I was able to corrupt the file very easily (and unintentionally).

I am contemplating starting an open source alternative to the built-in bitcoind account management infrastructure. It would still use bitcoind for interfacing with the network, but would use a more robust database setup to store and handle account data. More about this in a separate post.
gmaxwell
Moderator
Legendary
*
expert
Offline Offline

Activity: 4172
Merit: 8421



View Profile WWW
April 26, 2014, 09:33:29 AM
Last edit: April 26, 2014, 09:44:35 AM by gmaxwell
 #2

I was able to severely corrupt the wallet file by terminating bitcoind process. I did not lose any keys, but the account balance information was corrupted. In essence I was able to lose track of what the correct balance is in each account without any effort at all.
Can you provide some more information here?  Were you running the release binaries? What version? What operating system? How did you kill the process? What state was it in when you brought it back up? What errors did you receive?  Would it be possible for you to provide the courrupted wallet and database/ directory to me?

I ask because last year I ran a loop killing the process under load for more than a month, killing it thousands and thousands of time trying to tease out some rare issues and was not able to generate a single instant of corruption that way. Before I start trying to reproduce your experience I want to have a comparable setup.

Generally use of the 'account' functionality is not recommended it wasn't designed for what most people who try to use it expect to use it for, and other methods (which support durability across hardware failure) should be used instead.  Wrt large amounts of transactions, there I must disagree— for better or worse some of the largest bitcoin using sites collect their transactions in a bitcoind using wallet. Unfortunately, none of the people interested in those high transaction load applications are contributing to the code base but they tell me that they don't need to because it currently works for them with reasonable considerations.  If you've automated your tests enough that they could be run against a testnet/regtest wallet out of a script it might be useful to get them imported into the integration testing used for bitcoin core— it's quite shy on wallet related tests.

Quote
The really bad news is that transfers end up taking several seconds each, on average
I assume you were spending unconfirmed coins in these transactions?   Taking several seconds per-spend is a known artifact of the current software behavior— the code that traverses unspent coins has factorial-ish complexity. While it could be improved— there are patches available, and simply disabling spending unconfirmed outputs avoids it—, since the overall network capacity is not very great I've mostly considered this bug helpful at discouraging inept denial of service attacks so I haven't personally considered it a priority. (And most of the people who've noticed it who have mentioned it to me appear to have just been conducting tests or attempting denial of service attacks…)
arosca (OP)
Newbie
*
Offline Offline

Activity: 38
Merit: 0


View Profile
April 26, 2014, 01:58:08 PM
 #3

Can you provide some more information here?  Were you running the release binaries? What version? What operating system? How did you kill the process? What state was it in when you brought it back up? What errors did you receive?  Would it be possible for you to provide the courrupted wallet and database/ directory to me?

I want to start by saying I think bitcoind overall is solid. This whole experiment started informally. A friend of mine is working on a project that requires accounts and I'm mostly exploring the topic out of curiosity. I saw a lot of posts recommending not to use the account features, and I wanted to see for myself how far I can take things before they break.

I was running an older version (which happened to be installed with my Armory instance), 8.2.2-beta (80202). You're absolutely right, I should probably try this again on the latest version.

I'll be happy to provide the database files to you (it's all on testnet), but they are currently very large. Wallet.dat is 85MB. Contact me directly please and I'll send you a download link.

I am running on Windows 7 and making calls from Python 2.7.

I didn't intentionally kill the process, but when I initially set up my code I used this construct, which seems to have caused the problem:
Code:
    process = subprocess.Popen([r'C:\Program Files (x86)\Bitcoin\daemon\bitcoind.exe', '-testnet', '-rpcuser=test', '-rpcpassword=test1'])
    time.sleep(20) #give bitcoind time to start up; a smarter way would be to check the network connection in a loop, but for our purposes this is fine
   
    try:   
          #run various tests in a loop 
    finally:
        process.terminate()
The terminate() call is what appears to cause the corruption. Later on I got a bit smarter, as I gained experience both with bitcoind and Python:
Code:
    process = subprocess.Popen([r'C:\Program Files (x86)\Bitcoin\daemon\bitcoind.exe', '-testnet', '-rpcuser=test', '-rpcpassword=test1'])
    time.sleep(20) #give bitcoind time to start up; a smarter way would be to check the network connection in a loop, but for our purposes this is fine
   
    try:   
          #run various tests in a loop 
    finally:
        if client is not None:
            client.stop()
            time.sleep(5) 

        process.terminate()
But things are still not 100% OK. I currently get this when I start up:
Code:
Warning: Warning: error reading wallet.dat! All keys read correctly, but transaction data or address book entries might be missing or incorrect.
In its current state, whether the issue is with the wallet database or with the chain database, I am unable to perform some of the wallet operations:
Code:
>>> client.getbalance()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Python27\lib\site-packages\jsonrpc\proxy.py", line 45, in __call__
    raise JSONRPCException(resp['error'])
JSONRPCException

I assume you were spending unconfirmed coins in these transactions?   Taking several seconds per-spend is a known artifact of the current software behavior— the code that traverses unspent coins has factorial-ish complexity. While it could be improved— there are patches available, and simply disabling spending unconfirmed outputs avoids it—, since the overall network capacity is not very great I've mostly considered this bug helpful at discouraging inept denial of service attacks so I haven't personally considered it a priority. (And most of the people who've noticed it who have mentioned it to me appear to have just been conducting tests or attempting denial of service attacks…)
I'm not sure but I believe the inputs were all confirmed. I started out with 5 confirmed BTC and sent 0.0001 to a random address in the wallet on each iteration. Code is below.

It seems to me all or most of the delay was not in code but rather with disk operations, and more specifically flushing wallet.dat (which is now 85MB). In any case I don't consider this to be a major issue.

This is the code I used in my test:

Populate wallet with 50K accounts and test duration of moving funds internally between accounts:
Code:
import subprocess
import time
import datetime
import os
import shutil
import timeit
from jsonrpc import ServiceProxy
from random import randrange


#config

    #location of an "empty" wallet file (only the default ('') account exists)
blankWalletFile = r'C:\Users\User\AppData\Roaming\Bitcoin\testnet3\wallet.empty.dat'

    #location of the live wallet file; this file will be backed up and then restored after the test is complete
liveWalletFile = r'C:\Users\User\AppData\Roaming\Bitcoin\testnet3\wallet.dat'

    #the number of accounts to be created
account_count = 50000

    #the number of random transfers to perform between accounts
transfer_count = 10000


def resetWallet(account_count = 1):
    #back up live wallet
    shutil.copy(liveWalletFile, liveWalletFile + '.bak')

    if account_count > 1: #see if we already have a wallet file with this number of accounts
        source_wallet_file = liveWalletFile + '.' + str(account_count)
   
    if not os.path.isfile(source_wallet_file):
            print('wallet file does not exist; starting with a blank file')
            source_wallet_file = blankWalletFile
    else:
            print('wallet file exists; re-using existing wallet file')
           
    #overwrite live wallet with empty wallet
    shutil.copy(source_wallet_file, liveWalletFile)

def restoreLiveWallet():
    #make a copy of this test wallet file for future use
    shutil.copy(liveWalletFile, liveWalletFile + '.' + str(account_count))
    #overwrite test wallet with live wallet backup
    shutil.move(liveWalletFile + '.bak', liveWalletFile)

def createAccounts(client, count):
    for i in range(0, count):
        client.move('', 'account%d' % i, 1e-8)
        #print(client.getbalance('account%d' % i))

def performRandomTransfers(client, count):
    account1 = ''
    account2 = getRandomAccount(account_count)

    for i in range(0, count):
        client.move(account1, account2, 1e-8)
        account1 = account2
        account2 = getRandomAccount(account_count, account1)

def getRandomAccount(account_count, except_account = ''):
    account = except_account
   
    while account == except_account:
        account = 'account%d' % randrange(0, account_count)

    return account

def main():
    print(datetime.datetime.now().time())

    resetWallet(account_count)

    print('starting bitcoind...')
    process = subprocess.Popen([r'C:\Program Files (x86)\Bitcoin\daemon\bitcoind.exe', '-testnet', '-rpcuser=test', '-rpcpassword=test1'])
    time.sleep(10) #give bitcoind time to start up; a smarter way would be to check the network connection in a loop, but for our purposes this is fine
    print('bitcoind started')
   
    try:   
        client = ServiceProxy("http://test:test1@localhost:18332")

        actual_account_count = len(client.listaccounts())
        print ('there are currently {} accounts'.format(actual_account_count))

        to_create_count =  account_count - actual_account_count + 1

        if actual_account_count < account_count:
            print('creating %d accounts' % to_create_count)
            total_time = timeit.timeit(lambda: createAccounts(client, to_create_count), number=1)
            print('time elapsed creating {0:d} accounts: {1:.2f} seconds'.format(to_create_count, total_time))

        print(datetime.datetime.now().time())

        print 'performing %d random transfers' % transfer_count
        total_time = timeit.timeit(lambda: performRandomTransfers(client, transfer_count), number=1)
        print 'time elapsed performing {0:d} transfers: {1:.2f} seconds'.format(transfer_count, total_time)

        print('there are currently {} accounts'.format(len(client.listaccounts())))
        #print(client.listaccounts())

        print(datetime.datetime.now().time())

        print('shutting down bitcoind...')
        client.stop()
        time.sleep(5)       
           
    finally:
        process.terminate()
        restoreLiveWallet()


if __name__ == "__main__":
    main()


Perform external transfers between accounts:
Code:
import subprocess
import time
import datetime
import os
import shutil
import timeit
from jsonrpc import ServiceProxy
from random import randrange


#config
transfer_count = 50  #the number of random transfers to perform between accounts
account_count = 50000 #number of existing accounts in the wallet


def performRandomTransfer(client, actual_count):
    account1 = getRandomAccount()
    account2 = getRandomAccount(account1)
       
    address = client.getaccountaddress(account2)

    if (client.getbalance(account1) > 1e-4):
        tx = client.sendfrom(account1, address, 1e-4)             
    elif client.getbalance() > 1e-4:
        tx = client.sendtoaddress(address, 1e-4)
    else:
        raise Exception('No balance available in any account') 
   
    actual_count[0] += 1

    print(tx)

def getRandomAccount(except_account = ''):
    account = except_account
   
    while account == except_account:
        account = 'account%d' % randrange(0, account_count)

    return account

def main():
    print(datetime.datetime.now().time())

    print('starting bitcoind...')
    process = subprocess.Popen([r'C:\Program Files (x86)\Bitcoin\daemon\bitcoind.exe', '-testnet', '-rpcuser=test', '-rpcpassword=test1'])
    time.sleep(20) #give bitcoind time to start up; a smarter way would be to check the network connection in a loop, but for our purposes this is fine
    print('bitcoind started')
   
    try:   
        client = ServiceProxy("http://test:test1@localhost:18332")

        print(datetime.datetime.now().time())

        print 'performing %d random transfers' % transfer_count

        actual_count = [0]
        total_time = timeit.timeit(lambda: performRandomTransfer(client, actual_count), number=transfer_count)

        print 'time elapsed performing {0:d} transfers: {1:.2f} seconds'.format(actual_count[0], total_time)
        print(datetime.datetime.now().time())
                 
    finally:
        print('shutting down bitcoind...')
        if client is not None:
            client.stop()
            time.sleep(5) 

        process.terminate()


if __name__ == "__main__":
    main()
DocJeff
Newbie
*
Offline Offline

Activity: 28
Merit: 0


View Profile
April 26, 2014, 03:54:40 PM
 #4

I was curious to see what kind of negative performance effect a large number of accounts has on bitcoind. The results are not pretty. My tests were not particularly scientific, but here's what I've learned.

Methodology

I created 50K accounts in an empty wallet with a small balance. The resulting wallet file is approximately 13MB. Creating accounts takes approximately 0.03 seconds per account.

Code:
for N = 1 to 5e4
client.move('', accountN, smallAmount)

I then executed a sequence of 10K random transfers between these accounts. These transfers take approximately 0.1 seconds each, on average. Again, this is in an empty wallet, and all of these transfers are internal to the wallet (no transactions are actually sent to the bitcoin network).

Code:
client.move(account1, account2, smallAmount)

Next, I executed the following sequence of external transfers (i.e. actual network transactions), on each transfer sending funds from a random account to another random account.
  • send 100 transfers
  • send 500 transfers
  • send 10 transfers
  • send 50 transfers

Code:
    account1 = getRandomAccount()
    account2 = getRandomAccount(except: account1)
        
    address = client.getaccountaddress(account2)

    if (client.getbalance(account1) > 1e-4):
        tx = client.sendfrom(account1, address, 1e-4)              
    elif client.getbalance() > 1e-4:
        tx = client.sendtoaddress(address, 1e-4)
    else:
        raise Exception('No balance available in any account')  

Results

After each step above, I recorded the size of the wallet file, the time it took for bitcoind to start up (i.e. initialize by reading the wallet and other database files), and the time it took to actually execute the transfers. Here is the summary of my results:
http://i.snag.gy/1Zh8Z.jpg

The results are surprisingly bad. File wallet.dat ballooned to 85MB (!) after only 660 transfers. I have no idea what could possibly take up so much space, but I'll try to inspect the file using BerkeleyDB tools and will add to this post if I gain some insight.

The really bad news is that transfers end up taking several seconds each, on average. As expected the duration increases as the number of transactions in the wallet goes up.

I inspected the bitcoind logs and it appears that most of the delay is because wallet.dat is flushed to disk after each transfer.

Other Observations
I was able to severely corrupt the wallet file by terminating bitcoind process. I did not lose any keys, but the account balance information was corrupted. In essence I was able to lose track of what the correct balance is in each account without any effort at all.

Conclusions
Others have said, both here in this forum and elsewhere: don't use Accounts in a server environment. More importantly, bitcoind itself does not seem to be suitable for any type of system where a large number of transactions is expected to occur. A different solution is needed. There is only one commercial, enterprise-level solution I am aware of (https://bitsofproof.com/?page_id=323).

Additionally, BerkeleyDB (which bitcoind uses to store account, address, and all other data) does not appear to be a sufficiently robust solution if you really care about account balances. I do not know enough about it to comment but it is possible that it would perform better if it were implemented differently. For example, I would like to see an option for transactional replication of all wallet data to a separate disk or server. This would at least ensure an internally-consistent copy of the wallet database exists. As things stand now, if the wallet file gets corrupted, everything is lost, and I was able to corrupt the file very easily (and unintentionally).

I am contemplating starting an open source alternative to the built-in bitcoind account management infrastructure. It would still use bitcoind for interfacing with the network, but would use a more robust database setup to store and handle account data. More about this in a separate post.


https://en.bitcoin.it/wiki/Accounts_explained
From the wiki:
Code:
Account Weaknesses
Since the accounts feature was introduced, several services have used it to keep track of customer's bitcoin balances and have had the following problems:

Wallet backups are an issue; if you rely on a good backup of wallet.dat then a backup must be done every time an address is associated with an account and every time the 'move' command is used.
The accounts code does not scale up to thousands of accounts with tens of thousands of transactions, because by-account (and by-account-by-time) indices are not implemented. So many operations (like computing an account balance) require accessing every wallet transaction.
Most applications already have a customer database, implemented with MySQL or some other relational database technology. It is awkward at best to keep the bitcoin-maintained Berkely DB wallet database and the application database backed up and synchronized at all times.
arosca (OP)
Newbie
*
Offline Offline

Activity: 38
Merit: 0


View Profile
April 26, 2014, 08:06:10 PM
 #5


https://en.bitcoin.it/wiki/Accounts_explained
From the wiki:
Code:
Account Weaknesses
Since the accounts feature was introduced, several services have used it to keep track of customer's bitcoin balances and have had the following problems:

Wallet backups are an issue; if you rely on a good backup of wallet.dat then a backup must be done every time an address is associated with an account and every time the 'move' command is used.
The accounts code does not scale up to thousands of accounts with tens of thousands of transactions, because by-account (and by-account-by-time) indices are not implemented. So many operations (like computing an account balance) require accessing every wallet transaction.
Most applications already have a customer database, implemented with MySQL or some other relational database technology. It is awkward at best to keep the bitcoin-maintained Berkely DB wallet database and the application database backed up and synchronized at all times.

Yep, no doubt, but I wanted to quantify this and see what the limits of bitcoind are in terms of managing accounts. How many users can I realistically handle before I run into trouble?
2112
Legendary
*
Offline Offline

Activity: 2128
Merit: 1068



View Profile
April 26, 2014, 08:38:11 PM
 #6

How many users can I realistically handle before I run into trouble?
One, before running into trouble. With two users the trouble starts: the bitcoind "accounts" are unlike any other "accounts" anywhere in the known universe. Any accountant will object to using them because it violates the principles of accounting.

As with many things in Bitcoin there is however an unexpected benefit: the enterprises interested in using the built-in accounts have history of losing customer's Bitcoins due to fraud or gross negligence. Two most well-know cases are Instawallet and BitFloor.

Again as with many things Bitcoin: it is hard to come by a definite proof of cause-effect relationship in the enterprises that are by design made un-auditable and un-accountable. But it seems to be an useful quick litmus test.

Please comment, critique, criticize or ridicule BIP 2112: https://bitcointalk.org/index.php?topic=54382.0
Long-term mining prognosis: https://bitcointalk.org/index.php?topic=91101.0
arosca (OP)
Newbie
*
Offline Offline

Activity: 38
Merit: 0


View Profile
April 26, 2014, 09:12:17 PM
 #7

After everything that I've read and experienced, I completely agree. I think the fundamental principles of how accounts are implemented are OK (you can get a list of transactions that "explain" the balance in each account), but the technical implementation is not great. In my opinion this is in part due to the BerkeleyDB implementation. A more robust database solution is needed to handle accounts, including an option for offsite transactional replication. I don't know enough about BerkeleyDB but it does appear that it supports replication. This feature is not implemented in bitcoind however.
wumpus
Hero Member
*****
qt
Offline Offline

Activity: 812
Merit: 1022

No Maps for These Territories


View Profile
April 27, 2014, 06:35:04 AM
 #8

Others have said, both here in this forum and elsewhere: don't use Accounts in a server environment.
We already know that. You could just have asked Smiley It is one of the worst parts of the bitcoind code.

Everyone wants something else from the account system, but the conclusion is that it belongs at a higher level (with the database) not with the wallet. Maintaining third-party balances is not part of the responsibility of Bitcoin Core.

There are plans to completely remove the account system in a future revision of JSON RPC API (see https://github.com/bitcoin/bitcoin/issues/3816 ). Labelling of addresses will be kept, but not accounts-with-balances.

Quote
I am contemplating starting an open source alternative to the built-in bitcoind account management infrastructure. It would still use bitcoind for interfacing with the network, but would use a more robust database setup to store and handle account data. More about this in a separate post.
Great idea. That's what it should be, a solution on top.

Bitcoin Core developer [PGP] Warning: For most, coin loss is a larger risk than coin theft. A disk can die any time. Regularly back up your wallet through FileBackup Wallet to an external storage or the (encrypted!) cloud. Use a separate offline wallet for storing larger amounts.
arosca (OP)
Newbie
*
Offline Offline

Activity: 38
Merit: 0


View Profile
April 27, 2014, 01:14:19 PM
 #9

Great idea. That's what it should be, a solution on top.

I posted some ideas here:
https://bitcointalk.org/index.php?topic=586013.0

I appreciate any feedback!
Pages: [1]
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!