Bitcoin Forum
November 02, 2024, 05:01:57 PM *
News: Latest Bitcoin Core release: 28.0 [Torrent]
 
   Home   Help Search Login Register More  
Pages: « 1 2 3 4 [5] 6 7 8 9 10 11 12 13 »  All
  Print  
Author Topic: Starting preliminary 0.94 testing - "Headless fullnode"  (Read 15200 times)
goatpig
Moderator
Legendary
*
Offline Offline

Activity: 3738
Merit: 1360

Armory Developer


View Profile
June 09, 2015, 09:40:07 PM
 #81

This isn't the thread adjustment, it's failing to grab the block data or the data has expired before it got to process it. It's consistent with the symptoms of the issue I'm going after ATM so I'll have you wait for the fix to try this again. Right now I don't think there is anything else you can do about it.

If you want to manually set your thread counts regardless, go inside BlockWriteBatcher.cpp and do the following:

1) comment out line 2004 to prevent thread adjustment
2) hardcode values in the BlockWriteBatcher ctor (line 112 for fullnode, 99 for supernode)

Roy Badami
Hero Member
*****
Offline Offline

Activity: 563
Merit: 500


View Profile
June 09, 2015, 10:46:05 PM
 #82

Thanks.  I'm sure you're right, but I might try nailing down the thread count anyway.  (Probably not tonight though given how long it takes to rebuild on this slow machine.  Is there no way of doing an incremental build on a Mac?)

roy
Roy Badami
Hero Member
*****
Offline Offline

Activity: 563
Merit: 500


View Profile
June 10, 2015, 12:15:09 AM
Last edit: June 10, 2015, 12:25:37 AM by Roy Badami
 #83

If you want to manually set your thread counts regardless, go inside BlockWriteBatcher.cpp and do the following:

1) comment out line 2004 to prevent thread adjustment
2) hardcode values in the BlockWriteBatcher ctor (line 112 for fullnode, 99 for supernode)

Could well be just chance since the behaviour before was clearly somewhat nondeterministric.  But I did the above and it synced my wallets perfectly first time (and I hadn't managed to get a successfully sync at all so far up till now).  If I get the chance tomorrow I'll nuke the databases and try again to confirm whether this is repeatable.

Just to confirm, only change I made was as you suggested:

Code:
diff --git a/cppForSwig/BlockWriteBatcher.cpp b/cppForSwig/BlockWriteBatcher.cpp
index 153558b..c11348f 100644
--- a/cppForSwig/BlockWriteBatcher.cpp
+++ b/cppForSwig/BlockWriteBatcher.cpp
@@ -109,6 +109,8 @@ BlockWriteBatcher::BlockWriteBatcher(
       if (readers < 1)
          readers = 1;
 
+      workers = readers = 1;
+
       setThreadCounts(readers, workers, 1);
    }
 }
@@ -2001,7 +2003,7 @@ void BlockBatchProcessor::writeThread()
 
       commitedId_.fetch_add(1, memory_order_release);
       commitObject->writeTime_ = clock() - commitObject->writeTime_;
-      commitObject->processor_->adjustThreadCount(commitObject);
+//      commitObject->processor_->adjustThreadCount(commitObject);
 
       thread cleanup(cleanUpThread, commitObject);
       if (cleanup.joinable())

doug_armory
Sr. Member
****
Offline Offline

Activity: 255
Merit: 250

Senior Developer - Armory


View Profile WWW
June 10, 2015, 12:17:42 AM
 #84

I should also point out that this is my first time building in a very long time (I think the last time was before you offered prebuilt Mac binaries) so in particular I have absolutely no evidence that there isn't something subtly borked in my build environment.

The OS X build process is mostly self-contained. Knock on wood and all but I've found that even the slightest hiccups cause Armory to not build properly. In other words, your environment's probably fine. Smiley

Senior Developer -  Armory Technologies, Inc.
goatpig
Moderator
Legendary
*
Offline Offline

Activity: 3738
Merit: 1360

Armory Developer


View Profile
June 10, 2015, 02:42:03 AM
 #85

Could well be just chance since the behaviour before was clearly somewhat nondeterministric.  But I did the above and it synced my wallets perfectly first time (and I hadn't managed to get a successfully sync at all so far up till now).  If I get the chance tomorrow I'll nuke the databases and try again to confirm whether this is repeatable.

Well yes and no. I expect the issue is object life span: one type of threads creates some data that gets cleaned up before the next type of threads is done with it. If you reduce the amount of threads per groups to the bare minimum, you also reduce the "surface area" for the bug and thus the chance it will occur.

Quote
Just to confirm, only change I made was as you suggested:

That's the proper change to force thread count for all groups down to 1.

jl2012
Legendary
*
Offline Offline

Activity: 1792
Merit: 1111


View Profile
June 10, 2015, 09:19:54 AM
 #86

New version out in ffreeze, please test away. There's a wealth of robustness changes and it should also be a bit faster. However no one at Armory has experienced the issues Carlton Banks and btcchris ran into, so most of these changes were speculative. It's possible these bugs will still be there partially or in full.

As usual, looking forward to your bug reports and logs, and again thank you for the time and effort you put in helping us to test things out =)

Have you made any big change re supernode? In the previous ffreeze it did not complete even with a week. In this version it took only a few hours to complete.

Quote

I made changes to some stuff that makes supernode faster (some weird opts I should have rolled but figured it could wait for fear it would be unstable). The other issue, with pulling block data and the object lifespan remains. It's a rare occurrence but common enough to trip certain setup quite often (2 out of my 3 testers T_T). I ended rolled the optimization with the stability changes (it didn't cost that much more to add the opts at that stage)

Now I'm also convinced that the code speeding up supernode is not responsible for the instability. That gives me room to add some more optimizations (again supernode only) without fear of making bugs more convoluted (and thus harder to isolated).

Not having a clue what the issue was, I ended up fixing what you were suffering from (which was a lot more obvious than the remaining issue) in an attempt to fix the bug they are experiencing. In grand scheme of things it doesn't matter what order I fix these, since I have to go after both anyways. In the current scope, it doesn't speak so good of me since I thought I was fixing one thing and got the other instead =P

Quote
I can now get the details of a random tx with "armoryd.py gettransaction txid". I guess this means the supernode in running properly?

Not really, that just means you are using supernode indeed (which is the only the DB format now that support random txhash resolution), but that doesn't mean supernode is working per se. What you need to do to test it is to add a random bitcoin address to a wallet and getledger on the wallet, which will display that random address history along with the rest.

You can also verify this in the UI by importing some publicly known private keys to a wallet (kind of an oxymoron I know) and verify they get imported quasi instantly and that their history shows up in the main ledger.
It fails again. This is what I did and saw:

1. I compiled the latest ffreeze. Removed previous databases folder.
2. I started armoryd in supernode mode. It finished "Reading raw blocks finished at file xxx offset xxxxx" in a few hours
3. At this time, I can get the details of a random tx with "armoryd.py gettransaction txid". However, I didn't tested by importing private key.
4. At block 360206, and armoryd says "BDM thread failed: The scanning process interrupted unexpectedly" and asks me to rebuild and rescan. I killed armoryd by Control-C.
5. I started armory-qt. I didn't request a rescan but it seems doing that automatically.
6. The console started to show "Finished applying blocks up to xxxxxx". This message did not show up before the crash. The progress became slower and slower. I knew it will never complete and closed armory-qt.
7. I restored the previous databased folder by last ffreeze. It was incomplete and I stopped it at about block 300,000 after more than a week of scanning
8. I started armory-qt. No re-scan.
9. I imported this private key: 5JdeC9P7Pbd1uGdFVEsJ41EkEnADbbHGq6p1BwFxm6txNBsQnsw (SHA256 of "1") for the address 12AKRNHpFhDSBDD9rSn74VAzZSL3774PxQ. It only shows the 2 txs in 2014 but not the latest ones of 2015.

So I guess both databases are somehow corrupted?

Donation address: 374iXxS4BuqFHsEwwxUuH3nvJ69Y7Hqur3 (Bitcoin ONLY)
LRDGENPLYrcTRssGoZrsCT1hngaH3BVkM4 (LTC)
PGP: D3CC 1772 8600 5BB8 FF67 3294 C524 2A1A B393 6517
goatpig
Moderator
Legendary
*
Offline Offline

Activity: 3738
Merit: 1360

Armory Developer


View Profile
June 10, 2015, 03:48:37 PM
 #87

4. At block 360206, and armoryd says "BDM thread failed: The scanning process interrupted unexpectedly" and asks me to rebuild and rescan. I killed armoryd by Control-C.

That's consistent with the bug I'm going after currently. That error is too aggressive, half the time you can ignore it and restart without rescanning.

Quote
5. I started armory-qt. I didn't request a rescan but it seems doing that automatically.

There will be an empty file named either rescan.flag or rebuild .flag in your datadir when the previous instance of Armory wants to signal the next one to rebuild/rescan. Delete it and it won't.

Quote
7. I restored the previous databased folder by last ffreeze. It was incomplete and I stopped it at about block 300,000 after more than a week of scanning
8. I started armory-qt. No re-scan.

That's odd, it should try to catch up from 300,000 onwards

Quote
9. I imported this private key: 5JdeC9P7Pbd1uGdFVEsJ41EkEnADbbHGq6p1BwFxm6txNBsQnsw (SHA256 of "1") for the address 12AKRNHpFhDSBDD9rSn74VAzZSL3774PxQ. It only shows the 2 txs in 2014 but not the latest ones of 2015.

Expectable if the scan ended around block 300,000

Quote
So I guess both databases are somehow corrupted?

Shouldn't be, but I guess they are. Actually what's off is that they don't pick up where they left. I'd say hang on to your 300k DB, I intent to add some more checks to allow the DB to fix itself (instead of rebuild/rescan).

goatpig
Moderator
Legendary
*
Offline Offline

Activity: 3738
Merit: 1360

Armory Developer


View Profile
June 19, 2015, 02:26:08 AM
 #88

New version. This one should fix the segfaults mid scan. Still working on the others ones. Pull away =)

Carlton Banks
Legendary
*
Offline Offline

Activity: 3430
Merit: 3080



View Profile
June 19, 2015, 09:50:25 AM
Last edit: June 19, 2015, 10:00:30 AM by Carlton Banks
 #89

Same problems (not segfault).

The Db build gets so far:

Code:
[Thread 0x7fffca34b700 (LWP 3812) exited]
-DEBUG - 1434706596: (Blockchain.cpp:214) Organizing chain w/ rebuild
-INFO  - 1434706599: (BlockUtils.cpp:1385) Looking for first unrecognized block
-INFO  - 1434706599: (BlockUtils.cpp:1389) Updating Headers in DB

...and gets stuck, python process is more or less idling ~1% CPU. No errors in the logs. Quitting Armory doesn't work in this state, it does respond to ending the process.

Vires in numeris
goatpig
Moderator
Legendary
*
Offline Offline

Activity: 3738
Merit: 1360

Armory Developer


View Profile
June 19, 2015, 10:15:50 AM
 #90

That's quite interesting considering I added verbose at this stage for this specific reason (was curious how deep it would get before failing). I guess it doesn't even get that far.

Carlton Banks
Legendary
*
Offline Offline

Activity: 3430
Merit: 3080



View Profile
June 19, 2015, 10:26:06 AM
 #91

Double-checked the BTCARMORY_BUILD string, definitely compiling from the latest commit.

Vires in numeris
goatpig
Moderator
Legendary
*
Offline Offline

Activity: 3738
Merit: 1360

Armory Developer


View Profile
June 19, 2015, 12:16:06 PM
 #92

How long do you let this part run until you conclude it is hanging?

Carlton Banks
Legendary
*
Offline Offline

Activity: 3430
Merit: 3080



View Profile
June 19, 2015, 12:36:51 PM
 #93

How long do you let this part run until you conclude it is hanging?

5-10 mins. Could there be something intended happening for longer than that length of time that consumes ~1% CPU on all cores? It's not consistent with the behaviour of your first couple of 0.94 commits, they took seconds to handle that stage.

Vires in numeris
goatpig
Moderator
Legendary
*
Offline Offline

Activity: 3738
Merit: 1360

Armory Developer


View Profile
June 19, 2015, 12:49:52 PM
 #94

How long do you let this part run until you conclude it is hanging?

5-10 mins. Could there be something intended happening for longer than that length of time that consumes ~1% CPU on all cores? It's not consistent with the behaviour of your first couple of 0.94 commits, they took seconds to handle that stage.

Glancing at the code, it's pretty old and inefficient, but that wouldn't warrant that long a wait. This is probably some sort of dead lock. Unfortunately I haven't managed to reproduce it in this version (which is why I thought it was fixed), so it's gonna take me some time to figure it out. I expect this is the last hurdle left for this code to be stable and functional.

Carlton Banks
Legendary
*
Offline Offline

Activity: 3430
Merit: 3080



View Profile
June 19, 2015, 01:06:55 PM
 #95

How long do you let this part run until you conclude it is hanging?

5-10 mins. Could there be something intended happening for longer than that length of time that consumes ~1% CPU on all cores? It's not consistent with the behaviour of your first couple of 0.94 commits, they took seconds to handle that stage.

Glancing at the code, it's pretty old and inefficient, but that wouldn't warrant that long a wait. This is probably some sort of dead lock. Unfortunately I haven't managed to reproduce it in this version (which is why I thought it was fixed), so it's gonna take me some time to figure it out. I expect this is the last hurdle left for this code to be stable and functional.

Just to totally clarify, I cannot get this or the build from your last commit to behave in any way except for this bug. It does this every time I load it up. Carefully deleting old Db and logs every run. Tested that Armory 93.1 & 93.2 work normally on the same VM (I cloned it from a working bitcoin-configured VM to begin with, but you never know).

I'm pretty sure what you're saying about deadlocks sounds about right, from what I recall about how threading works.

Vires in numeris
doug_armory
Sr. Member
****
Offline Offline

Activity: 255
Merit: 250

Senior Developer - Armory


View Profile WWW
June 19, 2015, 02:50:49 PM
 #96

I'm pretty sure what you're saying about deadlocks sounds about right, from what I recall about how threading works.

Deadlocks can be a nightmare to debug. Threads are great in many ways. Debugging is not one of them. Sad

Senior Developer -  Armory Technologies, Inc.
Carlton Banks
Legendary
*
Offline Offline

Activity: 3430
Merit: 3080



View Profile
June 19, 2015, 02:58:29 PM
 #97

I'm pretty sure what you're saying about deadlocks sounds about right, from what I recall about how threading works.

Deadlocks can be a nightmare to debug. Threads are great in many ways. Debugging is not one of them. Sad

I heard there were new processor instructions to abstract a bit of the complexity away from the programming logic, but Intel screwed up the implementation somehow and it's not quite "on the streets" yet. But all the specs are finalised as far as I remember, I believe they're calling it "lock elision" in C++, although I may be remembering wrong. So there's some more C++ to render the old standard obsolete for us all to look forward to  Cheesy.

Vires in numeris
goatpig
Moderator
Legendary
*
Offline Offline

Activity: 3738
Merit: 1360

Armory Developer


View Profile
June 19, 2015, 07:14:53 PM
 #98

Just to totally clarify, I cannot get this or the build from your last commit to behave in any way except for this bug. It does this every time I load it up. Carefully deleting old Db and logs every run. Tested that Armory 93.1 & 93.2 work normally on the same VM (I cloned it from a working bitcoin-configured VM to begin with, but you never know).

And that bit is what's disturbing me considering this part uses the same code across version. Again, this is pretty old code, so I'm puzzled as to why it starts failing now. Anyways, I think I got a clue as to what's going on.

I heard there were new processor instructions to abstract a bit of the complexity away from the programming logic, but Intel screwed up the implementation somehow and it's not quite "on the streets" yet. But all the specs are finalised as far as I remember, I believe they're calling it "lock elision" in C++, although I may be remembering wrong. So there's some more C++ to render the old standard obsolete for us all to look forward to  Cheesy.

That's pretty fancy. It doesn't really simplify much. It looks like it's meant to speed up interlocked operations that don't overlap in memory, and it's only in some Haswell CPUs so far. I suspect it's the kind of extra assembly optimization that I will personally never have to use, as either compilers or OSes will upgrade their locks with HLE. The way it looks like, there is little reason not to standardize it for every lock operation.

Roy Badami
Hero Member
*****
Offline Offline

Activity: 563
Merit: 500


View Profile
June 19, 2015, 07:27:40 PM
 #99

Could well be just chance since the behaviour before was clearly somewhat nondeterministric.  But I did the above and it synced my wallets perfectly first time (and I hadn't managed to get a successfully sync at all so far up till now).  If I get the chance tomorrow I'll nuke the databases and try again to confirm whether this is repeatable.

Well yes and no. I expect the issue is object life span: one type of threads creates some data that gets cleaned up before the next type of threads is done with it. If you reduce the amount of threads per groups to the bare minimum, you also reduce the "surface area" for the bug and thus the chance it will occur.

Quote
Just to confirm, only change I made was as you suggested:

That's the proper change to force thread count for all groups down to 1.

Is there any point in my trying the new version or will this version just behave the same for me?
goatpig
Moderator
Legendary
*
Offline Offline

Activity: 3738
Merit: 1360

Armory Developer


View Profile
June 19, 2015, 07:31:58 PM
 #100

Could well be just chance since the behaviour before was clearly somewhat nondeterministric.  But I did the above and it synced my wallets perfectly first time (and I hadn't managed to get a successfully sync at all so far up till now).  If I get the chance tomorrow I'll nuke the databases and try again to confirm whether this is repeatable.

Well yes and no. I expect the issue is object life span: one type of threads creates some data that gets cleaned up before the next type of threads is done with it. If you reduce the amount of threads per groups to the bare minimum, you also reduce the "surface area" for the bug and thus the chance it will occur.

Quote
Just to confirm, only change I made was as you suggested:

That's the proper change to force thread count for all groups down to 1.

Is there any point in my trying the new version or will this version just behave the same for me?

I'd like you to run it, the main fix in this version is supposed to cover your issue.

Pages: « 1 2 3 4 [5] 6 7 8 9 10 11 12 13 »  All
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!