zielar
|
|
May 25, 2020, 05:40:05 PM |
|
10 minutes on fresh debug version without any restart.
|
If you want - you can send me a donation to my BTC wallet address 31hgbukdkehcuxcedchkdbsrygegyefbvd
|
|
|
dextronomous
|
|
May 25, 2020, 05:44:46 PM |
|
@jeanLuc maybe i am wrong but i can`t see in the code any limits on the number of transferred DP for 1 time. Packet in tcp connection can send only 65536 bytes max. I don`t know which size have 1 DP Maybe huge rig can produce a lot of DP and when send this amount to server this overhead packet and can cause unxpected error? When i send BIG file in my app to server i divide this file on part each not more than 65536bytes.
hi there, just reading some, someone found the problem after looking at dmesg output: nf_conntrack: table full, dropping packet So the problem was in the TCP redirection 80 => 8000 using iptables. he removed that iptables rule and he made Erlang listen on both ports 80 and 8000 directly, and the 65k limit was removed.
|
|
|
|
Jean_Luc (OP)
|
|
May 25, 2020, 05:49:05 PM |
|
@jeanLuc maybe i am wrong but i can`t see in the code any limits on the number of transferred DP for 1 time. Packet in tcp connection can send only 65536 bytes max. I don`t know which size have 1 DP Maybe huge rig can produce a lot of DP and when send this amount to server this overhead packet and can cause unxpected error? When i send BIG file in my app to server i divide this file on part each not more than 65536bytes.
Yes, this is why the Read function has a loop until it reached the end of the transmission. I prefer to do like this rather than extending the packet length. the new server is stil running: 2h07, 2^22.28 DP. i cross the finger
|
|
|
|
zielar
|
|
May 25, 2020, 06:03:52 PM |
|
35 minutes without restart (and still runing)
|
If you want - you can send me a donation to my BTC wallet address 31hgbukdkehcuxcedchkdbsrygegyefbvd
|
|
|
Etar
|
|
May 25, 2020, 06:07:00 PM |
|
-snip- Yes, this is why the Read function has a loop until it reached the end of the transmission. I prefer to do like this rather than extending the packet length. the new server is stil running: 2h07, 2^22.28 DP. i cross the finger i test release 1.5 (not debug version): connect to server, send byte=2 to server(comand to send DP), than i send word=1(mean 1DP will be send), than send random 40bytes(like DP) than server return me status 0(mean ok) and crashed! i do this 5 times and 5 times i got the same. That mean if there will be invalid DP than server can crashed. so i think need check CRC or something before use this buffer of DP
|
|
|
|
Jean_Luc (OP)
|
|
May 25, 2020, 06:17:59 PM |
|
-snip- Yes, this is why the Read function has a loop until it reached the end of the transmission. I prefer to do like this rather than extending the packet length. the new server is stil running: 2h07, 2^22.28 DP. i cross the finger i test release 1.5 (not debug version): connect to server, send byte=2 to server(comand to send DP), than i send word=1(mean 1DP will be send), than send random 40bytes(like DP) than server return me status 0(mean ok) and crashed! The number of DP is on 4 byte. But if you send a random hash, on the original 1.5, they're is no check of the hash validity so when added to the hashtable you have an illegal access. This check is added on the dgb version. However, on normal situation this should not happen. The server is badly protected against protocol attack. 2h35 without chrash...
|
|
|
|
Etar
|
|
May 25, 2020, 06:23:25 PM |
|
-snip- The number of DP is on 4 byte. But if you send a random hash, on the original 1.5, they're is no check of the hash validity so when added to the hashtable you have an illegal access. This check is added on the dgb version. However, on normal situation this should not happen. The server is badly protected against protocol attack. 2h35 without chrash...
that what we found.. you say that "this check is added on the dgb version" so in that case this led to errors in early version. thanks a lot for job! Hope server will work like sharm.
|
|
|
|
Jean_Luc (OP)
|
|
May 25, 2020, 06:52:14 PM |
|
that what we found.. you say that "this check is added on the dgb version" so in that case this led to errors in early version. thanks a lot for job! Hope server will work like sharm.
This check was already there in the first debug release tested by zielar that crashes. As I said, in normal condition a corrupted hash should not happen. The thing that seems to solve the problem is that: thread.cpp #ifndef WIN64 pthread_mutex_init(&ghMutex, NULL); // Why ? setvbuf(stdout, NULL, _IONBF, 0); #else ghMutex = CreateMutex(NULL,FALSE,NULL); #endif
I did that already on linux, and added a "why" ? in the comment because I didn't understand this. On Linux without the reinitialization on the mutex on the Processthread function of the server, it results in an immediate hanging although there was not lock of this mutex before. On windows, it seems that the mutex does not work correctly and that the local DP cache can be corrupted. The mutexes are initialised in the class constructor so in the main thread. So I think the issue is due to the ownership of the mutex but this is not not yet fully clear. 3h10 without crash....
|
|
|
|
zielar
|
|
May 25, 2020, 07:21:59 PM |
|
Success! 01:40:00 has passed since the server was started and i don't have any restart !!!
|
If you want - you can send me a donation to my BTC wallet address 31hgbukdkehcuxcedchkdbsrygegyefbvd
|
|
|
MrFreeDragon
|
|
May 25, 2020, 07:33:06 PM |
|
Success! 01:40:00 has passed since the server was started and i don't have any restart !!!
How many jump do you have in total at the moment for #110?
|
|
|
|
dextronomous
|
|
May 25, 2020, 07:42:41 PM |
|
could you share your work file?
|
|
|
|
zielar
|
|
May 25, 2020, 07:55:51 PM |
|
How many jump do you have in total at the moment for #110?
This is my -winfo from fully merged job to this time: could you share your work file?
I don't see a problem ... As soon as I start working on #115 :-)
|
If you want - you can send me a donation to my BTC wallet address 31hgbukdkehcuxcedchkdbsrygegyefbvd
|
|
|
Jean_Luc (OP)
|
|
May 25, 2020, 08:18:12 PM |
|
I Would be very happy if #110 will be solved tomorrow
|
|
|
|
WanderingPhilospher
Full Member
Offline
Activity: 1120
Merit: 234
Shooters Shoot...
|
|
May 25, 2020, 08:36:17 PM |
|
This is my server/client app. https://drive.google.com/open?id=1pnMcVPEV8b-cJszBiQKcZ6_AHIScgUO8Both app work only on Windows x64! In the archive, there are both compiled files, source codes and example .bat files. So you can compile the executable yourself or use the ready-made one. It is example of bat file to start server: REM puzzle #110(109bit)
SET dpsize=31 SET wi=7200 SET beginrange=2000000000000000000000000000 SET endrange=3fffffffffffffffffffffffffff SET pub=0309976ba5570966bf889196b7fdf5a0f9a1e9ab340556ec29f8bb60599616167d SET workfile=savework serverapp.exe -workfile %workfile% -dp %dpsize% -wi %wi% -beginrange %beginrange% -endrange %endrange% -pub %pub% pause
-workfile - it is filename of your masterfile, where merged all clients job -wi - this is job saving interval for client, 7200 mean the client will save his job every 2h and send to server, do not setup this value to small, the client must have time to send work before a new one appears. Note! if you will use already existed masterfile, use only copy of masterfile and original masterfile save to safe place!!!It is example of bat file to start client: clientapp.exe -name rig1 -pool 127.0.0.1:8000 -t 0 -gpu -gpuId 0 pause
-name - this is name of your rig, just for stats -pool - server address:port -gpuId - set dependency how many GPU you have on rig (-gpuId 0,1,2,3 ex for 4gpu) Note! Before use app, make sure that you have good enternet bandwidth, because client will send BIG files(which also have kangaroo)!When client connect first time he get job params form server(dpsize,wi,beginrange,endrange,pub) You can see downloaded params in clentapp console. After client send his job to server, server merge this job to masterfile and check collision during merge. If server or client solve key, server app will create log file where will be dump of private key(the same as in server concole) There possible to get telegramm notification when key solved, but i don`t think that is need. Try server and client app on a small range to make sure you're doing everything right.Was trying to recompile because my Kangaroo.exe is a different name (have a few that I use with diff configs) but I keep getting errors such as: Line 35: Constant not found: #STANDARD_RIGHTS_REQUIRED Is it because I am using the free/demo version of purebasic?
|
|
|
|
zielar
|
|
May 25, 2020, 08:37:08 PM |
|
I Would be very happy if #110 will be solved tomorrow Thank you very much for your commitment. Without your immediate action it would not have happened in a month :-) 3 hours have just passed without restarting with 92 machines loaded, so I can confirm that the problem has been solved!
|
If you want - you can send me a donation to my BTC wallet address 31hgbukdkehcuxcedchkdbsrygegyefbvd
|
|
|
Jean_Luc (OP)
|
|
May 26, 2020, 03:36:05 AM |
|
Thank you very much for your commitment. Without your immediate action it would not have happened in a month :-) 3 hours have just passed without restarting with 92 machines loaded, so I can confirm that the problem has been solved!
That's great ! On my side the server ends correctly after 5h25 on the 80bit test and the clients were correctly closed. Kangaroo v1.5dbg Start:B60E83280258A40F9CDF1649744D730D6E939DE92A2B00000000000000000000 Stop :B60E83280258A40F9CDF1649744D730D6E939DE92A2BE19BFFFFFFFFFFFFFFFF Keys :1 Range width: 2^80 Expected operations: 2^41.05 Expected RAM: 344.2MB DP size: 18 [0xFFFFC00000000000] Kangaroo server is ready and listening to TCP port 17403 ... [Client 3][DP Count 2^20.69/2^23.05][Dead 0][46:36][53.5/87.4MB] New connection from 127.0.0.1:59904 [Client 4][DP Count 2^23.68/2^23.05][Dead 158][05:25:15][412.7/522.3MB] Key# 0 [1S]Pub: 0x0284A930C243C0C2F67FDE3E0A98CE6DB0DB9AB5570DAD9338CADE6D181A431246 Priv: 0xB60E83280258A40F9CDF1649744D730D6E939DE92A2BE19B0D19A3D64A1DE032
Closing connection with 172.24.9.18:51060
Closing connection with 172.24.9.18:51058
Closing connection with 127.0.0.1:59813
Closing connection with 127.0.0.1:59904
|
|
|
|
Etar
|
|
May 26, 2020, 04:59:53 AM Last edit: May 26, 2020, 07:22:51 AM by Etar |
|
-snip- Was trying to recompile because my Kangaroo.exe is a different name (have a few that I use with diff configs) but I keep getting errors such as:
Line 35: Constant not found: #STANDARD_RIGHTS_REQUIRED
Is it because I am using the free/demo version of purebasic?
What windows version you are using? You can delete this constants line #PROCESS_ALL_ACCESS = #STANDARD_RIGHTS_REQUIRED | #SYNCHRONIZE | $FFF this constant is not using anywhere(probably left over from previous projects) edit: in worst-case running time should 2 √ N group operations. I think that zielar already done this, no?
|
|
|
|
MrFreeDragon
|
|
May 26, 2020, 09:15:33 AM |
|
How many jump do you have in total at the moment for #110?
This is my -winfo from fully merged job to this time: 45 million dead kangaroos... Is it due to "bad" random (different machines create the same kangaroos) or is these the same kangaroos killed several times during the synchronization with the server (as server does not send back the signal to kill the duplicate kangaroos on different machines, that kangaroos continue there paths, and all the subsequent DPs are also equal on different machines). I.e. server kills the kangaroo, but the "dead" kangaroo continues jumping on the client machine. Is it the case?
|
|
|
|
Etar
|
|
May 26, 2020, 09:21:56 AM |
|
45 million dead kangaroos...
Is it due to "bad" random (different machines create the same kangaroos) or is these the same kangaroos killed several times during the synchronization with the server (as server does not send back the signal to kill the duplicate kangaroos on different machines, that kangaroos continue there paths, and all the subsequent DPs are also equal on different machines). I.e. server kills the kangaroo, but the "dead" kangaroo continues jumping on the client machine. Is it the case?
Agree, random is not so good, 1 rig 8x2080ti produce every 2h 35000 dead kangaroos. it is no matter how many time rig was restarted. Every time get +30-35k dead kangaroos. i mean when merge files, on client side zero dead kangaroos.
|
|
|
|
MrFreeDragon
|
|
May 26, 2020, 09:50:51 AM |
|
45 million dead kangaroos...
Is it due to "bad" random (different machines create the same kangaroos) or is these the same kangaroos killed several times during the synchronization with the server (as server does not send back the signal to kill the duplicate kangaroos on different machines, that kangaroos continue there paths, and all the subsequent DPs are also equal on different machines). I.e. server kills the kangaroo, but the "dead" kangaroo continues jumping on the client machine. Is it the case?
Agree, random is not so good, 1 rig 8x2080ti produce every 2h 35000 dead kangaroos. it is no matter how many time rig was restarted. Every time get +30-35k dead kangaroos. i mean when merge files, on client side zero dead kangaroos. I mean that random could be only one part of these "dead kangaroos". Another thing is that server kills (while merging files) the kangaroos but does not give feed back to clients - so the kangaroos continue their paths, but during the next communication with server they are "killed" again and so on. Without feedback from the server these "zombi" kangaroos will never be killed, and will continue their useless job jumping on the client machines. If the majority of client machine's kangaroos become zombi, they will burn the GPU resources, and the real collision will not happen. Or at least the probability of such collision become much lower.
|
|
|
|
|