Latest posts of: bipben

Hi everyone,

After many hours of setup I finally made it. I have a 1Tb generation in progress and 3x100Gb already finished.
I would like to test the V2 pool but I haven't any BURST for now. Could someone send me 1 BURST to test it please ? Here is my address : BURST-YA29-QCEW-QXC3-BKXDL.

Regarding the plot generation, I found an OpenCL implementation of Shabal (https://github.com/aznboy84/X15GPU/blob/master/kernel/shabal.cl) that could be used to make a GPU version of the generator. I will try to work on it when I have some free time.

Regards

Hi everyone,

As promised I have been working on a GPU plot generator on the last few days. I made a little program built on top of OpenCL, and it seems to work pretty well in CPU mode. Unfortunately, I can't test the GPU mode as it requires a very powerfull graphic card (with at least 46kB private memory per compute unit, because the algorithm needs at least 4096*64 static bytes to store an entire plot).

Here is a preview you can test for now :
gpuPlotGenerator-src-1.0.0.7z : https://mega.co.nz/#!bcF2yKKL!3Ud86GaibgvwBehoxkbO4UNdiBgsaixRx7ksHrgNbDI
gpuPlotGenerator-bin-win-x86-1.0.0.7z : https://mega.co.nz/#!HJsziTCK!UmAMoEHQ3z34R4RsXoIkYo9rYd4LnFtO_pw-R4KObJs

I will build another release in the end of the day with some minor improvements (threads per compute unit selection, output of OpenCL error codes, improvement of the Makefile to generate the distribution directly).
I will also try to figure out another mean to dispatch the work between the GPU threads to reduce the amount of private memory needed by the program.

For the windows people, you can use the binary version directly.
For the linux people, just download the source archive, make sure to modify the OpenCL library and lib path in the makefile (and maybe the executable name), and build the project via "make". To run the program, you need the "kernel" and the "plots" directories beside the executable.

The executable usage is : ./gpuPlotGenerator <address> <start nonce> <nonces> <stagger size>
The parameters are the same as the original plot generator, without the threads number.

If you find bugs or if you want some new features, let me now.

If you want to support me, here are my Bitcoin and Burst addresses :
Bitcoin: 138gMBhCrNkbaiTCmUhP9HLU9xwn5QKZgD
Burst: BURST-YA29-QCEW-QXC3-BKXDL

Regards

Quote from: bipben on September 08, 2014, 09:02:10 AM

Quote from: burstcoin on September 08, 2014, 08:43:32 AM

Quote from: bipben on September 08, 2014, 08:20:35 AM

Unfortunately, I can't test the GPU mode as it requires a very powerfull graphic card (with at least 46kB private memory per compute unit, because the algorithm needs at least 4096*64 static bytes to store an entire plot).

It's nice to see someone else working on this, since I seem to have failed in it.

Private memory is actually part of global on AMD cards, so storing it in private isn't any better than just using global for everything; it's local that needs to aimed for for the massive speedup. No AMD cards have more than 64KB local per workgroup, which makes storing it all in local impossible however.

I haven't tried your implementation yet, but on my own first attempt, I also used global on everything also, and the result was faster than the java plotter, but slower than dcct's c plotter. My 2nd attempt used a 32KB local buffer I rotated through for storing the currently being hashed stuff, however I couldn't figure out how to get it copied also to global fast enough, and the local -> global copy killed the performance.

You might be interested in those kernels here: https://bitcointalk.org/index.php?topic=731923.msg8695829#msg8695829

Thanks, I will look at your kernels to see if I can find a better solution.

Here is the new version. I reduced the amount of memory used from 40KB to about 1KB per unit. The only drawback is that it requires twice the global memory as before. I will search a mean to reduce this overhead later.
In CPU mode, it all goes pretty well (when no graphic card is detected).
The GPU mode is still kind of buggy on my graphic card (an old GeForce 9300M GS), don't know the exact reason yet. Sometimes it works, sometimes not. I will try to fix this issue tomorrow.

Here are the files :
gpuPlotGenerator-src-1.1.0.7z : https://mega.co.nz/#!iYFWAL5B!BvtmRQ5qGq4gGwjDglFNtDtNIX4LDaUvATBtClBdTlQ
gpuPlotGenerator-bin-win-x86-1.1.0.7z : https://mega.co.nz/#!aBVGBBQD!tBsRtb8VrHR12_anrFTrl41U0fPQu_OqFnxyi5nCyBY

For the linux users, the Makefile has a new target named "dist" that builds and copy/paste all the necessary files to the "bin" directory.

The executable usage is : ./gpuPlotGenerator <path> <address> <start nonce> <nonces> <stagger size> <threads>
<path> : the path to the plots directory
<threads> : number of parrallel threads for each work group

Found the "randomness" cause. NVIDIA is caching the kernel after the first build and rebuild it from time to time. By cleaning the cache, I can force the kernel build and speed up the debugging process.
I will notify you as soon as the crash cause is found and corrected.

Bad news guys. There is no actual "bug" in the implementation. Seems like the graphic card is beeing streesed too much by the shabal core, thus the driver is shutting down the kernel (there is a watch-dog timer for this purpose hard coded in the display driver to ensure that the display don't freeze too much). I will try to improve the whole algorithm and memory consumption to the needed graphic card power.

In the meantime, I found this thread (http://stackoverflow.com/questions/12259044/limitations-of-work-item-load-in-gpu-cuda-opencl) that speak about this particular issue. The available options are :
- If you have more than one graphic card, you can launch the plotter on the one that does not hold the display. There is still no option to select the graphic card in the plotter, but I will code it soon so that you can test it in a multi-GPU environment.
- You can try to turn-off the watchdog timer by following the provided link, but be CAREFUL, you may experience terrible display lags, or even full black screens until the plotter process finishes its work.

You don't need to improve it to avoid this issue, just split it. One kernel for first half, one kernel for second half.

The new major update is in progress, thanks to burstcoin advice. I think that a lot more of graphic cards will be compatible with this version (at least I hope so).

Quote from: yellowduck2 on September 09, 2014, 01:42:47 PM

Anyone know who is the creator of gpu plotter ?

Sorry for the confusion between "bipben" and "cryo", but "cryo" was already taken on this forum so I picked a random name...

Quote from: alphateam on September 09, 2014, 01:30:19 PM

After 3h30 i was plotting nearly 1tb with the gpu plotter

I keep you informed if all works well when finish bipben (more 3,5tb to fill)

Thanks for the news. Good to see that it works well on your side.

Quote from: unsoindovo on September 09, 2014, 12:51:44 PM

Quote from: unsoindovo on September 09, 2014, 12:46:47 PM

Any problem if i mix in the same dir, plots generated with different stagger size?

maybe the miner will have some issue?

i can't run gpuplotter for stagger size >4000
see below.
with 8000 i get bad_alloc error

To plot with a stagger size of 8000, you will need 2GB of RAM on the CPU side and 4GB of RAM on the GPU side.
Here, the error refers to the CPU side.

Quote from: alexzerg11 on September 09, 2014, 12:22:35 PM

I'm use gpuPlotGenerator 1.1.
I have error in two-videocards system (HD7850 & R9_270x with driver 14.6b):

Code:

Path: plots
Nonces: 68847637 to 69036053 (46 GB)
Process memory: 1024MB
Threads number: 256
--------------
Retrieving OpenCL platform
Retrieving OpenCL GPU device
Creating OpenCL context
Creating OpenCL command queue
Creating CPU buffer
Creating OpenCL GPU generation buffer
An OpenCL error occured in the generation process, aborting...
>>> [-61] Unable to create the OpenCL GPU generation buffer

and other error in system with HD5870 (driver 13.12):

Code:

Path: plots
Nonces: 0 to 10000 (2 GB)
Process memory: 500MB
Threads number: 128
--------------
Retrieving OpenCL platform
Retrieving OpenCL GPU device
Creating OpenCL context
Creating OpenCL command queue
Creating CPU buffer
Creating OpenCL GPU generation buffer
Creating OpenCL GPU scoops buffer
Creating OpenCL program
Building OpenCL program
Creating OpenCL kernel
Setting OpenCL kernel arguments
Generating from nonce #0
An OpenCL error occured in the generation process, aborting...
>>> [-54] Error in kernel launch

For the first error (-61 = CL_INVALID_BUFFER_SIZE), it is due to a lack of memory space in your GPU. Try with a lower stagger size.
For the second error (-54 = CL_INVALID_WORK_GROUP_SIZE), it is due to an incorrect threads number. Try with a lower threads number.

Quote from: alexrossi on September 09, 2014, 10:29:12 AM

Advices? (GTX 770 2GB) 100 bursts bounty

Quote

C:\Users\x\Desktop>gpuPlotGenerator.exe C:\Users\x\Desktop\plots 7733xxxxxxxxxxx 5700001 2000000 1000 64
GPU plot generator v1.1.0
Author: Cryo
Bitcoin: 138gMBhCrNkbaiTCmUhP9HLU9xwn5QKZgD
Burst: BURST-YA29-QCEW-QXC3-BKXDL
--------------
Path: C:\Users\x\Desktop\plots
Nonces: 5700001 to 7700001 (488 GB)
Process memory: 250MB
Threads number: 64
--------------
Retrieving OpenCL platform
Retrieving OpenCL GPU device
Creating OpenCL context
Creating OpenCL command queue
Creating CPU buffer
Creating OpenCL GPU generation buffer
Creating OpenCL GPU scoops buffer
Creating OpenCL program
Building OpenCL program
Creating OpenCL kernel
Setting OpenCL kernel arguments
Generating from nonce #5700001
An OpenCL error occured in the generation process, aborting...
>>> [-54] Error in kernel launch

C:\Users\x\Desktop>pause

The -54 error is "CL_INVALID_WORK_GROUP_SIZE". Maybe 64 threads is too high for your GPU. Try a lower number like 32.

Quote from: burstcoin on September 09, 2014, 09:57:06 AM

Quote from: bipben on September 09, 2014, 09:50:43 AM

Quote from: bipben on September 09, 2014, 08:53:28 AM

Quote from: bipben on September 08, 2014, 10:45:17 PM

Quote from: bipben on September 08, 2014, 08:20:35 AM

Quote from: bipben on September 02, 2014, 05:42:17 PM

Quote from: bipben on September 08, 2014, 09:02:10 AM

Quote from: burstcoin on September 08, 2014, 08:43:32 AM

Quote from: bipben on September 08, 2014, 08:20:35 AM

Thanks, I will look at your kernels to see if I can find a better solution.

You don't need to improve it to avoid this issue, just split it. One kernel for first half, one kernel for second half.

What do you mean by splitting it ?
If you mean "splitting the nonces chunks" : I don't think that two kernel instances, one for each half staggerSize, will resolve the stressness problem. The actual algorithm break my NVIDIA Quadro 4000 in the middle of the P application. Moreover it will need twice more memory to run, both on GPU and CPU side.
If you mean "splitting the shabal from the nonces generation" : I don't see how for now.

Quote from: bipben on September 09, 2014, 08:53:28 AM

Quote from: bipben on September 08, 2014, 10:45:17 PM

Quote from: bipben on September 08, 2014, 08:20:35 AM

Quote from: bipben on September 02, 2014, 05:42:17 PM

Quote from: bipben on September 08, 2014, 09:02:10 AM

Quote from: burstcoin on September 08, 2014, 08:43:32 AM

Quote from: bipben on September 08, 2014, 08:20:35 AM

Thanks, I will look at your kernels to see if I can find a better solution.

Quote from: bipben on September 08, 2014, 10:45:17 PM

Quote from: bipben on September 08, 2014, 08:20:35 AM

Quote from: bipben on September 02, 2014, 05:42:17 PM

Quote from: bipben on September 08, 2014, 09:02:10 AM

Quote from: burstcoin on September 08, 2014, 08:43:32 AM

Quote from: bipben on September 08, 2014, 08:20:35 AM

Thanks, I will look at your kernels to see if I can find a better solution.

Quote from: BurstBurst on September 09, 2014, 08:12:42 AM

What is your full parameter ?

Quote from: alphateam on September 09, 2014, 07:52:10 AM

Quote from: bipben on September 09, 2014, 07:17:23 AM

Quote from: SpeedDemon13 on September 09, 2014, 12:38:04 AM

Quote from: bipben on September 08, 2014, 10:45:17 PM

Quote from: bipben on September 08, 2014, 08:20:35 AM

Quote from: bipben on September 02, 2014, 05:42:17 PM

Quote from: bipben on September 08, 2014, 09:02:10 AM

Quote from: burstcoin on September 08, 2014, 08:43:32 AM

Quote from: bipben on September 08, 2014, 08:20:35 AM

Thanks, I will look at your kernels to see if I can find a better solution.

So the usage would be like this: "D:/gpuPlotGenerator <numerical_account_address> 0 819200 4096 <cpu/gpu_threads?>"

Is that format correct? Is the thread count need for gpu plotting(Point out in bold)? What's the nonce/minute rate?

Hi,

This is still a buggy early stage version. I post it here to have feedback from people who owns more powerfull graphic cards (the behaviour may vary from one card to another).
But yes, the final usage would be the one you mentioned. The threads parameter is the number of threads used in the local work group. In GPU mode, the value should be a multiple a 64, 256 is the typical value for most of the cards.

Ok i made a test with my R9 290

I Put 256 in thread (apparently can't put more)

And in 1min15 i generate from nonce 888597 to nonce 900885, So 9830 nonce minute, not bad at all

Wow! So it really works on some models after all! Glad to read it. I am still investigating to correct the bug that occurs on the other graphic cards.
Thank you for your feedback.

Quote from: SpeedDemon13 on September 09, 2014, 07:22:14 AM

Quote from: bipben on September 09, 2014, 07:08:22 AM

Quote from: BurstBurst on September 08, 2014, 11:53:14 PM

Can't get it work and What is reco <threads> for R9 Series Card ?

Quote from: bipben on September 08, 2014, 10:45:17 PM

Quote from: bipben on September 08, 2014, 08:20:35 AM

Quote from: bipben on September 02, 2014, 05:42:17 PM

Quote from: bipben on September 08, 2014, 09:02:10 AM

Quote from: burstcoin on September 08, 2014, 08:43:32 AM

Quote from: bipben on September 08, 2014, 08:20:35 AM

Thanks, I will look at your kernels to see if I can find a better solution.

Hi,

I am aware of the current bug and I will work on it today.
For the <threads> parameter, use a multiple of 64, the typical value is 256 for most of the graphic cards.

So, the <threads> parameter is not needed for Windows exe? Is the <address> parameter the numerical address created from the passphrase? How does this gpu plotter perform vs the cpu plotter? Are the plots error free or is there a margin of error compared to the cpu plotted ones?

Update: Didn't read it properly about the thread parameter, understand it's required for Windows and Linux. Why is the multiples 64 (same as the scoop size) and typical value 256 (same as the plot size) for gpu's?

For the <address> parameter, yes it's the numerical one.
Once finished, the plotter will generate the exact same file as the CPU plotter, but faster I hope. There is no performance comparison available for now as I am still correcting it.
For the <threads> parameter, it depends on your graphic card architecture. Most of the graphic cards perform computation with workgroups of 64*N threads (256 most of the time). The maximum workgroup size is available on the manufacturer website for each model.

Quote from: SpeedDemon13 on September 09, 2014, 12:38:04 AM

Quote from: bipben on September 08, 2014, 10:45:17 PM

Quote from: bipben on September 08, 2014, 08:20:35 AM

Quote from: bipben on September 02, 2014, 05:42:17 PM

Quote from: bipben on September 08, 2014, 09:02:10 AM

Quote from: burstcoin on September 08, 2014, 08:43:32 AM

Quote from: bipben on September 08, 2014, 08:20:35 AM

Thanks, I will look at your kernels to see if I can find a better solution.

Quote from: BurstBurst on September 08, 2014, 11:53:14 PM

Can't get it work and What is reco <threads> for R9 Series Card ?

Quote from: bipben on September 08, 2014, 10:45:17 PM

Quote from: bipben on September 08, 2014, 08:20:35 AM

Quote from: bipben on September 02, 2014, 05:42:17 PM

Quote from: bipben on September 08, 2014, 09:02:10 AM

Quote from: burstcoin on September 08, 2014, 08:43:32 AM

Quote from: bipben on September 08, 2014, 08:20:35 AM

Thanks, I will look at your kernels to see if I can find a better solution.

Hi,

I am aware of the current bug and I will work on it today.
For the <threads> parameter, use a multiple of 64, the typical value is 256 for most of the graphic cards.

Quote from: twig123 on September 08, 2014, 11:01:11 PM

Quote from: bipben on September 08, 2014, 10:45:17 PM

Tried the Windows version on my Lappy (has a nvidia geforce 335M, 1GB) and I get the following:

Code:

gpuPlotGenerator.exe C:\plots 11111222223333344444 0 5000 500 1

Code:

GPU plot generator v1.1.0
Author: Cryo
Bitcoin: 138gMBhCrNkbaiTCmUhP9HLU9xwn5QKZgD
Burst: BURST-YA29-QCEW-QXC3-BKXDL
--------------
Path: C:\plots
Nonces: 0 to 5000 (1 GB)
Process memory: 125MB
Threads number: 1
--------------
Retrieving OpenCL platform
Retrieving OpenCL GPU device
Creating OpenCL context
Creating OpenCL command queue
Creating CPU buffer
Creating OpenCL GPU generation buffer
Creating OpenCL GPU scoops buffer
Creating OpenCL program
Building OpenCL program
Creating OpenCL kernel
Setting OpenCL kernel arguments
Generating from nonce #0
An OpenCL error occured in the generation process, aborting...
>>> [-5] Error in synchronous read

Each time I run it, the screen goes black for a moment and the comes back. Then I get a popup stating that the Nvidia display driver has stopped responding and has recovered.

I have my desktop with a ATI 7800 already in progress of plotting with dcct's plot generator on Linux. But I will see if I can figure out how to test this on linux on an ATI card.

Edit: Tested it on my Ubuntu install and I get the following when issuing the 'make' command:

Code:

Compiling [gpuPlotGenerator.cpp]
make: /c/_data/cryo/_apps/mingw/bin/g++: Command not found
make: *** [gpuPlotGenerator.o] Error 127

Seems this may be a hard coded location?
I've already installed 'build-essential'... how do I change this so I can actually test it?

Thanks for the test. There is a persistent bug in this implementation, I will work on it today.

For the compilation, yes I forgot to remove the hard coded path to my g++, juste replace the two locations in the Makefile by "g++". Also, replace the harcoded path to the opencl library to your own opencl install path. Sorry for the lack of linux build for now.