Bitcoin Forum
November 14, 2024, 10:42:28 AM *
News: Check out the artwork 1Dq created to commemorate this forum's 15th anniversary
 
   Home   Help Search Login Register More  
Pages: [1]
  Print  
Author Topic: Would it be smart to unload some of the work onto the cpu?--read this PDF  (Read 1705 times)
joulesbeef (OP)
Sr. Member
****
Offline Offline

Activity: 476
Merit: 250


moOo


View Profile
August 02, 2011, 05:37:26 PM
 #1

This is a PDF about using the AMD GPU for password recovery and they are mainly talking about the upcoming sdk 2.5.
If you look at page 10 on.. you see they suggest that offloading some of the work onto the CPU.. speeds up their password recovery greatly.

Now I'm not a programmer.. probably should have said that first but you probably wouldnt have read this far... So I dont know if this will help you guys at all, but I figure  solving these hashes is simular.. like I dont think we have to do the trial password thing.. just the hashes and keys.. but surely we validate the keys


Eh if it doesnt help just ignore me.. I hate it when people dont know what they are talking about try to diagnosis crap.. cars, computers.. etc

But if it helps yall then it helps me, so I figured it was worth the large chance of looking like the fool, to share.

mooo for rent
Starlightbreaker
Legendary
*
Offline Offline

Activity: 1764
Merit: 1006



View Profile
August 02, 2011, 05:38:47 PM
 #2

i read somewhere that people sells their gpu time for password cracking.

it has been done for quite awhile, i think?

djex
Full Member
***
Offline Offline

Activity: 196
Merit: 100


View Profile
August 02, 2011, 05:54:12 PM
 #3

Interesting find. This could prove to be useful.

Smiley  : 1LbvSEJwtQZKLSQQVYxQJes8YneQk2yhE3
PLaci1982
Full Member
***
Offline Offline

Activity: 168
Merit: 100


Live long and prosper. \\//,


View Profile
August 02, 2011, 07:54:37 PM
 #4

More interesting for me in this PDF, that this experienced team of GPGPU programers do write they programs not in OpenCl, but in in Intermediate Language (IL)...

Hardware Expert / WinXP, Win7 Expert

1J5oPkyGVdb4mv44KGZQYsHS2ch6e1t4rc
kokjo
Legendary
*
Offline Offline

Activity: 1050
Merit: 1000

You are WRONG!


View Profile
August 02, 2011, 08:01:15 PM
 #5

no there is nothing to speedup.

"The whole problem with the world is that fools and fanatics are always so certain of themselves and wiser people so full of doubts." -Bertrand Russell
joulesbeef (OP)
Sr. Member
****
Offline Offline

Activity: 476
Merit: 250


moOo


View Profile
August 02, 2011, 08:38:05 PM
 #6

Quote
no there is nothing to speedup.


while I admit I know nothing about programming miners, that is a bit definitive, considering the kernel mods which have resulted in increased speed.

Not that I doubt you, but did you look at the pdf?


mooo for rent
kokjo
Legendary
*
Offline Offline

Activity: 1050
Merit: 1000

You are WRONG!


View Profile
August 02, 2011, 09:03:56 PM
 #7

Quote
while I admit I know nothing about programming miners, that is a bit definitive, considering the kernel mods which have resulted in increased speed.
it was optimization in the code of the gpu kernel, that resulted in the speedup.
 
Quote
Not that I doubt you, but did you look at the pdf?
no, i did not have to. but im going to now.
there is nothing mining related that a CPU can do, that a GPU can not.

the only thing that seperates the GPU and the CPU is:
GPUs have a lot of cores, and therefor i good for heavy computation.
CPUs have a lot of system capability. like paging, privileged separation, task switching and other nasty stuff.

all this stuff makes the CPUs more complicated more heavy, and robust.

i have now read the paper. there is still nothing to speedup.
the thing that the paper suggest is that we put the cpu to work, while the gpu works.
but there are nothing to do for the cpu while the gpu works.

"The whole problem with the world is that fools and fanatics are always so certain of themselves and wiser people so full of doubts." -Bertrand Russell
joulesbeef (OP)
Sr. Member
****
Offline Offline

Activity: 476
Merit: 250


moOo


View Profile
August 02, 2011, 09:41:06 PM
 #8

cool, i knew i was an idiot, but I just wanted a clearer def on why.

thanks for the reply.

mooo for rent
fpgaminer
Hero Member
*****
Offline Offline

Activity: 560
Merit: 517



View Profile WWW
August 03, 2011, 12:36:25 AM
 #9

Quote
cool, i knew i was an idiot, but I just wanted a clearer def on why.
Your question was definitely not stupid. I don't like to see perfectly viable ideas get shot down for no reason  Tongue

Quote
the thing that the paper suggest is that we put the cpu to work, while the gpu works.
It's more complicated than that. The slides are a bit scatter brained, but I'll just reference the "Profit!" slide, because I think that is what the OP is referring to. What it suggests is moving the two components of the algorithm which the GPUs perform poorly at, but which the CPU can quickly process while the GPU is doing more intensive and suitable work. It's like hiring someone at your restaurant to peel potatoes and take out the trash, so your cooks can do what they do best.

In some sense, this has already been applied to mining. bitcoind actually does half the work for you, and "pre-hashes" the results it returns (this is where midstate comes from). Hence, some of the work has been offloaded to the CPU-based bitcoind.

And a lot of the recent improvements to the GPU mining algorithms have come from pre-computing, on the CPU, various values.

And we also only generate Difficulty-1 shares from the GPU, because anything else would slow the GPU down. The CPU is responsible for determining if that share is actually useful in generating a block.

However, those are only once per unit of work optimizations. Unlike the password recovery slides, there is no repeated computation that the CPU could help with in mining, because there are no parts of the mining algorithm that are better suited for the CPU. No complicated generation or verification. Since the CPU isn't better at any part of it, it's best to leave the whole algorithm on the GPU.

Quote
but there are nothing to do for the cpu while the gpu works.
That's not the case at all. The CPU can be hashing as well, and on most modern CPUs that would give you an extra 3 or 4 MH/s ... roughly the same increment you see in each new revision of the GPU code.

The real reason why we don't utilize the CPU in this manner is not because you can't. You most certainly can run a CPU miner alongside your GPU miners (with a bit of effort). People don't usually do it, though, because the CPUs have terrible MH per Watt performance. i.e. you're wasting electricity. It's far more suitable to buy a low power, cheap CPU and have it just twiddling its thumbs so it can snap into action when it's time to give the GPU new work. Like the squeegee boys on porn shoots.

kokjo
Legendary
*
Offline Offline

Activity: 1050
Merit: 1000

You are WRONG!


View Profile
August 03, 2011, 08:46:49 AM
 #10

Quote
And a lot of the recent improvements to the GPU mining algorithms have come from pre-computing, on the CPU, various values.
please define "various values". if its the midstate you are talking about, it was a obvious optimization.

"The whole problem with the world is that fools and fanatics are always so certain of themselves and wiser people so full of doubts." -Bertrand Russell
fpgaminer
Hero Member
*****
Offline Offline

Activity: 560
Merit: 517



View Profile WWW
August 03, 2011, 09:26:47 AM
 #11

Quote
please define "various values". if its the midstate you are talking about, it was a obvious optimization.
https://bitcointalk.org/index.php?topic=33817.0

They list out lots of the constant and semi-constant optimizations in that thread.

gat3way
Sr. Member
****
Offline Offline

Activity: 256
Merit: 250


View Profile
August 03, 2011, 10:11:11 AM
 #12

What they do is overlapping kernel execution and host-device transfers. It is generally possible but troublesome with OpenCL and AMD APP SDK. It is done by using asynchronous kernel invocations and buffer reads (supplying CL_FALSE as the "blocking" function argument and callback functions also supplying CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE while creating the queue). This requires DMA support which works with 2.4 only. Keep in mind that it depends very much on the task being performed - the way all bitcoin kernels are written would not allow this.

It would be possible if there were no "reduction", e.g all workitems write their output in their part of the grid, e.g for NDRange of 1,000,000 nonces, you have an output buffer of 1,000,000 uints. This way you may overlap transfers and kernel executions.

It wouldn't be much faster (in fact I wouldn't be surprised if it turns out to be slower). Also, people that underclock memory will definitely suffer.

P.S those might be helpful:

http://forums.amd.com/forum/messageview.cfm?catid=390&threadid=144609
http://forums.amd.com/devforum/messageview.cfm?FTVAR_FORUMVIEWTMP=Linear&catid=390&threadid=134450
kokjo
Legendary
*
Offline Offline

Activity: 1050
Merit: 1000

You are WRONG!


View Profile
August 03, 2011, 10:32:16 AM
 #13

Quote
please define "various values". if its the midstate you are talking about, it was a obvious optimization.
https://bitcointalk.org/index.php?topic=33817.0

They list out lots of the constant and semi-constant optimizations in that thread.
still the cpu can not help. all theas is optimization in the algorithm, to calculate the sha256 hashes.

"The whole problem with the world is that fools and fanatics are always so certain of themselves and wiser people so full of doubts." -Bertrand Russell
Pages: [1]
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!