Print Page - delete

Title: delete
Post by: BitcoinEXpress on October 18, 2011, 12:11:42 PM

delete

Title: Re: Solidcoin 2.0 Coinhunter Looking For an OpenCL Miner
Post by: Spacy on October 18, 2011, 12:18:17 PM

Quote from: BitcoinEXpress on October 18, 2011, 12:11:42 PM

Supercoder and all around genius Coinhunter is looking for someone to develop an OpenCL miner for Solidcoin 2.0. In a totally hypo-critical move he requires it to be "Open Source" LOL ;D ;D ;D

http://solidcointalk.org/topic/318-first-opencl-miner-for-solidcoin-v20/page__view__findpost__p__3206

As your coding slaves are already working on one, you could publish it and get the bounty ;-)

Title: Re: Solidcoin 2.0 Coinhunter Looking For an OpenCL Miner
Post by: ElectricMucus on October 18, 2011, 12:19:13 PM

Ehm how is anyone supposed to develop that? By using his "white paper", not much to work with. The guy is trolling himself...

Title: Re: Solidcoin 2.0 Coinhunter Looking For an OpenCL Miner
Post by: MaGNeT on October 18, 2011, 05:13:27 PM

My colleague is working on a CUDA miner (in C++) for SC2 right now, started last Thursday.

It's Alpha status right now, many crashes and lots of rejects but it gets about 105Khash/s on his GTX470.
My good old 9800GTX is doing around 26Khash/s but there's still a lot of room for improvement.
It looks like nVidia is actually better on scrypt type chains.

Source will be released in about a week but I will be testing it a few days before launch.
Still very buggy...

I need a volunteer to test with GTX5xx series soon, so please PM me if you have one available!

Linux only! (we had some problems with latest Ubuntu and CUDA but back with old version it works again)

And we need a name for it!

edit: I will let BitcoinEXpress in on the Beta as soon as it is available. His knowledge would really help us huge!

Title: Re: Solidcoin 2.0 Coinhunter Looking For an OpenCL Miner
Post by: MaGNeT on October 18, 2011, 05:28:51 PM

If anyone has hashrates for ATI 5xxx and 6xxx series, I would like to compare!

Title: Re: Solidcoin 2.0 Coinhunter Looking For an OpenCL Miner
Post by: sd on October 18, 2011, 05:47:52 PM

Quote from: MaGNeT on October 18, 2011, 05:28:51 PM

If anyone has hashrates for ATI 5xxx and 6xxx series, I would like to compare!

Zero. He said CUDA not OpenCL and CUDA doesn't run on ATI hardware.

Title: Re: Solidcoin 2.0 Coinhunter Looking For an OpenCL Miner
Post by: MaGNeT on October 18, 2011, 05:51:40 PM

Quote from: sd on October 18, 2011, 05:47:52 PM

Quote from: MaGNeT on October 18, 2011, 05:28:51 PM

If anyone has hashrates for ATI 5xxx and 6xxx series, I would like to compare!

Zero. He CUDA not OpenCL and CUDA doesn't run on ATI hardware.

Duh, I know... I'm a (former) hardcore SETI@home miner.
My colleague too, he also did some software optimizations for a 3th party SETI@home cruncher.

We like to compare hashrate to the upcoming OpenCL miner.
If OpenCL does much better, he stops, there is no use to develop.
He knows a lot of CUDA, not OpenCL.

Title: Re: Solidcoin 2.0 Coinhunter Looking For an OpenCL Miner
Post by: johnj on October 18, 2011, 06:00:27 PM

Honest question: If miners are being developed for GPU's, doesn't this effectively push CPU's out of the race? I thought the whole point of a CPU 'only' chain was a wider distribution.

Title: Re: Solidcoin 2.0 Coinhunter Looking For an OpenCL Miner
Post by: Spacy on October 18, 2011, 06:06:02 PM

Quote from: johnj on October 18, 2011, 06:00:27 PM

Honest question: If miners are being developed for GPU's, doesn't this effectively push CPU's out of the race? I thought the whole point of a CPU 'only' chain was a wider distribution.

It depends on the efficiency => hardware cost + running costs per hashrate.

Title: Re: Solidcoin 2.0 Coinhunter Looking For an OpenCL Miner
Post by: johnj on October 18, 2011, 06:10:04 PM

Quote from: Spacy on October 18, 2011, 06:06:02 PM

Quote from: johnj on October 18, 2011, 06:00:27 PM

Honest question: If miners are being developed for GPU's, doesn't this effectively push CPU's out of the race? I thought the whole point of a CPU 'only' chain was a wider distribution.

It depends on the efficiency => hardware cost + running costs per hashrate.

True, but many of the people involved with mining cryptocurrencies already have an arsenal of GPU's just waiting - which have (probably) already been paid for with Bitcoins.

I can never seem to keep my finger on where CH is headed with his chain... seems like it changes every week.

Title: Re: Solidcoin 2.0 Coinhunter Looking For an OpenCL Miner
Post by: Spacy on October 18, 2011, 06:17:50 PM

Quote from: johnj on October 18, 2011, 06:10:04 PM

Quote from: Spacy on October 18, 2011, 06:06:02 PM

It depends on the efficiency => hardware cost + running costs per hashrate.

True, but many of the people involved with mining cryptocurrencies already have an arsenal of GPU's just waiting - which have (probably) already been paid for with Bitcoins.

True, but more people have CPUs :) But so you can mine with multiple xPUs.

Title: Re: Solidcoin 2.0 Coinhunter Looking For an OpenCL Miner
Post by: johnj on October 18, 2011, 06:24:10 PM

Quote from: Spacy on October 18, 2011, 06:17:50 PM

Quote from: johnj on October 18, 2011, 06:10:04 PM

Quote from: Spacy on October 18, 2011, 06:06:02 PM

It depends on the efficiency => hardware cost + running costs per hashrate.

True, but many of the people involved with mining cryptocurrencies already have an arsenal of GPU's just waiting - which have (probably) already been paid for with Bitcoins.

True, but more people have CPUs :) But so you can mine with multiple xPUs.

Right, but that puts SC mining back into the same state as BTC mining. I (genuinely) thought CH's goal was to be different in those regards.

Title: Re: Solidcoin 2.0 Coinhunter Looking For an OpenCL Miner
Post by: Ahimoth on October 18, 2011, 06:35:26 PM

The reason for a GPU version is exactly as viperjbm pointed out, there is a sizeable audience that desires a gpu miner. The goal of SC2's new hash was to make it GPU unfriendly, not GPU impossible. The idea was to level the playing field, so your average person with a not so spectacular video card could still have a chance at mining. Our preliminary investigations into a GPU miner for SC2 indicate that a GPU version will be somewhat faster than the CPU miner, but nowhere near the gap that GPU miners have over CPU's in BitCoin.

As a side bonus, by releasing a GPU miner, we would be taking away from bitcoins hash power.

And btw, MaGNeT, I cetainly hope you are not implying SC2 uses scrypt, because it does not.

Title: Re: Solidcoin 2.0 Coinhunter Looking For an OpenCL Miner
Post by: johnj on October 18, 2011, 06:37:07 PM

Quote from: viperjbm on October 18, 2011, 06:31:22 PM

Quote from: johnj on October 18, 2011, 06:24:10 PM

Right, but that puts SC mining back into the same state as BTC mining. I (genuinely) thought CH's goal was to be different in those regards.

If one could mine the same as or better on a CPU but have an option to mine GPU as well even if the efficiency in not ideal.... is that not different?

I suspect that is more for.... educational masturbation, just to see what could be done and open up the options that are out there. Ideally, it is likely a proof that indeed CPU's are better at mining this than GPU's.... at least for the time being.

I admit I haven't hashed on SC 2, so I'm unaware of what my CPU hashing would be, I was taking it for granted that it would be roughly somewhere the same as LTC/TBX. Given that, if a i7-920 can get ~10-15 kh/s @ 200w, and a GTX 470 can get ~100 kh/s @ 205 (from above)...

Those were the numbers I was going on, which in that case makes GPU's obviously better. But of course I may be in error with some of my numbers.

Title: Re: Solidcoin 2.0 Coinhunter Looking For an OpenCL Miner
Post by: Spacy on October 18, 2011, 08:41:01 PM

Quote from: BitcoinEXpress on October 18, 2011, 07:55:28 PM

A single decent GPU rig will totally over power the entire SC network.

Yeah... not ;D

Title: Re: Solidcoin 2.0 Coinhunter Looking For an OpenCL Miner
Post by: SuperTramp on October 18, 2011, 09:02:31 PM

Quote from: johnj on October 18, 2011, 06:37:07 PM

Quote from: viperjbm on October 18, 2011, 06:31:22 PM

Quote from: johnj on October 18, 2011, 06:24:10 PM

Right, but that puts SC mining back into the same state as BTC mining. I (genuinely) thought CH's goal was to be different in those regards.

I am getting 34KH/s with my AMD x6 1090T (overclocked to 3.93ghz), my guess is with an i7-920 you would probably be around that number as well.

Title: Re: Solidcoin 2.0 Coinhunter Looking For an OpenCL Miner
Post by: MaGNeT on October 18, 2011, 09:22:52 PM

Quote from: BitcoinEXpress on October 18, 2011, 07:55:28 PM

Quote from: viperjbm on October 18, 2011, 06:31:22 PM

Quote from: johnj on October 18, 2011, 06:24:10 PM

Right, but that puts SC mining back into the same state as BTC mining. I (genuinely) thought CH's goal was to be different in those regards.

@Coinhunter

Since I know you are watching this forum. The only reason you are mentioning GPU mining now is that I said THIS (https://bitcointalk.org/index.php?topic=48648.msg578639#msg578639) about you having GPU mining and you know people are disassembling your code. Funny I post that and just a few hours later, you're "Looking" for an OpenCL miner. LOL

Anyway Magnet, Send me what you have and I will have a couple of Cuda/Open CL experts look at it. I jus PM'd you and yes as you can see it is from an apple.com email address. ;D

I am especially interested in the CUDA as I think it will be poetic justice that a "Geoforce Army" take out SC by essentially turning it into a BTC type of difficulty rather quickly.

GPU mining will effectively end CPU mining in Solidcoin and you KNOW IT, even if only one person drops a decent rig on it. A single decent GPU rig will totally over power the entire SC network.

But I highly suspect there has been one person GPU mining all alone.

~BCX~

Thanks! Yeah, the first mail from Apple I ever got ;D
I sent you all the relevant info I have.
Thanks already for the suggestions you made, this is very promising!
I wish I could code like that. :(

Title: Re: Solidcoin 2.0 Coinhunter Looking For an OpenCL Miner
Post by: MaGNeT on October 18, 2011, 10:38:01 PM

GTX470 is now at ~170Khash/s and it didn't crash for 3 hours :)
Stales are still 15-20% but he'll be working on that.
Will test on my 9800GTX but got to get some sleep now first.

I only received a few PM's from people with an GTX5xx series card. Seems logic, all ATI fanboys over here :P

1 person with 2 GTX580 cards received our Alpha and if he gets it up and running (he has to set up Ubuntu first) we hope to get some nice results :)

edit:

Please people, stop asking for the CUDA-miner on PM, we won't release it yet, and no, not for 10BTC also.
There will be a public release in about a week, maybe a bit longer (it's all done in spare-time of programmer) but now it's still "work in progress".
I suck in coding so all the kudo's go to him (well, I did a bit of Basic programming in the 80's and 90's :P)

Source-code will be released so it can be reviewed or compiled by the ones who don't like it precompiled.
I don't know how to compile it myself but I'm sure some smart guys figure that out.

We still need a name! PM your suggestions :)

Title: Re: Solidcoin 2.0 Coinhunter Looking For an OpenCL Miner
Post by: coblee on October 18, 2011, 11:10:32 PM

Quote from: MaGNeT on October 18, 2011, 10:38:01 PM

Is the SC2 algorithm more GPU-friendly than Scrypt? Do you also plan to release a Scrypt version so I can play around with mining Litecoins on a GPU?

Title: Re: Solidcoin 2.0 Coinhunter Looking For an OpenCL Miner
Post by: Bobnova on October 19, 2011, 12:48:17 AM

Oh man, 170kh from a 470 means probably 200-230kh from a gtx580, or right about one kh/watt.
Compare that to a Thuban's 34kh/s from ~170w or my 2600k's 50kh/s from 150w, and you're looking at the end of CPU mining alright. CUDA GPUs will be three or more times more profitable. And that's just the alpha miner.

nvidia GPUs have a lot more cache in them than ATI GPUs you see.

Title: Re: Solidcoin 2.0 Coinhunter Looking For an OpenCL Miner
Post by: MaGNeT on October 19, 2011, 06:08:38 AM

Quote from: coblee on October 18, 2011, 11:10:32 PM

Quote from: MaGNeT on October 18, 2011, 10:38:01 PM

Is the SC2 algorithm more GPU-friendly than Scrypt? Do you also plan to release a Scrypt version so I can play around with mining Litecoins on a GPU?

It's not profitable to create a scrypt miner, as far as my knowledge goes. But I'm sure other people are working on that too.

Title: Re: Solidcoin 2.0 Coinhunter Looking For an OpenCL Miner
Post by: MaGNeT on October 19, 2011, 06:10:17 AM

Quote from: BitcoinEXpress on October 19, 2011, 12:48:22 AM

Quote from: MaGNeT on October 18, 2011, 09:22:52 PM

Quote from: BitcoinEXpress on October 18, 2011, 07:55:28 PM

Quote from: viperjbm on October 18, 2011, 06:31:22 PM

Quote from: johnj on October 18, 2011, 06:24:10 PM

Right, but that puts SC mining back into the same state as BTC mining. I (genuinely) thought CH's goal was to be different in those regards.

Thanks! Yeah, the first mail from Apple I ever got ;D
I sent you all the relevant info I have.
Thanks already for the suggestions you made, this is very promising!
I wish I could code like that. :(

No prob, but realize it's not my coding! A really cool guy that just seems to be able to just bang stuff out that works did it!

We're still getting crashes due to loops of some kind. Working on it, will keep all posted.

~BCX~

Sent you the latest version. Does Amazon have a CUDA farm?

Title: Re: Solidcoin 2.0 Coinhunter Looking For an OpenCL Miner
Post by: 3phase on October 19, 2011, 07:19:07 AM

Quote from: doublec on October 19, 2011, 06:16:38 AM

Quote from: BitcoinEXpress on October 19, 2011, 06:12:45 AM

Quote

Sent you the latest version. Does Amazon have a CUDA farm?

uh that would be no...

But don't sweat it, it's not that hard to put together a small pool of 15-20 for testing.

Aren't their GPU instances CUDA compatible?

Yes they are. Currently 0.68$/per hour spot instances for 2xM2050 GPUs and, I might add, 2 x X5570 Xeons. Two birds in one shot.

Title: Re: EDITED TITLED: Solidcoin 2.0 GPU CUDA Nvidia Miner by MaGNET in Alpha Test
Post by: saethan on October 19, 2011, 07:31:09 AM

Quote

Cluster GPU Quadruple Extra Large Instance

22 GB of memory
33.5 EC2 Compute Units (2 x Intel Xeon X5570, quad-core “Nehalem” architecture)
2 x NVIDIA Tesla “Fermi” M2050 GPUs
1690 GB of instance storage
64-bit platform
I/O Performance: Very High (10 Gigabit Ethernet)
API name: cg1.4xlarge

- the M2050 is the same GPU as the GTX470, not sure about the differences in the memory layout of a Tesla though

Title: Re: EDITED TITLED: Solidcoin 2.0 GPU CUDA Nvidia Miner by MaGNET in Alpha Test
Post by: tacotime on October 19, 2011, 04:25:50 PM

pm me the latest version, I have 4x gtx570, i'll give it a try if you want

edit: give me the source code, I'm on linux amd64

Title: Re: Solidcoin 2.0 Coinhunter Looking For an OpenCL Miner
Post by: MaGNeT on October 19, 2011, 08:02:15 PM

Quote from: BitcoinEXpress on October 19, 2011, 07:08:26 AM

Quote from: doublec on October 19, 2011, 06:16:38 AM

Quote from: BitcoinEXpress on October 19, 2011, 06:12:45 AM

Quote

Sent you the latest version. Does Amazon have a CUDA farm?

uh that would be no...

But don't sweat it, it's not that hard to put together a small pool of 15-20 for testing.

Aren't their GPU instances CUDA compatible?

I am not familiar at all with CUDA anything, never used it.

The Tesla Fermi has CUDA support. Would be big if we could rent a farm.
I don't have the money to do it though. Once miner is Beta, could you do an attempt?

Title: Re: Solidcoin 2.0 Coinhunter Looking For an OpenCL Miner
Post by: MaGNeT on October 19, 2011, 08:08:16 PM

Quote from: BitcoinEXpress on October 19, 2011, 08:06:03 PM

Quote from: MaGNeT on October 19, 2011, 08:02:15 PM

Quote from: BitcoinEXpress on October 19, 2011, 07:08:26 AM

Quote from: doublec on October 19, 2011, 06:16:38 AM

Quote from: BitcoinEXpress on October 19, 2011, 06:12:45 AM

Quote

Sent you the latest version. Does Amazon have a CUDA farm?

uh that would be no...

But don't sweat it, it's not that hard to put together a small pool of 15-20 for testing.

Aren't their GPU instances CUDA compatible?

I am not familiar at all with CUDA anything, never used it.

The Tesla Fermi has CUDA support. Would be big if we could rent a farm.
I don't have the money to do it though. Once miner is Beta, could you do an attempt?

We availability to use as much EC2 as we want that pertains to us, I will see what I can to. I was just informed that it maybe possible, If I get it cost will be ZERO.

That's cheap ;D

Alpha 0.1.4 cuda_miner is now stable but he had to do some coding again. Gained a bit on efficienty, ~140Khash/s per thread now for the GTX470 (running 2 threads @ 1 GPU).
You got the update?

Funny part is: the GTX470 is only consuming ~120Watts with one thread and ~165Watts running 2 threads. That's a lot lower than I got when running SETI@home so I guess we are not on the limits of the card yet.

Title: Re: Solidcoin 2.0 Coinhunter Looking For an OpenCL Miner
Post by: tacotime on October 19, 2011, 08:12:36 PM

Quote from: MaGNeT on October 19, 2011, 08:08:16 PM

Funny part is: the GTX470 is only consuming ~120Watts with one thread and ~165Watts running 2 threads. That's a lot lower than I got when running SETI@home so I guess we are not on the limits of the card yet.

Proper coding for CUDA requires knowing both CUDA and the nVidia architecture well. It's very easy to code for it inefficiently if you are not very familiar with it.

Title: Re: Solidcoin 2.0 Coinhunter Looking For an OpenCL Miner
Post by: MaGNeT on October 19, 2011, 08:17:10 PM

Quote from: tacotime on October 19, 2011, 08:12:36 PM

Quote from: MaGNeT on October 19, 2011, 08:08:16 PM

Proper coding for CUDA requires knowing both CUDA and the nVidia architecture well. It's very easy to code for it inefficiently if you are not very familiar with it.

He did some Boinc / Seti@home improvements for one of the forked crunchers so I think he knows what he's doing. But he says this is something else.
Reversengineering the SC2 client was lots of work and only thursday he could start on the miner. He had some friends who helped him, he had to admit.

I should have asked BCX earlier, Apple knows a lot of steeling and reversengineering software ;)

I'm a total coding noob, so can't give you much of the details.

Title: Re: EDITED TITLED: Solidcoin 2.0 GPU CUDA Nvidia Miner by MaGNET in Alpha Test
Post by: sd on October 19, 2011, 08:50:09 PM

Can anyone tell me anything about the SC2 algorithm? I'm guessing it's something recursive?

I know people who work on transferring C functions to CUDA devices using Kahn process networks. It's works very nicely for some applications. I can't afford their rates but I might get some free advice.

Title: Re: EDITED TITLED: Solidcoin 2.0 GPU CUDA Nvidia Miner by MaGNET in Alpha Test
Post by: tacotime on October 20, 2011, 12:07:28 AM

Also, you should post the algorithm here. If it does rely on recursion, there is likely an iterative form of the algorithm which would not be hard to make and while would performs orders of magnitude faster than the recursive algorithm (honestly, if this is what coinhunter did it's incredibly stupid).

Title: Re: Solidcoin 2.0 Coinhunter Looking For an OpenCL Miner
Post by: DeathAndTaxes on October 20, 2011, 12:25:54 AM

Quote from: MaGNeT on October 19, 2011, 06:10:17 AM

Sent you the latest version. Does Amazon have a CUDA farm?

Yes. They have instances w/ a pair of Tesla M2050s.

Quote

Cluster GPU Instances

http://aws.amazon.com/ec2/instance-types/

Instances of this family provide general-purpose graphics processing units (GPUs) with proportionally high CPU and increased network performance for applications benefitting from highly parallelized processing, including HPC, rendering and media processing applications. While Cluster Compute Instances provide the ability to create clusters of instances connected by a low latency, high throughput network, Cluster GPU Instances provide an additional option for applications that can benefit from the efficiency gains of the parallel computing power of GPUs over what can be achieved with traditional processors. Learn more about use of this instance type for HPC applications.

Cluster GPU Quadruple Extra Large Instance

22 GB of memory
33.5 EC2 Compute Units (2 x Intel Xeon X5570, quad-core “Nehalem” architecture)
2 x NVIDIA Tesla “Fermi” M2050 GPUs
1690 GB of instance storage
64-bit platform
I/O Performance: Very High (10 Gigabit Ethernet)
API name: cg1.4xlarge

Title: Re: EDITED TITLED: Solidcoin 2.0 GPU CUDA Nvidia Miner by MaGNET in Alpha Test
Post by: Bobnova on October 20, 2011, 04:34:57 AM

Just shy of 2kh/w, that's insane compared to the CPU version.

I bet you could run a few more threads, too!

Title: Re: EDITED TITLED: Solidcoin 2.0 GPU CUDA Nvidia Miner by MaGNET in Alpha Test
Post by: MaGNeT on October 24, 2011, 07:41:18 PM

We got a nice offer in SC if we stopped this project and never release source, so we decided to stop.

Title: Re: EDITED TITLED: Solidcoin 2.0 GPU CUDA Nvidia Miner by MaGNET in Alpha Test
Post by: DeathAndTaxes on October 24, 2011, 07:43:05 PM

Oh awesome. Another win for ScamCoin. Got to love closed source code (ScamCoin not CUDA Miner).

Title: Re: EDITED TITLED: Solidcoin 2.0 GPU CUDA Nvidia Miner by MaGNET in Alpha Test
Post by: simonk83 on October 24, 2011, 08:00:10 PM

Quote from: MaGNeT on October 24, 2011, 07:41:18 PM

We got a nice offer in SC if we stopped this project and never release source, so we decided to stop.

Dammit Magnet.

Title: Re: EDITED TITLED: Solidcoin 2.0 GPU CUDA Nvidia Miner by MaGNET in Alpha Test
Post by: johnj on October 24, 2011, 08:03:08 PM

Take the SC, cash out in BTC, 'accidentally' leak the code ;)

Title: Re: EDITED TITLED: Solidcoin 2.0 GPU CUDA Nvidia Miner by MaGNET in Alpha Test
Post by: MaGNeT on October 24, 2011, 08:04:47 PM

Quote from: simonk83 on October 24, 2011, 08:00:10 PM

Quote from: MaGNeT on October 24, 2011, 07:41:18 PM

We got a nice offer in SC if we stopped this project and never release source, so we decided to stop.

Dammit Magnet.

Hmm, I understand how you and some others might feel but I hope you can all forgive me for this.
I have bills to pay, just like you...

Title: Re: EDITED TITLED: Solidcoin 2.0 GPU CUDA Nvidia Miner by MaGNET in Alpha Test
Post by: Bobnova on October 24, 2011, 08:06:20 PM

Nope, sorry.
Can't forgive it.

Last time I heard you were dedicated to SC2's downfall, and now you're supporting it.
I'm disappointed in you.

Title: Re: EDITED TITLED: Solidcoin 2.0 GPU CUDA Nvidia Miner by MaGNET in Alpha Test
Post by: Ten98 on October 24, 2011, 08:07:31 PM

Quote from: MaGNeT on October 24, 2011, 07:41:18 PM

We got a nice offer in SC if we stopped this project and never release source, so we decided to stop.

Lol, from who? Show the transaction in blockexplorer?

I think what happened is Reaper was released which beat you to the bounty.

Reaper is already in it's 7th version, mtrlt is working hard on it and adding cool features every day:

http://zerosignalgame.com/reaper0.07src.zip
http://zerosignalgame.com/reaper0.07.zip (Win32 Binary)

We still need a Win64 version as well as binaries for Linux, so any compiling whizzkids out there please have a go...

Bring your geforce2mx's to the party ;D

Title: Re: EDITED TITLED: Solidcoin 2.0 GPU CUDA Nvidia Miner by MaGNET in Alpha Test
Post by: MaGNeT on October 24, 2011, 08:22:55 PM

Quote from: BitcoinEXpress on October 24, 2011, 08:18:32 PM

Quote from: MaGNeT on October 24, 2011, 07:41:18 PM

We got a nice offer in SC if we stopped this project and never release source, so we decided to stop.

Speak for yourself, I didn't agree to it.

@Ten98
The transaction if it happens will occur in a transferred wallet file, avoiding the block explorer.

What will you do with the code?
I forgot to tell him you got a copy of the code :o

Title: Re: EDITED TITLED: Solidcoin 2.0 GPU CUDA Nvidia Miner by MaGNET in Alpha Test
Post by: Ten98 on October 24, 2011, 08:25:27 PM

Just thought I'd mention the new SolidCoin Beta 7 is out now and working very nicely.

http://dl.dropbox.com/u/38674765/beta/solidcoin-201b7.exe

8)

Title: Re: EDITED TITLED: Solidcoin 2.0 GPU CUDA Nvidia Miner by MaGNET in Alpha Test
Post by: BitterTea on October 24, 2011, 08:27:38 PM

Quote from: Ten98 on October 24, 2011, 08:25:27 PM

Just thought I'd mention the new SolidCoin Beta 7 is out now and working very nicely.

http://dl.dropbox.com/u/38674765/beta/solidcoin-201b7.exe

8)

Dropbox... professional... not scammy at all.

Title: Re: EDITED TITLED: Solidcoin 2.0 GPU CUDA Nvidia Miner by MaGNET in Alpha Test
Post by: ohforf on October 24, 2011, 08:28:40 PM

Quote from: MaGNeT on October 24, 2011, 07:41:18 PM

We got a nice offer in SC if we stopped this project and never release source, so we decided to stop.

Bullshit! Your CUDA Miner never existed. :P ;D

Title: Re: EDITED TITLED: Solidcoin 2.0 GPU CUDA Nvidia Miner by MaGNET in Alpha Test
Post by: coblee on October 24, 2011, 08:36:34 PM

Quote from: BitterTea on October 24, 2011, 08:27:38 PM

Quote from: Ten98 on October 24, 2011, 08:25:27 PM

Just thought I'd mention the new SolidCoin Beta 7 is out now and working very nicely.

http://dl.dropbox.com/u/38674765/beta/solidcoin-201b7.exe

8)

Dropbox... professional... not scammy at all.

Seriously, I hope people don't download and run random binaries from other users. But then again, I don't even trust binaries posted by CoinHunter. But if you do trust CoinHunter, only run binaries that he posted.

Title: Re: EDITED TITLED: Solidcoin 2.0 GPU CUDA Nvidia Miner by MaGNET in Alpha Test
Post by: MaGNeT on October 24, 2011, 08:42:16 PM

Quote from: Ten98 on October 24, 2011, 08:07:31 PM

Quote from: MaGNeT on October 24, 2011, 07:41:18 PM

We got a nice offer in SC if we stopped this project and never release source, so we decided to stop.

I don't have SC2 on my computer, got it on my BTC-E account. How do I show that? By receiving address?

Title: Re: EDITED TITLED: Solidcoin 2.0 GPU CUDA Nvidia Miner by MaGNET in Alpha Test
Post by: ohforf on October 24, 2011, 08:43:28 PM

Quote from: BitcoinEXpress on October 24, 2011, 08:33:02 PM

@Ten98 Nice try asshole.

http://a75.org/b7.jpg

Do you understand what your Antivirus Software is telling you ?
I dont think so. You are a Noob. :P

Title: Re: EDITED TITLED: Solidcoin 2.0 GPU CUDA Nvidia Miner by MaGNET in Alpha Test
Post by: MSAvenger on October 24, 2011, 08:44:21 PM

Quote from: Bobnova on October 24, 2011, 08:06:20 PM

Last time I heard you were dedicated to SC2's downfall, and now you're supporting it.
I'm disappointed in you.

http://media.tumblr.com/tumblr_llt1c7LvPO1qfk9xh.jpg
"Gee, Brain, what do you want to do tonight?"
"The same thing we do every night, Pinky—try to destroy the Solidcoin!"
...
But really, how would CUDA miner destroy the Solidcoin? Was it supposed to be 3 times faster tha Reaper or what? Even if it was, it would only help Coinhunter's project. There are a lot of GeForce owners and Nvidia's cards suck in Bitcoin.

Title: Re: EDITED TITLED: Solidcoin 2.0 GPU CUDA Nvidia Miner by MaGNET in Alpha Test
Post by: ineededausername on October 24, 2011, 08:44:43 PM

You should've taken CH's shitcoins and then released it anyways.

Title: Re: EDITED TITLED: Solidcoin 2.0 GPU CUDA Nvidia Miner by MaGNET in Alpha Test
Post by: BitterTea on October 24, 2011, 09:00:37 PM

Quote from: BitcoinEXpress on October 24, 2011, 08:54:07 PM

Yeah Actually I do, I like flushing out you Coinhunter clones and ass lickers.

But if it's no threat, why the twisted panty reaction ;D ;D ;D

BCE, you discredit yourself in my eyes when you make weak arguments like this... There's no indication from the provided screenshot that there is anything malicious in this file.

Title: Re: EDITED TITLED: Solidcoin 2.0 GPU CUDA Nvidia Miner by MaGNET in Alpha Test
Post by: Spacy on October 24, 2011, 09:08:00 PM

Quote from: BitcoinEXpress on October 24, 2011, 09:03:33 PM

Quote from: BitterTea on October 24, 2011, 09:00:37 PM

Quote from: BitcoinEXpress on October 24, 2011, 08:54:07 PM

Yeah Actually I do, I like flushing out you Coinhunter clones and ass lickers.

But if it's no threat, why the twisted panty reaction ;D ;D ;D

BCE, you discredit yourself in my eyes when you make weak arguments like this... There's no indication from the provided screenshot that there is anything malicious in this file.

I just like screwing with these guys, I am well aware this means maybe, who knows, probably false positive. But nonetheless it was a Norton Alert LOL

Noob at the keyboard? Maybe you should ask your pro buddies what to do know, they surely will fix your computer ;)

Title: Re: EDITED TITLED: Solidcoin 2.0 GPU CUDA Nvidia Miner by MaGNET in Alpha Test
Post by: johnj on October 24, 2011, 09:12:14 PM

Quote from: BitterTea on October 24, 2011, 09:00:37 PM

Quote from: BitcoinEXpress on October 24, 2011, 08:54:07 PM

Yeah Actually I do, I like flushing out you Coinhunter clones and ass lickers.

But if it's no threat, why the twisted panty reaction ;D ;D ;D

BCE, you discredit yourself in my eyes when you make weak arguments like this... There's no indication from the provided screenshot that there is anything malicious in this file.

I believe BCX is just getting wearisome from hashing the same things out with SC. At some point you kinda 'give up' on getting any reliable answers (source code, not just CH's word) and start pokin the fire for fun.

Title: Re: EDITED TITLED: Solidcoin 2.0 GPU CUDA Nvidia Miner by MaGNET in Alpha Test
Post by: Lolcust on October 24, 2011, 09:31:10 PM

Perhaps evil, evil BTCEx could hack the workstation with the magical miner code ;) ?

Title: Re: EDITED TITLED: Solidcoin 2.0 GPU CUDA Nvidia Miner by MaGNET in Alpha Test
Post by: Bobnova on October 24, 2011, 11:05:20 PM

Quote from: MaGNeT on October 24, 2011, 08:22:55 PM

Quote from: BitcoinEXpress on October 24, 2011, 08:18:32 PM

Quote from: MaGNeT on October 24, 2011, 07:41:18 PM

We got a nice offer in SC if we stopped this project and never release source, so we decided to stop.

Speak for yourself, I didn't agree to it.

@Ten98
The transaction if it happens will occur in a transferred wallet file, avoiding the block explorer.

What will you do with the code?
I forgot to tell him you got a copy of the code :o

My guess? Talk about it a lot but never release it, as it never existed in reality.

Title: Re: EDITED TITLED: Solidcoin 2.0 GPU CUDA Nvidia Miner by MaGNET in Alpha Test
Post by: Bitcoin Oz on October 25, 2011, 02:08:26 AM

Quote from: Spacy on October 24, 2011, 09:08:00 PM

Quote from: BitcoinEXpress on October 24, 2011, 09:03:33 PM

Quote from: BitterTea on October 24, 2011, 09:00:37 PM

Quote from: BitcoinEXpress on October 24, 2011, 08:54:07 PM

Yeah Actually I do, I like flushing out you Coinhunter clones and ass lickers.

But if it's no threat, why the twisted panty reaction ;D ;D ;D

BCE, you discredit yourself in my eyes when you make weak arguments like this... There's no indication from the provided screenshot that there is anything malicious in this file.

I just like screwing with these guys, I am well aware this means maybe, who knows, probably false positive. But nonetheless it was a Norton Alert LOL

Noob at the keyboard? Maybe you should ask your pro buddies what to do know, they surely will fix your computer ;)

Mac's dont get viruses ::)

lol

Title: Re: EDITED TITLED: Solidcoin 2.0 GPU CUDA Nvidia Miner by MaGNET in Alpha Test
Post by: Raoul Duke on October 25, 2011, 02:22:09 AM

@Magnet Well, at least they didn't offered you coins because you threatened to attack them, like lolclown did to douchebagexpress

Quote from: BitcoinMedia on October 25, 2011, 02:08:26 AM

Mac's dont get viruses ::)

lol

keep repeating that until you are convinced it's the truth and then watch yourself get owned...
There is no effective anti-virus against human stupidity...

Title: Re: EDITED TITLED: Solidcoin 2.0 GPU CUDA Nvidia Miner by MaGNET in Alpha Test
Post by: Raoul Duke on October 25, 2011, 02:35:58 AM

Quote from: BitcoinEXpress on October 25, 2011, 02:28:32 AM

@Psytard

Get the facts straight, Lolcust never paid me anything. Anonymous Namecoin people gave me 35,000 NMC after I called off the attack to show gratitude. That's all.

yes he did... 1,25million shitbrix... as you kindly admitted on a PM to someone else who will soon chime in and make you feel like a fool...

And they didn't gave you the NMC as a sign of gratitude after you called off the attack, you called off the attack because they gave you the NMC...

Thanks for reminding me about namecoins also. That showed everybody who the retarded is here... and it ain't me, right, retardexpress?

I really like you, the more you talk the more you show what a dumbass you are!

Title: Re: EDITED TITLED: Solidcoin 2.0 GPU CUDA Nvidia Miner by MaGNET in Alpha Test
Post by: Raoul Duke on October 25, 2011, 02:41:25 AM

here is the thread and a moderator confirming you said it.

https://bitcointalk.org/index.php?topic=48245.0

So, who's the retard now?

You are a moron... btw, Tor is for pedo's like you...

And the fact that you are 850+ BTC richer from namecoin only makes you are an extorsionist criminal...

Title: Re: EDITED TITLED: Solidcoin 2.0 GPU CUDA Nvidia Miner by MaGNET in Alpha Test
Post by: BitterTea on October 25, 2011, 02:42:33 AM

Quote from: psy on October 25, 2011, 02:41:25 AM

here is the thread and a moderator confirming you said it.

https://bitcointalk.org/index.php?topic=48245.0

So, who's the retard now?

You are a moron... btw, Tor is for pedo's like you...

I'm Satoshi. Any mod can confirm that I, BitterTea did in fact say this.

Title: Re: EDITED TITLED: Solidcoin 2.0 GPU CUDA Nvidia Miner by MaGNET in Alpha Test
Post by: Raoul Duke on October 25, 2011, 02:43:31 AM

Quote from: BitterTea on October 25, 2011, 02:42:33 AM

Quote from: psy on October 25, 2011, 02:41:25 AM

here is the thread and a moderator confirming you said it.

https://bitcointalk.org/index.php?topic=48245.0

So, who's the retard now?

You are a moron... btw, Tor is for pedo's like you...

I'm Satoshi. Any mod can confirm that I, BitterTea did in fact say this.

ask lolclown...

but i follow you, we all know that BCX is a dumbass full of shit that never tells the truth, so why believe him...

It's fun to pwn retards(BCX and BitterTea) on their own game...

Title: Re: EDITED TITLED: Solidcoin 2.0 GPU CUDA Nvidia Miner by MaGNET in Alpha Test
Post by: simonk83 on October 25, 2011, 02:46:47 AM

Quote from: psy on October 25, 2011, 02:35:58 AM

That showed everybody who the retarded is here... and it ain't me

Ah, the irony is delicious.

Title: Re: EDITED TITLED: Solidcoin 2.0 GPU CUDA Nvidia Miner by MaGNET in Alpha Test
Post by: Raoul Duke on October 25, 2011, 03:03:53 AM

I told you before and I'll tell you again:

Chupa-me a pila e chama-me tarzan...

Title: Re: EDITED TITLED: Solidcoin 2.0 GPU CUDA Nvidia Miner by MaGNET in Alpha Test
Post by: BitterTea on October 25, 2011, 03:05:23 AM

Quote from: psy on October 25, 2011, 03:03:53 AM

I told you before and I'll tell you again:

Chupa-me a pila e chama-me tarzan...

If you've got to obfuscate what you say so that others can't understand it, why say it at all?

Translated:

Quote

Suck my dick and call me tarzan

Title: Re: EDITED TITLED: Solidcoin 2.0 GPU CUDA Nvidia Miner by MaGNET in Alpha Test
Post by: Raoul Duke on October 25, 2011, 03:12:48 AM

Quote from: BitcoinEXpress on October 25, 2011, 03:06:08 AM

Ha ha if you only knew who I really was...

Is that you, Cheetah?

Title: Re: EDITED TITLED: Solidcoin 2.0 GPU CUDA Nvidia Miner by MaGNET in Alpha Test
Post by: tacotime on October 25, 2011, 04:42:40 PM

reaper 0.7's error handling is befuckered at best... to keep it from crashing with libcurl, change:

Code:

CURLcode code = curl_easy_perform(curl);
	if(code != CURLE_OK)
	{
		if (code == CURLE_COULDNT_CONNECT)
		{
			cout << "Could not connect. Server down?" << endl;
		}
		else
		{
			cout << "Error " << code << " submitting work. See http://curl.haxx.se/libcurl/c/libcurl-errors.html for error code explanations." << endl;
		}
	}
	curl_slist_free_all(headerlist);

So that it loops and sleeps if the data to return is null

edit: quickfix

Code:

        CURLcode code = curl_easy_perform(curl);
        while (code != CURLE_OK){
            if (code == CURLE_COULDNT_CONNECT)
            {
                   cout << "Could not connect. Server down?" << endl;
                   sleep(5);
            }
            else
            {
                   cout << "Error " << code << " submitting work. See http://curl.haxx.se/libcurl/c/libcurl-errors.html for error code explanations." << endl;
                   sleep(5);
            }
            code = curl_easy_perform(curl);
        }

Title: Re: EDITED TITLED: Solidcoin 2.0 GPU CUDA Nvidia Miner by MaGNET in Alpha Test
Post by: tacotime on October 26, 2011, 01:47:00 AM

Okay, this fix is better and just restarts the miner if it dies. simply run this script in bash:

Code:

until ./run_reaper.sh; do
    echo "Server 'run_reaper.sh' crashed with exit code $?.  Respawning.." >&2
    sleep 1
done

where run_reaper.sh is the script to run reaper

Title: Re: EDITED TITLED: Solidcoin 2.0 GPU CUDA Nvidia Miner by MaGNET in Alpha Test
Post by: tacotime on October 27, 2011, 01:06:36 AM

Bumping again, having a lot of fun here... optimized Coinhunter's code for mining and now I'm pulling 150 kh/s on a single GTX 570. I'll published later when I'm done with more optimizations. The OpenCL multidevice coding is totally fucked by the use of pthread and multigpu setups will take about a 70% performance hit for the second GPU. This can not be fixed by running separate instances, because there are segfaults if you set it to only use a 2nd or 3rd etc device.

Title: Re: EDITED TITLED: Solidcoin 2.0 GPU CUDA Nvidia Miner by MaGNET in Alpha Test
Post by: tacotime on October 27, 2011, 01:22:26 AM

Here is the more optimized RSHash.cpp

Code:

#include "RSHash.h"
#include "Blake512.h"
#include "SHA256.h"
#include <stdint.h>
#include <iostream>
using std::cout;
using std::endl;

#define PHI 0x9e3779b9
#define BLOCKHASH_1_PADSIZE (1024*1024*4)

typedef unsigned int uint32;
typedef unsigned long long int uint64;

static uint32 BlockHash_1_Q[4096],BlockHash_1_c,BlockHash_1_i;
unsigned char *BlockHash_1_MemoryPAD8;
uint32 *BlockHash_1_MemoryPAD32;

uint32 BlockHash_1_rand(void)
{
    uint32 x, r = 0xfffffffe;
    uint64 t, a = 18782LL;
    BlockHash_1_i = (BlockHash_1_i + 1) & 4095;
    t = a * BlockHash_1_Q[BlockHash_1_i] + BlockHash_1_c;
    BlockHash_1_c = (t >> 32);
    x = (t + BlockHash_1_c)&0xFFFFFFFF;
    (x < BlockHash_1_c) && ( x++ && BlockHash_1_c++ );
    return (BlockHash_1_Q[BlockHash_1_i] = r - x);
}

#include <cstdio>

void BlockHash_Init()
{
    static unsigned char SomeArrogantText1[]="Back when I was born the world was different. As a kid I could run around the streets, build things in the forest, go to the beach and generally live a care free life. Sure I had video games and played them a fair amount but they didn't get in the way of living an adventurous life. The games back then were different too. They didn't require 40 hours of your life to finish. Oh the good old days, will you ever come back?";
    static unsigned char SomeArrogantText2[]="Why do most humans not understand their shortcomings? The funny thing with the human brain is it makes everyone arrogant at their core. Sure some may fight it more than others but in every brain there is something telling them, HEY YOU ARE THE MOST IMPORTANT PERSON IN THE WORLD. THE CENTER OF THE UNIVERSE. But we can't all be that, can we? Well perhaps we can, introducing GODria, take 2 pills of this daily and you can be like RealSolid, lord of the universe.";
    static unsigned char SomeArrogantText3[]="What's up with kids like artforz that think it's good to attack other's work? He spent a year in the bitcoin scene riding on the fact he took some other guys SHA256 opencl code and made a miner out of it. Bravo artforz, meanwhile all the false praise goes to his head and he thinks he actually is a programmer. Real programmers innovate and create new work, they win through being better coders with better ideas. You're not real artforz, and I hear you like furries? What's up with that? You shouldn't go on IRC when you're drunk, people remember the weird stuff.";
    BlockHash_1_MemoryPAD8 = new unsigned char[BLOCKHASH_1_PADSIZE+8];  //need the +8 for memory overwrites
    BlockHash_1_MemoryPAD32 = (uint32*)BlockHash_1_MemoryPAD8;

    BlockHash_1_Q[0] = 0x6970F271;
    BlockHash_1_Q[1] = 0x6970F271 + PHI;
    BlockHash_1_Q[2] = 0x6970F271 + PHI + PHI;
    for (int i = 3; i < 4096; ++i)  BlockHash_1_Q[i] = BlockHash_1_Q[i - 3] ^ BlockHash_1_Q[i - 2] ^ PHI ^ i;
    BlockHash_1_c=362436;
    BlockHash_1_i=4095;

    int count1=0,count2=0,count3=0;
    for(int x=0;x<(BLOCKHASH_1_PADSIZE/4)+2;++x)  BlockHash_1_MemoryPAD32[x] = BlockHash_1_rand();
    for(int x=0;x<BLOCKHASH_1_PADSIZE+8;++x)
    {
        switch(BlockHash_1_MemoryPAD8[x]&3)
        {
            case 0: BlockHash_1_MemoryPAD8[x] ^= SomeArrogantText1[count1++]; if(count1>=sizeof(SomeArrogantText1)) count1=0; break;
            case 1: BlockHash_1_MemoryPAD8[x] ^= SomeArrogantText2[count2++]; if(count2>=sizeof(SomeArrogantText2)) count2=0; break;
            case 2: BlockHash_1_MemoryPAD8[x] ^= SomeArrogantText3[count3++]; if(count3>=sizeof(SomeArrogantText3)) count3=0; break;
            case 3: BlockHash_1_MemoryPAD8[x] ^= 0xAA; break;
        }
    }
}

void BlockHash_DeInit()
{
    delete[] BlockHash_1_MemoryPAD8;
}

const uint32 PAD_MASK = BLOCKHASH_1_PADSIZE-1;
typedef unsigned char uchar;

bool BlockHash_1(unsigned char *p512bytes, unsigned char* final_hash)
{
    //0->127   is the block header      (128)
    //128->191 is blake(blockheader)    (64)
    //192->511 is scratch work area     (320)

    unsigned char *work1 = p512bytes;
    unsigned char *work2=work1+128;
    unsigned char *work3=work1+192;

    blake512_hash(work2,work1);

    //setup the 320 scratch with some base values
    work3[0] = work2[15];
    for(int x=1;x<320;++x)
    {
        work3[x-1] ^= work2[x&63];
        (work3[x-1]<0x80) ? work3[x]=work2[(x+work3[x-1])&63] : work3[x]=work1[(x+work3[x-1])&127];
    }
	
    #define READ_PAD8(offset) BlockHash_1_MemoryPAD8[(offset)&PAD_MASK]
    #define READ_PAD32(offset) (*((uint32*)&BlockHash_1_MemoryPAD8[(offset)&PAD_MASK]))

	uint64 qCount = *((uint64*)&work3[310]);
    int nExtra=READ_PAD8(qCount+work3[300])>>3;
    for(int x=1;x<512+nExtra;++x)
    {
        qCount+= READ_PAD32( qCount );
        qCount&0x87878700 && work3[qCount%320]++;

        qCount-= READ_PAD8( qCount+work3[qCount%160] );
        qCount&0x80000000 ? qCount+= READ_PAD8( qCount&0x8080FFFF ) : qCount+= READ_PAD32( qCount&0x7F60FAFB );

        qCount+= READ_PAD32( qCount+work3[qCount%160] );
        qCount&0xF0000000 && work3[qCount%320]++;

        qCount+= READ_PAD32( *((uint32*)&work3[qCount&0xFF]) );
        work3[x%320]=work2[x&63]^uchar(qCount);

        qCount+= READ_PAD32( (qCount>>32)+work3[x%200] );
        *((uint32*)&work3[qCount%316]) ^= (qCount>>24)&0xFFFFFFFF;
        ((qCount&0x07)==0x03) && x++;
        qCount-= READ_PAD8( (x*x) );
        ((qCount&0x07)==0x01) && x++;
     }

     Sha256(work1, final_hash);
     return true;
}

I eliminated all of the cool if/else statements coinhunter was for some reason so proud of before, I'm not sure why exactly. It appears to get +3% or so performance for me.

Quote

Wait. Isn't he a $150 an hour, unemployed coding genius who spent 20,000 hours on this? I think you are just misunderstanding the code.

The multiGPU coding appears fucked, actually I seem to get lower hash rates using two GPUs instead of putting "device 0" in the config file and only using the first one. ??? If you put "device 1" in the reaper config though, SEGFAULT!

Edit: device 1 works on another motherboard, I'm wondering if maybe something is weird with my operating system on this computer... too tired, going home.

Title: Re: EDITED TITLED: Solidcoin 2.0 GPU CUDA Nvidia Miner by MaGNET in Alpha Test
Post by: Intention on October 27, 2011, 03:04:23 AM

Attempted to try your code taco but I get a build error from /usr/lib/ld about -lOpenCL missing.

Title: Re: EDITED TITLED: Solidcoin 2.0 GPU CUDA Nvidia Miner by MaGNET in Alpha Test
Post by: tacotime on October 27, 2011, 03:48:23 AM

Quote from: Intention on October 27, 2011, 03:04:23 AM

Attempted to try your code taco but I get a build error from /usr/lib/ld about -lOpenCL missing.

You need to install the opencl library

Title: Re: EDITED TITLED: Solidcoin 2.0 GPU CUDA Nvidia Miner by MaGNET in Alpha Test
Post by: Intention on October 27, 2011, 06:17:37 AM

Quote from: tacotime on October 27, 2011, 03:48:23 AM

Quote from: Intention on October 27, 2011, 03:04:23 AM

Attempted to try your code taco but I get a build error from /usr/lib/ld about -lOpenCL missing.

You need to install the opencl library

I attempted to however as a Linux n00b everything is pretty open ended. I believe I installed the ATI Stream SDK which came with the OpenCL stuff even though the computer itself has an older Nvidia card I was just hoping to compile it on there for my Windows PC that has radeon cards.

Title: Re: EDITED TITLED: Solidcoin 2.0 GPU CUDA Nvidia Miner by MaGNET in Alpha Test
Post by: tacotime on October 28, 2011, 05:19:40 PM

Okay, I've been playing around with the code for a few days now.

- The if/else statements can be avoided but have virtually no effect on the speed of GPU (while the CPU code seems to paradoxically benefit a little). This goes against everything CH said about if/else statements being difficult for the GPU.
- The main problem with reaper's implementation is that it does allow per-compute unit parallelization. The code is run on the OCL kernel, but not in parallel; ideally the search() function that is executed should be handed 32 or more data sets to work on and then execute all data sets by a for loop, and then these should be synced with a barrier and output to the global memory. The current code hands one data set to one compute unit and likely takes a very hard hit in terms of parallelization. With all the coprocessors working on the data set, a speed up on the order of a magnitude for GPUs should be possible (several megahashes per second).

Whoever has time to do this and cares, have a look at the OCL examples from nVidia and how they parallelize much more effectively than the current reaper code: http://developer.nvidia.com/opencl-sdk-code-samples

Title: Re: EDITED TITLED: Solidcoin 2.0 GPU CUDA Nvidia Miner by MaGNET in Alpha Test
Post by: makomk on October 28, 2011, 05:53:23 PM

Quote from: tacotime on October 28, 2011, 05:19:40 PM

- The if/else statements can be avoided but have virtually no effect on the speed of GPU (while the CPU code seems to paradoxically benefit a little). This goes against everything CH said about if/else statements being difficult for the GPU.

Well yeah, it wouldn't. Certain kinds of if-else statements cause poor performance on GPUs but I don't think the kind of branching he's using will, whereas it you don't have a clever compiler you will pay a penalty for it on CPUs.

Title: Re: EDITED TITLED: Solidcoin 2.0 GPU CUDA Nvidia Miner by MaGNET in Alpha Test
Post by: Ten98 on October 29, 2011, 10:32:38 AM

Tacotime I've tried your optimised code and it runs no faster.

OpenCL automatically sends multiple workloads to the GPU compute units to work on in parallel, you don't have to do it yourself in the code, so I don't think your theory about making the code being more parallel to speed things up will hold true.

Title: Re: EDITED TITLED: Solidcoin 2.0 GPU CUDA Nvidia Miner by MaGNET in Alpha Test
Post by: tacotime on October 31, 2011, 02:22:48 PM

I benched it earlier and it's about 3% faster, you should see some improvement.

Title: Re: EDITED TITLED: Solidcoin 2.0 GPU CUDA Nvidia Miner by MaGNET in Alpha Test
Post by: tacotime on November 10, 2011, 05:36:41 PM

bump, I'm wondering if anyone with an AMD card has tried AMD's bitwise rotation function, it's supposed to be much faster than coding for it in OCL:

Code:

//#pragma OPENCL EXTENSION cl_amd_media_ops : enable
//#define rot(x,y) amd_bitalign(x, x, (32-y))

edit: Appears someone uploaded an AMD optimized version of reaper.cl to pastebin, here it is:

Code:

typedef uint uint32_t;
typedef ulong uint64_t;
typedef uchar uint8_t;

typedef uint uint32;
typedef ulong uint64;

#define U8TO32(p) \
  (((uint32_t)((p)[0]) << 24) | ((uint32_t)((p)[1]) << 16) | \
   ((uint32_t)((p)[2]) <<  8) | ((uint32_t)((p)[3])      ))
#define U8TO64(p) \
  (((uint64_t)U8TO32(p) << 32) | (uint64_t)U8TO32((p) + 4))
#define U32TO8(p, v) \
    (p)[0] = (uint8_t)((v) >> 24); (p)[1] = (uint8_t)((v) >> 16); \
    (p)[2] = (uint8_t)((v) >>  8); (p)[3] = (uint8_t)((v)      ); 
#define U64TO8(p, v) \
    U32TO8((p),     (uint32_t)((v) >> 32));	\
    U32TO8((p) + 4, (uint32_t)((v)      )); 

#pragma OPENCL EXTENSION cl_khr_byte_addressable_store : enable
#pragma OPENCL EXTENSION cl_amd_media_ops : enable

/*typedef struct  { 
  uint64_t h[8];
  uint8_t buf[128];
} state;*/

__constant uint8_t sigma[256] = 
{
     0, 1, 2, 3, 4, 5, 6, 7, 8, 9,10,11,12,13,14,15 ,
    14,10, 4, 8, 9,15,13, 6, 1,12, 0, 2,11, 7, 5, 3 ,
    11, 8,12, 0, 5, 2,15,13,10,14, 3, 6, 7, 1, 9, 4 ,
     7, 9, 3, 1,13,12,11,14, 2, 6, 5,10, 4, 0,15, 8 ,
     9, 0, 5, 7, 2, 4,10,15,14, 1,11,12, 6, 8, 3,13 ,
     2,12, 6,10, 0,11, 8, 3, 4,13, 7, 5,15,14, 1, 9 ,
    12, 5, 1,15,14,13, 4,10, 0, 7, 6, 3, 9, 2, 8,11 ,
    13,11, 7,14,12, 1, 3, 9, 5, 0,15, 4, 8, 6, 2,10 ,
     6,15,14, 9,11, 3, 0, 8,12, 2,13, 7, 1, 4,10, 5 ,
    10, 2, 8, 4, 7, 6, 1, 5,15,11, 9,14, 3,12,13 ,0 ,
     0, 1, 2, 3, 4, 5, 6, 7, 8, 9,10,11,12,13,14,15 ,
    14,10, 4, 8, 9,15,13, 6, 1,12, 0, 2,11, 7, 5, 3 ,
    11, 8,12, 0, 5, 2,15,13,10,14, 3, 6, 7, 1, 9, 4 ,
     7, 9, 3, 1,13,12,11,14, 2, 6, 5,10, 4, 0,15, 8 ,
     9, 0, 5, 7, 2, 4,10,15,14, 1,11,12, 6, 8, 3,13 ,
     2,12, 6,10, 0,11, 8, 3, 4,13, 7, 5,15,14, 1, 9 
};

__constant uint64_t cst[16] = 
{
  0x243F6A8885A308D3UL,0x13198A2E03707344UL,0xA4093822299F31D0UL,0x082EFA98EC4E6C89UL,
  0x452821E638D01377UL,0xBE5466CF34E90C6CUL,0xC0AC29B7C97C50DDUL,0x3F84D5B5B5470917UL,
  0x9216D5D98979FB1BUL,0xD1310BA698DFB5ACUL,0x2FFD72DBD01ADFB7UL,0xB8E1AFED6A267E96UL,
  0xBA7C9045F12C7F99UL,0x24A19947B3916CF7UL,0x0801F2E2858EFC16UL,0x636920D871574E69UL
};
  
__constant uint K[64] = 
{ 
    0x428a2f98, 0x71374491, 0xb5c0fbcf, 0xe9b5dba5, 0x3956c25b, 0x59f111f1, 0x923f82a4, 0xab1c5ed5,
    0xd807aa98, 0x12835b01, 0x243185be, 0x550c7dc3, 0x72be5d74, 0x80deb1fe, 0x9bdc06a7, 0xc19bf174,
    0xe49b69c1, 0xefbe4786, 0x0fc19dc6, 0x240ca1cc, 0x2de92c6f, 0x4a7484aa, 0x5cb0a9dc, 0x76f988da,
    0x983e5152, 0xa831c66d, 0xb00327c8, 0xbf597fc7, 0xc6e00bf3, 0xd5a79147, 0x06ca6351, 0x14292967,
    0x27b70a85, 0x2e1b2138, 0x4d2c6dfc, 0x53380d13, 0x650a7354, 0x766a0abb, 0x81c2c92e, 0x92722c85,
    0xa2bfe8a1, 0xa81a664b, 0xc24b8b70, 0xc76c51a3, 0xd192e819, 0xd6990624, 0xf40e3585, 0x106aa070,
    0x19a4c116, 0x1e376c08, 0x2748774c, 0x34b0bcb5, 0x391c0cb3, 0x4ed8aa4a, 0x5b9cca4f, 0x682e6ff3,
    0x748f82ee, 0x78a5636f, 0x84c87814, 0x8cc70208, 0x90befffa, 0xa4506ceb, 0xbef9a3f7, 0xc67178f2
};

//uint rotl(uint x, uint y)
//{
//	return (x<<y)|(x>>(32-y));
//}

#define rotl(x, y) amd_bitalign(x, x, (uint)(32 - y))


//#define Ch(x, y, z) (z ^ (x & (y ^ z)))
#define Ma(x, y, z) ((y & z) | (x & (y | z)))

#define Ch(x, y, z) bitselect(z,y,x)
// Ma can also be implemented in terms of bitselect
//#define Ma(y, z, x) bitselect(z^x,y,x)


#define Tr(x,a,b,c) (rotl(x,a)^rotl(x,b)^rotl(x,c))

#define R(x) (work[x] = (rotl(work[x-2],15)^rotl(work[x-2],13)^((work[x-2])>>10)) + work[x-7] + (rotl(work[x-15],25)^rotl(work[x-15],14)^((work[x-15])>>3)) + work[x-16])
#define sharound(a,b,c,d,e,f,g,h,x,K) h+=Tr(e,7,21,26)+Ch(e,f,g)+K+x; d+=h; h+=Tr(a,10,19,30)+Ma(a,b,c);
#define sharound_s(a,b,c,d,e,f,g,h,x) h+=Tr(e,7,21,26)+Ch(e,f,g)+x; d+=h; h+=Tr(a,10,19,30)+Ma(a,b,c);

uint EndianSwap(uint n)
{
	return ((n&0xFF)<<24) | ((n&0xFF00)<<8) | ((n&0xFF0000)>>8) | ((n&0xFF000000)>>24);
}

void Sha256_round(uint* s, unsigned char* data)
{
	uint work[64];

	uint* udata = (uint*)data;
#pragma unroll
	for(uint i=0; i<16; ++i)
	{
		work[i] = EndianSwap(udata[i]);
	}

	uint A = s[0];
	uint B = s[1];
	uint C = s[2];
	uint D = s[3];
	uint E = s[4];
	uint F = s[5];
	uint G = s[6];
	uint H = s[7];
	sharound(A,B,C,D,E,F,G,H,work[0],K[0]);
	sharound(H,A,B,C,D,E,F,G,work[1],K[1]);
	sharound(G,H,A,B,C,D,E,F,work[2],K[2]);
	sharound(F,G,H,A,B,C,D,E,work[3],K[3]);
	sharound(E,F,G,H,A,B,C,D,work[4],K[4]);
	sharound(D,E,F,G,H,A,B,C,work[5],K[5]);
	sharound(C,D,E,F,G,H,A,B,work[6],K[6]);
	sharound(B,C,D,E,F,G,H,A,work[7],K[7]);
	sharound(A,B,C,D,E,F,G,H,work[8],K[8]);
	sharound(H,A,B,C,D,E,F,G,work[9],K[9]);
	sharound(G,H,A,B,C,D,E,F,work[10],K[10]);
	sharound(F,G,H,A,B,C,D,E,work[11],K[11]);
	sharound(E,F,G,H,A,B,C,D,work[12],K[12]);
	sharound(D,E,F,G,H,A,B,C,work[13],K[13]);
	sharound(C,D,E,F,G,H,A,B,work[14],K[14]);
	sharound(B,C,D,E,F,G,H,A,work[15],K[15]);
	sharound(A,B,C,D,E,F,G,H,R(16),K[16]);
	sharound(H,A,B,C,D,E,F,G,R(17),K[17]);
	sharound(G,H,A,B,C,D,E,F,R(18),K[18]);
	sharound(F,G,H,A,B,C,D,E,R(19),K[19]);
	sharound(E,F,G,H,A,B,C,D,R(20),K[20]);
	sharound(D,E,F,G,H,A,B,C,R(21),K[21]);
	sharound(C,D,E,F,G,H,A,B,R(22),K[22]);
	sharound(B,C,D,E,F,G,H,A,R(23),K[23]);
	sharound(A,B,C,D,E,F,G,H,R(24),K[24]);
	sharound(H,A,B,C,D,E,F,G,R(25),K[25]);
	sharound(G,H,A,B,C,D,E,F,R(26),K[26]);
	sharound(F,G,H,A,B,C,D,E,R(27),K[27]);
	sharound(E,F,G,H,A,B,C,D,R(28),K[28]);
	sharound(D,E,F,G,H,A,B,C,R(29),K[29]);
	sharound(C,D,E,F,G,H,A,B,R(30),K[30]);
	sharound(B,C,D,E,F,G,H,A,R(31),K[31]);
	sharound(A,B,C,D,E,F,G,H,R(32),K[32]);
	sharound(H,A,B,C,D,E,F,G,R(33),K[33]);
	sharound(G,H,A,B,C,D,E,F,R(34),K[34]);
	sharound(F,G,H,A,B,C,D,E,R(35),K[35]);
	sharound(E,F,G,H,A,B,C,D,R(36),K[36]);
	sharound(D,E,F,G,H,A,B,C,R(37),K[37]);
	sharound(C,D,E,F,G,H,A,B,R(38),K[38]);
	sharound(B,C,D,E,F,G,H,A,R(39),K[39]);
	sharound(A,B,C,D,E,F,G,H,R(40),K[40]);
	sharound(H,A,B,C,D,E,F,G,R(41),K[41]);
	sharound(G,H,A,B,C,D,E,F,R(42),K[42]);
	sharound(F,G,H,A,B,C,D,E,R(43),K[43]);
	sharound(E,F,G,H,A,B,C,D,R(44),K[44]);
	sharound(D,E,F,G,H,A,B,C,R(45),K[45]);
	sharound(C,D,E,F,G,H,A,B,R(46),K[46]);
	sharound(B,C,D,E,F,G,H,A,R(47),K[47]);
	sharound(A,B,C,D,E,F,G,H,R(48),K[48]);
	sharound(H,A,B,C,D,E,F,G,R(49),K[49]);
	sharound(G,H,A,B,C,D,E,F,R(50),K[50]);
	sharound(F,G,H,A,B,C,D,E,R(51),K[51]);
	sharound(E,F,G,H,A,B,C,D,R(52),K[52]);
	sharound(D,E,F,G,H,A,B,C,R(53),K[53]);
	sharound(C,D,E,F,G,H,A,B,R(54),K[54]);
	sharound(B,C,D,E,F,G,H,A,R(55),K[55]);
	sharound(A,B,C,D,E,F,G,H,R(56),K[56]);
	sharound(H,A,B,C,D,E,F,G,R(57),K[57]);
	sharound(G,H,A,B,C,D,E,F,R(58),K[58]);
	sharound(F,G,H,A,B,C,D,E,R(59),K[59]);
	sharound(E,F,G,H,A,B,C,D,R(60),K[60]);
	sharound(D,E,F,G,H,A,B,C,R(61),K[61]);
	sharound(C,D,E,F,G,H,A,B,R(62),K[62]);
	sharound(B,C,D,E,F,G,H,A,R(63),K[63]);

	s[0] += A;
	s[1] += B;
	s[2] +=	C;
	s[3] += D;
	s[4] += E;
	s[5] += F;
	s[6] += G;
	s[7] += H;
}

__constant uint P[64] =
{
	0xc28a2f98, 0x71374491, 0xb5c0fbcf, 0xe9b5dba5, 0x3956c25b, 0x59f111f1, 0x923f82a4, 0xab1c5ed5,
	0xd807aa98, 0x12835b01, 0x243185be, 0x550c7dc3, 0x72be5d74, 0x80deb1fe, 0x9bdc06a7, 0xc19c0174,
	0x649b69c1, 0xf9be478a, 0x0fe1edc6, 0x240ca60c, 0x4fe9346f, 0x4d1c84ab, 0x61b94f1e, 0xf6f993db,
	0xe8465162, 0xad13066f, 0xb0214c0d, 0x695a0283, 0xa0323379, 0x2bd376e9, 0xe1d0537c, 0x03a244a0,
	0xfc13a4a5, 0xfafda43e, 0x56bea8bb, 0x445ec9b6, 0x39907315, 0x8c0d4e9f, 0xc832dccc, 0xdaffb65b,
	0x1fed4f61, 0x2f646808, 0x1ff32294, 0x2634ccd7, 0xb0ebdefa, 0xd6fc592b, 0xa63c5c8f, 0xbe9fbab9,
	0x0158082c, 0x68969712, 0x51e1d7e1, 0x5cf12d0d, 0xc4be2155, 0x7d7c8a34, 0x611f2c60, 0x036324af,
	0xa4f08d87, 0x9e3e8435, 0x2c6dae30, 0x11921afc, 0xb76d720e, 0x245f3661, 0xc3a65ecb, 0x43b9e908
};

void Sha256_round_padding(uint* s)
{
	uint A = s[0];
	uint B = s[1];
	uint C = s[2];
	uint D = s[3];
	uint E = s[4];
	uint F = s[5];
	uint G = s[6];
	uint H = s[7];
	sharound_s(A,B,C,D,E,F,G,H,P[0]);
	sharound_s(H,A,B,C,D,E,F,G,P[1]);
	sharound_s(G,H,A,B,C,D,E,F,P[2]);
	sharound_s(F,G,H,A,B,C,D,E,P[3]);
	sharound_s(E,F,G,H,A,B,C,D,P[4]);
	sharound_s(D,E,F,G,H,A,B,C,P[5]);
	sharound_s(C,D,E,F,G,H,A,B,P[6]);
	sharound_s(B,C,D,E,F,G,H,A,P[7]);
	sharound_s(A,B,C,D,E,F,G,H,P[8]);
	sharound_s(H,A,B,C,D,E,F,G,P[9]);
	sharound_s(G,H,A,B,C,D,E,F,P[10]);
	sharound_s(F,G,H,A,B,C,D,E,P[11]);
	sharound_s(E,F,G,H,A,B,C,D,P[12]);
	sharound_s(D,E,F,G,H,A,B,C,P[13]);
	sharound_s(C,D,E,F,G,H,A,B,P[14]);
	sharound_s(B,C,D,E,F,G,H,A,P[15]);
	sharound_s(A,B,C,D,E,F,G,H,P[16]);
	sharound_s(H,A,B,C,D,E,F,G,P[17]);
	sharound_s(G,H,A,B,C,D,E,F,P[18]);
	sharound_s(F,G,H,A,B,C,D,E,P[19]);
	sharound_s(E,F,G,H,A,B,C,D,P[20]);
	sharound_s(D,E,F,G,H,A,B,C,P[21]);
	sharound_s(C,D,E,F,G,H,A,B,P[22]);
	sharound_s(B,C,D,E,F,G,H,A,P[23]);
	sharound_s(A,B,C,D,E,F,G,H,P[24]);
	sharound_s(H,A,B,C,D,E,F,G,P[25]);
	sharound_s(G,H,A,B,C,D,E,F,P[26]);
	sharound_s(F,G,H,A,B,C,D,E,P[27]);
	sharound_s(E,F,G,H,A,B,C,D,P[28]);
	sharound_s(D,E,F,G,H,A,B,C,P[29]);
	sharound_s(C,D,E,F,G,H,A,B,P[30]);
	sharound_s(B,C,D,E,F,G,H,A,P[31]);
	sharound_s(A,B,C,D,E,F,G,H,P[32]);
	sharound_s(H,A,B,C,D,E,F,G,P[33]);
	sharound_s(G,H,A,B,C,D,E,F,P[34]);
	sharound_s(F,G,H,A,B,C,D,E,P[35]);
	sharound_s(E,F,G,H,A,B,C,D,P[36]);
	sharound_s(D,E,F,G,H,A,B,C,P[37]);
	sharound_s(C,D,E,F,G,H,A,B,P[38]);
	sharound_s(B,C,D,E,F,G,H,A,P[39]);
	sharound_s(A,B,C,D,E,F,G,H,P[40]);
	sharound_s(H,A,B,C,D,E,F,G,P[41]);
	sharound_s(G,H,A,B,C,D,E,F,P[42]);
	sharound_s(F,G,H,A,B,C,D,E,P[43]);
	sharound_s(E,F,G,H,A,B,C,D,P[44]);
	sharound_s(D,E,F,G,H,A,B,C,P[45]);
	sharound_s(C,D,E,F,G,H,A,B,P[46]);
	sharound_s(B,C,D,E,F,G,H,A,P[47]);
	sharound_s(A,B,C,D,E,F,G,H,P[48]);
	sharound_s(H,A,B,C,D,E,F,G,P[49]);
	sharound_s(G,H,A,B,C,D,E,F,P[50]);
	sharound_s(F,G,H,A,B,C,D,E,P[51]);
	sharound_s(E,F,G,H,A,B,C,D,P[52]);
	sharound_s(D,E,F,G,H,A,B,C,P[53]);
	sharound_s(C,D,E,F,G,H,A,B,P[54]);
	sharound_s(B,C,D,E,F,G,H,A,P[55]);
	sharound_s(A,B,C,D,E,F,G,H,P[56]);
	sharound_s(H,A,B,C,D,E,F,G,P[57]);
	sharound_s(G,H,A,B,C,D,E,F,P[58]);
	sharound_s(F,G,H,A,B,C,D,E,P[59]);
	sharound_s(E,F,G,H,A,B,C,D,P[60]);

	s[7] += H;
}



#define ROT(x,n) (((x)<<(64-n))|( (x)>>(n)))

#define G(m,a,b,c,d,e,i)					\
  v[a] += (m[sigma[i+e]] ^ cst[sigma[i+e+1]]) + v[b];	\
  v[d] = ROT( v[d] ^ v[a],32);				\
  v[c] += v[d];						\
  v[b] = ROT( v[b] ^ v[c],25);				\
  v[a] += (m[sigma[i+e+1]] ^ cst[sigma[i+e]])+v[b];	\
  v[d] = ROT( v[d] ^ v[a],16);				\
  v[c] += v[d];						\
  v[b] = ROT( v[b] ^ v[c],11);				

//assumes input is 512 bytes
__kernel void search(__global uint8_t* in_param, __global uint* out_param, __global uint8_t* pad) 
{
	uchar in[512];
#pragma unroll
	for(uint i=0; i<128; ++i)
	in[i] = in_param[i];

	uint nonce = get_global_id(0);
	
	*(uint*)(in+108) = nonce;

	uint64_t h[8];
	h[0]=0x6A09E667F3BCC908UL;
	h[1]=0xBB67AE8584CAA73BUL;
	h[2]=0x3C6EF372FE94F82BUL;
	h[3]=0xA54FF53A5F1D36F1UL;
	h[4]=0x510E527FADE682D1UL;
	h[5]=0x9B05688C2B3E6C1FUL;
	h[6]=0x1F83D9ABFB41BD6BUL;
	h[7]=0x5BE0CD19137E2179UL;

	uint64_t v[16];
#pragma unroll
	for(uint i=0; i< 8;++i)  v[i] = h[i];
	v[ 8] = 0x243F6A8885A308D3UL;
	v[ 9] = 0x13198A2E03707344UL;
	v[10] = 0xA4093822299F31D0UL;
	v[11] = 0x082EFA98EC4E6C89UL;
	v[12] = 0x452821E638D01777UL;
	v[13] = 0xBE5466CF34E9086CUL;
	v[14] = 0xC0AC29B7C97C50DDUL;
	v[15] = 0x3F84D5B5B5470917UL;

	{
		uint64_t m[16];
#pragma unroll
		for(uint i=0; i<16;++i)  m[i] = U8TO64(in + i*8);
		uint i=0;
		G( m, 0, 4, 8,12, 0, i); G( m, 1, 5, 9,13, 2, i); G( m, 2, 6,10,14, 4, i); G( m, 3, 7,11,15, 6, i);
		G( m, 3, 4, 9,14,14, i); G( m, 2, 7, 8,13,12, i); G( m, 0, 5,10,15, 8, i); G( m, 1, 6,11,12,10, i);
		i+=16;
		G( m, 0, 4, 8,12, 0, i); G( m, 1, 5, 9,13, 2, i); G( m, 2, 6,10,14, 4, i); G( m, 3, 7,11,15, 6, i);
		G( m, 3, 4, 9,14,14, i); G( m, 2, 7, 8,13,12, i); G( m, 0, 5,10,15, 8, i); G( m, 1, 6,11,12,10, i);
		i+=16;
		G( m, 0, 4, 8,12, 0, i); G( m, 1, 5, 9,13, 2, i); G( m, 2, 6,10,14, 4, i); G( m, 3, 7,11,15, 6, i);
		G( m, 3, 4, 9,14,14, i); G( m, 2, 7, 8,13,12, i); G( m, 0, 5,10,15, 8, i); G( m, 1, 6,11,12,10, i);
		i+=16;
		G( m, 0, 4, 8,12, 0, i); G( m, 1, 5, 9,13, 2, i); G( m, 2, 6,10,14, 4, i); G( m, 3, 7,11,15, 6, i);
		G( m, 3, 4, 9,14,14, i); G( m, 2, 7, 8,13,12, i); G( m, 0, 5,10,15, 8, i); G( m, 1, 6,11,12,10, i);
		i+=16;
		G( m, 0, 4, 8,12, 0, i); G( m, 1, 5, 9,13, 2, i); G( m, 2, 6,10,14, 4, i); G( m, 3, 7,11,15, 6, i);
		G( m, 3, 4, 9,14,14, i); G( m, 2, 7, 8,13,12, i); G( m, 0, 5,10,15, 8, i); G( m, 1, 6,11,12,10, i);
		i+=16;
		G( m, 0, 4, 8,12, 0, i); G( m, 1, 5, 9,13, 2, i); G( m, 2, 6,10,14, 4, i); G( m, 3, 7,11,15, 6, i);
		G( m, 3, 4, 9,14,14, i); G( m, 2, 7, 8,13,12, i); G( m, 0, 5,10,15, 8, i); G( m, 1, 6,11,12,10, i);
		i+=16;
		G( m, 0, 4, 8,12, 0, i); G( m, 1, 5, 9,13, 2, i); G( m, 2, 6,10,14, 4, i); G( m, 3, 7,11,15, 6, i);
		G( m, 3, 4, 9,14,14, i); G( m, 2, 7, 8,13,12, i); G( m, 0, 5,10,15, 8, i); G( m, 1, 6,11,12,10, i);
		i+=16;
		G( m, 0, 4, 8,12, 0, i); G( m, 1, 5, 9,13, 2, i); G( m, 2, 6,10,14, 4, i); G( m, 3, 7,11,15, 6, i);
		G( m, 3, 4, 9,14,14, i); G( m, 2, 7, 8,13,12, i); G( m, 0, 5,10,15, 8, i); G( m, 1, 6,11,12,10, i);
		i+=16;
		G( m, 0, 4, 8,12, 0, i); G( m, 1, 5, 9,13, 2, i); G( m, 2, 6,10,14, 4, i); G( m, 3, 7,11,15, 6, i);
		G( m, 3, 4, 9,14,14, i); G( m, 2, 7, 8,13,12, i); G( m, 0, 5,10,15, 8, i); G( m, 1, 6,11,12,10, i);
		i+=16;
		G( m, 0, 4, 8,12, 0, i); G( m, 1, 5, 9,13, 2, i); G( m, 2, 6,10,14, 4, i); G( m, 3, 7,11,15, 6, i);
		G( m, 3, 4, 9,14,14, i); G( m, 2, 7, 8,13,12, i); G( m, 0, 5,10,15, 8, i); G( m, 1, 6,11,12,10, i);
		i+=16;
		G( m, 0, 4, 8,12, 0, i); G( m, 1, 5, 9,13, 2, i); G( m, 2, 6,10,14, 4, i); G( m, 3, 7,11,15, 6, i);
		G( m, 3, 4, 9,14,14, i); G( m, 2, 7, 8,13,12, i); G( m, 0, 5,10,15, 8, i); G( m, 1, 6,11,12,10, i);
		i+=16;
		G( m, 0, 4, 8,12, 0, i); G( m, 1, 5, 9,13, 2, i); G( m, 2, 6,10,14, 4, i); G( m, 3, 7,11,15, 6, i);
		G( m, 3, 4, 9,14,14, i); G( m, 2, 7, 8,13,12, i); G( m, 0, 5,10,15, 8, i); G( m, 1, 6,11,12,10, i);
		i+=16;
		G( m, 0, 4, 8,12, 0, i); G( m, 1, 5, 9,13, 2, i); G( m, 2, 6,10,14, 4, i); G( m, 3, 7,11,15, 6, i);
		G( m, 3, 4, 9,14,14, i); G( m, 2, 7, 8,13,12, i); G( m, 0, 5,10,15, 8, i); G( m, 1, 6,11,12,10, i);
		i+=16;
		G( m, 0, 4, 8,12, 0, i); G( m, 1, 5, 9,13, 2, i); G( m, 2, 6,10,14, 4, i); G( m, 3, 7,11,15, 6, i);
		G( m, 3, 4, 9,14,14, i); G( m, 2, 7, 8,13,12, i); G( m, 0, 5,10,15, 8, i); G( m, 1, 6,11,12,10, i);
		i+=16;
		G( m, 0, 4, 8,12, 0, i); G( m, 1, 5, 9,13, 2, i); G( m, 2, 6,10,14, 4, i); G( m, 3, 7,11,15, 6, i);
		G( m, 3, 4, 9,14,14, i); G( m, 2, 7, 8,13,12, i); G( m, 0, 5,10,15, 8, i); G( m, 1, 6,11,12,10, i);
		i+=16;
		G( m, 0, 4, 8,12, 0, i); G( m, 1, 5, 9,13, 2, i); G( m, 2, 6,10,14, 4, i); G( m, 3, 7,11,15, 6, i);
		G( m, 3, 4, 9,14,14, i); G( m, 2, 7, 8,13,12, i); G( m, 0, 5,10,15, 8, i); G( m, 1, 6,11,12,10, i);
	} 

#pragma unroll
	for(uint i=0; i<16;++i)  h[i&7] ^= v[i]; 

#pragma unroll
	for(uint i=0; i< 8;++i)  v[i] = h[i];
	v[8] = 0x243F6A8885A308D3UL;
	v[9] = 0x13198A2E03707344UL;
	v[10] = 0xA4093822299F31D0UL;
	v[11] = 0x082EFA98EC4E6C89UL;
	v[12] = 0x452821E638D01377UL;
	v[13] = 0xBE5466CF34E90C6CUL;
	v[14] = 0xC0AC29B7C97C50DDUL;
	v[15] = 0x3F84D5B5B5470917UL;

	{
		uint64_t m2[16] = {1UL << 63, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0x400};
		uint i=0;
		G(m2, 0, 4, 8,12, 0, i); G(m2, 1, 5, 9,13, 2, i); G(m2, 2, 6,10,14, 4, i); G(m2, 3, 7,11,15, 6, i);
		G(m2, 3, 4, 9,14,14, i); G(m2, 2, 7, 8,13,12, i); G(m2, 0, 5,10,15, 8, i); G(m2, 1, 6,11,12,10, i);
		i+=16;
		G(m2, 0, 4, 8,12, 0, i); G(m2, 1, 5, 9,13, 2, i); G(m2, 2, 6,10,14, 4, i); G(m2, 3, 7,11,15, 6, i);
		G(m2, 3, 4, 9,14,14, i); G(m2, 2, 7, 8,13,12, i); G(m2, 0, 5,10,15, 8, i); G(m2, 1, 6,11,12,10, i);
		i+=16;
		G(m2, 0, 4, 8,12, 0, i); G(m2, 1, 5, 9,13, 2, i); G(m2, 2, 6,10,14, 4, i); G(m2, 3, 7,11,15, 6, i);
		G(m2, 3, 4, 9,14,14, i); G(m2, 2, 7, 8,13,12, i); G(m2, 0, 5,10,15, 8, i); G(m2, 1, 6,11,12,10, i);
		i+=16;
		G(m2, 0, 4, 8,12, 0, i); G(m2, 1, 5, 9,13, 2, i); G(m2, 2, 6,10,14, 4, i); G(m2, 3, 7,11,15, 6, i);
		G(m2, 3, 4, 9,14,14, i); G(m2, 2, 7, 8,13,12, i); G(m2, 0, 5,10,15, 8, i); G(m2, 1, 6,11,12,10, i);
		i+=16;
		G(m2, 0, 4, 8,12, 0, i); G(m2, 1, 5, 9,13, 2, i); G(m2, 2, 6,10,14, 4, i); G(m2, 3, 7,11,15, 6, i);
		G(m2, 3, 4, 9,14,14, i); G(m2, 2, 7, 8,13,12, i); G(m2, 0, 5,10,15, 8, i); G(m2, 1, 6,11,12,10, i);
		i+=16;
		G(m2, 0, 4, 8,12, 0, i); G(m2, 1, 5, 9,13, 2, i); G(m2, 2, 6,10,14, 4, i); G(m2, 3, 7,11,15, 6, i);
		G(m2, 3, 4, 9,14,14, i); G(m2, 2, 7, 8,13,12, i); G(m2, 0, 5,10,15, 8, i); G(m2, 1, 6,11,12,10, i);
		i+=16;
		G(m2, 0, 4, 8,12, 0, i); G(m2, 1, 5, 9,13, 2, i); G(m2, 2, 6,10,14, 4, i); G(m2, 3, 7,11,15, 6, i);
		G(m2, 3, 4, 9,14,14, i); G(m2, 2, 7, 8,13,12, i); G(m2, 0, 5,10,15, 8, i); G(m2, 1, 6,11,12,10, i);
		i+=16;
		G(m2, 0, 4, 8,12, 0, i); G(m2, 1, 5, 9,13, 2, i); G(m2, 2, 6,10,14, 4, i); G(m2, 3, 7,11,15, 6, i);
		G(m2, 3, 4, 9,14,14, i); G(m2, 2, 7, 8,13,12, i); G(m2, 0, 5,10,15, 8, i); G(m2, 1, 6,11,12,10, i);
		i+=16;
		G(m2, 0, 4, 8,12, 0, i); G(m2, 1, 5, 9,13, 2, i); G(m2, 2, 6,10,14, 4, i); G(m2, 3, 7,11,15, 6, i);
		G(m2, 3, 4, 9,14,14, i); G(m2, 2, 7, 8,13,12, i); G(m2, 0, 5,10,15, 8, i); G(m2, 1, 6,11,12,10, i);
		i+=16;
		G(m2, 0, 4, 8,12, 0, i); G(m2, 1, 5, 9,13, 2, i); G(m2, 2, 6,10,14, 4, i); G(m2, 3, 7,11,15, 6, i);
		G(m2, 3, 4, 9,14,14, i); G(m2, 2, 7, 8,13,12, i); G(m2, 0, 5,10,15, 8, i); G(m2, 1, 6,11,12,10, i);
		i+=16;
		G(m2, 0, 4, 8,12, 0, i); G(m2, 1, 5, 9,13, 2, i); G(m2, 2, 6,10,14, 4, i); G(m2, 3, 7,11,15, 6, i);
		G(m2, 3, 4, 9,14,14, i); G(m2, 2, 7, 8,13,12, i); G(m2, 0, 5,10,15, 8, i); G(m2, 1, 6,11,12,10, i);
		i+=16;
		G(m2, 0, 4, 8,12, 0, i); G(m2, 1, 5, 9,13, 2, i); G(m2, 2, 6,10,14, 4, i); G(m2, 3, 7,11,15, 6, i);
		G(m2, 3, 4, 9,14,14, i); G(m2, 2, 7, 8,13,12, i); G(m2, 0, 5,10,15, 8, i); G(m2, 1, 6,11,12,10, i);
		i+=16;
		G(m2, 0, 4, 8,12, 0, i); G(m2, 1, 5, 9,13, 2, i); G(m2, 2, 6,10,14, 4, i); G(m2, 3, 7,11,15, 6, i);
		G(m2, 3, 4, 9,14,14, i); G(m2, 2, 7, 8,13,12, i); G(m2, 0, 5,10,15, 8, i); G(m2, 1, 6,11,12,10, i);
		i+=16;
		G(m2, 0, 4, 8,12, 0, i); G(m2, 1, 5, 9,13, 2, i); G(m2, 2, 6,10,14, 4, i); G(m2, 3, 7,11,15, 6, i);
		G(m2, 3, 4, 9,14,14, i); G(m2, 2, 7, 8,13,12, i); G(m2, 0, 5,10,15, 8, i); G(m2, 1, 6,11,12,10, i);
		i+=16;
		G(m2, 0, 4, 8,12, 0, i); G(m2, 1, 5, 9,13, 2, i); G(m2, 2, 6,10,14, 4, i); G(m2, 3, 7,11,15, 6, i);
		G(m2, 3, 4, 9,14,14, i); G(m2, 2, 7, 8,13,12, i); G(m2, 0, 5,10,15, 8, i); G(m2, 1, 6,11,12,10, i);
		i+=16;
		G(m2, 0, 4, 8,12, 0, i); G(m2, 1, 5, 9,13, 2, i); G(m2, 2, 6,10,14, 4, i); G(m2, 3, 7,11,15, 6, i);
		G(m2, 3, 4, 9,14,14, i); G(m2, 2, 7, 8,13,12, i); G(m2, 0, 5,10,15, 8, i); G(m2, 1, 6,11,12,10, i);
	} 

#pragma unroll
	for(uint i=0; i<16;++i)  h[i&7] ^= v[i];

	uint8_t* work2 = in+128;

	U64TO8( work2 + 0, h[0]);
	U64TO8( work2 + 8, h[1]);
	U64TO8( work2 +16, h[2]);
	U64TO8( work2 +24, h[3]);
	U64TO8( work2 +32, h[4]);
	U64TO8( work2 +40, h[5]);
	U64TO8( work2 +48, h[6]);
	U64TO8( work2 +56, h[7]);
	
	uint8_t* work3 = work2+64;
//a = x-1, b = x, c = x&63
#define WORKINIT(a,b,c)   work3[a] ^= work2[c]; \
		if(work3[a]&0x80) work3[b]=in[(b+work3[a])&0x7F]; \
		else              work3[b]=work2[(b+work3[a])&0x3F];

	
	work3[0] = work2[15];
	WORKINIT(0,1,1);
	WORKINIT(1,2,2);
	WORKINIT(2,3,3);
#pragma unroll
	for(int x=4;x<64;++x)
	{
		WORKINIT(x-1,x,x);
		++x;
		WORKINIT(x-1,x,x);
		++x;
		WORKINIT(x-1,x,x);
		++x;
		WORKINIT(x-1,x,x);
	}
#pragma unroll
	for(int x=64;x<320;++x)
	{
		WORKINIT(x-1,x,x&63);
		++x;
		WORKINIT(x-1,x,x&63);
		++x;
		WORKINIT(x-1,x,x&63);
		++x;
		WORKINIT(x-1,x,x&63);
	}

	#define READ_PAD32_R(offset) ((uint)pad[offset] | (((uint)pad[offset+1])<<8) | (((uint)pad[offset+2])<<16) | (((uint)pad[offset+3])<<24))
	
	#define READ_W32(offset) ((uint)work3[offset] + (((uint)work3[(offset)+1])<<8) + (((uint)work3[(offset)+2]&0x3F)<<16))

	ushort* shortptr = (ushort*)(work3+310);
	uint64 qCount = shortptr[0];
	qCount |= ((uint64)shortptr[3])<<48;
	uint* uintptr = (uint*)(work3+312);
	qCount |= ((uint64)*uintptr)<<16;

	uint nExtra=(pad[(qCount+work3[300])&0x3FFFFF]>>3)+512;
#pragma unroll
	for(uint x=1;x<nExtra;++x)
	{
		uint res = 0;
		qCount += READ_PAD32_R((qCount&0x3FFFFF));
		work3[qCount%320] += (qCount&0x87878700) ? 1 : 0;

		qCount-= pad[(qCount+work3[qCount%160])&0x3FFFFF];
		
		if(qCount&0x80000000)   { qCount+= pad[qCount&0xFFFF]; }
		else                    { res = qCount&0x20FAFB; qCount+= READ_PAD32_R(res); }

		res = (qCount+work3[qCount%160]) & 0x3FFFFF;
		qCount+= READ_PAD32_R(res);
		if(qCount&0xF0000000)        ++work3[qCount%320];

		res = READ_W32(qCount&0xFF);
		qCount+= READ_PAD32_R(res);
		work3[x%320]=work2[x&63]^(qCount&0xFF);

		res = ((qCount>>32)+work3[x%200]) & 0x3FFFFF;
		qCount+= READ_PAD32_R(res);
		
		#define OFFS (qCount&3)
		uint* ram = (uint *)(work3+((qCount%316)-OFFS));
		uint val = amd_bytealign((uint32)(qCount>>24), (uint32)(qCount>>24), (uint32)(4-OFFS));
		ram[0] ^= val&(0xFFFFFFFFL<<(OFFS<<3));
		ram[1] ^= val&(0xFFFFFFFFL>>(32-(OFFS<<3)));

		x += ((qCount&7)==3);

		qCount-= pad[x*x];
		if((qCount&0x07)==0x01) ++x;
	}

	uint s[8]= {0x6a09e667, 0xbb67ae85, 0x3c6ef372, 0xa54ff53a, 0x510e527f, 0x9b05688c, 0x1f83d9ab, 0x5be0cd19};

	Sha256_round(s, in);
	Sha256_round(s, in+64);
	Sha256_round(s, in+128);
	Sha256_round(s, in+192);
	Sha256_round(s, in+256);
	Sha256_round(s, in+320);
	Sha256_round(s, in+384);
	Sha256_round(s, in+448);
	Sha256_round_padding(s);
	
	if ((s[7] & 0x80FFFF) == 0)
	{
		out_param[nonce&0xFF] = get_global_id(0);
	}
}

Title: Re: EDITED TITLED: Solidcoin 2.0 GPU CUDA Nvidia Miner by MaGNET in Alpha Test
Post by: bulanula on November 10, 2011, 06:53:32 PM

Quote from: tacotime on November 10, 2011, 05:36:41 PM

bump, I'm wondering if anyone with an AMD card has tried AMD's bitwise rotation function, it's supposed to be much faster than coding for it in OCL:

Code:

//#pragma OPENCL EXTENSION cl_amd_media_ops : enable
//#define rot(x,y) amd_bitalign(x, x, (32-y))

edit: Appears someone uploaded an AMD optimized version of reaper.cl to pastebin, here it is:

Code:

typedef uint uint32_t;
typedef ulong uint64_t;
typedef uchar uint8_t;

typedef uint uint32;
typedef ulong uint64;

#define U8TO32(p) \
  (((uint32_t)((p)[0]) << 24) | ((uint32_t)((p)[1]) << 16) | \
   ((uint32_t)((p)[2]) <<  8) | ((uint32_t)((p)[3])      ))
#define U8TO64(p) \
  (((uint64_t)U8TO32(p) << 32) | (uint64_t)U8TO32((p) + 4))
#define U32TO8(p, v) \
    (p)[0] = (uint8_t)((v) >> 24); (p)[1] = (uint8_t)((v) >> 16); \
    (p)[2] = (uint8_t)((v) >>  8); (p)[3] = (uint8_t)((v)      ); 
#define U64TO8(p, v) \
    U32TO8((p),     (uint32_t)((v) >> 32));	\
    U32TO8((p) + 4, (uint32_t)((v)      )); 

#pragma OPENCL EXTENSION cl_khr_byte_addressable_store : enable
#pragma OPENCL EXTENSION cl_amd_media_ops : enable

/*typedef struct  { 
  uint64_t h[8];
  uint8_t buf[128];
} state;*/

__constant uint8_t sigma[256] = 
{
     0, 1, 2, 3, 4, 5, 6, 7, 8, 9,10,11,12,13,14,15 ,
    14,10, 4, 8, 9,15,13, 6, 1,12, 0, 2,11, 7, 5, 3 ,
    11, 8,12, 0, 5, 2,15,13,10,14, 3, 6, 7, 1, 9, 4 ,
     7, 9, 3, 1,13,12,11,14, 2, 6, 5,10, 4, 0,15, 8 ,
     9, 0, 5, 7, 2, 4,10,15,14, 1,11,12, 6, 8, 3,13 ,
     2,12, 6,10, 0,11, 8, 3, 4,13, 7, 5,15,14, 1, 9 ,
    12, 5, 1,15,14,13, 4,10, 0, 7, 6, 3, 9, 2, 8,11 ,
    13,11, 7,14,12, 1, 3, 9, 5, 0,15, 4, 8, 6, 2,10 ,
     6,15,14, 9,11, 3, 0, 8,12, 2,13, 7, 1, 4,10, 5 ,
    10, 2, 8, 4, 7, 6, 1, 5,15,11, 9,14, 3,12,13 ,0 ,
     0, 1, 2, 3, 4, 5, 6, 7, 8, 9,10,11,12,13,14,15 ,
    14,10, 4, 8, 9,15,13, 6, 1,12, 0, 2,11, 7, 5, 3 ,
    11, 8,12, 0, 5, 2,15,13,10,14, 3, 6, 7, 1, 9, 4 ,
     7, 9, 3, 1,13,12,11,14, 2, 6, 5,10, 4, 0,15, 8 ,
     9, 0, 5, 7, 2, 4,10,15,14, 1,11,12, 6, 8, 3,13 ,
     2,12, 6,10, 0,11, 8, 3, 4,13, 7, 5,15,14, 1, 9 
};

__constant uint64_t cst[16] = 
{
  0x243F6A8885A308D3UL,0x13198A2E03707344UL,0xA4093822299F31D0UL,0x082EFA98EC4E6C89UL,
  0x452821E638D01377UL,0xBE5466CF34E90C6CUL,0xC0AC29B7C97C50DDUL,0x3F84D5B5B5470917UL,
  0x9216D5D98979FB1BUL,0xD1310BA698DFB5ACUL,0x2FFD72DBD01ADFB7UL,0xB8E1AFED6A267E96UL,
  0xBA7C9045F12C7F99UL,0x24A19947B3916CF7UL,0x0801F2E2858EFC16UL,0x636920D871574E69UL
};
  
__constant uint K[64] = 
{ 
    0x428a2f98, 0x71374491, 0xb5c0fbcf, 0xe9b5dba5, 0x3956c25b, 0x59f111f1, 0x923f82a4, 0xab1c5ed5,
    0xd807aa98, 0x12835b01, 0x243185be, 0x550c7dc3, 0x72be5d74, 0x80deb1fe, 0x9bdc06a7, 0xc19bf174,
    0xe49b69c1, 0xefbe4786, 0x0fc19dc6, 0x240ca1cc, 0x2de92c6f, 0x4a7484aa, 0x5cb0a9dc, 0x76f988da,
    0x983e5152, 0xa831c66d, 0xb00327c8, 0xbf597fc7, 0xc6e00bf3, 0xd5a79147, 0x06ca6351, 0x14292967,
    0x27b70a85, 0x2e1b2138, 0x4d2c6dfc, 0x53380d13, 0x650a7354, 0x766a0abb, 0x81c2c92e, 0x92722c85,
    0xa2bfe8a1, 0xa81a664b, 0xc24b8b70, 0xc76c51a3, 0xd192e819, 0xd6990624, 0xf40e3585, 0x106aa070,
    0x19a4c116, 0x1e376c08, 0x2748774c, 0x34b0bcb5, 0x391c0cb3, 0x4ed8aa4a, 0x5b9cca4f, 0x682e6ff3,
    0x748f82ee, 0x78a5636f, 0x84c87814, 0x8cc70208, 0x90befffa, 0xa4506ceb, 0xbef9a3f7, 0xc67178f2
};

//uint rotl(uint x, uint y)
//{
//	return (x<<y)|(x>>(32-y));
//}

#define rotl(x, y) amd_bitalign(x, x, (uint)(32 - y))


//#define Ch(x, y, z) (z ^ (x & (y ^ z)))
#define Ma(x, y, z) ((y & z) | (x & (y | z)))

#define Ch(x, y, z) bitselect(z,y,x)
// Ma can also be implemented in terms of bitselect
//#define Ma(y, z, x) bitselect(z^x,y,x)


#define Tr(x,a,b,c) (rotl(x,a)^rotl(x,b)^rotl(x,c))

#define R(x) (work[x] = (rotl(work[x-2],15)^rotl(work[x-2],13)^((work[x-2])>>10)) + work[x-7] + (rotl(work[x-15],25)^rotl(work[x-15],14)^((work[x-15])>>3)) + work[x-16])
#define sharound(a,b,c,d,e,f,g,h,x,K) h+=Tr(e,7,21,26)+Ch(e,f,g)+K+x; d+=h; h+=Tr(a,10,19,30)+Ma(a,b,c);
#define sharound_s(a,b,c,d,e,f,g,h,x) h+=Tr(e,7,21,26)+Ch(e,f,g)+x; d+=h; h+=Tr(a,10,19,30)+Ma(a,b,c);

uint EndianSwap(uint n)
{
	return ((n&0xFF)<<24) | ((n&0xFF00)<<8) | ((n&0xFF0000)>>8) | ((n&0xFF000000)>>24);
}

void Sha256_round(uint* s, unsigned char* data)
{
	uint work[64];

	uint* udata = (uint*)data;
#pragma unroll
	for(uint i=0; i<16; ++i)
	{
		work[i] = EndianSwap(udata[i]);
	}

	uint A = s[0];
	uint B = s[1];
	uint C = s[2];
	uint D = s[3];
	uint E = s[4];
	uint F = s[5];
	uint G = s[6];
	uint H = s[7];
	sharound(A,B,C,D,E,F,G,H,work[0],K[0]);
	sharound(H,A,B,C,D,E,F,G,work[1],K[1]);
	sharound(G,H,A,B,C,D,E,F,work[2],K[2]);
	sharound(F,G,H,A,B,C,D,E,work[3],K[3]);
	sharound(E,F,G,H,A,B,C,D,work[4],K[4]);
	sharound(D,E,F,G,H,A,B,C,work[5],K[5]);
	sharound(C,D,E,F,G,H,A,B,work[6],K[6]);
	sharound(B,C,D,E,F,G,H,A,work[7],K[7]);
	sharound(A,B,C,D,E,F,G,H,work[8],K[8]);
	sharound(H,A,B,C,D,E,F,G,work[9],K[9]);
	sharound(G,H,A,B,C,D,E,F,work[10],K[10]);
	sharound(F,G,H,A,B,C,D,E,work[11],K[11]);
	sharound(E,F,G,H,A,B,C,D,work[12],K[12]);
	sharound(D,E,F,G,H,A,B,C,work[13],K[13]);
	sharound(C,D,E,F,G,H,A,B,work[14],K[14]);
	sharound(B,C,D,E,F,G,H,A,work[15],K[15]);
	sharound(A,B,C,D,E,F,G,H,R(16),K[16]);
	sharound(H,A,B,C,D,E,F,G,R(17),K[17]);
	sharound(G,H,A,B,C,D,E,F,R(18),K[18]);
	sharound(F,G,H,A,B,C,D,E,R(19),K[19]);
	sharound(E,F,G,H,A,B,C,D,R(20),K[20]);
	sharound(D,E,F,G,H,A,B,C,R(21),K[21]);
	sharound(C,D,E,F,G,H,A,B,R(22),K[22]);
	sharound(B,C,D,E,F,G,H,A,R(23),K[23]);
	sharound(A,B,C,D,E,F,G,H,R(24),K[24]);
	sharound(H,A,B,C,D,E,F,G,R(25),K[25]);
	sharound(G,H,A,B,C,D,E,F,R(26),K[26]);
	sharound(F,G,H,A,B,C,D,E,R(27),K[27]);
	sharound(E,F,G,H,A,B,C,D,R(28),K[28]);
	sharound(D,E,F,G,H,A,B,C,R(29),K[29]);
	sharound(C,D,E,F,G,H,A,B,R(30),K[30]);
	sharound(B,C,D,E,F,G,H,A,R(31),K[31]);
	sharound(A,B,C,D,E,F,G,H,R(32),K[32]);
	sharound(H,A,B,C,D,E,F,G,R(33),K[33]);
	sharound(G,H,A,B,C,D,E,F,R(34),K[34]);
	sharound(F,G,H,A,B,C,D,E,R(35),K[35]);
	sharound(E,F,G,H,A,B,C,D,R(36),K[36]);
	sharound(D,E,F,G,H,A,B,C,R(37),K[37]);
	sharound(C,D,E,F,G,H,A,B,R(38),K[38]);
	sharound(B,C,D,E,F,G,H,A,R(39),K[39]);
	sharound(A,B,C,D,E,F,G,H,R(40),K[40]);
	sharound(H,A,B,C,D,E,F,G,R(41),K[41]);
	sharound(G,H,A,B,C,D,E,F,R(42),K[42]);
	sharound(F,G,H,A,B,C,D,E,R(43),K[43]);
	sharound(E,F,G,H,A,B,C,D,R(44),K[44]);
	sharound(D,E,F,G,H,A,B,C,R(45),K[45]);
	sharound(C,D,E,F,G,H,A,B,R(46),K[46]);
	sharound(B,C,D,E,F,G,H,A,R(47),K[47]);
	sharound(A,B,C,D,E,F,G,H,R(48),K[48]);
	sharound(H,A,B,C,D,E,F,G,R(49),K[49]);
	sharound(G,H,A,B,C,D,E,F,R(50),K[50]);
	sharound(F,G,H,A,B,C,D,E,R(51),K[51]);
	sharound(E,F,G,H,A,B,C,D,R(52),K[52]);
	sharound(D,E,F,G,H,A,B,C,R(53),K[53]);
	sharound(C,D,E,F,G,H,A,B,R(54),K[54]);
	sharound(B,C,D,E,F,G,H,A,R(55),K[55]);
	sharound(A,B,C,D,E,F,G,H,R(56),K[56]);
	sharound(H,A,B,C,D,E,F,G,R(57),K[57]);
	sharound(G,H,A,B,C,D,E,F,R(58),K[58]);
	sharound(F,G,H,A,B,C,D,E,R(59),K[59]);
	sharound(E,F,G,H,A,B,C,D,R(60),K[60]);
	sharound(D,E,F,G,H,A,B,C,R(61),K[61]);
	sharound(C,D,E,F,G,H,A,B,R(62),K[62]);
	sharound(B,C,D,E,F,G,H,A,R(63),K[63]);

	s[0] += A;
	s[1] += B;
	s[2] +=	C;
	s[3] += D;
	s[4] += E;
	s[5] += F;
	s[6] += G;
	s[7] += H;
}

__constant uint P[64] =
{
	0xc28a2f98, 0x71374491, 0xb5c0fbcf, 0xe9b5dba5, 0x3956c25b, 0x59f111f1, 0x923f82a4, 0xab1c5ed5,
	0xd807aa98, 0x12835b01, 0x243185be, 0x550c7dc3, 0x72be5d74, 0x80deb1fe, 0x9bdc06a7, 0xc19c0174,
	0x649b69c1, 0xf9be478a, 0x0fe1edc6, 0x240ca60c, 0x4fe9346f, 0x4d1c84ab, 0x61b94f1e, 0xf6f993db,
	0xe8465162, 0xad13066f, 0xb0214c0d, 0x695a0283, 0xa0323379, 0x2bd376e9, 0xe1d0537c, 0x03a244a0,
	0xfc13a4a5, 0xfafda43e, 0x56bea8bb, 0x445ec9b6, 0x39907315, 0x8c0d4e9f, 0xc832dccc, 0xdaffb65b,
	0x1fed4f61, 0x2f646808, 0x1ff32294, 0x2634ccd7, 0xb0ebdefa, 0xd6fc592b, 0xa63c5c8f, 0xbe9fbab9,
	0x0158082c, 0x68969712, 0x51e1d7e1, 0x5cf12d0d, 0xc4be2155, 0x7d7c8a34, 0x611f2c60, 0x036324af,
	0xa4f08d87, 0x9e3e8435, 0x2c6dae30, 0x11921afc, 0xb76d720e, 0x245f3661, 0xc3a65ecb, 0x43b9e908
};

void Sha256_round_padding(uint* s)
{
	uint A = s[0];
	uint B = s[1];
	uint C = s[2];
	uint D = s[3];
	uint E = s[4];
	uint F = s[5];
	uint G = s[6];
	uint H = s[7];
	sharound_s(A,B,C,D,E,F,G,H,P[0]);
	sharound_s(H,A,B,C,D,E,F,G,P[1]);
	sharound_s(G,H,A,B,C,D,E,F,P[2]);
	sharound_s(F,G,H,A,B,C,D,E,P[3]);
	sharound_s(E,F,G,H,A,B,C,D,P[4]);
	sharound_s(D,E,F,G,H,A,B,C,P[5]);
	sharound_s(C,D,E,F,G,H,A,B,P[6]);
	sharound_s(B,C,D,E,F,G,H,A,P[7]);
	sharound_s(A,B,C,D,E,F,G,H,P[8]);
	sharound_s(H,A,B,C,D,E,F,G,P[9]);
	sharound_s(G,H,A,B,C,D,E,F,P[10]);
	sharound_s(F,G,H,A,B,C,D,E,P[11]);
	sharound_s(E,F,G,H,A,B,C,D,P[12]);
	sharound_s(D,E,F,G,H,A,B,C,P[13]);
	sharound_s(C,D,E,F,G,H,A,B,P[14]);
	sharound_s(B,C,D,E,F,G,H,A,P[15]);
	sharound_s(A,B,C,D,E,F,G,H,P[16]);
	sharound_s(H,A,B,C,D,E,F,G,P[17]);
	sharound_s(G,H,A,B,C,D,E,F,P[18]);
	sharound_s(F,G,H,A,B,C,D,E,P[19]);
	sharound_s(E,F,G,H,A,B,C,D,P[20]);
	sharound_s(D,E,F,G,H,A,B,C,P[21]);
	sharound_s(C,D,E,F,G,H,A,B,P[22]);
	sharound_s(B,C,D,E,F,G,H,A,P[23]);
	sharound_s(A,B,C,D,E,F,G,H,P[24]);
	sharound_s(H,A,B,C,D,E,F,G,P[25]);
	sharound_s(G,H,A,B,C,D,E,F,P[26]);
	sharound_s(F,G,H,A,B,C,D,E,P[27]);
	sharound_s(E,F,G,H,A,B,C,D,P[28]);
	sharound_s(D,E,F,G,H,A,B,C,P[29]);
	sharound_s(C,D,E,F,G,H,A,B,P[30]);
	sharound_s(B,C,D,E,F,G,H,A,P[31]);
	sharound_s(A,B,C,D,E,F,G,H,P[32]);
	sharound_s(H,A,B,C,D,E,F,G,P[33]);
	sharound_s(G,H,A,B,C,D,E,F,P[34]);
	sharound_s(F,G,H,A,B,C,D,E,P[35]);
	sharound_s(E,F,G,H,A,B,C,D,P[36]);
	sharound_s(D,E,F,G,H,A,B,C,P[37]);
	sharound_s(C,D,E,F,G,H,A,B,P[38]);
	sharound_s(B,C,D,E,F,G,H,A,P[39]);
	sharound_s(A,B,C,D,E,F,G,H,P[40]);
	sharound_s(H,A,B,C,D,E,F,G,P[41]);
	sharound_s(G,H,A,B,C,D,E,F,P[42]);
	sharound_s(F,G,H,A,B,C,D,E,P[43]);
	sharound_s(E,F,G,H,A,B,C,D,P[44]);
	sharound_s(D,E,F,G,H,A,B,C,P[45]);
	sharound_s(C,D,E,F,G,H,A,B,P[46]);
	sharound_s(B,C,D,E,F,G,H,A,P[47]);
	sharound_s(A,B,C,D,E,F,G,H,P[48]);
	sharound_s(H,A,B,C,D,E,F,G,P[49]);
	sharound_s(G,H,A,B,C,D,E,F,P[50]);
	sharound_s(F,G,H,A,B,C,D,E,P[51]);
	sharound_s(E,F,G,H,A,B,C,D,P[52]);
	sharound_s(D,E,F,G,H,A,B,C,P[53]);
	sharound_s(C,D,E,F,G,H,A,B,P[54]);
	sharound_s(B,C,D,E,F,G,H,A,P[55]);
	sharound_s(A,B,C,D,E,F,G,H,P[56]);
	sharound_s(H,A,B,C,D,E,F,G,P[57]);
	sharound_s(G,H,A,B,C,D,E,F,P[58]);
	sharound_s(F,G,H,A,B,C,D,E,P[59]);
	sharound_s(E,F,G,H,A,B,C,D,P[60]);

	s[7] += H;
}



#define ROT(x,n) (((x)<<(64-n))|( (x)>>(n)))

#define G(m,a,b,c,d,e,i)					\
  v[a] += (m[sigma[i+e]] ^ cst[sigma[i+e+1]]) + v[b];	\
  v[d] = ROT( v[d] ^ v[a],32);				\
  v[c] += v[d];						\
  v[b] = ROT( v[b] ^ v[c],25);				\
  v[a] += (m[sigma[i+e+1]] ^ cst[sigma[i+e]])+v[b];	\
  v[d] = ROT( v[d] ^ v[a],16);				\
  v[c] += v[d];						\
  v[b] = ROT( v[b] ^ v[c],11);				

//assumes input is 512 bytes
__kernel void search(__global uint8_t* in_param, __global uint* out_param, __global uint8_t* pad) 
{
	uchar in[512];
#pragma unroll
	for(uint i=0; i<128; ++i)
	in[i] = in_param[i];

	uint nonce = get_global_id(0);
	
	*(uint*)(in+108) = nonce;

	uint64_t h[8];
	h[0]=0x6A09E667F3BCC908UL;
	h[1]=0xBB67AE8584CAA73BUL;
	h[2]=0x3C6EF372FE94F82BUL;
	h[3]=0xA54FF53A5F1D36F1UL;
	h[4]=0x510E527FADE682D1UL;
	h[5]=0x9B05688C2B3E6C1FUL;
	h[6]=0x1F83D9ABFB41BD6BUL;
	h[7]=0x5BE0CD19137E2179UL;

	uint64_t v[16];
#pragma unroll
	for(uint i=0; i< 8;++i)  v[i] = h[i];
	v[ 8] = 0x243F6A8885A308D3UL;
	v[ 9] = 0x13198A2E03707344UL;
	v[10] = 0xA4093822299F31D0UL;
	v[11] = 0x082EFA98EC4E6C89UL;
	v[12] = 0x452821E638D01777UL;
	v[13] = 0xBE5466CF34E9086CUL;
	v[14] = 0xC0AC29B7C97C50DDUL;
	v[15] = 0x3F84D5B5B5470917UL;

	{
		uint64_t m[16];
#pragma unroll
		for(uint i=0; i<16;++i)  m[i] = U8TO64(in + i*8);
		uint i=0;
		G( m, 0, 4, 8,12, 0, i); G( m, 1, 5, 9,13, 2, i); G( m, 2, 6,10,14, 4, i); G( m, 3, 7,11,15, 6, i);
		G( m, 3, 4, 9,14,14, i); G( m, 2, 7, 8,13,12, i); G( m, 0, 5,10,15, 8, i); G( m, 1, 6,11,12,10, i);
		i+=16;
		G( m, 0, 4, 8,12, 0, i); G( m, 1, 5, 9,13, 2, i); G( m, 2, 6,10,14, 4, i); G( m, 3, 7,11,15, 6, i);
		G( m, 3, 4, 9,14,14, i); G( m, 2, 7, 8,13,12, i); G( m, 0, 5,10,15, 8, i); G( m, 1, 6,11,12,10, i);
		i+=16;
		G( m, 0, 4, 8,12, 0, i); G( m, 1, 5, 9,13, 2, i); G( m, 2, 6,10,14, 4, i); G( m, 3, 7,11,15, 6, i);
		G( m, 3, 4, 9,14,14, i); G( m, 2, 7, 8,13,12, i); G( m, 0, 5,10,15, 8, i); G( m, 1, 6,11,12,10, i);
		i+=16;
		G( m, 0, 4, 8,12, 0, i); G( m, 1, 5, 9,13, 2, i); G( m, 2, 6,10,14, 4, i); G( m, 3, 7,11,15, 6, i);
		G( m, 3, 4, 9,14,14, i); G( m, 2, 7, 8,13,12, i); G( m, 0, 5,10,15, 8, i); G( m, 1, 6,11,12,10, i);
		i+=16;
		G( m, 0, 4, 8,12, 0, i); G( m, 1, 5, 9,13, 2, i); G( m, 2, 6,10,14, 4, i); G( m, 3, 7,11,15, 6, i);
		G( m, 3, 4, 9,14,14, i); G( m, 2, 7, 8,13,12, i); G( m, 0, 5,10,15, 8, i); G( m, 1, 6,11,12,10, i);
		i+=16;
		G( m, 0, 4, 8,12, 0, i); G( m, 1, 5, 9,13, 2, i); G( m, 2, 6,10,14, 4, i); G( m, 3, 7,11,15, 6, i);
		G( m, 3, 4, 9,14,14, i); G( m, 2, 7, 8,13,12, i); G( m, 0, 5,10,15, 8, i); G( m, 1, 6,11,12,10, i);
		i+=16;
		G( m, 0, 4, 8,12, 0, i); G( m, 1, 5, 9,13, 2, i); G( m, 2, 6,10,14, 4, i); G( m, 3, 7,11,15, 6, i);
		G( m, 3, 4, 9,14,14, i); G( m, 2, 7, 8,13,12, i); G( m, 0, 5,10,15, 8, i); G( m, 1, 6,11,12,10, i);
		i+=16;
		G( m, 0, 4, 8,12, 0, i); G( m, 1, 5, 9,13, 2, i); G( m, 2, 6,10,14, 4, i); G( m, 3, 7,11,15, 6, i);
		G( m, 3, 4, 9,14,14, i); G( m, 2, 7, 8,13,12, i); G( m, 0, 5,10,15, 8, i); G( m, 1, 6,11,12,10, i);
		i+=16;
		G( m, 0, 4, 8,12, 0, i); G( m, 1, 5, 9,13, 2, i); G( m, 2, 6,10,14, 4, i); G( m, 3, 7,11,15, 6, i);
		G( m, 3, 4, 9,14,14, i); G( m, 2, 7, 8,13,12, i); G( m, 0, 5,10,15, 8, i); G( m, 1, 6,11,12,10, i);
		i+=16;
		G( m, 0, 4, 8,12, 0, i); G( m, 1, 5, 9,13, 2, i); G( m, 2, 6,10,14, 4, i); G( m, 3, 7,11,15, 6, i);
		G( m, 3, 4, 9,14,14, i); G( m, 2, 7, 8,13,12, i); G( m, 0, 5,10,15, 8, i); G( m, 1, 6,11,12,10, i);
		i+=16;
		G( m, 0, 4, 8,12, 0, i); G( m, 1, 5, 9,13, 2, i); G( m, 2, 6,10,14, 4, i); G( m, 3, 7,11,15, 6, i);
		G( m, 3, 4, 9,14,14, i); G( m, 2, 7, 8,13,12, i); G( m, 0, 5,10,15, 8, i); G( m, 1, 6,11,12,10, i);
		i+=16;
		G( m, 0, 4, 8,12, 0, i); G( m, 1, 5, 9,13, 2, i); G( m, 2, 6,10,14, 4, i); G( m, 3, 7,11,15, 6, i);
		G( m, 3, 4, 9,14,14, i); G( m, 2, 7, 8,13,12, i); G( m, 0, 5,10,15, 8, i); G( m, 1, 6,11,12,10, i);
		i+=16;
		G( m, 0, 4, 8,12, 0, i); G( m, 1, 5, 9,13, 2, i); G( m, 2, 6,10,14, 4, i); G( m, 3, 7,11,15, 6, i);
		G( m, 3, 4, 9,14,14, i); G( m, 2, 7, 8,13,12, i); G( m, 0, 5,10,15, 8, i); G( m, 1, 6,11,12,10, i);
		i+=16;
		G( m, 0, 4, 8,12, 0, i); G( m, 1, 5, 9,13, 2, i); G( m, 2, 6,10,14, 4, i); G( m, 3, 7,11,15, 6, i);
		G( m, 3, 4, 9,14,14, i); G( m, 2, 7, 8,13,12, i); G( m, 0, 5,10,15, 8, i); G( m, 1, 6,11,12,10, i);
		i+=16;
		G( m, 0, 4, 8,12, 0, i); G( m, 1, 5, 9,13, 2, i); G( m, 2, 6,10,14, 4, i); G( m, 3, 7,11,15, 6, i);
		G( m, 3, 4, 9,14,14, i); G( m, 2, 7, 8,13,12, i); G( m, 0, 5,10,15, 8, i); G( m, 1, 6,11,12,10, i);
		i+=16;
		G( m, 0, 4, 8,12, 0, i); G( m, 1, 5, 9,13, 2, i); G( m, 2, 6,10,14, 4, i); G( m, 3, 7,11,15, 6, i);
		G( m, 3, 4, 9,14,14, i); G( m, 2, 7, 8,13,12, i); G( m, 0, 5,10,15, 8, i); G( m, 1, 6,11,12,10, i);
	} 

#pragma unroll
	for(uint i=0; i<16;++i)  h[i&7] ^= v[i]; 

#pragma unroll
	for(uint i=0; i< 8;++i)  v[i] = h[i];
	v[8] = 0x243F6A8885A308D3UL;
	v[9] = 0x13198A2E03707344UL;
	v[10] = 0xA4093822299F31D0UL;
	v[11] = 0x082EFA98EC4E6C89UL;
	v[12] = 0x452821E638D01377UL;
	v[13] = 0xBE5466CF34E90C6CUL;
	v[14] = 0xC0AC29B7C97C50DDUL;
	v[15] = 0x3F84D5B5B5470917UL;

	{
		uint64_t m2[16] = {1UL << 63, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0x400};
		uint i=0;
		G(m2, 0, 4, 8,12, 0, i); G(m2, 1, 5, 9,13, 2, i); G(m2, 2, 6,10,14, 4, i); G(m2, 3, 7,11,15, 6, i);
		G(m2, 3, 4, 9,14,14, i); G(m2, 2, 7, 8,13,12, i); G(m2, 0, 5,10,15, 8, i); G(m2, 1, 6,11,12,10, i);
		i+=16;
		G(m2, 0, 4, 8,12, 0, i); G(m2, 1, 5, 9,13, 2, i); G(m2, 2, 6,10,14, 4, i); G(m2, 3, 7,11,15, 6, i);
		G(m2, 3, 4, 9,14,14, i); G(m2, 2, 7, 8,13,12, i); G(m2, 0, 5,10,15, 8, i); G(m2, 1, 6,11,12,10, i);
		i+=16;
		G(m2, 0, 4, 8,12, 0, i); G(m2, 1, 5, 9,13, 2, i); G(m2, 2, 6,10,14, 4, i); G(m2, 3, 7,11,15, 6, i);
		G(m2, 3, 4, 9,14,14, i); G(m2, 2, 7, 8,13,12, i); G(m2, 0, 5,10,15, 8, i); G(m2, 1, 6,11,12,10, i);
		i+=16;
		G(m2, 0, 4, 8,12, 0, i); G(m2, 1, 5, 9,13, 2, i); G(m2, 2, 6,10,14, 4, i); G(m2, 3, 7,11,15, 6, i);
		G(m2, 3, 4, 9,14,14, i); G(m2, 2, 7, 8,13,12, i); G(m2, 0, 5,10,15, 8, i); G(m2, 1, 6,11,12,10, i);
		i+=16;
		G(m2, 0, 4, 8,12, 0, i); G(m2, 1, 5, 9,13, 2, i); G(m2, 2, 6,10,14, 4, i); G(m2, 3, 7,11,15, 6, i);
		G(m2, 3, 4, 9,14,14, i); G(m2, 2, 7, 8,13,12, i); G(m2, 0, 5,10,15, 8, i); G(m2, 1, 6,11,12,10, i);
		i+=16;
		G(m2, 0, 4, 8,12, 0, i); G(m2, 1, 5, 9,13, 2, i); G(m2, 2, 6,10,14, 4, i); G(m2, 3, 7,11,15, 6, i);
		G(m2, 3, 4, 9,14,14, i); G(m2, 2, 7, 8,13,12, i); G(m2, 0, 5,10,15, 8, i); G(m2, 1, 6,11,12,10, i);
		i+=16;
		G(m2, 0, 4, 8,12, 0, i); G(m2, 1, 5, 9,13, 2, i); G(m2, 2, 6,10,14, 4, i); G(m2, 3, 7,11,15, 6, i);
		G(m2, 3, 4, 9,14,14, i); G(m2, 2, 7, 8,13,12, i); G(m2, 0, 5,10,15, 8, i); G(m2, 1, 6,11,12,10, i);
		i+=16;
		G(m2, 0, 4, 8,12, 0, i); G(m2, 1, 5, 9,13, 2, i); G(m2, 2, 6,10,14, 4, i); G(m2, 3, 7,11,15, 6, i);
		G(m2, 3, 4, 9,14,14, i); G(m2, 2, 7, 8,13,12, i); G(m2, 0, 5,10,15, 8, i); G(m2, 1, 6,11,12,10, i);
		i+=16;
		G(m2, 0, 4, 8,12, 0, i); G(m2, 1, 5, 9,13, 2, i); G(m2, 2, 6,10,14, 4, i); G(m2, 3, 7,11,15, 6, i);
		G(m2, 3, 4, 9,14,14, i); G(m2, 2, 7, 8,13,12, i); G(m2, 0, 5,10,15, 8, i); G(m2, 1, 6,11,12,10, i);
		i+=16;
		G(m2, 0, 4, 8,12, 0, i); G(m2, 1, 5, 9,13, 2, i); G(m2, 2, 6,10,14, 4, i); G(m2, 3, 7,11,15, 6, i);
		G(m2, 3, 4, 9,14,14, i); G(m2, 2, 7, 8,13,12, i); G(m2, 0, 5,10,15, 8, i); G(m2, 1, 6,11,12,10, i);
		i+=16;
		G(m2, 0, 4, 8,12, 0, i); G(m2, 1, 5, 9,13, 2, i); G(m2, 2, 6,10,14, 4, i); G(m2, 3, 7,11,15, 6, i);
		G(m2, 3, 4, 9,14,14, i); G(m2, 2, 7, 8,13,12, i); G(m2, 0, 5,10,15, 8, i); G(m2, 1, 6,11,12,10, i);
		i+=16;
		G(m2, 0, 4, 8,12, 0, i); G(m2, 1, 5, 9,13, 2, i); G(m2, 2, 6,10,14, 4, i); G(m2, 3, 7,11,15, 6, i);
		G(m2, 3, 4, 9,14,14, i); G(m2, 2, 7, 8,13,12, i); G(m2, 0, 5,10,15, 8, i); G(m2, 1, 6,11,12,10, i);
		i+=16;
		G(m2, 0, 4, 8,12, 0, i); G(m2, 1, 5, 9,13, 2, i); G(m2, 2, 6,10,14, 4, i); G(m2, 3, 7,11,15, 6, i);
		G(m2, 3, 4, 9,14,14, i); G(m2, 2, 7, 8,13,12, i); G(m2, 0, 5,10,15, 8, i); G(m2, 1, 6,11,12,10, i);
		i+=16;
		G(m2, 0, 4, 8,12, 0, i); G(m2, 1, 5, 9,13, 2, i); G(m2, 2, 6,10,14, 4, i); G(m2, 3, 7,11,15, 6, i);
		G(m2, 3, 4, 9,14,14, i); G(m2, 2, 7, 8,13,12, i); G(m2, 0, 5,10,15, 8, i); G(m2, 1, 6,11,12,10, i);
		i+=16;
		G(m2, 0, 4, 8,12, 0, i); G(m2, 1, 5, 9,13, 2, i); G(m2, 2, 6,10,14, 4, i); G(m2, 3, 7,11,15, 6, i);
		G(m2, 3, 4, 9,14,14, i); G(m2, 2, 7, 8,13,12, i); G(m2, 0, 5,10,15, 8, i); G(m2, 1, 6,11,12,10, i);
		i+=16;
		G(m2, 0, 4, 8,12, 0, i); G(m2, 1, 5, 9,13, 2, i); G(m2, 2, 6,10,14, 4, i); G(m2, 3, 7,11,15, 6, i);
		G(m2, 3, 4, 9,14,14, i); G(m2, 2, 7, 8,13,12, i); G(m2, 0, 5,10,15, 8, i); G(m2, 1, 6,11,12,10, i);
	} 

#pragma unroll
	for(uint i=0; i<16;++i)  h[i&7] ^= v[i];

	uint8_t* work2 = in+128;

	U64TO8( work2 + 0, h[0]);
	U64TO8( work2 + 8, h[1]);
	U64TO8( work2 +16, h[2]);
	U64TO8( work2 +24, h[3]);
	U64TO8( work2 +32, h[4]);
	U64TO8( work2 +40, h[5]);
	U64TO8( work2 +48, h[6]);
	U64TO8( work2 +56, h[7]);
	
	uint8_t* work3 = work2+64;
//a = x-1, b = x, c = x&63
#define WORKINIT(a,b,c)   work3[a] ^= work2[c]; \
		if(work3[a]&0x80) work3[b]=in[(b+work3[a])&0x7F]; \
		else              work3[b]=work2[(b+work3[a])&0x3F];

	
	work3[0] = work2[15];
	WORKINIT(0,1,1);
	WORKINIT(1,2,2);
	WORKINIT(2,3,3);
#pragma unroll
	for(int x=4;x<64;++x)
	{
		WORKINIT(x-1,x,x);
		++x;
		WORKINIT(x-1,x,x);
		++x;
		WORKINIT(x-1,x,x);
		++x;
		WORKINIT(x-1,x,x);
	}
#pragma unroll
	for(int x=64;x<320;++x)
	{
		WORKINIT(x-1,x,x&63);
		++x;
		WORKINIT(x-1,x,x&63);
		++x;
		WORKINIT(x-1,x,x&63);
		++x;
		WORKINIT(x-1,x,x&63);
	}

	#define READ_PAD32_R(offset) ((uint)pad[offset] | (((uint)pad[offset+1])<<8) | (((uint)pad[offset+2])<<16) | (((uint)pad[offset+3])<<24))
	
	#define READ_W32(offset) ((uint)work3[offset] + (((uint)work3[(offset)+1])<<8) + (((uint)work3[(offset)+2]&0x3F)<<16))

	ushort* shortptr = (ushort*)(work3+310);
	uint64 qCount = shortptr[0];
	qCount |= ((uint64)shortptr[3])<<48;
	uint* uintptr = (uint*)(work3+312);
	qCount |= ((uint64)*uintptr)<<16;

	uint nExtra=(pad[(qCount+work3[300])&0x3FFFFF]>>3)+512;
#pragma unroll
	for(uint x=1;x<nExtra;++x)
	{
		uint res = 0;
		qCount += READ_PAD32_R((qCount&0x3FFFFF));
		work3[qCount%320] += (qCount&0x87878700) ? 1 : 0;

		qCount-= pad[(qCount+work3[qCount%160])&0x3FFFFF];
		
		if(qCount&0x80000000)   { qCount+= pad[qCount&0xFFFF]; }
		else                    { res = qCount&0x20FAFB; qCount+= READ_PAD32_R(res); }

		res = (qCount+work3[qCount%160]) & 0x3FFFFF;
		qCount+= READ_PAD32_R(res);
		if(qCount&0xF0000000)        ++work3[qCount%320];

		res = READ_W32(qCount&0xFF);
		qCount+= READ_PAD32_R(res);
		work3[x%320]=work2[x&63]^(qCount&0xFF);

		res = ((qCount>>32)+work3[x%200]) & 0x3FFFFF;
		qCount+= READ_PAD32_R(res);
		
		#define OFFS (qCount&3)
		uint* ram = (uint *)(work3+((qCount%316)-OFFS));
		uint val = amd_bytealign((uint32)(qCount>>24), (uint32)(qCount>>24), (uint32)(4-OFFS));
		ram[0] ^= val&(0xFFFFFFFFL<<(OFFS<<3));
		ram[1] ^= val&(0xFFFFFFFFL>>(32-(OFFS<<3)));

		x += ((qCount&7)==3);

		qCount-= pad[x*x];
		if((qCount&0x07)==0x01) ++x;
	}

	uint s[8]= {0x6a09e667, 0xbb67ae85, 0x3c6ef372, 0xa54ff53a, 0x510e527f, 0x9b05688c, 0x1f83d9ab, 0x5be0cd19};

	Sha256_round(s, in);
	Sha256_round(s, in+64);
	Sha256_round(s, in+128);
	Sha256_round(s, in+192);
	Sha256_round(s, in+256);
	Sha256_round(s, in+320);
	Sha256_round(s, in+384);
	Sha256_round(s, in+448);
	Sha256_round_padding(s);
	
	if ((s[7] & 0x80FFFF) == 0)
	{
		out_param[nonce&0xFF] = get_global_id(0);
	}
}

Good work !

Title: Re: EDITED TITLED: Solidcoin 2.0 GPU CUDA Nvidia Miner by MaGNET in Alpha Test
Post by: ThiagoCMC on November 15, 2011, 08:39:37 AM

Guys,

This reaper miner can be used to miner for Litecoins?!
It works/compiles on Linux?!

Thanks!
Thiago

Title: Re: EDITED TITLED: Solidcoin 2.0 GPU CUDA Nvidia Miner by MaGNET in Alpha Test
Post by: DeathAndTaxes on November 15, 2011, 02:21:20 PM

Quote from: ThiagoCMC on November 15, 2011, 08:39:37 AM

Guys,

This reaper miner can be used to miner for Litecoins?!
It works/compiles on Linux?!

Thanks!
Thiago

Unlikely. SC doesn't use same algorithm for hashing as LTC does (Scrypt).

Bitcoin Forum

Other => Archival => Topic started by: BitcoinEXpress on October 18, 2011, 12:11:42 PM