Bitcoin Forum
December 18, 2017, 04:23:08 AM *
News: Latest stable version of Bitcoin Core: 0.15.1  [Torrent].
 
   Home   Help Search Donate Login Register  
Pages: « 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 [23] 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 ... 75 »
  Print  
Author Topic: [ANN] [SKC] Skeincoin 0.9.3.1 | Skein-SHA2  (Read 158147 times)
oroqen
Sr. Member
****
Offline Offline

Activity: 280



View Profile
January 02, 2014, 03:40:54 PM
á#441

Are you sure its optimized? I cant compare the gpu performance but when it comes to cpu, poclbm-skc hashes around 50% slower than skeincoin-cpuminer on the same cpu. I dont have much time right now, but I can compare it tomorrow.

Maybe we can create some kind of a benchmark to see the performance of particular gpus?
In my case its:
7950 - 180MH/s
r9 280x 200MH/s
Overclocking can get you another 5-10MH/s

5870 at stock core (850MHz) - 95MH/s
7870 (Tahiti) at 880MHz - 94MH/s
6950 at 900MHz - 103MH/s
6930 at 860MHz - 93MHz

Basically all cards giving around 90-105 MHz on my setup, overclocking doesn't change it a lot. On scrypt 5870,7870,6930 have performance close to 400KHz, and 6950 runs at about 450KHz, so current Skein-SHA2 implementation has 230-280 times higher hashrate than scrypt.

Thanks for the numbers.
Looks like all your cards are also mining about 3x slower than SHA256 coins running on cgminer.

Not sure what is needed to get that 3x speedboost, cgminer has a lot of functions/tricks to be that fast...
This isn't a pure sha256 coin, comparing it to sha256 cgminer performance is moot, esp in the face of missing optimizations between the two, i'm not the first too point this out. However r9 280x overclocked to 1100mhz 207MH/s
Advertised sites are not endorsed by the Bitcoin Forum. They may be unsafe, untrustworthy, or illegal in your jurisdiction. Advertise here.
1513570988
Hero Member
*
Offline Offline

Posts: 1513570988

View Profile Personal Message (Offline)

Ignore
1513570988
Reply with quote  #2

1513570988
Report to moderator
broketech
Member
**
Offline Offline

Activity: 104


View Profile
January 02, 2014, 04:43:16 PM
á#442

My 5870 gets around 110 mhash.

Sysadmin - Troubleshooter - Armchair Debugger
BTC: 1PCocLTxLJP4L1d1Gigjhxoy2WypifA4Cy - UN: uQAR2PhjtdvNvbbh4JC4wJdx3SCh2W4xB4 - SKC: SR81M5iqLkRB6PjZgkpNkpz1G7KmY3zceL
escobol
Member
**
Offline Offline

Activity: 79


View Profile
January 02, 2014, 04:52:29 PM
á#443

WTS 10k, pm with offer
daemonfox
Hero Member
*****
Offline Offline

Activity: 630



View Profile
January 02, 2014, 05:29:26 PM
á#444

No dice on the above... not a valid win32 application.

Dug into the Windows error report and it seems to be something with the amdocl.dll

EDIT: Found out why.

For Win 7 64 you do not use the system32 folder... you need to rename amdocl.dll in the SYSWOW64 folder and then add the one linked here.

Working now... time to tweak!

EDIT2: Of course... this has now broken my cgminer for scrypt mining unless I swap the files back... NICE!
yeah figured as much, march=nocona is core generation intel, I didn't have a dll issue at all and can switch between the two easly, must be a driver/sdk issue
using driver 13-12_win7_win8_64_dd_ccc_whql, AMD-APP-SDK-v2.9-Windows-641 and cgminer 3.7.1 on r9 280x, win7 64 basic.


Well I am all set now... 193MH/s on my 7950 and got it running on the APU Devastator core on my 6800K... another 30MH/s.

Question if someone knows... what edits can be made to make this run and look for a DIFFERENT named amdocl.dll? I would like to make it call something like amdoclskc.dll and rename the custom amdocl64.dll made for this so I can put my reg one back and not have to file swap just to go back to scrypt mining.

1K SKC to the first functioning answer.

The fastest would be to create 2 bat files that does the renameing for you.
First one for SKC, to rename the original amdocl.dll to amdocl.new and the "new" amdocl64.dll to amdocl.dll.
Second one to reset the names above.

Good luck!

Tip SKC address: SkHZ8rTEehrmjFB6VffWh8K7eKgdppG2bq

Yes I can do that on my own... was not what I requested though.

Still looking... what modifications can be made to the existing scripts so that this can be forced to call amdoclskc.dll?

1K SKC still up for grabs.

.ZenCash. █████
█████ ██
▀▀▀▀▀ ▀▀
▄▄▄ ███
▀▀▀ ▀▀▀
   ████
   ▀▀▀▀
███   ▄
███  ▄▄▄
   ▄▄ ███
  ▄
▄▄▄▄▄
    █████
██ █████
█████
█████ ██
▀▀▀▀▀ ▀▀
▄▄▄ ███
▀▀▀ ▀▀▀
   ████
   ▀▀▀▀
███   ▄
███  ▄▄▄
   ▄▄ ███
  ▄
▄▄▄▄▄
    █████
██ █████
slack   III   ANN Thread   III   Blog
Telegram   III   forum   III   twitter

▬▬▬▬▬▬▬▬▬▬▬▬▬  ZEN
3dcgminer
Full Member
***
Offline Offline

Activity: 138


View Profile
January 02, 2014, 06:01:07 PM
á#445

Yes I can do that on my own... was not what I requested though.

Still looking... what modifications can be made to the existing scripts so that this can be forced to call amdoclskc.dll?

1K SKC still up for grabs.

Reorder had a better suggestion:
"I suggest that somebody steps in and replaces my quickhack dll with pure python implementation from http://pythonhosted.org/pyskein/"

Maybe better to offer the 1k bounty for a proper miner upgrade instead of another dll hack.
madjihad
Sr. Member
****
Offline Offline

Activity: 245


🌟 Šternity🌟 blockchain🌟


View Profile
January 02, 2014, 06:07:06 PM
á#446

I also noticed skein doesnt need high memory frequency so you can decrease it almost all the way down without any performance hit. Skein miner is probably not optimized yet and thats why it needs less power i guess.
The kernel does not use VRAM at all (save for tiny bit to pass back results to miner), so yes, you can downclock it to minimum. It is optimized, though, but for GCN. Chances are that if you replace rolhack functions with rotates, arrays with variables and unroll skein rounds, it will perform better on VLIW architectures.

Current sha256_res imlementation form skein.cl performs too slow. I've tried to test double SHA256 hashing on current kernel, and replaced skein call with one more sha256_res call:

Code:
    if(sha256_res((sha256_res(as_uint16(state)))) & 0xf0ffffff)
        return;
    output[OUTPUT_SIZE] = output[nonce & OUTPUT_MASK] = nonce;

And I've got 125MH/s on single 5870. So probably it's the bottleneck of current Skein-SHA256 opencl implementation.

I have never worked with opencl before, so I can miss something (or even everything Smiley)

3dcgminer
Full Member
***
Offline Offline

Activity: 138


View Profile
January 02, 2014, 06:12:27 PM
á#447

Thanks for the numbers.
Looks like all your cards are also mining about 3x slower than SHA256 coins running on cgminer.

Not sure what is needed to get that 3x speedboost, cgminer has a lot of functions/tricks to be that fast...
This isn't a pure sha256 coin, comparing it to sha256 cgminer performance is moot, esp in the face of missing optimizations between the two, i'm not the first too point this out. However r9 280x overclocked to 1100mhz 207MH/s

Lets compare Skeincoin with Blakecoin instead, running on AMD 7970/R9 280x at 1170MHz core and 300MHz mem:

Skein (poclbm-skc): 225MH/s
Blake (cgminer): 2800MH/s
SHA256 (cgminer): 700MH/s
Scrypt (cgminer): 700 KH/s (1050/1500MHz)

Moot or not, benchmarking is fun.
reorder
Sr. Member
****
Offline Offline

Activity: 462


View Profile
January 02, 2014, 07:46:49 PM
á#448

I also noticed skein doesnt need high memory frequency so you can decrease it almost all the way down without any performance hit. Skein miner is probably not optimized yet and thats why it needs less power i guess.
The kernel does not use VRAM at all (save for tiny bit to pass back results to miner), so yes, you can downclock it to minimum. It is optimized, though, but for GCN. Chances are that if you replace rolhack functions with rotates, arrays with variables and unroll skein rounds, it will perform better on VLIW architectures.

Current sha256_res imlementation form skein.cl performs too slow. I've tried to test double SHA256 hashing on current kernel, and replaced skein call with one more sha256_res call:

Code:
    if(sha256_res((sha256_res(as_uint16(state)))) & 0xf0ffffff)
        return;
    output[OUTPUT_SIZE] = output[nonce & OUTPUT_MASK] = nonce;

And I've got 125MH/s on single 5870. So probably it's the bottleneck of current Skein-SHA256 opencl implementation.

I have never worked with opencl before, so I can miss something (or even everything Smiley)
Yes, that W[] array is moved (by compiler) to registers on GCN, but apparently on VLIW it is not and uses global memory, which is slow. This can be improved of course (and first of all it does not have to be 62 elements long, 16 elements is enough if you reuse them). Just wonder how have you managed to compile sha256_res(sha256_res()): it takes uint16 vector as parameter, but returns only one uint.
madjihad
Sr. Member
****
Offline Offline

Activity: 245


🌟 Šternity🌟 blockchain🌟


View Profile
January 02, 2014, 08:22:28 PM
á#449

Yes, that W[] array is moved (by compiler) to registers on GCN, but apparently on VLIW it is not and uses global memory, which is slow. This can be improved of course (and first of all it does not have to be 62 elements long, 16 elements is enough if you reuse them). Just wonder how have you managed to compile sha256_res(sha256_res()): it takes uint16 vector as parameter, but returns only one uint.

I've tried both
Code:
(sha256_res((uint16)sha256_res(as_uint16(skein512_mid_impl(state, msg)))) & 0xf0ffffff)
and
Code:
(sha256_res(sha256_res(as_uint16(skein512_mid_impl(state, msg)))) & 0xf0ffffff)

And it compiles, probably only getting wrong results. But it still enough for test, as sha256_res runs twice, maybe only with wrong input on second run Smiley

Besides, double Skein runs and 780MH/s on 5870, so SHA256 is current bottleneck for sure. With good sha implementation we will be able to reach even better performance, than SHA256D Cheesy


reorder
Sr. Member
****
Offline Offline

Activity: 462


View Profile
January 02, 2014, 11:10:34 PM
á#450

Yes, that W[] array is moved (by compiler) to registers on GCN, but apparently on VLIW it is not and uses global memory, which is slow. This can be improved of course (and first of all it does not have to be 62 elements long, 16 elements is enough if you reuse them). Just wonder how have you managed to compile sha256_res(sha256_res()): it takes uint16 vector as parameter, but returns only one uint.

I've tried both
Code:
(sha256_res((uint16)sha256_res(as_uint16(skein512_mid_impl(state, msg)))) & 0xf0ffffff)
and
Code:
(sha256_res(sha256_res(as_uint16(skein512_mid_impl(state, msg)))) & 0xf0ffffff)

And it compiles, probably only getting wrong results. But it still enough for test, as sha256_res runs twice, maybe only with wrong input on second run Smiley

Besides, double Skein runs and 780MH/s on 5870, so SHA256 is current bottleneck for sure. With good sha implementation we will be able to reach even better performance, than SHA256D Cheesy


Casting uint to uint16 compiles on your system? Guess something is awfully wrong with it, then, it is against OpenCL spec (and common sense Smiley ). Anyway, it would be great if you manage to optimize sha256, I have only quickly thrown together something that worked for me and feel somewhat embarrassed now that it is public.
budabob07
Newbie
*
Offline Offline

Activity: 19


View Profile
January 02, 2014, 11:23:14 PM
á#451

Is skeincoin's SHA256 hash function the same as the one used in normal SHA256 coins? Or is it a variation?

I am no longer using this account. I have switched to the account 'JHermz'
madjihad
Sr. Member
****
Offline Offline

Activity: 245


🌟 Šternity🌟 blockchain🌟


View Profile
January 02, 2014, 11:49:02 PM
á#452

Casting uint to uint16 compiles on your system? Guess something is awfully wrong with it, then, it is against OpenCL spec (and common sense Smiley ). Anyway, it would be great if you manage to optimize sha256, I have only quickly thrown together something that worked for me and feel somewhat embarrassed now that it is public.

Yeah, compiles, even tried fresh checkout:D
Code:
    if(sha256_res(sha256_res(as_uint16(state))) & 0xf0ffffff)
        return;
Agree, that's really weird casting Smiley And no worries about the code, you kernel works, and thank you very much for developing and publishing it!!!

I've already moved W[62] and all it usages to kernel's search method, declared it as local, but haven't got any significant speedup. Tried to rid off all vectors, but made even much worse Smiley Now will try to use original poclbm sha256D search with reduced rounds (without 2nd call).

Is skeincoin's SHA256 hash function the same as the one used in normal SHA256 coins? Or is it a variation?

It's the same, but not double sha256 as in Bitcoin and other coins, so we can't just reuse well optimized bitcoin's kernel.

reorder
Sr. Member
****
Offline Offline

Activity: 462


View Profile
January 03, 2014, 12:48:35 AM
á#453

Casting uint to uint16 compiles on your system? Guess something is awfully wrong with it, then, it is against OpenCL spec (and common sense Smiley ). Anyway, it would be great if you manage to optimize sha256, I have only quickly thrown together something that worked for me and feel somewhat embarrassed now that it is public.

Yeah, compiles, even tried fresh checkout:D
Code:
    if(sha256_res(sha256_res(as_uint16(state))) & 0xf0ffffff)
        return;
Agree, that's really weird casting Smiley And no worries about the code, you kernel works, and thank you very much for developing and publishing it!!!

I've already moved W[62] and all it usages to kernel's search method, declared it as local, but haven't got any significant speedup. Tried to rid off all vectors, but made even much worse Smiley Now will try to use original poclbm sha256D search with reduced rounds (without 2nd call).

Is skeincoin's SHA256 hash function the same as the one used in normal SHA256 coins? Or is it a variation?

It's the same, but not double sha256 as in Bitcoin and other coins, so we can't just reuse well optimized bitcoin's kernel.
Looks like you misunderstand the local memory, it is shared between threads in a workgroup, and you do not want all threads writing in this array simultaneously. The idea is to get rid of array and replace it wilt just 16 uint variables: AMD stores arrays in global memory when optimization cannot coerce them into registers (and optimization is not necessarily supersmart), and uint variables are always mapped to registers (then there may be register spilling if you don't have enough of them, but at least you get a warning).
madjihad
Sr. Member
****
Offline Offline

Activity: 245


🌟 Šternity🌟 blockchain🌟


View Profile
January 03, 2014, 01:07:01 AM
á#454

Looks like you misunderstand the local memory, it is shared between threads in a workgroup, and you do not want all threads writing in this array simultaneously. The idea is to get rid of array and replace it wilt just 16 uint variables: AMD stores arrays in global memory when optimization cannot coerce them into registers (and optimization is not necessarily supersmart), and uint variables are always mapped to registers (then there may be register spilling if you don't have enough of them, but at least you get a warning).

Ah, big thanks for the clarification. Probably the time to read opencl documentation has come Smiley
And it's not the last my fail for today Embarrassed I've tested double Skein today and it was runing at 780MH/s on 5870 only due to avoiding last bitwise AND check (hash & 0xf0ffffff). So it's wrong data Sad And maybe your SHA256 implementation performs really well and there might be no reason to change it. Will start testing and search of the bottleneck from a scratch tomorrow...sorry for disinformation  Embarrassed

qiuzhixin15
Sr. Member
****
Offline Offline

Activity: 462


View Profile
January 03, 2014, 03:57:55 AM
á#455

good coin
broketech
Member
**
Offline Offline

Activity: 104


View Profile
January 03, 2014, 04:36:02 PM
á#456

Quote
good coin

this is going on the hard money.

Sysadmin - Troubleshooter - Armchair Debugger
BTC: 1PCocLTxLJP4L1d1Gigjhxoy2WypifA4Cy - UN: uQAR2PhjtdvNvbbh4JC4wJdx3SCh2W4xB4 - SKC: SR81M5iqLkRB6PjZgkpNkpz1G7KmY3zceL
madjihad
Sr. Member
****
Offline Offline

Activity: 245


🌟 Šternity🌟 blockchain🌟


View Profile
January 03, 2014, 04:46:25 PM
á#457

I've managed to run CodeXL's GPU profiler on poclbm-skc Smiley I've used cx_Freeze-4.3.2 to generate executable form .py files. Here is the detailed profiler's output:
http://s000.tinyupload.com/index.php?file_id=35957564424055866207

There are only few memory leaks detected and a lot of unnecessary synchronizations warnings. So probably the only thing left is kernel's search method itself. I will try to use only vector types there, but it could take a while with my current poor knowledge of OpenCL Sad

surfer43
Sr. Member
****
Offline Offline

Activity: 420



View Profile
January 04, 2014, 01:10:24 AM
á#458

skeincon ad astra! 120
0_0
escobol
Member
**
Offline Offline

Activity: 79


View Profile
January 04, 2014, 10:36:23 AM
á#459

Per aspera.... Smiley
MethMatician
Full Member
***
Offline Offline

Activity: 132


View Profile
January 04, 2014, 11:15:28 AM
á#460

skeincon ad astra! 120
0_0

et ultra! Cheesy
Pages: « 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 [23] 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 ... 75 »
  Print  
 
Jump to:  

Sponsored by , a Bitcoin-accepting VPN.
Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!