ecohash
Newbie
Offline
Activity: 34
Merit: 0
|
|
January 17, 2017, 09:43:57 PM |
|
Each RX 480 CU hosts four texture units, 16KB of L1 cache, a 64KB local data share, and register space for the vector and scalar units. AMD says it made a number of tweaks to improve the CU’s efficiency, including the addition of native FP16 (and Int16) support, tuned cache access and better instruction prefetching. Altogether, the changes purportedly yield up to 15% more performance per CU than the Radeon R9 290’s Hawaii GPU, which is based on a second-gen GCN architecture.
|
|
|
|
reb0rn21
Legendary
Offline
Activity: 1901
Merit: 1024
|
|
January 17, 2017, 10:23:01 PM |
|
As far I know both RX 4xx and GTX 1070 have "just" 2MB of L2 cash
|
|
|
|
th00ber
|
|
January 18, 2017, 12:21:18 AM |
|
vcruntime140.dll missing both Win7 and Win10 I tried to reinstall VC Redist / DL missing lib
But not working ... any tips on how to run this in windows ?
|
|
|
|
Jdope
|
|
January 18, 2017, 12:47:43 AM |
|
It might not be the best place to ask but, what are the skills used in making such mining softwares, what are the core subjects that one needs to have a good grasp on to have that low level (i assume) knowledge?
|
|
|
|
zawawa (OP)
Sr. Member
Offline
Activity: 728
Merit: 304
Miner Developer
|
|
January 18, 2017, 01:16:23 AM |
|
vcruntime140.dll missing both Win7 and Win10 I tried to reinstall VC Redist / DL missing lib
But not working ... any tips on how to run this in windows ?
That's pretty weird... Are you using a 32-bit version of Windows by any chance?
|
Gateless Gate Sharp, an open-source ETH/XMR miner: http://bit.ly/2rJ2x4VBTC: 1BHwDWVerUTiKxhHPf2ubqKKiBMiKQGomZ
|
|
|
zawawa (OP)
Sr. Member
Offline
Activity: 728
Merit: 304
Miner Developer
|
|
January 18, 2017, 01:28:36 AM |
|
It might not be the best place to ask but, what are the skills used in making such mining softwares, what are the core subjects that one needs to have a good grasp on to have that low level (i assume) knowledge?
I am pretty much self-taught as far as programming is concerned, so my approach to it is fairly idiosyncratic. I was originally interested in internal workings of operating systems, device drivers, and compilers, and that background definitely helped me so far. Now only if I could get this assembly version right...
|
Gateless Gate Sharp, an open-source ETH/XMR miner: http://bit.ly/2rJ2x4VBTC: 1BHwDWVerUTiKxhHPf2ubqKKiBMiKQGomZ
|
|
|
cryptominer420
|
|
January 18, 2017, 03:04:17 AM Last edit: January 18, 2017, 03:53:48 AM by cryptominer420 |
|
Well guys using Version 1.1 on suprnova on my 5 bios modded powercolor RX 470's clocked at 1250Mhz core and 1720 ram if I set a static difficulty of 1500 my reported average hashrate on the pool side is 1152h/s over 1 hr so that would break down to a effective hash of 230.4h/s per card.
|
╖╖ ╓╖╖ ╖╖╖ ,╖╖─ ║▒▒ ╢▒╜,@╢▒▒▒║ ╓╣╢╝║║*║▒╢ ╢▒╣ ]▒▒,╢▒╢`]▒▒░╢▒▒╖ ╢▒ ╥╢▒▒▒╢ @║╝╢▒╜ ▒▒Ñ╝╝╢▒▒]▒▒` ]▒▒`╙╢╢║║╖┌▒▒╣▒╢▒▒ ╢▒╝▒▒▒ ╢▒╜║▒╢▒▒╢▒░║▒╜ ╥╥─╙╢╢╢║N ║▒╢ ▒▒╜ ║▒▒╢▒▒╣╓╢@@╢╢╜║▒║ ╢▒╜ ║▒▒ ╙▒▒,║▒▒░▒╣ ║▒▒║ ╢▒▒╢▒▒▒»@╢@@╢╜
|
. | | |
█ █ █ █ █ █ █ █ █ █ █ █ | | | | | |
█ █ █ █ █ █ █ █ █ █ █ █ |
|
|
|
zawawa (OP)
Sr. Member
Offline
Activity: 728
Merit: 304
Miner Developer
|
|
January 18, 2017, 06:14:03 AM |
|
I was playing with the cryptonight kernel for a change and was able to get it to work on GTX 1060 with "--gpu-threads 1". I also dug out a NeoScrypt kernel I optimized a while back, which runs at 780kh/s on RX 480. I will include them in the next version.
|
Gateless Gate Sharp, an open-source ETH/XMR miner: http://bit.ly/2rJ2x4VBTC: 1BHwDWVerUTiKxhHPf2ubqKKiBMiKQGomZ
|
|
|
toptek
Legendary
Offline
Activity: 1274
Merit: 1000
|
|
January 18, 2017, 06:16:58 AM Last edit: January 18, 2017, 07:17:32 AM by toptek |
|
Well guys using Version 1.1 on suprnova on my 5 bios modded powercolor RX 470's clocked at 1250Mhz core and 1720 ram if I set a static difficulty of 1500 my reported average hashrate on the pool side is 1152h/s over 1 hr so that would break down to a effective hash of 230.4h/s per card.
SO i need to remolded my 470 to get 230 like CM with the fee off at stock setting ? . no complaints if i have to i have to ... ID rather use Gateless Gate. would you be willing to post your Exact setting ? . Above all before some one miss judges me , I m not complaining or against paying Fees that said i know it is a little bit longer and zawawa is doing his best to catch up but if some one is actually getting such high speeds please share with us your exact setting and this is not a demand ,, if you willing and don't mine other wise cool if not .
|
|
|
|
chronek
Sr. Member
Offline
Activity: 273
Merit: 250
BD People Are Legend
|
|
January 18, 2017, 07:41:19 AM Last edit: January 18, 2017, 08:25:54 AM by chronek |
|
So tell me, wise one, how can any developer get >40MB of data to fit into the L2 cache on something like a Rx 480? Do you even know how big the cache is?
Who say about get all data at once, just do it asynchronus, when table is filling, table is using at the same time, like a buffer. Now memory is used when cores need it, but they not need it all time, so that why it not use all mcu, but when would be buffer table it would be filled all the time in separate process, even in that gaps when cores not need use memory, and more data would be to process leter when cores want to, it can be few kb table only, but benefits would be faster access and less waiting, but it would need redesign all working process, every calculation would be need push data to table, and get result from second, there would be each unit doing only own job and only when data is in table, it would not be simple sorry my english is not good and i can not express everything what i want
|
|
|
|
nerdralph
|
|
January 18, 2017, 11:26:34 AM |
|
So tell me, wise one, how can any developer get >40MB of data to fit into the L2 cache on something like a Rx 480? Do you even know how big the cache is?
Who say about get all data at once, just do it asynchronus, when table is filling, table is using at the same time, like a buffer. Now memory is used when cores need it, but they not need it all time, so that why it not use all mcu, but when would be buffer table it would be filled all the time in separate process, even in that gaps when cores not need use memory, and more data would be to process leter when cores want to, it can be few kb table only, but benefits would be faster access and less waiting, but it would need redesign all working process, every calculation would be need push data to table, and get result from second, there would be each unit doing only own job and only when data is in table, it would not be simple sorry my english is not good and i can not express everything what i want No problem, you've clearly expressed that you are talking out your arse.
|
|
|
|
chronek
Sr. Member
Offline
Activity: 273
Merit: 250
BD People Are Legend
|
|
January 18, 2017, 11:42:12 AM |
|
No problem, you've clearly expressed that you are talking out your arse.
and you have expressed you could not think creatively, and you prefer to reject any new thoughts
|
|
|
|
OhGodAGirl
Full Member
Offline
Activity: 199
Merit: 108
Look, I'm really not that interesting. Promise.
|
|
January 18, 2017, 11:44:39 AM |
|
No problem, you've clearly expressed that you are talking out your arse.
and you have expressed you could not think creatively, and you prefer to reject any new thoughts There's nothing to do with creative thinking here - there's a limit to how much data can fit on the cache. That's it. You can't add more. You're not being creative, you're being illogical.
|
|
|
|
chronek
Sr. Member
Offline
Activity: 273
Merit: 250
BD People Are Legend
|
|
January 18, 2017, 12:07:48 PM |
|
There's nothing to do with creative thinking here - there's a limit to how much data can fit on the cache. That's it. You can't add more. You're not being creative, you're being illogical.
You didnt read, he didn't too, 4kb table can fit in cache
|
|
|
|
laik2
|
|
January 18, 2017, 03:40:11 PM |
|
There's nothing to do with creative thinking here - there's a limit to how much data can fit on the cache. That's it. You can't add more. You're not being creative, you're being illogical.
You didnt read, he didn't too, 4kb table can fit in cache I may not understand much of OpenCL or graphics at all but as network engeneer I still think that hardware limits cannot be exceeded for the purpuse mentioned above. There is no queue mechanism or alike that can be used to queue solution rate AFAIK. Protocol specifics doesn't allow workarounds.
|
|
|
|
th00ber
|
|
January 18, 2017, 03:43:24 PM |
|
vcruntime140.dll missing both Win7 and Win10 I tried to reinstall VC Redist / DL missing lib
But not working ... any tips on how to run this in windows ?
That's pretty weird... Are you using a 32-bit version of Windows by any chance? 64 bits both... Have you à release with the full DLL dependencies ?
|
|
|
|
joaocha
|
|
January 18, 2017, 03:46:22 PM |
|
vcruntime140.dll missing both Win7 and Win10 I tried to reinstall VC Redist / DL missing lib
But not working ... any tips on how to run this in windows ?
That's pretty weird... Are you using a 32-bit version of Windows by any chance? 64 bits both... Have you à release with the full DLL dependencies ? https://www.microsoft.com/en-us/download/confirmation.aspx?id=48145
|
|
|
|
OhGodAGirl
Full Member
Offline
Activity: 199
Merit: 108
Look, I'm really not that interesting. Promise.
|
|
January 18, 2017, 03:53:13 PM |
|
There's nothing to do with creative thinking here - there's a limit to how much data can fit on the cache. That's it. You can't add more. You're not being creative, you're being illogical.
You didnt read, he didn't too, 4kb table can fit in cache I may not understand much of OpenCL or graphics at all but as network engeneer I still think that hardware limits cannot be exceeded for the purpuse mentioned above. There is no queue mechanism or alike that can be used to queue solution rate AFAIK. Protocol specifics doesn't allow workarounds. You're correct.
|
|
|
|
chronek
Sr. Member
Offline
Activity: 273
Merit: 250
BD People Are Legend
|
|
January 18, 2017, 04:18:25 PM Last edit: January 18, 2017, 04:33:16 PM by chronek |
|
I may not understand much of OpenCL or graphics at all but as network engeneer I still think that hardware limits cannot be exceeded for the purpuse mentioned above. There is no queue mechanism or alike that can be used to queue solution rate AFAIK. Protocol specifics doesn't allow workarounds.
Yes we can not exceed hardware limits, but for now miner use 63% of mcu and 80% of power, so it not utilize full hardware capacity, why? I suspect that miner computation have 2 phases, one when fetching from memory - cores wait (are blocked), and second cores compute (not use memory). in simple way: now threads do: [[external memory read to registers][comp][external memory write result]][[external memory read to registers][comp][external memory write result]] so one part cores waiting, second part memory not used... i want (all at the same time): thread1: [external memory read to cache][external memory read to cache][external memory read to cache] thread2: [[read cache][comp][write cache]][[read cache][comp][write cache]][[read cache][comp][write cache]] thread3: [external memory write result][external memory write result][external memory write result] yes it have flaws in logic, but why i can not discuss that?
|
|
|
|
cryptominer420
|
|
January 18, 2017, 04:20:34 PM |
|
Well guys using Version 1.1 on suprnova on my 5 bios modded powercolor RX 470's clocked at 1250Mhz core and 1720 ram if I set a static difficulty of 1500 my reported average hashrate on the pool side is 1152h/s over 1 hr so that would break down to a effective hash of 230.4h/s per card.
SO i need to remolded my 470 to get 230 like CM with the fee off at stock setting ? . no complaints if i have to i have to ... ID rather use Gateless Gate. would you be willing to post your Exact setting ? . Above all before some one miss judges me , I m not complaining or against paying Fees that said i know it is a little bit longer and zawawa is doing his best to catch up but if some one is actually getting such high speeds please share with us your exact setting and this is not a demand ,, if you willing and don't mine other wise cool if not . My cards all have custom memory straps, I use Driver 16.6.2. Power is +5 core runs at 1250 mem runs on 1720 The actual miner reports a average of 201h/s for each card but when I set to mine at a static difficulty on suprnova of 1500 I am getting a average hashrate on the pool of over 1150h/s for my 5 card rig. I do have to say that i killed my 6th rx470 finding the ideal memory straps for my powercolor cards. First thing to try is setting your difficulty to 1500 using d=1500 as your password on suprnova pools and see what you get. I am not planning to share my straps as I would feel terrible if some one used them and fried their card.
|
╖╖ ╓╖╖ ╖╖╖ ,╖╖─ ║▒▒ ╢▒╜,@╢▒▒▒║ ╓╣╢╝║║*║▒╢ ╢▒╣ ]▒▒,╢▒╢`]▒▒░╢▒▒╖ ╢▒ ╥╢▒▒▒╢ @║╝╢▒╜ ▒▒Ñ╝╝╢▒▒]▒▒` ]▒▒`╙╢╢║║╖┌▒▒╣▒╢▒▒ ╢▒╝▒▒▒ ╢▒╜║▒╢▒▒╢▒░║▒╜ ╥╥─╙╢╢╢║N ║▒╢ ▒▒╜ ║▒▒╢▒▒╣╓╢@@╢╢╜║▒║ ╢▒╜ ║▒▒ ╙▒▒,║▒▒░▒╣ ║▒▒║ ╢▒▒╢▒▒▒»@╢@@╢╜
|
. | | |
█ █ █ █ █ █ █ █ █ █ █ █ | | | | | |
█ █ █ █ █ █ █ █ █ █ █ █ |
|
|
|
|