Bitcoin Forum
December 09, 2016, 01:52:05 PM *
News: To be able to use the next phase of the beta forum software, please ensure that your email address is correct/functional.
 
   Home   Help Search Donate Login Register  
Pages: [1]
  Print  
Author Topic: Core, Bus, Cache... What is more important for mining?  (Read 1930 times)
Come-from-Beyond
Legendary
*
Online Online

Activity: 1414

Newbie


View Profile
December 25, 2011, 08:23:29 PM
 #1

I tested my miner for LTC an hour ago. The main task was to adjust a ratio of hashes per GHz. I lowered it to 1.87 (1870 hashes per 1 GHz). And it was 1.8-1.9 on most machines. But 1 computer had 2.1 ratio and I noticed no difference in hardware if compare to other machines. That puzzled me. Could u look at this screenshot and tell me what is so special in this machine? I have no ideas.

1481291525
Hero Member
*
Offline Offline

Posts: 1481291525

View Profile Personal Message (Offline)

Ignore
1481291525
Reply with quote  #2

1481291525
Report to moderator
1481291525
Hero Member
*
Offline Offline

Posts: 1481291525

View Profile Personal Message (Offline)

Ignore
1481291525
Reply with quote  #2

1481291525
Report to moderator
1481291525
Hero Member
*
Offline Offline

Posts: 1481291525

View Profile Personal Message (Offline)

Ignore
1481291525
Reply with quote  #2

1481291525
Report to moderator
"With e-currency based on cryptographic proof, without the need to trust a third party middleman, money can be secure and transactions effortless." -- Satoshi
Advertised sites are not endorsed by the Bitcoin Forum. They may be unsafe, untrustworthy, or illegal in your jurisdiction. Advertise here.
1481291525
Hero Member
*
Offline Offline

Posts: 1481291525

View Profile Personal Message (Offline)

Ignore
1481291525
Reply with quote  #2

1481291525
Report to moderator
0ni0ns
Jr. Member
*
Offline Offline

Activity: 40


Allium cepa


View Profile
December 25, 2011, 10:14:00 PM
 #2

Instruction set/pipelining/instructions per cycle are just as important as GHz when looking at CPUs... What were the other ones? The i5 probably takes less cycles per hash.

1onioniLaf4S6iYC8cUMuineLJtizJZoM

PGP Public Key
TorPM: 0ni0ns
ssvb
Jr. Member
*
Offline Offline

Activity: 39


View Profile
December 25, 2011, 10:51:43 PM
 #3

I tested my miner for LTC an hour ago. The main task was to adjust a ratio of hashes per GHz. I lowered it to 1.87 (1870 hashes per 1 GHz). And it was 1.8-1.9 on most machines. But 1 computer had 2.1 ratio and I noticed no difference in hardware if compare to other machines. That puzzled me. Could u look at this screenshot and tell me what is so special in this machine? I have no ideas.
It could be also the effect of turbo boost. Can you share any additional details about your implementation? Are you calculating more than one hash in parallel? If doing so, it is important for the working set not to exceed L2 cache size per CPU core.

Right now with my own tweaks coincidentally developed just today (for now using intrinsics for prototyping purposes), I'm getting ~3.4 khash/s for calculating one hash at once and ~5.1 khash/s when calculating two hashes at once when running a single thread on Intel Core i7 860 @2.8GHz (turbo boost disabled). The hand tuned pooler's assembly code runs at ~4.0 khash/s in the same conditions on my machine. But if hyperthreading comes into action, it ruins everything Smiley For total 8 threads both my and pooler's implementations get roughly the same ~3 khash/s per thread. Two hashes at once use double working set (~256K of memory) and that's exactly the size of L2 cache. There is no room for the second hardware thread on the same core and hyperthreading is apparently thrashing L2 cache, killing the performance benefits. Converting the code to hand tuned assembly may turn the tables though.

BTW, your sandybridge cpu should also have an advantage per mhz over my nehalem because it does not suffer from register read stalls which are a pest under registers pressure.
Come-from-Beyond
Legendary
*
Online Online

Activity: 1414

Newbie


View Profile
December 26, 2011, 05:31:29 AM
 #4

Instruction set/pipelining/instructions per cycle are just as important as GHz when looking at CPUs... What were the other ones? The i5 probably takes less cycles per hash.

I recall there were no i5. Only this one...
Come-from-Beyond
Legendary
*
Online Online

Activity: 1414

Newbie


View Profile
December 26, 2011, 05:37:57 AM
 #5

It could be also the effect of turbo boost. Can you share any additional details about your implementation? Are you calculating more than one hash in parallel? If doing so, it is important for the working set not to exceed L2 cache size per CPU core.

Right now with my own tweaks coincidentally developed just today (for now using intrinsics for prototyping purposes), I'm getting ~3.4 khash/s for calculating one hash at once and ~5.1 khash/s when calculating two hashes at once when running a single thread on Intel Core i7 860 @2.8GHz (turbo boost disabled). The hand tuned pooler's assembly code runs at ~4.0 khash/s in the same conditions on my machine. But if hyperthreading comes into action, it ruins everything Smiley For total 8 threads both my and pooler's implementations get roughly the same ~3 khash/s per thread. Two hashes at once use double working set (~256K of memory) and that's exactly the size of L2 cache. There is no room for the second hardware thread on the same core and hyperthreading is apparently thrashing L2 cache, killing the performance benefits. Converting the code to hand tuned assembly may turn the tables though.

BTW, your sandybridge cpu should also have an advantage per mhz over my nehalem because it does not suffer from register read stalls which are a pest under registers pressure.

I calculate 2 hashes at once to avoid register/memory read stalls (xmm0-xmm3 for 1st hash and xmm4-xmm7 for 2nd one). So there was something else that increased performance. After u told about cache I noticed that computer from the picture had very big cache size, 4 x 32 instead of 2 x 32 like it was on other machines. I suspect that cache size is more important.
ssvb
Jr. Member
*
Offline Offline

Activity: 39


View Profile
December 26, 2011, 07:54:52 AM
 #6

I calculate 2 hashes at once to avoid register/memory read stalls (xmm0-xmm3 for 1st hash and xmm4-xmm7 for 2nd one).
OK, this explains why we get similar performance improvements (using pooler's code as a baseline for comparison). Do you have any other trick in your sleeve? Of course unless you want to keep it secret Wink Anyway, I got an impressions that you managed to somehow simplify the algorithm (for N = 1024, r = 1, p = 1 configuration) and reduce the number of arithmetic operations based on the information from this post.

Quote
So there was something else that increased performance.
My guess is that it's because sandybridge is a nice microarchitecture improvement over previous processors (eliminated register read stalls and instruction decoder bottlenecks). Agner Fog explains it quite well. Various benchmarkers/reviewers also have confirmed better performance per MHz.

Quote
After u told about cache I noticed that computer from the picture had very big cache size, 4 x 32 instead of 2 x 32 like it was on other machines. I suspect that cache size is more important.
I think 4 x 32 just means four cores with 32K of L1 each. The other processor was likely dualcore. 32K much smaller than 128K needed for hash calculation, which means that L2 cache plays a more significant role. Because each core has 256K of L2 cache, parallel computation of two caches at once makes a lot of sense at least for the processors without hyperthreading.

What are your plans regarding you miner?
Come-from-Beyond
Legendary
*
Online Online

Activity: 1414

Newbie


View Profile
December 26, 2011, 08:13:45 AM
 #7

I calculate 2 hashes at once to avoid register/memory read stalls (xmm0-xmm3 for 1st hash and xmm4-xmm7 for 2nd one).
OK, this explains why we get similar performance improvements (using pooler's code as a baseline for comparison). Do you have any other trick in your sleeve? Of course unless you want to keep it secret Wink Anyway, I got an impressions that you managed to somehow simplify the algorithm (for N = 1024, r = 1, p = 1 configuration) and reduce the number of arithmetic operations based on the information from this post.

Quote
So there was something else that increased performance.
My guess is that it's because sandybridge is a nice microarchitecture improvement over previous processors (eliminated register read stalls and instruction decoder bottlenecks). Agner Fog explains it quite well. Various benchmarkers/reviewers also have confirmed better performance per MHz.

Quote
After u told about cache I noticed that computer from the picture had very big cache size, 4 x 32 instead of 2 x 32 like it was on other machines. I suspect that cache size is more important.
I think 4 x 32 just means four cores with 32K of L1 each. The other processor was likely dualcore. 32K much smaller than 128K needed for hash calculation, which means that L2 cache plays a more significant role. Because each core has 256K of L2 cache, parallel computation of two caches at once makes a lot of sense at least for the processors without hyperthreading.

What are your plans regarding you miner?

Yes, I removed some redundant calculations from salsa (not all though, I use them to tweak the ratio per GHz). It's the only trick I use. All others are just standard optimization techniques.

I'd like to publish my miner but I wish to have it bonded to a particular pool, so now I'm looking for a pool owner who could cooperate with me. Do u know anyone btw?
Also I'm not going to create Linux or MacOS editions of the miner to avoid situations when all miners (human) will use only my miner (soft).
ssvb
Jr. Member
*
Offline Offline

Activity: 39


View Profile
December 26, 2011, 02:29:05 PM
 #8

Yes, I removed some redundant calculations from salsa (not all though, I use them to tweak the ratio per GHz). It's the only trick I use. All others are just standard optimization techniques.
I'm not sure if I fully understand this part. What does "tweak the ratio per GHz" mean?

Quote
I'd like to publish my miner but I wish to have it bonded to a particular pool, so now I'm looking for a pool owner who could cooperate with me. Do u know anyone btw?
Also I'm not going to create Linux or MacOS editions of the miner to avoid situations when all miners (human) will use only my miner (soft).
Have you noticed that pooler has also updated his miner and introduced processing of two hashes at once? And actually his commit at github seems to be 2 days old already. Does your miner have any speed advantage over it? After all, your benchmark numbers look pretty normal for "two hashes at once" processing. How much does the algorithmic optimization actually contribute?
Come-from-Beyond
Legendary
*
Online Online

Activity: 1414

Newbie


View Profile
December 26, 2011, 04:03:16 PM
 #9

>> I'm not sure if I fully understand this part. What does "tweak the ratio per GHz" mean?

When i said "tweak" i meant "lowering". Yes, i lower mining rate, coz there is no need to publish very fast miner. It's enough for me if it's just 10% faster than any other miner.

>> Have you noticed that pooler has also updated his miner and introduced processing of two hashes at once?

No, i didn't know that. If he didn't follow my advice to calculate 2 hashes in maneur when instructions are doubled to avoid cpu stalls, then he still have a trick to use.

>> Does your miner have any speed advantage over it?

Current version calculates 2000 hashes per 1 GHz. If pooler's miner has the same ratio, then there is no any speed advantage using my miner. I can't test it by myself coz Windows version of pooler's miner crashes on my computer.

>> After all, your benchmark numbers look pretty normal for "two hashes at once" processing.

Yes, seems so.

>> How much does the algorithmic optimization actually contribute?

Not much, when i remove all unnecessary code i have 2500 hashes per 1 GHz.
ineededausername
Hero Member
*****
Offline Offline

Activity: 784


bitcoin hundred-aire


View Profile
December 27, 2011, 11:49:12 PM
 #10

When i said "tweak" i meant "lowering". Yes, i lower mining rate, coz there is no need to publish very fast miner. It's enough for me if it's just 10% faster than any other miner.

Why would you lower it when you can make it better and when you're going to give it out for free anyways?  That seems really counterintuitive.

(BFL)^2 < 0
Graet
VIP
Legendary
*
Offline Offline

Activity: 980



View Profile WWW
December 28, 2011, 09:26:55 AM
 #11



I'd like to publish my miner but I wish to have it bonded to a particular pool, so now I'm looking for a pool owner who could cooperate with me. Do u know anyone btw?
Also I'm not going to create Linux or MacOS editions of the miner to avoid situations when all miners (human) will use only my miner (soft).
why "bond to a pool" and not allow to be used by all

all of the pools have threads in this forum, you could post there or Private message one? (they dont hide) Cheesy

in my experience - even when there is an excellent miner (soft) many users (for wehatever reason) will continue to use the one they have or other miners they think work better for them, if your software is truly revolutionary someone will port it - even if you dont Smiley

| Ozcoin Pooled Mining Pty Ltd https://ozcoin.net Double Geometric Reward System https://lc.ozcoin.net for Litecoin mining DGM| https://crowncloud.net VPS and Dedicated Servers for the BTC community
Come-from-Beyond
Legendary
*
Online Online

Activity: 1414

Newbie


View Profile
December 28, 2011, 03:35:00 PM
 #12

When i said "tweak" i meant "lowering". Yes, i lower mining rate, coz there is no need to publish very fast miner. It's enough for me if it's just 10% faster than any other miner.

Why would you lower it when you can make it better and when you're going to give it out for free anyways?  That seems really counterintuitive.

If everyone uses a fast miner this is the same as everyone uses a slow one coz of DIFFICULTY adjusting. So no point to give it to everyone.
Come-from-Beyond
Legendary
*
Online Online

Activity: 1414

Newbie


View Profile
December 28, 2011, 03:37:56 PM
 #13

why "bond to a pool" and not allow to be used by all

Just to earn some money for beer.
bitlane
Internet detective
Sr. Member
****
Offline Offline

Activity: 462


I heart thebaron


View Profile
January 19, 2012, 07:26:42 PM
 #14

If everyone uses a fast miner this is the same as everyone uses a slow one coz of DIFFICULTY adjusting. So no point to give it to everyone.

Why not offer it up for sale then ? require a key or some sort of license validation and make some cash from it.

There is a pretty fancy BTC Mining Utility on this forum that is the flashiest I have ever seen and looks awesome, but it is also tied to a pool and unfortunately, that pool is not as popular as I am sure the Pool owner would like it to be......regardless of his awesome performing/looking software.
The 'Pool Binding' idea is bad....just sell it.

CAMOPEJB
Full Member
***
Offline Offline

Activity: 132



View Profile
January 20, 2012, 06:08:55 PM
 #15

why "bond to a pool" and not allow to be used by all

Just to earn some money for beer.

I would recommend http://pool-x.eu

Contact info:
skype: ag2x3k
irc:#pool-x.eu @ Freenode iRC ( http://webchat.freenode.net/?channels=#pool-x.eu )
web: g2x3k@bitcointalk.org
Come-from-Beyond
Legendary
*
Online Online

Activity: 1414

Newbie


View Profile
January 20, 2012, 08:05:26 PM
 #16

Thx, but I switched to other project.
Pages: [1]
  Print  
 
Jump to:  

Sponsored by , a Bitcoin-accepting VPN.
Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!