Bitcoin Forum
November 03, 2024, 11:40:53 PM *
News: Latest Bitcoin Core release: 28.0 [Torrent]
 
   Home   Help Search Login Register More  
Warning: One or more bitcointalk.org users have reported that they believe that the creator of this topic displays some red flags which make them high-risk. (Login to see the detailed trust ratings.) While the bitcointalk.org administration does not verify such claims, you should proceed with extreme caution.
Pages: [1]
  Print  
Author Topic: Core, Bus, Cache... What is more important for mining?  (Read 2184 times)
Come-from-Beyond (OP)
Legendary
*
Offline Offline

Activity: 2142
Merit: 1010

Newbie


View Profile
December 25, 2011, 08:23:29 PM
 #1

I tested my miner for LTC an hour ago. The main task was to adjust a ratio of hashes per GHz. I lowered it to 1.87 (1870 hashes per 1 GHz). And it was 1.8-1.9 on most machines. But 1 computer had 2.1 ratio and I noticed no difference in hardware if compare to other machines. That puzzled me. Could u look at this screenshot and tell me what is so special in this machine? I have no ideas.

0ni0ns
Newbie
*
Offline Offline

Activity: 40
Merit: 0



View Profile
December 25, 2011, 10:14:00 PM
 #2

Instruction set/pipelining/instructions per cycle are just as important as GHz when looking at CPUs... What were the other ones? The i5 probably takes less cycles per hash.
ssvb
Newbie
*
Offline Offline

Activity: 39
Merit: 0


View Profile
December 25, 2011, 10:51:43 PM
 #3

I tested my miner for LTC an hour ago. The main task was to adjust a ratio of hashes per GHz. I lowered it to 1.87 (1870 hashes per 1 GHz). And it was 1.8-1.9 on most machines. But 1 computer had 2.1 ratio and I noticed no difference in hardware if compare to other machines. That puzzled me. Could u look at this screenshot and tell me what is so special in this machine? I have no ideas.
It could be also the effect of turbo boost. Can you share any additional details about your implementation? Are you calculating more than one hash in parallel? If doing so, it is important for the working set not to exceed L2 cache size per CPU core.

Right now with my own tweaks coincidentally developed just today (for now using intrinsics for prototyping purposes), I'm getting ~3.4 khash/s for calculating one hash at once and ~5.1 khash/s when calculating two hashes at once when running a single thread on Intel Core i7 860 @2.8GHz (turbo boost disabled). The hand tuned pooler's assembly code runs at ~4.0 khash/s in the same conditions on my machine. But if hyperthreading comes into action, it ruins everything Smiley For total 8 threads both my and pooler's implementations get roughly the same ~3 khash/s per thread. Two hashes at once use double working set (~256K of memory) and that's exactly the size of L2 cache. There is no room for the second hardware thread on the same core and hyperthreading is apparently thrashing L2 cache, killing the performance benefits. Converting the code to hand tuned assembly may turn the tables though.

BTW, your sandybridge cpu should also have an advantage per mhz over my nehalem because it does not suffer from register read stalls which are a pest under registers pressure.
Come-from-Beyond (OP)
Legendary
*
Offline Offline

Activity: 2142
Merit: 1010

Newbie


View Profile
December 26, 2011, 05:31:29 AM
 #4

Instruction set/pipelining/instructions per cycle are just as important as GHz when looking at CPUs... What were the other ones? The i5 probably takes less cycles per hash.

I recall there were no i5. Only this one...
Come-from-Beyond (OP)
Legendary
*
Offline Offline

Activity: 2142
Merit: 1010

Newbie


View Profile
December 26, 2011, 05:37:57 AM
 #5

It could be also the effect of turbo boost. Can you share any additional details about your implementation? Are you calculating more than one hash in parallel? If doing so, it is important for the working set not to exceed L2 cache size per CPU core.

Right now with my own tweaks coincidentally developed just today (for now using intrinsics for prototyping purposes), I'm getting ~3.4 khash/s for calculating one hash at once and ~5.1 khash/s when calculating two hashes at once when running a single thread on Intel Core i7 860 @2.8GHz (turbo boost disabled). The hand tuned pooler's assembly code runs at ~4.0 khash/s in the same conditions on my machine. But if hyperthreading comes into action, it ruins everything Smiley For total 8 threads both my and pooler's implementations get roughly the same ~3 khash/s per thread. Two hashes at once use double working set (~256K of memory) and that's exactly the size of L2 cache. There is no room for the second hardware thread on the same core and hyperthreading is apparently thrashing L2 cache, killing the performance benefits. Converting the code to hand tuned assembly may turn the tables though.

BTW, your sandybridge cpu should also have an advantage per mhz over my nehalem because it does not suffer from register read stalls which are a pest under registers pressure.

I calculate 2 hashes at once to avoid register/memory read stalls (xmm0-xmm3 for 1st hash and xmm4-xmm7 for 2nd one). So there was something else that increased performance. After u told about cache I noticed that computer from the picture had very big cache size, 4 x 32 instead of 2 x 32 like it was on other machines. I suspect that cache size is more important.
ssvb
Newbie
*
Offline Offline

Activity: 39
Merit: 0


View Profile
December 26, 2011, 07:54:52 AM
 #6

I calculate 2 hashes at once to avoid register/memory read stalls (xmm0-xmm3 for 1st hash and xmm4-xmm7 for 2nd one).
OK, this explains why we get similar performance improvements (using pooler's code as a baseline for comparison). Do you have any other trick in your sleeve? Of course unless you want to keep it secret Wink Anyway, I got an impressions that you managed to somehow simplify the algorithm (for N = 1024, r = 1, p = 1 configuration) and reduce the number of arithmetic operations based on the information from this post.

Quote
So there was something else that increased performance.
My guess is that it's because sandybridge is a nice microarchitecture improvement over previous processors (eliminated register read stalls and instruction decoder bottlenecks). Agner Fog explains it quite well. Various benchmarkers/reviewers also have confirmed better performance per MHz.

Quote
After u told about cache I noticed that computer from the picture had very big cache size, 4 x 32 instead of 2 x 32 like it was on other machines. I suspect that cache size is more important.
I think 4 x 32 just means four cores with 32K of L1 each. The other processor was likely dualcore. 32K much smaller than 128K needed for hash calculation, which means that L2 cache plays a more significant role. Because each core has 256K of L2 cache, parallel computation of two caches at once makes a lot of sense at least for the processors without hyperthreading.

What are your plans regarding you miner?
Come-from-Beyond (OP)
Legendary
*
Offline Offline

Activity: 2142
Merit: 1010

Newbie


View Profile
December 26, 2011, 08:13:45 AM
 #7

I calculate 2 hashes at once to avoid register/memory read stalls (xmm0-xmm3 for 1st hash and xmm4-xmm7 for 2nd one).
OK, this explains why we get similar performance improvements (using pooler's code as a baseline for comparison). Do you have any other trick in your sleeve? Of course unless you want to keep it secret Wink Anyway, I got an impressions that you managed to somehow simplify the algorithm (for N = 1024, r = 1, p = 1 configuration) and reduce the number of arithmetic operations based on the information from this post.

Quote
So there was something else that increased performance.
My guess is that it's because sandybridge is a nice microarchitecture improvement over previous processors (eliminated register read stalls and instruction decoder bottlenecks). Agner Fog explains it quite well. Various benchmarkers/reviewers also have confirmed better performance per MHz.

Quote
After u told about cache I noticed that computer from the picture had very big cache size, 4 x 32 instead of 2 x 32 like it was on other machines. I suspect that cache size is more important.
I think 4 x 32 just means four cores with 32K of L1 each. The other processor was likely dualcore. 32K much smaller than 128K needed for hash calculation, which means that L2 cache plays a more significant role. Because each core has 256K of L2 cache, parallel computation of two caches at once makes a lot of sense at least for the processors without hyperthreading.

What are your plans regarding you miner?

Yes, I removed some redundant calculations from salsa (not all though, I use them to tweak the ratio per GHz). It's the only trick I use. All others are just standard optimization techniques.

I'd like to publish my miner but I wish to have it bonded to a particular pool, so now I'm looking for a pool owner who could cooperate with me. Do u know anyone btw?
Also I'm not going to create Linux or MacOS editions of the miner to avoid situations when all miners (human) will use only my miner (soft).
ssvb
Newbie
*
Offline Offline

Activity: 39
Merit: 0


View Profile
December 26, 2011, 02:29:05 PM
 #8

Yes, I removed some redundant calculations from salsa (not all though, I use them to tweak the ratio per GHz). It's the only trick I use. All others are just standard optimization techniques.
I'm not sure if I fully understand this part. What does "tweak the ratio per GHz" mean?

Quote
I'd like to publish my miner but I wish to have it bonded to a particular pool, so now I'm looking for a pool owner who could cooperate with me. Do u know anyone btw?
Also I'm not going to create Linux or MacOS editions of the miner to avoid situations when all miners (human) will use only my miner (soft).
Have you noticed that pooler has also updated his miner and introduced processing of two hashes at once? And actually his commit at github seems to be 2 days old already. Does your miner have any speed advantage over it? After all, your benchmark numbers look pretty normal for "two hashes at once" processing. How much does the algorithmic optimization actually contribute?
Come-from-Beyond (OP)
Legendary
*
Offline Offline

Activity: 2142
Merit: 1010

Newbie


View Profile
December 26, 2011, 04:03:16 PM
 #9

>> I'm not sure if I fully understand this part. What does "tweak the ratio per GHz" mean?

When i said "tweak" i meant "lowering". Yes, i lower mining rate, coz there is no need to publish very fast miner. It's enough for me if it's just 10% faster than any other miner.

>> Have you noticed that pooler has also updated his miner and introduced processing of two hashes at once?

No, i didn't know that. If he didn't follow my advice to calculate 2 hashes in maneur when instructions are doubled to avoid cpu stalls, then he still have a trick to use.

>> Does your miner have any speed advantage over it?

Current version calculates 2000 hashes per 1 GHz. If pooler's miner has the same ratio, then there is no any speed advantage using my miner. I can't test it by myself coz Windows version of pooler's miner crashes on my computer.

>> After all, your benchmark numbers look pretty normal for "two hashes at once" processing.

Yes, seems so.

>> How much does the algorithmic optimization actually contribute?

Not much, when i remove all unnecessary code i have 2500 hashes per 1 GHz.
ineededausername
Hero Member
*****
Offline Offline

Activity: 784
Merit: 1000


bitcoin hundred-aire


View Profile
December 27, 2011, 11:49:12 PM
 #10

When i said "tweak" i meant "lowering". Yes, i lower mining rate, coz there is no need to publish very fast miner. It's enough for me if it's just 10% faster than any other miner.

Why would you lower it when you can make it better and when you're going to give it out for free anyways?  That seems really counterintuitive.

(BFL)^2 < 0
Graet
VIP
Legendary
*
Offline Offline

Activity: 980
Merit: 1001



View Profile WWW
December 28, 2011, 09:26:55 AM
 #11



I'd like to publish my miner but I wish to have it bonded to a particular pool, so now I'm looking for a pool owner who could cooperate with me. Do u know anyone btw?
Also I'm not going to create Linux or MacOS editions of the miner to avoid situations when all miners (human) will use only my miner (soft).
why "bond to a pool" and not allow to be used by all

all of the pools have threads in this forum, you could post there or Private message one? (they dont hide) Cheesy

in my experience - even when there is an excellent miner (soft) many users (for wehatever reason) will continue to use the one they have or other miners they think work better for them, if your software is truly revolutionary someone will port it - even if you dont Smiley

| Ozcoin Pooled Mining Pty Ltd https://ozcoin.net Double Geometric Reward System https://lc.ozcoin.net for Litecoin mining DGM| https://crowncloud.net VPS and Dedicated Servers for the BTC community
Come-from-Beyond (OP)
Legendary
*
Offline Offline

Activity: 2142
Merit: 1010

Newbie


View Profile
December 28, 2011, 03:35:00 PM
 #12

When i said "tweak" i meant "lowering". Yes, i lower mining rate, coz there is no need to publish very fast miner. It's enough for me if it's just 10% faster than any other miner.

Why would you lower it when you can make it better and when you're going to give it out for free anyways?  That seems really counterintuitive.

If everyone uses a fast miner this is the same as everyone uses a slow one coz of DIFFICULTY adjusting. So no point to give it to everyone.
Come-from-Beyond (OP)
Legendary
*
Offline Offline

Activity: 2142
Merit: 1010

Newbie


View Profile
December 28, 2011, 03:37:56 PM
 #13

why "bond to a pool" and not allow to be used by all

Just to earn some money for beer.
bitlane
Internet detective
Sr. Member
****
Offline Offline

Activity: 462
Merit: 250


I heart thebaron


View Profile
January 19, 2012, 07:26:42 PM
 #14

If everyone uses a fast miner this is the same as everyone uses a slow one coz of DIFFICULTY adjusting. So no point to give it to everyone.

Why not offer it up for sale then ? require a key or some sort of license validation and make some cash from it.

There is a pretty fancy BTC Mining Utility on this forum that is the flashiest I have ever seen and looks awesome, but it is also tied to a pool and unfortunately, that pool is not as popular as I am sure the Pool owner would like it to be......regardless of his awesome performing/looking software.
The 'Pool Binding' idea is bad....just sell it.

CAMOPEJB
Full Member
***
Offline Offline

Activity: 132
Merit: 100



View Profile
January 20, 2012, 06:08:55 PM
 #15

why "bond to a pool" and not allow to be used by all

Just to earn some money for beer.

I would recommend http://pool-x.eu

Contact info:
skype: ag2x3k
irc:#pool-x.eu @ Freenode iRC ( http://webchat.freenode.net/?channels=#pool-x.eu )
web: g2x3k@bitcointalk.org
Come-from-Beyond (OP)
Legendary
*
Offline Offline

Activity: 2142
Merit: 1010

Newbie


View Profile
January 20, 2012, 08:05:26 PM
 #16

Thx, but I switched to other project.
Pages: [1]
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!