Bitcoin Forum
May 27, 2024, 10:19:14 PM *
News: Latest Bitcoin Core release: 27.0 [Torrent]
 
  Home Help Search Login Register More  
  Show Posts
Pages: « 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 [19] 20 21 22 23 24 25 26 27 28 29 30 31 32 »
361  Bitcoin / Development & Technical Discussion / Re: Modular FPGA Miner Hardware Design Development on: August 19, 2011, 01:57:29 AM
Quote
Im currently learning to programm the msp 430. Anybody interested is invited to join me as this is the biggest obstacle for our project to continue.
I'd be happy to write firmware for the MSP430, and supporting PC software. Heck, I even have three MSP430 kits  Tongue I couldn't resist their $4+free shipping price.

Correct me if I am mistaken: The MSP430 you have selected will be present on each DIMM board. It will be an MSP430 with built-in USB support. The MSP430 will be connected by SPI to pins on both of the FPGAs.

If that is correct I can:

1) Write the SPI module and test it on my Spartan-6 dev kit.
2) Use my MSP430 dev kit to write the MSP430 code and talk to my S6 dev kit.
3) Write a Python interface, console/UI, for the PC software to talk to the MSP430.

Caveat: My MSP430 dev kit has the lowest form of life MSP430 on it. USB is supported through extra chips, so I don't have an actual MSP430 with built-in USB support. So I, or someone else, will have to eventually get one of those chips and tweak up the code to adjust for any differences.

Please correct my understanding of the current design I quoted above, and if the steps I listed seem correct. If all is well, I'll go off and get your firmware and software all ready  Cheesy
362  Economy / Computer hardware / Re: Custom FPGA Board for Sale! on: August 19, 2011, 01:33:06 AM
Quote
so basically similar algo optimizations were implemented on your fpga code?
Not really. Those are GPU optimizations, and are focused on reducing the total number of calculations needed to compute the final hash. That benefits the GPU's performance, but none of them would really have a useful impact on an FPGA's MH/s performance. So I haven't implemented most of them, because they aren't too helpful, would clutter the code, and my time is better spent elsewhere for now.
363  Economy / Computer hardware / Re: Custom FPGA Board for Sale! on: August 19, 2011, 12:56:28 AM
I'm mostly just the firmware designer, but I will try to chime in on a few things here for everyone's benefit Smiley

First and foremost, I do not think it has been made clear that these boards are firmware upgradeable. As improvements are made to the Open Source FPGA Bitcoin Mining project, new firmware will be generated and made available. Just like we see improvements to GPU software and you can drive more out of your dusty 5850s Wink

Personally, I am very excited about this little board as a development platform Smiley Beats the heck out of my $1000 dev kit  Angry


Connecting Multiple Boards
As newMeat1 has hinted at, this was being fleshed out and worked on even before these first boards were sent into production. I'll be playing around with FTDI chips and/or micros, and see if we can get a nice, simple solution that will be backwards compatible with existing boards, streamline future designs, and scale well for multiple boards.

Quote
This is really really cool and I'm glad to see a working board for sale so quickly! Not needing PCIe is like a dream and mining via usb on my old AMD duron machine would rock! Best of luck!
I have an ~8 year old laptop that could probably drive these things Tongue All current GPU mining rigs can also drive it, in addition to driving their GPUs.

Quote
Also, is there any chance that the FPGA miner code will run on a Spartan XC2S30?
The Spartan-2 series is likely too old to push many MH, if any. That particular chip only has ~1K CLBs, for example.

Quote
Are you certain? Aren't the LUTs involved in routing too? I always thought that they were either configured as logic or routing node.
Think of an FPGA as a massive breadboard, with LUTs glued onto it in columns and rows. The breadboards are your "routing fabric" and allow you to choose how you connect the LUTs, and of course you can load the LUTs with whatever configuration you want. That's your most basic FPGA architecture. Just note that there is a "routing cost" proportional to the distance you try to route. For example, connecting one LUT to another half way across your massive breadboard would lead to a massive signal delay.

Quote
On optimization of mining algo (be it on GPUs, CPUs or FPGAs) is there a post that describes all available optimizations?
This thread gives a good run down of all the optimizations being implemented on GPUs.
364  Bitcoin / Hardware / Re: Official Open Source FPGA Bitcoin Miner (Spartan-6 Now Tops Performance per $!) on: August 18, 2011, 04:30:49 AM
Quote
well i had used LX150_test. and i dnt have a lx150 dev board, so i think i will just share my ideas here..
Oops, sorry, LX150_Test isn't really usable at the moment. I really need to add a useful README outlining all those different project variations ...

Thank you for contributing your idea!

Please take a look at the project variation I linked: https://github.com/progranism/Open-Source-FPGA-Bitcoin-Miner/tree/master/projects/LX150_makomk_Test

You will find that your idea, for the most part, has already been implemented in there. Specifically look around this line.

BUT: You did point something out that I think I missed. In the code I linked you'll see that the pre-calculated T1 value is stored in a separate register, not tx_state[7] as you listed in your example. On looking at my code, I believe you are correct; tx_state[7] is never used (except for the last round) so it could be removed or replaced with the partial calculation. Good catch, Anoynomous!

Not sure if the compiler catches this optimization automatically or not.

Quote
again s0_w can be calculated a loop ahead and added to  rx_w[31:0]. this way our new_w will be shortened to:
Now that, I hadn't thought of. Another fantastic catch, Anoynomous!

Double check me on this:

Code:
tx_pre_w <= s0(rx_w[2]) + rx_w[1];     // Calculate the next round's s0 + the next round's w[0].
tx_new_w <= s1(rx_w[14]) + rx_w[9] + rx_pre_w;

Quote
if the above solution is applied, the calculation of new_w will be the new critical path...
The calculation of tx_state[0] is the current critical path:
Code:
t1 = rx_t1_part + e1_w + ch_w
tx_state[0] <= t1 + e0_w + maj_w;
Which is actually pretty good, since it's implemented as only two adders.
365  Bitcoin / Hardware / Re: Official Open Source FPGA Bitcoin Miner (Spartan-6 Now Tops Performance per $!) on: August 17, 2011, 10:54:34 PM
Quote
i am having a little trouble here. I had some experience in designing sha1 hash cracker on fpga, so this project caught my interest. When i downloaded the code and tried to compile it for S6 lx150, it took about an hour to just synthesize the code and then the software said i had overused my resources.. so i wanted to knw, where did i go wrong?...
Which project did you use?

For S6-LX150, this is probably the preferred project to start from:
https://github.com/progranism/Open-Source-FPGA-Bitcoin-Miner/tree/master/projects/LX150_makomk_Test
You'll want to adjust main_pll.v:98 to 5 for 50MHz, to make the compile easier and the firmware actually usable (assuming you have the S6-LX150T dev board) without cooling.
366  Bitcoin / Hardware / Re: Official Open Source FPGA Bitcoin Miner (Spartan-6 Now Tops Performance per $!) on: August 14, 2011, 11:27:54 PM
Quote
My criticism of this design (your design?)  is that there is too much pipelining.
Thank you for the criticism. I really do appreciate the feedback, and I am by no means an expert Smiley

My intuition is similar to yours, in that a more traditional serial design should achieve better utilization and performance on the Spartan-6 architecture. But it is very easy to underestimate the massive amount of optimizations that occur in the fully unrolled design that takes my current primary focus.

I have a functioning serial implementation, but so far my estimates for its total performance once put in parallel on the S6-LX150 is not exciting. Something like 120MH/s of performance. It's in the back of my mind, and there is plenty more work to be done in optimizing and perfecting it, but it hasn't shown me enough promise to warrant being in my mental spotlight like the unrolled design.

Quote
The logic you are using to compute the basic hashes is not optimal, and you have not spent any time trying to optimize for your critical path.
The current critical path is approximately two 3-way 32-bit adders implemented as 16 total slices, thanks to the Spartan-6 fast carry look ahead chains. Is there a means of optimizating that logic that I have missed?
367  Bitcoin / Hardware / Re: Official Open Source FPGA Bitcoin Miner (Spartan-6 Now Tops Performance per $!) on: August 14, 2011, 10:46:29 PM
Quote
200MH is simply way out of the question for an S6-LX150.
That won't stop me from trying  Grin

As far as I can tell with the poking around I've done so far, the current bottleneck on the S6-LX150 is the far dependencies caused by the W calculations. These references make it so that the rounds are not isolated, and so cannot be routed into a uniform chain. This forces ISE to do completely absurd routing, splattering the placement of a round's components across a good 1/4th of the chip. And that, obviously, leads to massive routing delays. On my last few compiles, the worst-case paths were >80% routing (8ns+ of routing, with 2ns of logic).

If W is buffered between each round as a 512-bit register, instead of chains of shift registers and BRAMs, then the rounds can be isolated, but ISE fails to Map such a design for reasons I have not yet nailed down. 512-bits*~100 is quite a lot of registers  Undecided

If I, or someone else, can find a way to isolate the rounds and put them into a more consistent chain, then I highly suspect that both performance and area will improve considerably.

I may create a "fake" design that focuses specifically on the W calculations (without digester rounds), and see if I can somehow get them routed into a sensible structure (even if it requires manual placement  Angry )
368  Bitcoin / Mining software (miners) / Re: Modified Kernel for Phoenix 1.5 on: August 14, 2011, 06:30:05 PM
Quote
Still, why not do the share/H6 test in GPU - it would certainly be faster - shares are also rare compared to a job (about 1 in 2 billion)
Is that an issue with the CL not being able to be changed based on the difficulty?
There are several reasons.

99.99% of the time the mining software only needs to look for Difficulty 1 (a share, H7==0), so there is rarely the needed to check for anything else.
GPU's absolutely hate branching; a full Difficulty check involves many branches.
Smaller GPU programs are better GPU programs.
The CPU runs in parallel to the GPU. Since the CPU is fully capable of checking for extra Difficulty levels, why would you burden the GPU with such work?
The CPU should double-check the GPU's results anyway, to detect errors. Since the CPU will thus be recomputing the full two SHA-256 passes for each result returned by the GPU, it again makes sense to only check for higher difficulties on the CPU.
369  Bitcoin / Mining software (miners) / Re: Modified Kernel for Phoenix 1.5 on: August 14, 2011, 10:11:42 AM
I've compiled a Win32 EXE for my poclbm fork (which has phatk, phatk2, phatk2.1, and phatk2.2 support):

http://www.bitcoin-mining.com/poclbm-progranism-win32-20110814a.zip
md5sum - df623a45f8cb0a50fcded92728f12c14

Let me know if it works, I was only able to test it on one machine so far.

Quote
Well I've been talking to a few people about this but got no real response from anyone, that it was possible ...
The optimization you've spelled out is more or less already implemented in most, if not all GPU miners.

The way GPU miners currently work is that they check in the GPU code whether h7==0. If it does, the result (a nonce) is returned, otherwise nothing is returned. It is the responsibility of the CPU software to do any further difficulty checks if needed.

Since the only thing the GPU miners care about is H7, they completely skip the last 3 rounds (stopping after the 61st round).

Also note, that GPU miners don't calculate the first 3 rounds of the first pass. Those rounds are pre-computed, because the inputs to those rounds remains constant for a given unit of getwork. So a GPU miner really only computes a grand total of 122 rounds, minus various other small pre-calculations here and there.
370  Bitcoin / Hardware / Re: Official Open Source FPGA Bitcoin Miner (Spartan-6 Now Tops Performance per $!) on: August 14, 2011, 05:07:33 AM
Quote
Finally got around to coding some maximum clock speed improvements for users of smaller Cyclone III and IV devices - now available from my new partial-unroll-speed branch. Expected minimum device size and speed is roughly as follows:
More fantastic work, makomk!  Cool *applause*

Quote
I've been playing around with the xilinx-verilog port in the github repo and can confirm that it works just fine on the Xilinx Spartan-6 XC6LX9 microboard eval board from Avnet for $69.
Thank you for taking the time to share your experiences with all these mini eval boards, jonand. That's great information.

It's a shame that LX9 microboard uses a 324 landing. Would be neat to re-solder an LX150 to it, but the LX150 doesn't come in 324 package :/

Quote
But finally, my experiment is working @ 2250 Mhash/s and about 100w!  The cost is out of control, but since I had these cards laying around from other experiments, I figured I'd give it a shot.
*drools* For reference, an AMD 5850 only gets ~350MH/s for ~150W.  Tongue
371  Bitcoin / Mining software (miners) / Re: Modified Kernel for Phoenix 1.5 on: August 10, 2011, 04:35:50 AM
Updated my poclbm branch to support phatk2.2 through the --phatk2_2 command line option:

https://github.com/progranism/poclbm

372  Bitcoin / Hardware / Re: Official Open Source FPGA Bitcoin Miner (Spartan-6 Now Tops Performance per $!) on: August 04, 2011, 12:07:12 PM
Quote
Not sure to what use they can be put either.
I'm guessing you could pair them up to get the equivalent of a 16BWER, but I'm not sure.

I just tried adding an extra register to the shifter's inferred RAM. After compilation it failed timing (80MHz) and ... the register was gone. I'm guessing it optimized the register away somehow, or balanced it. Either way, it ended up having a negative impact. I'm running another compile with USE_XILINX_BRAM_FOR_W off to see how that works. Perhaps we need to find a way for ISE not to optimize the shifter so much when USE_XILINX_BRAM_FOR_W is being used?
373  Bitcoin / Hardware / Re: Official Open Source FPGA Bitcoin Miner (Spartan-6 Now Tops Performance per $!) on: August 04, 2011, 10:13:50 AM
Regarding Performance Optimizing Spartan-6 LX150:
DANGER: Long detailed post coming... sorry, I hope the information is useful though

I'm working with my LX150_makomk_speed_Test project, where I'm trying to nail down the performance bottlenecks and remove them. I'm learning up FPGA Editor so I can better visualize what the router is doing, and I've read through some of the Spartan-6 UGs to get a better understanding of the architecture.

First off, I will say I am quite impressed by Xilinx's work and foresight on the logic of the S6's slices. They can perform a 3 component, 32-bit addition in 8 chained slices, with 4-bits being computed per slice. That blew my mind when I saw it in the FPGA Editor. This is great for our mining algorithm, and you can see why in this critical path analysis:

Code:
W:           16 slices + 0 slices = 16
tx_t1_part:  8  slices + 0 slices = 8

t1:          8  slices + 0 slices
tx_state[0]: 8  slices + 8 slices = 16
tx_state[4]: 8  slices + 8 slices = 16
The worst critical paths are only 16 slices long, with a single break in the carry chain (AFAIK). W is a 4-way, performing a 3-way of the first 3, and a 2-way of the result and the remaining component. tx_state[4] is a 2-way with t1 and rx_state[3].

I haven't fully analyzed the router's behavior on the 2-way's yet, but it appears to include work from other operations ... somehow. Not sure yet.

So, that's the good news. The bad news is, of course, only half of the slices are useful. There are two slices in a CLB. One slice always has fast-carry logic and chains to the slice directly above it (in the CLB above it). The other slice is a lowest form of life slice. It's still a powerful slice, with 4 6-LUTs (or 8 5-LUTs, or combinations thereof), and 8 flip flops, but the mining algorithm has rare use for it.

The next bad news is, only half of the "good" slices can be used as RAM or shift registers. That's not a terrible thing since most will be consumed as adders anyway.

And that's about all I could find that's particularly good or bad with the S6 slices. Since the good slices are all in columns, and spaced evenly, the impact of the useless slices should actually be far less severe than I thought.


For the S6's routing architecture, the quick overview basically said routing costs roughly Manhattan Distance between CLBs. I haven't dug into the details more than that at this point.



With that knowledge in hand, and some beginner's experience with FPGA Editor, I dived in and found what appears to be the largest bottleneck in the current code:

Code:
Slack (setup path):     0.264ns (requirement - (data path - clock path skew + uncertainty))
  Source:               uut2/HASHERS[41].shift_w1/Mram_m (RAM)
  Destination:          uut2/HASHERS[41].upd_w/tx_w15_30 (FF)
  Requirement:          12.500ns
  Data Path Delay:      11.716ns (Levels of Logic = 6)
  Clock Path Skew:      -0.260ns (0.780 - 1.040)
  Source Clock:         hash_clk rising at 0.000ns
  Destination Clock:    hash_clk rising at 12.500ns
  Clock Uncertainty:    0.260ns

  Clock Uncertainty:          0.260ns  ((TSJ^2 + TIJ^2)^1/2 + DJ) / 2 + PE
    Total System Jitter (TSJ):  0.070ns
    Total Input Jitter (TIJ):   0.000ns
    Discrete Jitter (DJ):       0.450ns
    Phase Error (PE):           0.000ns

  Maximum Data Path at Slow Process Corner: uut2/HASHERS[41].shift_w1/Mram_m to uut2/HASHERS[41].upd_w/tx_w15_30
    Location             Delay type         Delay(ns)  Physical Resource
                                                       Logical Resource(s)
    -------------------------------------------------  -------------------
    RAMB16_X2Y46.DOA2    Trcko_DOA             1.850   uut2/HASHERS[41].shift_w1/Mram_m
                                                       uut2/HASHERS[41].shift_w1/Mram_m
    SLICE_X60Y126.A2     net (fanout=4)        5.845   uut2/HASHERS[41].cur_w1<2>
    SLICE_X60Y126.COUT   Topcya                0.379   uut2/HASHERS[41].upd_w/ADDERTREE_INTERNAL_Madd1_cy<19>
                                                       uut2/HASHERS[41].upd_w/ADDERTREE_INTERNAL_Madd1_lut<16>
                                                       uut2/HASHERS[41].upd_w/ADDERTREE_INTERNAL_Madd1_cy<19>
    SLICE_X60Y127.CIN    net (fanout=1)        0.003   uut2/HASHERS[41].upd_w/ADDERTREE_INTERNAL_Madd1_cy<19>
    SLICE_X60Y127.BMUX   Tcinb                 0.292   uut2/HASHERS[41].shift_w0/r<27>
                                                       uut2/HASHERS[41].upd_w/ADDERTREE_INTERNAL_Madd1_cy<23>
    SLICE_X78Y122.B3     net (fanout=1)        1.995   uut2/HASHERS[41].upd_w/ADDERTREE_INTERNAL_Madd_211
    SLICE_X78Y122.BMUX   Tilo                  0.251   uut2/HASHERS[41].upd_w/tx_w15<23>
                                                       uut2/HASHERS[41].upd_w/ADDERTREE_INTERNAL_Madd221
    SLICE_X78Y122.C5     net (fanout=2)        0.383   uut2/HASHERS[41].upd_w/ADDERTREE_INTERNAL_Madd221
    SLICE_X78Y122.COUT   Topcyc                0.295   uut2/HASHERS[41].upd_w/tx_w15<23>
                                                       uut2/HASHERS[41].upd_w/ADDERTREE_INTERNAL_Madd2_lut<0>22
                                                       uut2/HASHERS[41].upd_w/ADDERTREE_INTERNAL_Madd2_cy<0>_22
    SLICE_X78Y123.CIN    net (fanout=1)        0.003   uut2/HASHERS[41].upd_w/ADDERTREE_INTERNAL_Madd2_cy<0>23
    SLICE_X78Y123.COUT   Tbyp                  0.076   uut2/HASHERS[41].upd_w/tx_w15<27>
                                                       uut2/HASHERS[41].upd_w/ADDERTREE_INTERNAL_Madd2_cy<0>_26
    SLICE_X78Y124.CIN    net (fanout=1)        0.003   uut2/HASHERS[41].upd_w/ADDERTREE_INTERNAL_Madd2_cy<0>27
    SLICE_X78Y124.CLK    Tcinck                0.341   uut2/HASHERS[41].upd_w/tx_w15<31>
                                                       uut2/HASHERS[41].upd_w/ADDERTREE_INTERNAL_Madd2_xor<0>_30
                                                       uut2/HASHERS[41].upd_w/tx_w15_30
    -------------------------------------------------  ---------------------------
    Total                                     11.716ns (3.484ns logic, 8.232ns route)
                                                       (29.7% logic, 70.3% route)

It's being forced to route from a RAMB16BWER to a CLB that's right smack dab in the middle of a group of columns, furthest possible position from possible RAM locations. Here, check this image out, it will make you go insane, so don't stare too long:
https://i.imgur.com/gBv5R.png (RAM is on the left).
No, seriously, don't stare at it. The router will drive insanity into the depths of your soon rotting brain fleshes. After exploding the Universe, of course.

Oh, you looked at it anyway and are wondering about that little path heading downward? Yeah, it keeps going ... and going ... (into my damned soul).


And as you can read from the timing report above, routing accounts for *drum roll* 70.3%! Yay! That's 8ns of routing, and only 3.4ns of logic! Imagine if we got rid of all the routing...


I see four solutions at the moment, and will investigate as time allows:

1) Get rid of the RAMB16BWER to some extent.
2) Add an extra register to the output of shifter_32b when inferring RAM logic. Flip-flops should route close to the logic and mask RAM routing delay.
3) Add two, duplicate registers to the output of shifter_32b when inferring RAM logic.
4) Ditch the RAM infer completely and try to coax ISE into using all those flip-flops in the useless slices (which are peppered throughout the routed design at the moment).

I will try 3 first, and hope ISE does the intelligent thing. My hope is that flip-flops in the useless slices will get utilized, since they're mingled in with the useful logic and so should provide somewhat fast local routing.

The interesting this is that we've got lots of RAM to play with. The design is using ~30% of the 16BWERs, and none of the 8BWERs. It seems like a good idea to try to use them and bring slice consumption down if possible, but only if their awkward placement can be solved appropriately.
374  Bitcoin / Hardware / Re: Official Open Source FPGA Bitcoin Miner (Spartan-6 Now Tops Performance per $!) on: August 04, 2011, 08:36:59 AM
Quote
If yes, we should probably decide on a common simple serial protocol for all these FPGA designs, so that one software will fit them all. I'll happily implement that in my python miner, so you don't need to care about the software side of things.
I completely agree, and Python is a far better choice for controller software than Tcl Tongue

I took the time to write out some preliminary specifications for the internal hardware interfaces:
https://github.com/progranism/Open-Source-FPGA-Bitcoin-Miner/wiki/Specification:-Components,-Interfaces,-and-Protocols-%5BWIP%5D

TL;DR: I abstracted away all of the hardware and implementation specifics and shielded them with a single, memory mapped component called the Comm. Being memory mapped, it's a trivial interface that protocols like I2C and SPI can wrap easily.

Relevant to the topic at hand, the specifics of this wrapping by SPI, I2C, UART, JTAG, or what have you will be documented here:
https://github.com/progranism/Open-Source-FPGA-Bitcoin-Miner/wiki/Specification:-PHY-Interfaces

And yes, I've abused the term PHY in that document and the previous specification. Because I'm a horrible person and I want to watch language burn for my own personal pleasure Tongue

Currently in that PHY specification I've only outlined the requirements. TL;DR: It must support multiple devices (addressing a single device specifically), allow reading registers, and writing registers. That's it.

I haven't laid down any ground work for any specific protocols yet, but I listed out what the most immediate ones probably are: UART, SPI, and JTAG. Feel free to begin specifying those in this thread.

And as always, feel free to critique my work, call me stupid, and redo the whole thing Tongue
375  Other / CPU/GPU Bitcoin mining hardware / Re: estimating/calculating hash rates for FPGA chips on: August 04, 2011, 01:12:59 AM
Quote
if I take x number of some fpga chips, assume a bunch of other things, here's how to do a rough estimate on the expected hashing rate....  That's what I'm after.  I can't find that post...
It's easiest to do with Altera Cyclone chips, and roughly works out to:

1,000 LEs = 1MH/s

Could be more in some cases, less in others, but that's a fairly good baseline. For Spartan-6 it's roughly:

1,000 LEs = 0.5MH/s

For high-performance families like Virtex or Stratix you should multiply by 3 or 4.
376  Bitcoin / Mining software (miners) / Estimated Mining Rate Exceeds Measured Rate? on: August 04, 2011, 12:48:06 AM
I'm hoping someone here can help me decipher this mystery. poclbm displays two hashing rates when it is running. The real hashing rate, and the estimated hashing rate:



The first rate is calculated directly from the number of hashes the GPU has processed. So it should be the most accurate number.

The second number is the estimated hashing rate, and it's calculated from the number of shares poclbm actually submits within a window of time (15 minutes by default). It should not directly reflect the GPU's actual processing speed, but rather it's a good estimate of "all things considered" hashing rate.


I made a small modification to the code for that second value:

Code:
if self.options.estimate == -1:
total_shares = self.share_count[1] + self.share_count[0]
estimated_rate = Decimal(total_shares) * (work.targetQ) / int(now - start_time) / 1000
else:

If I use the command line option "-e -1" it will now estimate hashing rate over the entire run-time of poclbm, and all submitted hashes, even rejected ones. I wanted this, so I could be sure that experimental kernels were really producing the expected hashrate. For example, if the kernel or myself screwed up nonce calculations it could end up re-hashing the same nonce, and hence reduce real performance.

The Problem: In the above screenshot, I ran with "-e -1" for a hair under a day now ... and estimated has remained at approximately 363MH/s since last night. That's 9 MH/s (2.5%) more than what should be the most accurate number. Can anyone figure out what might account for this discrepancy?

As far as I can tell, share_count should be accurate; no share should be counted twice.

All the code is here:
https://github.com/progranism/poclbm

And should be up-to-date with m0mchil's code except for my modifications.
377  Bitcoin / Mining / Re: Theoretical limit on hashing speed on: August 03, 2011, 09:52:49 AM
Quote
Add in 8 values: 1 vs. naive 8   (saves 7)
Theoretically you don't need that last add either:

Code:
if(h64 + state[7] == 0) // Yay! Money!
optimized to -> if(h64 == -state[7]) // Yay! Money!

Where state[7] is constant. So you save 8 ops, instead of just 7. I don't know if this is applicable to GPUs specifically (perhaps it's cheaper to add than compare to 0!?), but it's an optimization in general.
378  Bitcoin / Mining software (miners) / Re: Would it be smart to unload some of the work onto the cpu?--read this PDF on: August 03, 2011, 09:26:47 AM
Quote
please define "various values". if its the midstate you are talking about, it was a obvious optimization.
https://bitcointalk.org/index.php?topic=33817.0

They list out lots of the constant and semi-constant optimizations in that thread.
379  Other / Beginners & Help / Re: cpu, FPGA, gpu simultaneous mining rig on: August 03, 2011, 09:14:40 AM
Quote
I had a class about how CPUs work internally last semester and think this might be interesting. It'd probably be prohibitively expensive, but how about having a fully pipelined SHA256?
https://bitcointalk.org/index.php?topic=9047.0

Quote
is a cpu, FPGA, gpu simultaneous mining rig possible say on a 6 core i7, or an amd opteron?
Sure is. I've run GPU + FPGA before. CPU is worthless since I pay for my electricity Tongue
380  Other / CPU/GPU Bitcoin mining hardware / Re: estimating/calculating hash rates for FPGA chips on: August 03, 2011, 09:09:20 AM
https://en.bitcoin.it/wiki/Mining_hardware_comparison#FPGA_Devices
Pages: « 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 [19] 20 21 22 23 24 25 26 27 28 29 30 31 32 »
Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!