zefir
Donator
Hero Member
Offline
Activity: 919
Merit: 1000
|
|
July 26, 2012, 11:10:44 PM |
|
Isn't that easier to just flick a switch?
As far as I understood Yohan and Glasswalker, the Baudrate is dependent on the array controller's clock. And Glasswalker's recent bitstream operates in 25MHz mode, effectively requiring to set serial line to 57.6 kBd. Just posted a pull request to add support in fpgautils.c as preparation for a CM1 driver for cgminer.
|
|
|
|
ebereon
|
|
July 26, 2012, 11:49:21 PM |
|
Short comparison of the two new bitstreams:
Glasswalker's bitstream: (special glassworker controller rev.) fpga0: working but bad U ~1.8 fpga1: not working fpga2: not working fpga3: working but bad U ~1.6
Makomk's bitstream 140: (controller rev. 1.3) fpga0: working fpga1: working fpga2: working (invalids <1%) fpga3: working (invalids <1%)
Makomk's bitstream 150: (controller rev. 1.3) fpga0: working (invalids >10%!!) fpga1: working (invalids >10%!!) fpga2: working (invalids >15%!!) fpga3: working (invalids >15%!!)
thats on my worst board SN#62-409 with MPBM as miner soft.
eb
EDIT: Glasswalker's bitstream hangs after 1h 11min :-/
|
|
|
|
Glasswalker
|
|
July 27, 2012, 12:56:31 AM |
|
Glasswalker's bitstream hangs after 1h 11min :-/
To confirm this is on your worst board though right? Any longer success on your other boards? The reason I ask is this bitstream still uses the "flaky" original icarus UART code, among other things, and Enterpoint has tweaked drive strength and other stuff to reduce noise to help along the uart. But the new bitstream I'm building now has new UART code which should solve that problem, in concert with the improvements from enterpoint it should even be rock solid on the most unstable of board. The part that confuses me is that on the same board the 140Mhz makomk bitstream is stable... (which from what I understand is also a direct port of the icarus code)... Which is odd indeed... Anyway keep reporting results, I should have a new bitstream soon. This next one will be 150Mhz, and then I'll push for faster, but at least you can plop the 150Mhz onto any "poor" chip positions, and the 175 on the others for a hybrid boost... Once that one is confirmed stable, I'll start pushing to get it's speed up higher. I'll post more once I've got something to report.
|
|
|
|
rampone
Sr. Member
Offline
Activity: 339
Merit: 250
dafq is goin on
|
|
July 27, 2012, 01:03:43 AM |
|
sn:62-0016 glasswalker v1.0 sp3 fpgamining_top.bit flashed all flash red when getting work/goldennonce test fpga 0: stays yellow fpga 1: sometimes after power up wrong golden nonce returned, sometimes like fpga0 fpga 3: like fpga 1 fpga 2: like fpga 0 (tested via mpbm, 57600 bps, dips sp3 all on, dips fpga1/2 #1 off, rest on (thats the right setting?!?!!!) if i switch dips fpga1/2 with #12 both off like twintest build fpga1 seems to hang (no yellow, no heat), and golden nonce test is even more wrong on fpga3. long walk home to 2*50mh/s build now
|
http://virwox.com - Bitcoins via CCard, Skrill, paysafe, paypal & SEPA Convert your bitcoin into spendable fiat money in less than 2 days. Poker Players use this method to avoid "unnecessary trouble" with the country they live in ... PM me for details. +1:naz86,b4nana,tinua,smart1986,fhh
|
|
|
kano
Legendary
Offline
Activity: 4592
Merit: 1851
Linux since 1997 RedHat 4
|
|
July 27, 2012, 01:26:32 AM |
|
Glasswalker's bitstream hangs after 1h 11min :-/
To confirm this is on your worst board though right? Any longer success on your other boards? The reason I ask is this bitstream still uses the "flaky" original icarus UART code, among other things ... Odd, I've never had either of my orignal Icarus fail during hashing since I got them ... almost 5 months ago. Runtime is random (coz I often restart to test cgminer code) ... but the longest runtime is 164.5 hours before I stopped it.
|
|
|
|
misternoodle
Member
Offline
Activity: 108
Merit: 10
|
|
July 27, 2012, 02:19:14 AM |
|
Finally got MPBM working, using the Makomk's 140 Bitstream. Followed the settings, SW1 & 6 per the Twin Build Running settings. SW2-5 on low performance mode. Only 2 of my chips are mining right now. https://i.imgur.com/df6ne.jpgAny idea? The two that are hashing, don't have any issues. Board#171
|
|
|
|
Glasswalker
|
|
July 27, 2012, 02:27:49 AM |
|
Odd, I've never had either of my orignal Icarus fail during hashing since I got them ... almost 5 months ago. Runtime is random (coz I often restart to test cgminer code) ... but the longest runtime is 164.5 hours before I stopped it.
That's fair, many people had a great time with them, and ngzhang offered great service and was helpful. I'm not slamming him as an individual. But to provide a counterpoint to your argument, my icarus boards froze fairly frequently, or randomly locked up, and exhibited all kinds of unstable behavior. Some of this was attributed to the usb->uart chip he used. But now that I've dug into his code, there is a vulnerability there, that when combined with the different clocking structure on the cairnsmore and a little bit of noise on the clock lines, it results in a problem. On his boards, his code was fine, but on other boards it has the potential (under the right circumstances) to be highly susceptible to noise. The reason for this is the way his UART samples the waveform. It ends up sampling very near the edge, so in a situation where there might be bounce or ringing on the serial lines, the edge might not be clear, and the result could be the UART clocking in a 0 instead of a 1, or vice-versa. The other problem is that it can cause the UART to "lock" onto the wrong spot in the waveform, resulting in garbled nonsense coming into it altogether. This works as a cheap and dirty UART concept, but ideally you want sampling reliably in the center of the waveform, and in a perfect world, you want oversampling, with filtering to smooth out noise. In my case I'm using the center of waveform method, with oversampling to detect the center effectively, but not using smoothing as that's a bit more complex. Though I can go that route if I have to. Anyway, hopefully that explains a bit better what I'm doing here. On my "from scratch" bitstream, I intend to use a proper serial protocol, with checksums for validation and error checking/correction. The current icarus protocol is fairly crude (it spews bytes out which are just shifted into a register, there is no synchronization attempted, no error checking, nothing. So if it ends up shifted by even a single bit, from then on the data is garbage, and it will keep on going as if nothing has gone wrong). But that's an overall design thing, and changing it on the icarus would likely be a fair bit of work. As for why the noise on the lines exists, Yohan can likely explain that better. It's likely a combination of things. (for one, cairnsmore has a programmable, synthetic clock, rather than a single fixed crystal, also because of the different board layout it might be more sensitive to inductive noise in places the icarus was not for example, but overall I'm not really sure, like I said, that's a question for Yohan)
|
|
|
|
zefir
Donator
Hero Member
Offline
Activity: 919
Merit: 1000
|
|
July 27, 2012, 05:18:44 PM |
|
[...] Anyway keep reporting results, [...]
since you asked for results , here are mine collected over the past 20h: I re-programmed an array of 12 boards with the given controller FW and bitstream. Before the boards operated in twin_bitstream mode with - 7 boards being stable (both U values above ~2.3)
- 3 boards impaired (at least one FPGA below ~2.3U)
- 2 unstable boards (both FPGA below 1U)
They totaled somewhere at 44U. After the programming I have: - 6 boards working 'ok', with all 3 FPGA per board at more than 2.3U
- 5 boards that do not mine. they pass the golden nonce test, but even after an hour still did not generate a single share.
- 1 board that does almost not mine, i.e. after 1h all three FPGAs are below 0.1U
Running the 6 'good' boards for ~20h: cgminer version 2.5.0 - Started: [2012-07-27 10:04:11] -------------------------------------------------------------------------------- (5s):10425.6 (avg):9003.0 Mh/s | Q:8973 A:23300 R:76 HW:0 E:260% U:43.3/m TQ: 24 ST: 25 SS: 0 DW: 3006 NB: 65 LW: 129448 GF: 0 RF: 0 Connected to http://eu.ozco.in:8332 with LP as user zeta-mining.1 Block: 000003b506a76f202a4544008ce4ed0e... Started: [19:01:42] -------------------------------------------------------------------------------- [P]ool management [S]ettings [D]isplay options [Q]uit ICA 0: | 373.1/373.6Mh/s | A:1286 R:8 HW:0 U: 2.39/m ICA 1: | 371.0/373.7Mh/s | A:1307 R:4 HW:0 U: 2.43/m ICA 2: | 378.4/373.7Mh/s | A:1266 R:1 HW:0 U: 2.35/m ICA 3: | 380.0/379.9Mh/s | A: 0 R:0 HW:0 U: 0.00/m ICA 4: | 379.3/373.0Mh/s | A:1382 R:4 HW:0 U: 2.57/m ICA 5: | 370.4/373.7Mh/s | A:1288 R:3 HW:0 U: 2.39/m ICA 6: | 374.2/373.4Mh/s | A:1291 R:5 HW:0 U: 2.40/m ICA 7: | 380.0/379.9Mh/s | A: 0 R:0 HW:0 U: 0.00/m ICA 8: | 376.6/373.6Mh/s | A:1291 R:4 HW:0 U: 2.40/m ICA 9: | 378.1/373.7Mh/s | A:1266 R:5 HW:0 U: 2.35/m ICA 10: | 373.7/373.6Mh/s | A:1272 R:3 HW:0 U: 2.36/m ICA 11: | 380.0/379.9Mh/s | A: 0 R:0 HW:0 U: 0.00/m ICA 12: | 378.8/373.7Mh/s | A:1266 R:3 HW:0 U: 2.35/m ICA 13: | 372.3/373.4Mh/s | A:1314 R:2 HW:0 U: 2.44/m ICA 14: | 370.4/373.4Mh/s | A:1308 R:3 HW:0 U: 2.43/m ICA 15: | 380.0/379.9Mh/s | A: 0 R:0 HW:0 U: 0.00/m ICA 16: | 377.9/373.8Mh/s | A:1260 R:6 HW:0 U: 2.34/m ICA 17: | 373.3/373.8Mh/s | A:1296 R:3 HW:0 U: 2.41/m ICA 18: | 373.2/373.5Mh/s | A:1291 R:9 HW:0 U: 2.40/m ICA 19: | 380.0/379.9Mh/s | A: 0 R:0 HW:0 U: 0.00/m ICA 20: | 375.8/373.4Mh/s | A:1297 R:6 HW:0 U: 2.41/m ICA 21: | 371.8/373.5Mh/s | A:1255 R:4 HW:0 U: 2.33/m ICA 22: | 372.7/373.0Mh/s | A:1364 R:3 HW:0 U: 2.54/m ICA 23: | 380.0/379.9Mh/s | A: 0 R:0 HW:0 U: 0.00/m --------------------------------------------------------------------------------
... I get almost the same total U value as I had before - but now I can power down half of the boards One important observation to note is, that the now working boards are not exactly those that worked stable before - and vice versa. To me it looks completely random whether a board is going to mine with a given bitstream or not. And this bothers me even more than the lack of a bitstream to unleash the full potential of CM1. With all the fights I had so far to get at least some of the FPGAs mining, If I had a choice I would highly prefer a 600 MHps FW that runs on 100% of the boards flawlessly over a 1000 MHps one that runs on 70% of boards. Hope your ongoing improvement of the communication module will resolve the current flaws. Good Luck and thanks for your efforts.
|
|
|
|
Glasswalker
|
|
July 27, 2012, 05:44:18 PM |
|
Thanks! this data is helpful, and interesting.
I ran into an ongoing battle with the xilinx tools, my last build I attempted to do too much at once, and ended up having to backtrack. But I'm now caught up.
What I have now:
- Rewritten UART core to be improved and a bit more noise resiliant, and should be better at locking onto the data cleanly - Adjusted the timing some more - Reworked the project to be buildable purely in Xilinx ISE. (no more requirement for third party tools!) - Using the above code I've built a 175Mhz version that doesn't fully meet timing (1.5ns off) - I'm not running that build through smartxplorer on my workhorse to get it to close timing. - In the meantime I'm now trying a build of both a 125Mhz and 150Mhz version of the same bitstream, hoping one of the 2 will meet timing so we can at least do functional testing to ensure the new code actually works as expected, and if so we can do some stability tests on the lower hashrate.
I'll post once I have a finished build of one of the "slower test builds" so some of you can try them out. Unfortunately this evening I leave town for the weekend and won't be able to get any work done. But the optimization run will run all weekend long and should be done early next week, so provided this new code actually works (and the slower cores test out fine) the 175Mhz version should finish sometime soon.
I'll be sending all my work to Yohan at Enterpoint, and he'll do some additional poking and testing on their end, perhaps another improved controller will help as well.
So I'm in rush mode right now to validate the code on a slower bitstream before I leave for the weekend, so I don't waste too much compute time on invalid code (if that's the case). Here's hoping though that it's all solid, and this can get us runing reliably on all 4 chips (at a slower rate), and early next week we'll see a 175Mhz version of the same.
Hopefully within an hour or two I'll post a .bit file for you to test out.
|
|
|
|
Doff
|
|
July 27, 2012, 06:05:37 PM Last edit: July 27, 2012, 06:20:52 PM by Doff |
|
Thanks! this data is helpful, and interesting.
I ran into an ongoing battle with the xilinx tools, my last build I attempted to do too much at once, and ended up having to backtrack. But I'm now caught up.
What I have now:
- Rewritten UART core to be improved and a bit more noise resiliant, and should be better at locking onto the data cleanly - Adjusted the timing some more - Reworked the project to be buildable purely in Xilinx ISE. (no more requirement for third party tools!) - Using the above code I've built a 175Mhz version that doesn't fully meet timing (1.5ns off) - I'm not running that build through smartxplorer on my workhorse to get it to close timing. - In the meantime I'm now trying a build of both a 125Mhz and 150Mhz version of the same bitstream, hoping one of the 2 will meet timing so we can at least do functional testing to ensure the new code actually works as expected, and if so we can do some stability tests on the lower hashrate.
I'll post once I have a finished build of one of the "slower test builds" so some of you can try them out. Unfortunately this evening I leave town for the weekend and won't be able to get any work done. But the optimization run will run all weekend long and should be done early next week, so provided this new code actually works (and the slower cores test out fine) the 175Mhz version should finish sometime soon.
I'll be sending all my work to Yohan at Enterpoint, and he'll do some additional poking and testing on their end, perhaps another improved controller will help as well.
So I'm in rush mode right now to validate the code on a slower bitstream before I leave for the weekend, so I don't waste too much compute time on invalid code (if that's the case). Here's hoping though that it's all solid, and this can get us runing reliably on all 4 chips (at a slower rate), and early next week we'll see a 175Mhz version of the same.
Hopefully within an hour or two I'll post a .bit file for you to test out.
I have a strange problem where it only works on one of 5 of my machines, and the only difference I know of is that the one it works on is Debian with a custom zen 3.4 patched kernel. Im hoping some of your updates miraculously fix that. The ftdi driver on all the debian machines is 1.6, and that's the only serial driver I added to the one it works on. The current Glasswalker bitstream worked for about an hour before cutting out on a second FPGA, and I figured out if I remove the workers that are broken and restart it runs stable on 2 FPGA at about 370mhash. Not that it really helps anything other than to confirm the bitstream obviously still needs tuning. I did figure out why my MPBM didn't work, turns out at some point I downloaded a non working version of ebs modified version at some point and had that loaded. Once re-downloaded the testing branch it worked for the one PC. Looking forward to a new Bitstream to test its nice to see things moving ahead. Small Update after another hour I start to get errors on the another FPGA. Ozcoin accepted share 1bed5606 (difficulty 1.01941) 2012-07-27 11:18:53.933 [100] Untitled Cairnsmore worker: Traceback (most recent call last): File "/home/doff/bin/testing/Modular-Python-Bitcoin-Miner/modules/glasswalker/cairnsmore/cairnsmoreworker.py", line 201, in main <------------------------- This Error and then it stops working. if not self.checksuccess: raise Exception("Timeout waiting for validation job to finish") Exception: Timeout waiting for validation job to finish 2012-07-27 11:19:02.250 [400] Untitled Cairnsmore worker: Mining Ozcoin:0000000173f517c30b30dc172356f762d03a88545e8aa071cd7adcea0000001800000000ef650d8 1c86066601ffbcdf2d4fbf6eeb3ee15e3f07506e708dcd1173399b6b45012dbba1a08fd2e 2012-07-27 11:19:11.352 [400] Untitled Cairnsmore worker: Mining Ozcoin:0000000173f517c30b30dc172356f762d03a88545e8aa071cd7adcea0000001800000000ef650d8 1c86066601ffbcdf2d4fbf6eeb3ee15e3f07506e708dcd1173399b6b45012dbbb1a08fd2e 2012-07-27 11:19:19.691 [350] Untitled Cairnsmore worker: Found share: Ozcoin:0000000173f517c30b30dc172356f762d03a88545e8aa071cd7adcea0000001800000000ef650d8 1c86066601ffbcdf2d4fbf6eeb3ee15e3f07506e708dcd1173399b6b45012dbbb1a08fd2e:815ad556 2012-07-27 11:19:19.736 [400] Untitled Cairnsmore worker: Mining Ozcoin:0000000173f517c30b30dc172356f762d03a88545e8aa071cd7adcea0000001800000000ef650d8 1c86066601ffbcdf2d4fbf6eeb3ee15e3f07506e708dcd1173399b6b45012dbbc1a08fd2e 2012-07-27 11:19:19.820 [250] Untitled Cairnsmore worker: Ozcoin accepted share 815ad556 (difficulty 1.14627) 2012-07-27 11:19:28.836 [400] Untitled Cairnsmore worker: Mining Ozcoin:0000000173f517c30b30dc172356f762d Dont know if that helps anything but there it is.
|
|
|
|
steamboat
|
|
July 27, 2012, 08:03:46 PM |
|
which usb hubs are you guys using?
|
|
|
|
Glasswalker
|
|
July 27, 2012, 08:04:37 PM |
|
Ok I'm leaving town now... And unfortunately neither the 150 or 125 closed timing on first attempt (which is silly, stupid xilinx tools)...
Anyway, I'm now running smartxplorer on BOTH the 125 and the 175... The 175 is set to run all possible options and find the best, the 125 is set to find the first possible option that closes timing at all, and then quit...
With both running all 24 of the processor cores in my machine are maxed out at 100% and my ram is pegged at 98%... So it's bordering on bursting into flames lol...
Hopefully it will close timing very quickly on the 125, I don't know why it didn't on the first try, amusingly the 175's first attempt would likely run fine at 125 out of the gate, but because the DCMs are configured differently if I simply underclocked it it would fail to work. And when I rebuilt with the needed clocking changes, it resulted in this crippled bitstream.
Anyway, I'll keep checking on it remotely, and when it's closed the 125, I'll release it for testing.
Once I get something viable on the 175 I'll do the same.
I've sent all my work to Enterpoint, hopefully they can continue running with it over the weekend while I'm gone and maybe we'll get an improved controller that helps mitigate more of the problems.
Also while waiting on the build today, I made some fairly major headway on my "from scratch" bitstream... It may be closer than I originally estimated (ie weeks instead of months)
Thanks!
|
|
|
|
rjk
Sr. Member
Offline
Activity: 448
Merit: 250
1ngldh
|
|
July 27, 2012, 08:57:31 PM |
|
Sheeit, how much RAM do you have anyway? I've been considering building a monster, but with 24 cores I'm assuming you already have a lot of RAM.
|
|
|
|
this time
Newbie
Offline
Activity: 55
Merit: 0
|
|
July 27, 2012, 09:48:30 PM Last edit: July 29, 2012, 09:15:53 PM by this time |
|
Finally got MPBM working, using the Makomk's 140 Bitstream. Followed the settings, SW1 & 6 per the Twin Build Running settings. SW2-5 on low performance mode. Only 2 of my chips are mining right now. https://i.imgur.com/df6ne.jpgAny idea? The two that are hashing, don't have any issues. Board#171 This is my experience also. All 4 fpga positions produce the occasional green flash. Sometimes the 2 fpga positions next to each other produce a simultaneous very quick red flash. I've tried different setting for sw 2-5. None seem to make a difference to anything.
|
|
|
|
ebereon
|
|
July 27, 2012, 10:05:36 PM |
|
Finally got MPBM working, using the Makomk's 140 Bitstream. Followed the settings, SW1 & 6 per the Twin Build Running settings. SW2-5 on low performance mode. Only 2 of my chips are mining right now. https://i.imgur.com/df6ne.jpgAny idea? The two that are hashing, don't have any issues. Board#171 This is my experience also. All 4 fpga positions produce the occasional green flash. Sometimes the 2 fpga positions next to each other produce a simultaneous very quick red flash. I've tried different setting for sw 2-5. None seem to make a difference to anything. 2 cores mine at around 140 mh/s each, the other 2 show 0 in mpbm. I had 1 board that i had to flash permanently (SPI) to get it working. All others run with the temporary mode. I don't know why, but yohan knows... @Makomk: Please post your Wallet address, i will throw some coins to you =D - your bitstream is the best at the moment, nothing is faster or easyer to flash. eb
|
|
|
|
this time
Newbie
Offline
Activity: 55
Merit: 0
|
|
July 28, 2012, 12:22:33 AM |
|
I tried on a different board with the same results. 140 makomk bitsteam works on fpga 0,3 just like twin_test. It
does not work on fpga 1,2. I double checked the comms. I can only think I'm making a procedure error.
For programming:
SW6 1 off, all else on
SW1 3 off, all else on
SW2,5 all on
SW3,4 2 off, all else on
For mining:
SW1 all on
no other changes
|
|
|
|
ebereon
|
|
July 28, 2012, 12:41:53 AM |
|
I tried on a different board with the same results. 140 makomk bitsteam works on fpga 0,3 just like twin_test. It
does not work on fpga 1,2. I double checked the comms. I can only think I'm making a procedure error.
For programming:
SW6 1 off, all else on
SW1 3 off, all else on
SW2,5 all on
SW3,4 2 off, all else on
For mining:
SW1 all on
no other changes
Should do...hmm Controller rev. 1.3? Flashed to SPI and powered completly down? Changed USB cable? What SN? My boards are all working and SN# is betwen 62-406 and 62-415. U is betwen 3.76 and 4.00 for each pair.
|
|
|
|
this time
Newbie
Offline
Activity: 55
Merit: 0
|
|
July 28, 2012, 01:26:30 AM Last edit: July 29, 2012, 09:16:59 PM by this time |
|
Yes, to all.
|
|
|
|
Doff
|
|
July 28, 2012, 03:14:45 AM |
|
I tried on a different board with the same results. 140 makomk bitsteam works on fpga 0,3 just like twin_test. It
does not work on fpga 1,2. I double checked the comms. I can only think I'm making a procedure error.
For programming:
SW6 1 off, all else on
SW1 3 off, all else on
SW2,5 all on
SW3,4 2 off, all else on
For mining:
SW1 all on
no other changes
I think you need to program all 4, but only address 2 COM ports as it drives them as a pair in this mode. Am I wrong? I couldn't get this bitstream to work better than twin_test either. It either failed Icarus detect in CGminer or hashed too slow, or not at all. I had the same results, if I read correctly I think ebereon uses a jtag cable, maybe that makes it work? Im just using the usb cable.
|
|
|
|
this time
Newbie
Offline
Activity: 55
Merit: 0
|
|
July 28, 2012, 03:27:01 AM |
|
I tried on a different board with the same results. 140 makomk bitsteam works on fpga 0,3 just like twin_test. It
does not work on fpga 1,2. I double checked the comms. I can only think I'm making a procedure error.
For programming:
SW6 1 off, all else on
SW1 3 off, all else on
SW2,5 all on
SW3,4 2 off, all else on
For mining:
SW1 all on
no other changes
I think you need to program all 4, but only address 2 COM ports as it drives them as a pair in this mode. Am I wrong? I couldn't get this bitstream to work better than twin_test either. It either failed Icarus detect in CGminer or hashed too slow, or not at all. Yes, that's how it should work. You should get 280 mh/s per comm. 2 comms per board. The screenshot of the person I was quoting is a little misleading as it looks like he was trying to open 4 comms per board. I guess I don't know which fpga is actually doing the work of each pair, but it looks like the same situation as before as far as only 1 active fpga per comm channel. My 280 mh/s is composed of 140/140 as opposed to 280/0. It should be 280/280.
|
|
|
|
|