Bitcoin Forum

Alternate cryptocurrencies => Mining (Altcoins) => Topic started by: Genoil on November 28, 2015, 10:24:19 AM



Title: Assessing the impact of TLB trashing on memory hard algorhitms
Post by: Genoil on November 28, 2015, 10:24:19 AM
During the development of the CUDA miner for Ethereum, I ran into an issue where the hashrate on GTX750Ti dramatically drops when the size of the memory buffer the miner operates on exceeds a certain threshold (1GB on Win7/Linux, 512MB on Win8/10). After a long discussion on the CUDA forums, one of the designers of CUDA weighed in and identified the issue as TLB trashing. I'm currently conducting a bit of research on the subject and have created a simple test program that measures these effects. It simulates the 'dagger' part of the Ethereum algorithm at different memory buffer (DAG) sizes and writes the results to a CSV file. So far, I have concluded that it is not an Nvidia-only issue, but manifests on AMD hardware as well. And apparently this is not an ETH-only issue, I've got some reports from srcypt-jane miners in as well.

I'm currently looking for as many as possible hardware/OS combinations to come to a recommendation for miners as well as designers of new algo's. Below is an example for ETH hashrate on GTX780 on Windows with increasing buffer size (in MB):

http://i.snag.gy/JQhR3.jpg

The test program can be dowloaded from https://github.com/Genoil/dagSimCL. Win-64 binaries are in the x64/Release folder. You can also build it yourself, but only have supporting MSVC files targetted at Nvidia OpenCL. On AMD hardware you may want to run

Code:
set GPU_MAX_ALLOC_PERCENT 100

first. By default, the program tries to use all of your GPU's RAM up until 4096MB. If you have less system RAM, you may add a cmd line param to test up until a lower maximum:

Code:
dagsimCL.exe 2048

If you have multiple GPU's, you need to add a second param:

Code:
dagsimCL.exe 4096 1

If you have multiple OpenCL platform installed:

Code:
dagsimCL.exe 4096 0 1

I would be very grateful if you could participate in this bit of research and possible discuss any workarounds. Thanks!

p.s. note that achieved hashrates with the test program can be significantly higher than what you actaully get with ethminer. This is because it only simautes the Dagge stages, not the Keccak stages.




Title: Re: Assessing the impact of TLB trashing on memory hard algorhitms
Post by: MaxDZ8 on November 29, 2015, 09:05:57 AM
Interesting.
Please, can you share links to all discussions?

Considering OpenCL is possibly higher level than GL ever was, I'm quite surprised one pinpointed an hardware construct issue especially as GPUs are traditionally managed and there's a huge gap between different OS which in my experience should not be there for HW constructs... odd.

I have a 1GiB card so there's little I can do. I will try to take a look in the next few days if I can set apart some time. Initial analysis in CodeXL gave me inconsistent results.

Have you investigated different access patterns?


Title: Re: Assessing the impact of TLB trashing on memory hard algorhitms
Post by: Genoil on November 29, 2015, 01:41:46 PM
This thread on the CUDA forums is most relevant:
https://devtalk.nvidia.com/default/topic/878455/cuda-programming-and-performance/gtx750ti-and-buffers-gt-1gb-on-win7/
Somebody over there (@allnamac) wrote a completely independent test that verified my findings.

This is not so interesting but shows the problems affect both NVidia and AMD:
http://gathering.tweakers.net/forum/list_messages/1659186





Title: Re: Assessing the impact of TLB trashing on memory hard algorhitms
Post by: gielbier on November 30, 2015, 01:29:57 PM
I'm still trying to get my 7850 2GB do above 1280MB , but getting the out of memory error.  
Even with
set GPU_MAX_ALLOC_PERCENT=100 / GPU_MAX_ALLOC_PERCENT=95
set GPU_MAX_HEAP_SIZE=100
set GPU_USE_SYNC_OBJECTS=1

Code:
DAG size (MB)	Bandwidth (GB/s)	Hashrate (MH/s)
128 130.915 17.1593
256 130.547 17.111
384 129.763 17.0083
512 129.429 16.9645
640 129.359 16.9553
768 129.501 16.9739
896 130.307 17.0796
1,024 130.303 17.0791
1,152 113.466 14.8722
1,280 103.826 13.6086
But it does seem to drop hard. from 1,024->1,280

Chunked (512) version below:
Code:
128	130.953	17.1643
256 130.552 17.1117
384 130.483 17.1027
512 129.715 17.002
640 160.314 21.0126
768 166.186 21.7823
896 162.538 21.3042
1,024 166.417 21.8126
1,152 135.096 17.7073
1,280 38.5741 5.05599
1,408 23.306 3.05476
1,536 17.4977 2.29346
1,664 12.6435 1.65721
1,792 12.2781 1.60932
1,920 10.8921 1.42764

Chunked (256) version below:
Code:
DAG size (MB)	Bandwidth (GB/s)	Hashrate (MH/s)
128 131.008 17.1715
256 130.584 17.1158
384 124.342 16.2977
512 114.388 14.9931
640 178.814 23.4376
768 160.401 21.0241
896 166.627 21.8401
1,024 156.984 20.5762
1,152 141.14 18.4996
1,280 123.989 16.2515
1,408 122.695 16.0819
1,536 51.0244 6.68787
1,664 29.0346 3.80563
1,792 21.4296 2.80881
1,920 17.2236 2.25754


Title: Re: Assessing the impact of TLB trashing on memory hard algorhitms
Post by: Genoil on November 30, 2015, 03:23:57 PM
I've modified the sourcecode a bit to allocate in 256MB chunks. Now it should be possible for AMD cards to get to use more RAM. On my GTX780, the hashrate curve is just about the same (tiny bit slower) when using 256MB chunks.   


Title: Re: Assessing the impact of TLB trashing on memory hard algorhitms
Post by: Genoil on November 30, 2015, 04:35:27 PM
I've modified the sourcecode a bit to allocate in 256MB user-definable chunks. Now it should be possible for AMD cards to get to use more RAM. On my GTX780, the hashrate curve is just about the same (tiny bit slower) when using 256MB chunks.   


Title: Re: Assessing the impact of TLB trashing on memory hard algorhitms
Post by: apriyoni on November 30, 2015, 04:47:31 PM
Do you have a binary for the variable chunk size? I wonder if the future ethminer can also let user choose the chuck size for optimization.


Title: Re: Assessing the impact of TLB trashing on memory hard algorhitms
Post by: Genoil on November 30, 2015, 05:16:32 PM
Do you have a binary for the variable chunk size? I wonder if the future ethminer can also let user choose the chuck size for optimization.

Binaries are in the x64/Release folder on the git repo


Title: Re: Assessing the impact of TLB trashing on memory hard algorhitms
Post by: apriyoni on November 30, 2015, 05:24:51 PM
Do you have a binary for the variable chunk size? I wonder if the future ethminer can also let user choose the chuck size for optimization.

Binaries are in the x64/Release folder on the git repo

What parameter do I use in the command line to change the chunk size?


Title: Re: Assessing the impact of TLB trashing on memory hard algorhitms
Post by: Genoil on November 30, 2015, 06:43:01 PM
Do you have a binary for the variable chunk size? I wonder if the future ethminer can also let user choose the chuck size for optimization.

Binaries are in the x64/Release folder on the git repo

What parameter do I use in the command line to change the chunk size?

Just a number:

dagSimCL.exe 128


Title: Re: Assessing the impact of TLB trashing on memory hard algorhitms
Post by: Omegasun on November 30, 2015, 07:19:48 PM
It seems the code can be optimised further.

I used the chunk size of 640, the speed varies.


Code:
DAG size (MB)	Bandwidth (GB/s)	Hashrate (MH/s)
1,024 171.293 22.4517
1,040 205.211 26.8974
1,056 134.719 17.6579
1,072 308.554 40.4428
1,088 163.232 21.3952
1,104 305.887 40.0932
1,120 158.693 20.8002
1,136 306.434 40.165
1,152 156.652 20.5326
1,168 301.328 39.4956
1,184 152.128 19.9397
1,200 299.754 39.2894
1,216 147.419 19.3225
1,232 293.886 38.5203
1,248 143.319 18.7851
1,264 295.562 38.7399
1,280 143.378 18.7928
1,296 420.797 55.1547
1,312 187.432 24.5671
1,328 297.962 39.0545
1,344 179.346 23.5073
1,360 302.645 39.6683
1,376 184.994 24.2476
1,392 296.381 38.8472
1,408 187.815 24.6173
1,424 295.367 38.7143
1,440 197.081 25.8318
1,456 288.925 37.87
1,472 201.804 26.4508
1,488 289.776 37.9815
1,504 206.96 27.1267
1,520 287.797 37.7221
1,536 208.31 27.3037
1,552 291.33 38.1852
1,568 219.074 28.7145
1,584 291.814 38.2486
1,600 219.739 28.8016
1,616 292.338 38.3173
1,632 219.81 28.811
1,648 293.074 38.4138
1,664 231.998 30.4084
1,680 291.659 38.2283
1,696 235.582 30.8782
1,712 291.056 38.1493
1,728 242.35 31.7653
1,744 292.548 38.3449
1,760 250.939 32.8911
1,776 290.89 38.1275
1,792 248.618 32.5868
1,808 290.146 38.0301
1,824 250.495 32.8329
1,840 290.413 38.065
1,856 252.41 33.0838
1,872 291.722 38.2365
1,888 256.025 33.5578
1,904 289.27 37.9152
1,920 255.669 33.511
1,936 8.29736 1.08755
1,952 277.308 36.3474
1,968 272.499 35.717
1,984 275.355 36.0913
2,000 273.839 35.8926
2,016 271.384 35.5709
2,032 270.993 35.5196
2,048 253.742 33.2585
2,064 269.63 35.3409
2,080 270.119 35.405
2,096 273.058 35.7903
2,112 271.964 35.6468
2,128 274.38 35.9635
2,144 268.958 35.2529
2,160 273.752 35.8812
2,176 262.851 34.4524
2,192 272.935 35.7741
2,208 258.234 33.8473
2,224 272.811 35.7579
2,240 250.648 32.8529
2,256 274.222 35.9429
2,272 246.125 32.2601
2,288 278.434 36.4949
2,304 240.086 31.4686
2,320 276.598 36.2543
2,336 235.735 30.8982
2,352 274.523 35.9823
2,368 226.982 29.751
2,384 281.238 36.8625
2,400 223.112 29.2438
2,416 282.779 37.0644
2,432 218.77 28.6746
2,448 283.579 37.1693
2,464 212.667 27.8748
2,480 287.033 37.6219
2,496 209.21 27.4216
2,512 287.018 37.62
2,528 202.401 26.5291
2,544 285.671 37.4435
2,560 198.479 26.0151
2,576 23.6545 3.10044
2,592 10.8985 1.42849


Title: Re: Assessing the impact of TLB trashing on memory hard algorhitms
Post by: Omegasun on November 30, 2015, 07:38:22 PM
More details:

AMD 7970 Catalyst 15.7 Win 8.1

Code:
DAG size (MB)	Bandwidth (GB/s)	Hashrate (MH/s)
1,024 169.584 22.2277
1,028 201.148 26.3649
1,032 137.039 17.962
1,036 304.848 39.957
1,040 168.196 22.0458
1,044 306.698 40.1995
1,048 168.269 22.0554
1,052 306.984 40.237
1,056 165.834 21.7362
1,060 304.877 39.9609
1,064 163.642 21.4489
1,068 307.009 40.2403
1,072 165.116 21.6421
1,076 308.427 40.4261
1,080 164.043 21.5015
1,084 308.633 40.4532
1,088 163.075 21.3746
1,092 307.799 40.3439
1,096 160.319 21.0133
1,100 306.587 40.185
1,104 159.124 20.8566
1,108 306.287 40.1457
1,112 159.627 20.9226
1,116 307.489 40.3032
1,120 158.726 20.8045
1,124 305.745 40.0746
1,128 159.441 20.8983
1,132 306.6 40.1867
1,136 158.056 20.7168
1,140 303.038 39.7198
1,144 157.432 20.6349
1,148 305.833 40.0861
1,152 156.652 20.5326
1,156 305.136 39.9947
1,160 155.543 20.3874
1,164 303.046 39.7209
1,168 153.005 20.0547
1,172 299.497 39.2556
1,176 152.261 19.9571
1,180 300.396 39.3734
1,184 152.327 19.9657
1,188 301.339 39.4971
1,192 151.254 19.8251
1,196 300.391 39.3729
1,200 150.914 19.7806
1,204 299.912 39.31
1,208 149.358 19.5767
1,212 298.042 39.065
1,216 146.716 19.2303
1,220 294.716 38.6291
1,224 144.464 18.9352
1,228 293.327 38.4469
1,232 142.717 18.7062
1,236 293.99 38.5338
1,240 143.653 18.8288
1,244 293.986 38.5333
1,248 143.824 18.8513
1,252 294.655 38.6211
1,256 143.908 18.8624
1,260 294.698 38.6266
1,264 143.466 18.8044
1,268 295.105 38.6799
1,272 143.061 18.7513
1,276 294.811 38.6414
1,280 143.163 18.7647
1,284 422.562 55.386
1,288 188.832 24.7506
1,292 295.35 38.7121
1,296 189.787 24.8757
1,300 296.379 38.847
1,304 188.287 24.6791
1,308 298.672 39.1475
1,312 186.728 24.4748
1,316 297.253 38.9615
1,320 183.426 24.042
1,324 295.835 38.7757
1,328 181.456 23.7837
1,332 297.77 39.0293
1,336 179.955 23.5871
1,340 297.201 38.9548
1,344 179.453 23.5213
1,348 299.408 39.244
1,352 180.218 23.6215
1,356 301.928 39.5743
1,360 181.53 23.7935
1,364 298.281 39.0963
1,368 185.617 24.3292
1,372 302.053 39.5907
1,376 185.688 24.3385
1,380 299.94 39.3137
1,384 186.133 24.3968
1,388 298.336 39.1035
1,392 186.56 24.4528
1,396 295.824 38.7742
1,400 187.393 24.562
1,404 295.826 38.7746
1,408 188.098 24.6543
1,412 294.417 38.5898
1,416 190.336 24.9477
1,420 294.859 38.6477
1,424 193.624 25.3786
1,428 293.79 38.5076
1,432 195.098 25.5718
1,436 292.607 38.3526
1,440 197.223 25.8504
1,444 291.125 38.1583
1,448 198.542 26.0233
1,452 289.802 37.985
1,456 199.387 26.1341
1,460 288.238 37.7799
1,464 200.996 26.345
1,468 288.351 37.7947
1,472 202.043 26.4822
1,476 288.643 37.8331
1,480 202.655 26.5624
1,484 289.962 38.006
1,488 203.537 26.678
1,492 290.97 38.138
1,496 205.972 26.9972
1,500 288.161 37.7699
1,504 207.029 27.1358
1,508 287.617 37.6986
1,512 206.491 27.0651
1,516 287.496 37.6826
1,520 205.916 26.9899
1,524 287.823 37.7255
1,528 206.337 27.045
1,532 287.394 37.6693
1,536 208.207 27.2901
1,540 288.091 37.7606
1,544 210.887 27.6414
1,548 289.533 37.9496
1,552 212.232 27.8177
1,556 291.393 38.1934
1,560 215.511 28.2475
1,564 293.224 38.4335
1,568 219.839 28.8147
1,572 293.074 38.4139
1,576 219.205 28.7316
1,580 292.927 38.3946
1,584 219.551 28.777
1,588 292.805 38.3786
1,592 220.017 28.838
1,596 292.63 38.3556
1,600 220.541 28.9067
1,604 292.791 38.3767
1,608 219.929 28.8266
1,612 290.807 38.1167
1,616 220.381 28.8857
1,620 292.645 38.3575
1,624 219.375 28.7539
1,628 290.738 38.1077
1,632 220.587 28.9128
1,636 292.731 38.3689
1,640 221.095 28.9794
1,644 292.084 38.284
1,648 227.149 29.7729
1,652 291.895 38.2593
1,656 230.119 30.1621
1,660 293.108 38.4182
1,664 231.49 30.3419
1,668 291.726 38.2372
1,672 232.928 30.5303
1,676 293.916 38.5242
1,680 234.557 30.7439
1,684 291.779 38.2441
1,688 235.908 30.9209
1,692 291.393 38.1934
1,696 238.068 31.204
1,700 291.31 38.1826
1,704 238.506 31.2614
1,708 290.866 38.1244
1,712 240.209 31.4846
1,716 290.223 38.0402
1,720 241.831 31.6973
1,724 293.142 38.4227
1,728 243.461 31.911
1,732 291.057 38.1495
1,736 244.488 32.0455
1,740 291.978 38.2701
1,744 247.518 32.4427
1,748 292.685 38.3627
1,752 247.4 32.4272
1,756 291.118 38.1575
1,760 249.833 32.7461
1,764 290.777 38.1127
1,768 249.076 32.6469
1,772 291.167 38.1639
1,776 248.571 32.5807
1,780 291.049 38.1484
1,784 247.46 32.4351
1,788 291.456 38.2018
1,792 246.624 32.3255
1,796 290.457 38.0708
1,800 248.342 32.5506
1,804 291.001 38.1421
1,808 248.408 32.5593
1,812 290.238 38.042
1,816 248.047 32.512
1,820 283.815 37.2002
1,824 251.754 32.9978
1,828 291.202 38.1685
1,832 248.827 32.6143
1,836 289.854 37.9918
1,840 249.792 32.7407
1,844 291.752 38.2406
1,848 249.65 32.7222
1,852 291.76 38.2416
1,856 251.274 32.935
1,860 290.769 38.1117
1,864 249.961 32.7629
1,868 289.805 37.9854
1,872 252.607 33.1097
1,876 293.063 38.4123
1,880 253.033 33.1656
1,884 291.514 38.2093
1,888 254.853 33.4041
1,892 289.493 37.9444
1,896 254.539 33.363
1,900 290.336 38.0549
1,904 256.004 33.5549
1,908 289.006 37.8806
1,912 256.708 33.6473
1,916 285.48 37.4185
1,920 255.686 33.5133
1,924 8.20708 1.07572
1,928 270.429 35.4457
1,932 274.044 35.9195
1,936 274.753 36.0125
1,940 273.052 35.7895
1,944 272.767 35.7521
1,948 272.661 35.7382
1,952 271.728 35.6159
1,956 272.43 35.708
1,960 264.622 34.6845
1,964 272.508 35.7181
1,968 262.64 34.4248
1,972 272.497 35.7168
1,976 264.491 34.6673
1,980 270.625 35.4714
1,984 264.232 34.6334
1,988 271.633 35.6035
1,992 266.205 34.892
1,996 270.912 35.509
2,000 266.598 34.9436
2,004 271.738 35.6173
2,008 261.33 34.253
2,012 271.001 35.5206
2,016 260.38 34.1285
2,020 269.918 35.3786
2,024 256.16 33.5754
2,028 271.426 35.5763
2,032 256.371 33.6031
2,036 270.266 35.4243
2,040 255.108 33.4375
2,044 267.721 35.0907
2,048 253.612 33.2415
2,052 266.675 34.9536
2,056 261.409 34.2634
2,060 268.425 35.183
2,064 263.32 34.5139
2,068 269.019 35.2608
2,072 260.561 34.1523
2,076 268.476 35.1897
2,080 263.256 34.5055
2,084 272.322 35.6937
2,088 264.666 34.6903
2,092 273.405 35.8357
2,096 263.937 34.5948
2,100 272.581 35.7277
2,104 265.341 34.7788
2,108 269.691 35.349
2,112 265.216 34.7624
2,116 274.05 35.9203
2,120 264.289 34.6409
2,124 273.125 35.799
2,128 262.044 34.3467
2,132 273.024 35.7858
2,136 251.644 32.9835
2,140 270.812 35.4959
2,144 263.824 34.5799
2,148 270.174 35.4122
2,152 261.542 34.2809
2,156 273.66 35.8691
2,160 262.08 34.3513
2,164 270.529 35.4588
2,168 261.311 34.2505
2,172 271.981 35.6491
2,176 259.845 34.0584
2,180 271.602 35.5994
2,184 257.208 33.7127
2,188 275.531 36.1143
2,192 255.897 33.5409
2,196 272.157 35.6721
2,200 255.044 33.4292
2,204 274.641 35.9978
2,208 253.2 33.1874
2,212 275.592 36.1224
2,216 252.516 33.0978
2,220 271.312 35.5614
2,224 249.935 32.7595
2,228 274.28 35.9504
2,232 249.433 32.6937
2,236 275.829 36.1535
2,240 247.638 32.4584
2,244 276.536 36.2461
2,248 247.598 32.4532
2,252 274.525 35.9825
2,256 246.511 32.3107
2,260 276.15 36.1955
2,264 244.45 32.0405
2,268 275.098 36.0576
2,272 243.543 31.9217
2,276 277.45 36.366
2,280 241.421 31.6435
2,284 277.051 36.3136
2,288 239.379 31.3758
2,292 277.136 36.3247
2,296 239.134 31.3438
2,300 278.188 36.4626
2,304 237.303 31.1038
2,308 280.271 36.7356
2,312 236.599 31.0115
2,316 278.368 36.4863
2,320 234.686 30.7608
2,324 278.092 36.4501
2,328 233.993 30.6699
2,332 277.043 36.3126
2,336 234.41 30.7246
2,340 276.13 36.1929
2,344 232.09 30.4205
2,348 278.554 36.5107
2,352 228.067 29.8932
2,356 277.828 36.4154
2,360 224.97 29.4872
2,364 278.849 36.5492
2,368 224.327 29.4031
2,372 280.193 36.7255
2,376 224.697 29.4515
2,380 280.091 36.7121
2,384 222.211 29.1256
2,388 279.278 36.6056
2,392 220.997 28.9665
2,396 281.534 36.9013
2,400 221.729 29.0625
2,404 282.74 37.0593
2,408 220.127 28.8525
2,412 282.141 36.9807
2,416 217.894 28.5599
2,420 282.731 37.0581
2,424 218.422 28.629
2,428 281.104 36.8448
2,432 216.018 28.3139
2,436 283.408 37.1469
2,440 214.819 28.1568
2,444 282.6 37.041
2,448 214.679 28.1384
2,452 283.089 37.1051
2,456 212.861 27.9001
2,460 285.962 37.4817
2,464 210.865 27.6384
2,468 286.566 37.5608
2,472 210.772 27.6264
2,476 285.056 37.3629
2,480 208.119 27.2786
2,484 286.163 37.508
2,488 207.207 27.159
2,492 287.098 37.6305
2,496 206.643 27.0851
2,500 286.126 37.5031
2,504 205.334 26.9136
2,508 285.176 37.3785
2,512 204.538 26.8093
2,516 287.837 37.7274
2,520 201.79 26.449
2,524 288.896 37.8662
2,528 201.035 26.35
2,532 286.915 37.6065
2,536 199.967 26.21
2,540 288.168 37.7707
2,544 198.046 25.9582
2,548 287.959 37.7433
2,552 197.707 25.9139
2,556 287.504 37.6837
2,560 196.663 25.777
2,564 24.5104 3.21262
2,568 10.8853 1.42676


Title: Re: Assessing the impact of TLB trashing on memory hard algorhitms
Post by: Genoil on December 01, 2015, 09:06:10 AM
That's interesting. Many thanks.


Title: Re: Assessing the impact of TLB trashing on memory hard algorhitms
Post by: MaxDZ8 on December 01, 2015, 10:09:12 AM
I've looked at the resources. Considering the linked threads are 1) in German 2) hundreds of messages long I cannot be 100% sure I got it completely.

What I can tell you is that I've observed considerable lower than expected memory performance on GCN1.0 even with much smaller buffers. I think it is also worth noticing before 'compute on GPU' became a thing graphical resources always had an upper bound (!= from CL max alloc). It is my understanding no such limitation shall be in place at that point (and it should be bigger than 1GiB anyway)...
I'm still very surprised this hardware issue to manifest itself at such big bounds (unless the historical limitation still applies).

Leaving aside the max alloc, have you tried how varying the stride affect result (for K MiB buffer)?


Title: Re: Assessing the impact of TLB trashing on memory hard algorhitms
Post by: virasog on December 01, 2015, 01:59:38 PM
From the 640MB chunk size above, the hash rate changes between 20 to 40 MHz all the time, then it the difference reduces from 20 MH/s to a much lower value. Does it indicate an optimisation opportunity.

Can anybody make the chunk work in the ethminers?

The latest ethminer does not display the hash rate, it makes it difficult to compare the results. I wonder this can be added as well.

Code:
	catch (cl::Error const& err)
{
ETHCL_LOG("Allocating/mapping single buffer failed with: " << err.what() << "(" << err.err() << "). GPU can't allocate the DAG in a single chunk. Bailing.");
return false;
#if 0 // Disabling chunking for release since it seems not to work. Never manages to mine a block. TODO: Fix when time is found.
int errCode = err.err();
if (errCode != CL_INVALID_BUFFER_SIZE || errCode != CL_MEM_OBJECT_ALLOCATION_FAILURE)
ETHCL_LOG("Allocating/mapping single buffer failed with: " << err.what() << "(" << errCode << ")");
cl_ulong result;
// if we fail midway on the try above make sure we start clean
m_dagChunks.clear();
device.getInfo(CL_DEVICE_MAX_MEM_ALLOC_SIZE, &result);
ETHCL_LOG(
"Failed to allocate 1 big chunk. Max allocateable memory is "
<< result << ". Trying to allocate 4 chunks."
);
// The OpenCL kernel has a hard coded number of 4 chunks at the moment
m_dagChunksCount = 4;
for (unsigned i = 0; i < m_dagChunksCount; i++)
{
// TODO Note: If we ever change to _dagChunksNum other than 4, then the size would need recalculation
ETHCL_LOG("Creating buffer for chunk " << i);
m_dagChunks.push_back(cl::Buffer(
m_context,
CL_MEM_READ_ONLY,
(i == 3) ? (_dagSize - 3 * ((_dagSize >> 9) << 7)) : (_dagSize >> 9) << 7
));
}
ETHCL_LOG("Loading chunk kernels");
m_hashKernel = cl::Kernel(program, "ethash_hash_chunks");
m_searchKernel = cl::Kernel(program, "ethash_search_chunks");
// TODO Note: If we ever change to _dagChunksNum other than 4, then the size would need recalculation
void* dag_ptr[4];
for (unsigned i = 0; i < m_dagChunksCount; i++)
{
ETHCL_LOG("Mapping chunk " << i);
dag_ptr[i] = m_queue.enqueueMapBuffer(m_dagChunks[i], true, m_openclOnePointOne ? CL_MAP_WRITE : CL_MAP_WRITE_INVALIDATE_REGION, 0, (i == 3) ? (_dagSize - 3 * ((_dagSize >> 9) << 7)) : (_dagSize >> 9) << 7);
}
for (unsigned i = 0; i < m_dagChunksCount; i++)
{
memcpy(dag_ptr[i], (char *)_dag + i*((_dagSize >> 9) << 7), (i == 3) ? (_dagSize - 3 * ((_dagSize >> 9) << 7)) : (_dagSize >> 9) << 7);
m_queue.enqueueUnmapMemObject(m_dagChunks[i], dag_ptr[i]);
}
#endif
}


Title: Re: Assessing the impact of TLB trashing on memory hard algorhitms
Post by: Eliovp on December 01, 2015, 10:18:30 PM
Hey,


as you already noticed, it is indeed correct, a bigger dag file will decrease speed drastically.

I've done some tests too.

first results: 390X

Code:
DAG size (MB)	Bandwidth (GB/s)	Hashrate (MH/s)
128 261,88 34,3251
256 253,74 33,2583
384 261,27 34,2452
512 263,394 34,5236
640 256,832 33,6635
768 262,678 34,4298
896 253,595 33,2392
1 024 262,77 34,4418
1 152 260,289 34,1166
1 280 262,375 34,3901
1 408 262,426 34,3968
1 536 261,992 34,3398
1 664 263,126 34,4884
1 792 262,872 34,4552
1 920 262,028 34,3446
2 048 262,432 34,3975
2 176 246,609 32,3235
2 304 236,27 30,9684
2 432 222,884 29,2138
2 560 206,16 27,0218
2 688 192,144 25,1847
2 816 180,781 23,6953
2 944 170,977 22,4103
3 072 162,634 21,3168
3 200 155,515 20,3837
3 328 149,158 19,5505
3 456 143,477 18,8058
3 584 138,379 18,1376
3 712 133,821 17,5402
3 840 129,724 17,0031
3 968 126,137 16,533

Second result: Fury X (stopped @ 2816MB)

Code:
DAG size (MB)	Bandwidth (GB/s)	Hashrate (MH/s)
128 254,497 33,3574
256 251,412 32,953
384 250,2 32,7942
512 249,919 32,7574
640 249,457 32,6969
768 249,345 32,6821
896 249,108 32,6511
1 024 248,899 32,6237
1 152 248,822 32,6136
1 280 248,888 32,6223
1 408 248,679 32,5948
1 536 248,822 32,6137
1 664 248,653 32,5914
1 792 248,686 32,5957
1 920 248,547 32,5775
2 048 248,564 32,5798
2 176 244,005 31,9823
2 304 217,763 28,5426
2 432 165,121 21,6427
2 560 135,737 17,7913
2 688 118,606 15,5459
2 816 106,813 14,0002


"Indien nodig, is het best mogelijk om nog wat testen te doen hoor. heb nog andere kaarten..."


Title: Re: Assessing the impact of TLB trashing on memory hard algorhitms
Post by: Genoil on December 02, 2015, 12:12:36 PM
From the 640MB chunk size above, the hash rate changes between 20 to 40 MHz all the time, then it the difference reduces from 20 MH/s to a much lower value. Does it indicate an optimisation opportunity.

Can anybody make the chunk work in the ethminers?

The latest ethminer does not display the hash rate, it makes it difficult to compare the results. I wonder this can be added as well.

Code:
	catch (cl::Error const& err)
{
ETHCL_LOG("Allocating/mapping single buffer failed with: " << err.what() << "(" << err.err() << "). GPU can't allocate the DAG in a single chunk. Bailing.");
return false;
#if 0 // Disabling chunking for release since it seems not to work. Never manages to mine a block. TODO: Fix when time is found.
int errCode = err.err();
if (errCode != CL_INVALID_BUFFER_SIZE || errCode != CL_MEM_OBJECT_ALLOCATION_FAILURE)
ETHCL_LOG("Allocating/mapping single buffer failed with: " << err.what() << "(" << errCode << ")");
cl_ulong result;
// if we fail midway on the try above make sure we start clean
m_dagChunks.clear();
device.getInfo(CL_DEVICE_MAX_MEM_ALLOC_SIZE, &result);
ETHCL_LOG(
"Failed to allocate 1 big chunk. Max allocateable memory is "
<< result << ". Trying to allocate 4 chunks."
);
// The OpenCL kernel has a hard coded number of 4 chunks at the moment
m_dagChunksCount = 4;
for (unsigned i = 0; i < m_dagChunksCount; i++)
{
// TODO Note: If we ever change to _dagChunksNum other than 4, then the size would need recalculation
ETHCL_LOG("Creating buffer for chunk " << i);
m_dagChunks.push_back(cl::Buffer(
m_context,
CL_MEM_READ_ONLY,
(i == 3) ? (_dagSize - 3 * ((_dagSize >> 9) << 7)) : (_dagSize >> 9) << 7
));
}
ETHCL_LOG("Loading chunk kernels");
m_hashKernel = cl::Kernel(program, "ethash_hash_chunks");
m_searchKernel = cl::Kernel(program, "ethash_search_chunks");
// TODO Note: If we ever change to _dagChunksNum other than 4, then the size would need recalculation
void* dag_ptr[4];
for (unsigned i = 0; i < m_dagChunksCount; i++)
{
ETHCL_LOG("Mapping chunk " << i);
dag_ptr[i] = m_queue.enqueueMapBuffer(m_dagChunks[i], true, m_openclOnePointOne ? CL_MAP_WRITE : CL_MAP_WRITE_INVALIDATE_REGION, 0, (i == 3) ? (_dagSize - 3 * ((_dagSize >> 9) << 7)) : (_dagSize >> 9) << 7);
}
for (unsigned i = 0; i < m_dagChunksCount; i++)
{
memcpy(dag_ptr[i], (char *)_dag + i*((_dagSize >> 9) << 7), (i == 3) ? (_dagSize - 3 * ((_dagSize >> 9) << 7)) : (_dagSize >> 9) << 7);
m_queue.enqueueUnmapMemObject(m_dagChunks[i], dag_ptr[i]);
}
#endif
}

It may be an oppurtunity for an optimization. The chunked implementation in current ethminer is disabled because it doesn't work. I'll see if I can find some time to check if this could work in ethminer.


Title: Re: Assessing the impact of TLB trashing on memory hard algorhitms
Post by: Dofnatues on December 02, 2015, 05:54:48 PM
From the 640MB chunk size above, the hash rate changes between 20 to 40 MHz all the time, then it the difference reduces from 20 MH/s to a much lower value. Does it indicate an optimisation opportunity.

Can anybody make the chunk work in the ethminers?

The latest ethminer does not display the hash rate, it makes it difficult to compare the results. I wonder this can be added as well.

Code:
	catch (cl::Error const& err)
{
ETHCL_LOG("Allocating/mapping single buffer failed with: " << err.what() << "(" << err.err() << "). GPU can't allocate the DAG in a single chunk. Bailing.");
return false;
#if 0 // Disabling chunking for release since it seems not to work. Never manages to mine a block. TODO: Fix when time is found.
int errCode = err.err();
if (errCode != CL_INVALID_BUFFER_SIZE || errCode != CL_MEM_OBJECT_ALLOCATION_FAILURE)
ETHCL_LOG("Allocating/mapping single buffer failed with: " << err.what() << "(" << errCode << ")");
cl_ulong result;
// if we fail midway on the try above make sure we start clean
m_dagChunks.clear();
device.getInfo(CL_DEVICE_MAX_MEM_ALLOC_SIZE, &result);
ETHCL_LOG(
"Failed to allocate 1 big chunk. Max allocateable memory is "
<< result << ". Trying to allocate 4 chunks."
);
// The OpenCL kernel has a hard coded number of 4 chunks at the moment
m_dagChunksCount = 4;
for (unsigned i = 0; i < m_dagChunksCount; i++)
{
// TODO Note: If we ever change to _dagChunksNum other than 4, then the size would need recalculation
ETHCL_LOG("Creating buffer for chunk " << i);
m_dagChunks.push_back(cl::Buffer(
m_context,
CL_MEM_READ_ONLY,
(i == 3) ? (_dagSize - 3 * ((_dagSize >> 9) << 7)) : (_dagSize >> 9) << 7
));
}
ETHCL_LOG("Loading chunk kernels");
m_hashKernel = cl::Kernel(program, "ethash_hash_chunks");
m_searchKernel = cl::Kernel(program, "ethash_search_chunks");
// TODO Note: If we ever change to _dagChunksNum other than 4, then the size would need recalculation
void* dag_ptr[4];
for (unsigned i = 0; i < m_dagChunksCount; i++)
{
ETHCL_LOG("Mapping chunk " << i);
dag_ptr[i] = m_queue.enqueueMapBuffer(m_dagChunks[i], true, m_openclOnePointOne ? CL_MAP_WRITE : CL_MAP_WRITE_INVALIDATE_REGION, 0, (i == 3) ? (_dagSize - 3 * ((_dagSize >> 9) << 7)) : (_dagSize >> 9) << 7);
}
for (unsigned i = 0; i < m_dagChunksCount; i++)
{
memcpy(dag_ptr[i], (char *)_dag + i*((_dagSize >> 9) << 7), (i == 3) ? (_dagSize - 3 * ((_dagSize >> 9) << 7)) : (_dagSize >> 9) << 7);
m_queue.enqueueUnmapMemObject(m_dagChunks[i], dag_ptr[i]);
}
#endif
}

It may be an oppurtunity for an optimization. The chunked implementation in current ethminer is disabled because it doesn't work. I'll see if I can find some time to check if this could work in ethminer.

If you can make it work, you save a lot of AMD card from being useful in a month or two.

By the way, why the latest ethminer (1.1.0) does not has rate?


Title: Re: Assessing the impact of TLB trashing on memory hard algorhitms
Post by: Grim on December 10, 2015, 08:16:43 AM
Besides all that TLB trashing how come the 280x has more bandwidth (~300GBs) compared to

390x only having 262 GBs

Fury X only 249 GBs

 ???


(also besides bandwidth the gpu memory timings seem to be a major factor)

PS: maybe the 280x has optimized timings from the stilt (bios update)?


Title: Re: Assessing the impact of TLB trashing on memory hard algorhitms
Post by: MaxDZ8 on December 13, 2015, 09:08:23 AM
It's a possibility. I am positive the distribution of math operations VS mem access has a major incidence in GCN; the OpenCL AMD driver is super fast but also very stupid.


Title: Re: Assessing the impact of TLB trashing on memory hard algorhitms
Post by: Fasdurcas on December 13, 2015, 07:51:42 PM
We have about 60 days for most AMD to work without an update of the ethminer. Who is responsible for update the software?


Title: Re: Assessing the impact of TLB trashing on memory hard algorhitms
Post by: Masked_Immortal on December 14, 2015, 08:32:16 AM
is this issue just related to bandwidth, gtx970 has less bandwidth than 280x.
and what about Maxwell GPU?


Title: Re: Assessing the impact of TLB trashing on memory hard algorhitms
Post by: Genoil on December 15, 2015, 01:13:26 PM
We have about 60 days for most AMD to work without an update of the ethminer. Who is responsible for update the software?

I filed this as a potential threat in the Ethereum bug bounty program but haven't anything from their end. Keep in mind I'm not 100% certain about this bug. It was an issue with my dagSimCL test program that may apply to ethminer as well. Unfortunately I don't have any time at the moment to further look into this. If it really is an issue, rest assured the private kernel gang has already jumped on it and resolved it, possibly using the approach that's publicly available (https://github.com/Genoil/dagSimCL/commit/cd900ffd83559a3764abfe2fbc6aa5d509c7a448) in the dagSimCL repo. The owners of such modded kernels should be in for some serious profit...  

is this issue just related to bandwidth, gtx970 has less bandwidth than 280x.
and what about Maxwell GPU?

Maxwell cards with Compute 5.2 (GTX 9xx) only start suffering badly from TLB trashing after 2GB+ allocations, so they are fine until the switch to POS. Maxwell cards with Compute 5.0 (GTX750) have already bitten the dust and are useless for ETH mining.

Note that TLB trashing and the AMD max allocation problem are two separate issues.


Title: Re: Assessing the impact of TLB trashing on memory hard algorhitms
Post by: vatusasid on December 15, 2015, 02:30:22 PM
The developers of Ethereum were paid 13 million Ethe. How come they could not solve this problem? The ethminer 1.0.1 is not stable yet.


Title: Re: Assessing the impact of TLB trashing on memory hard algorhitms
Post by: Omegasun on January 08, 2016, 09:11:37 AM
Is there any news about the development of the etherminer so that it can cope with the larger DAG size. We are approaching 1280MB.


Title: Re: Assessing the impact of TLB trashing on memory hard algorhitms
Post by: Bagdar13 on January 12, 2016, 08:54:00 PM
Is there any news about the development of the etherminer so that it can cope with the larger DAG size. We are approaching 1280MB.

I have heard nothing.  I also tested this and found a problem at ~1280 as well.

I started looking on forums as I noticed a substancially drop off on the 7970 cards at ~1.2 GB.  

At this point my 280xs are down from 27 to about 24
and
At this point my 7970s are down from 22 to about 17 each (this was what supprised me and this problem is present on XFX, powercolor and one other)

This drop in performance seems to be larger than the expected drop

Oddly enough my 7870s seem to have suffered little if any performance hit and are still happy doing 15 same as at launch.

Edit my 7870s are the ghz edition, however, i also have a sapphire and a 7870 MIST (which is really a broken 7950) both of these are also unaffected.

Food for thought.


Title: Re: Assessing the impact of TLB trashing on memory hard algorhitms
Post by: Venon on January 13, 2016, 08:32:35 PM
Is there any news about the development of the etherminer so that it can cope with the larger DAG size. We are approaching 1280MB.

I have heard nothing.  I also tested this and found a problem at ~1280 as well.

I started looking on forums as I noticed a substancially drop off on the 7970 cards at ~1.2 GB.  

At this point my 280xs are down from 27 to about 24
and
At this point my 7970s are down from 22 to about 17 each (this was what supprised me and this problem is present on XFX, powercolor and one other)

This drop in performance seems to be larger than the expected drop

Oddly enough my 7870s seem to have suffered little if any performance hit and are still happy doing 15 same as at launch.

Edit my 7870s are the ghz edition, however, i also have a sapphire and a 7870 MIST (which is really a broken 7950) both of these are also unaffected.

Food for thought.

Do you find the drop during the test or the actual mining?


Title: Re: Assessing the impact of TLB trashing on memory hard algorhitms
Post by: sp_ on January 13, 2016, 09:05:42 PM
Let the dagger grow. The ether algo will be perfect for the botnets.


Title: Re: Assessing the impact of TLB trashing on memory hard algorhitms
Post by: Bagdar13 on January 15, 2016, 04:24:49 AM
Is there any news about the development of the etherminer so that it can cope with the larger DAG size. We are approaching 1280MB.

I have heard nothing.  I also tested this and found a problem at ~1280 as well.

I started looking on forums as I noticed a substancially drop off on the 7970 cards at ~1.2 GB.  

At this point my 280xs are down from 27 to about 24
and
At this point my 7970s are down from 22 to about 17 each (this was what supprised me and this problem is present on XFX, powercolor and one other)

This drop in performance seems to be larger than the expected drop

Oddly enough my 7870s seem to have suffered little if any performance hit and are still happy doing 15 same as at launch.

Edit my 7870s are the ghz edition, however, i also have a sapphire and a 7870 MIST (which is really a broken 7950) both of these are also unaffected.

Food for thought.

Do you find the drop during the test or the actual mining?

I am now dropping in actual mining on this hardware with the dag update at block 840000; the point being is my drop in hash seems to be more than predicted by the size of DAG increase.


Title: Re: Assessing the impact of TLB trashing on memory hard algorhitms
Post by: Dofnatues on January 15, 2016, 06:15:27 PM
Because of the drop of the hash rate. I decided to reduce the core clock frequency and keep the memory frequency the same. Is that a good idea?


Title: Re: Assessing the impact of TLB trashing on memory hard algorhitms
Post by: Genoil on January 18, 2016, 04:09:29 PM
I just finished implementing the chunk allocation into my fork of ethminer.

https://github.com/Genoil/cpp-ethereum/tree/opencl-chunks

By allocating DAG memory in chunks (--cl-chunks <chunkSizeInMB>), issues with RAM allocation may be averted. A nice side effect of this may be (significantly) higher hashrates. Based on what I've seen from people using dagSimCL, --cl-chunks 640 yields quite good results. It may be however that there is a correlation between optimal setting of chunk size vs dag size.

I wrote this change without access to AMD hardware, so your mileage may vary. Don't bother trying this on CUDA devices, using chunks there only has a negative impact on hashrate.

  


Title: Re: Assessing the impact of TLB trashing on memory hard algorhitms
Post by: RustyNoman on January 18, 2016, 05:54:13 PM
I just finished implementing the chunk allocation into my fork of ethminer.

https://github.com/Genoil/cpp-ethereum/tree/opencl-chunks

By allocating DAG memory in chunks (--cl-chunks <chunkSizeInMB>), issues with RAM allocation may be averted. A nice side effect of this may be (significantly) higher hashrates. Based on what I've seen from people using dagSimCL, --cl-chunks 640 yields quite good results. It may be however that there is a correlation between optimal setting of chunk size vs dag size.

I wrote this change without access to AMD hardware, so your mileage may vary. Don't bother trying this on CUDA devices, using chunks there only has a negative impact on hashrate.

  

Do you have instructions for building the problem. Do you have an exe version so that we can try.


Title: Re: Assessing the impact of TLB trashing on memory hard algorhitms
Post by: Genoil on January 18, 2016, 06:02:29 PM
Binary is on the eth forum in mining section


Title: Re: Assessing the impact of TLB trashing on memory hard algorhitms
Post by: Justicemaxx on January 19, 2016, 07:42:35 PM
Binary is on the eth forum in mining section
I tried with different settings of chunks, 640, 660, these figures reduce hash rate about 3 times on R280x, R290. 6x280x give about 50 MGh. At the same time setting 1300 or more does not affect the speed, the speed becomes normal, about 150 MGh, and chunks 1300 give 150 MGh. Maybe I do something wrong? ....Before starting hash, miner writes that he can't create 2 block DAG file because it is blocked GPU.


Title: Re: Assessing the impact of TLB trashing on memory hard algorhitms
Post by: mandica on January 19, 2016, 09:04:29 PM
Binary is on the eth forum in mining section
I tried with different settings of chunks, 640, 660, these figures reduce hash rate about 3 times on R280x, R290. 6x280x give about 50 MGh. At the same time setting 1300 or more does not affect the speed, the speed becomes normal, about 150 MGh, and chunks 1300 give 150 MGh. Maybe I do something wrong? ....Before starting hash, miner writes that he can't create 2 block DAG file because it is blocked GPU.

Did your miner submit valid shares?


Title: Re: Assessing the impact of TLB trashing on memory hard algorhitms
Post by: Justicemaxx on January 20, 2016, 01:12:28 PM
I have extracted a solo miner showed resolve, but the server node seems to be ignored, that is, the balls from the miner with chunks.


Title: Re: Assessing the impact of TLB trashing on memory hard algorhitms
Post by: Assanger on January 20, 2016, 04:37:58 PM
I have extracted a solo miner showed resolve, but the server node seems to be ignored, that is, the balls from the miner with chunks.

Does it mean the etherminer is mining, but the shares are not recognized?


Title: Re: Assessing the impact of TLB trashing on memory hard algorhitms
Post by: Justicemaxx on January 20, 2016, 04:47:56 PM
Yes


Title: Re: Assessing the impact of TLB trashing on memory hard algorhitms
Post by: Genoil on January 25, 2016, 09:00:14 AM
I have extracted a solo miner showed resolve, but the server node seems to be ignored, that is, the balls from the miner with chunks.

LOL true. I'm sorry man, just knocked this out blindly without access to an actual AMD card. For now, some further testing by others have indicated there presently no need to worry about allocation problems in the near future. I wil have to verify for myself to be absolutely sure though.


Title: Re: Assessing the impact of TLB trashing on memory hard algorhitms
Post by: Marvell1 on January 25, 2016, 10:01:22 PM
I have extracted a solo miner showed resolve, but the server node seems to be ignored, that is, the balls from the miner with chunks.

LOL true. I'm sorry man, just knocked this out blindly without access to an actual AMD card. For now, some further testing by others have indicated there presently no need to worry about allocation problems in the near future. I wil have to verify for myself to be absolutely sure though.

I'd could send you one of my 7950s if you want to pay for shipping i have a bunch laying around due to no motherboards to host them in.

This dag problem is getting huge for my my 900mh/s farm is down to like 700mh/s


Title: Re: Assessing the impact of TLB trashing on memory hard algorhitms
Post by: mandica on January 26, 2016, 08:30:57 PM
I have extracted a solo miner showed resolve, but the server node seems to be ignored, that is, the balls from the miner with chunks.

LOL true. I'm sorry man, just knocked this out blindly without access to an actual AMD card. For now, some further testing by others have indicated there presently no need to worry about allocation problems in the near future. I wil have to verify for myself to be absolutely sure though.

I'd could send you one of my 7950s if you want to pay for shipping i have a bunch laying around due to no motherboards to host them in.

This dag problem is getting huge for my my 900mh/s farm is down to like 700mh/s

The Dag problem is not a problem as it affect all the graphics cards. But I heard that it affects R9 380 less.


Title: Re: Assessing the impact of TLB trashing on memory hard algorhitms
Post by: Akarabzie on January 26, 2016, 10:00:18 PM
I have extracted a solo miner showed resolve, but the server node seems to be ignored, that is, the balls from the miner with chunks.

LOL true. I'm sorry man, just knocked this out blindly without access to an actual AMD card. For now, some further testing by others have indicated there presently no need to worry about allocation problems in the near future. I wil have to verify for myself to be absolutely sure though.

I'd could send you one of my 7950s if you want to pay for shipping i have a bunch laying around due to no motherboards to host them in.

This dag problem is getting huge for my my 900mh/s farm is down to like 700mh/s

The Dag problem is not a problem as it affect all the graphics cards. But I heard that it affects R9 380 less.

I keep hearing this as well, but i don't think I've seen enough data to be sure about this yet, or the reason why the 380s aren't affected. Is it the difference in memory types or what? Also what kind of difference if any does the trashing have on the 380 vs 380X?


Title: Re: Assessing the impact of TLB trashing on memory hard algorhitms
Post by: RustyNoman on January 27, 2016, 08:58:23 AM
I have extracted a solo miner showed resolve, but the server node seems to be ignored, that is, the balls from the miner with chunks.

LOL true. I'm sorry man, just knocked this out blindly without access to an actual AMD card. For now, some further testing by others have indicated there presently no need to worry about allocation problems in the near future. I wil have to verify for myself to be absolutely sure though.

I'd could send you one of my 7950s if you want to pay for shipping i have a bunch laying around due to no motherboards to host them in.

This dag problem is getting huge for my my 900mh/s farm is down to like 700mh/s

The Dag problem is not a problem as it affect all the graphics cards. But I heard that it affects R9 380 less.

I keep hearing this as well, but i don't think I've seen enough data to be sure about this yet, or the reason why the 380s aren't affected. Is it the difference in memory types or what? Also what kind of difference if any does the trashing have on the 380 vs 380X?

Yes. We need more data to assess the situation. I am also interested in knowing the performance of 380 vs 380x.


Title: Re: Assessing the impact of TLB trashing on memory hard algorhitms
Post by: Marvell1 on January 27, 2016, 05:45:52 PM
I have extracted a solo miner showed resolve, but the server node seems to be ignored, that is, the balls from the miner with chunks.

LOL true. I'm sorry man, just knocked this out blindly without access to an actual AMD card. For now, some further testing by others have indicated there presently no need to worry about allocation problems in the near future. I wil have to verify for myself to be absolutely sure though.

I'd could send you one of my 7950s if you want to pay for shipping i have a bunch laying around due to no motherboards to host them in.

This dag problem is getting huge for my my 900mh/s farm is down to like 700mh/s

The Dag problem is not a problem as it affect all the graphics cards. But I heard that it affects R9 380 less.

I keep hearing this as well, but i don't think I've seen enough data to be sure about this yet, or the reason why the 380s aren't affected. Is it the difference in memory types or what? Also what kind of difference if any does the trashing have on the 380 vs 380X?

Yes. We need more data to assess the situation. I am also interested in knowing the performance of 380 vs 380x.

I have both the 380 and 380x 4G cards and the hash rate is pretty underwhelming 18mh/s vs 19.5 mh/s max it seems.  They are both pretty power hungry too around 240 watts maybe 250 for the x.

a 7950 gets close to 23 mhs/s for around the same power.  One thing i do notice is the hash rate on the 380 and 380x has remained constant regardless of DAG size vs the drop in hash rate of the 7950s to around 22-21 mh/s   not sure to make of all of this .

I think the best bet right now is to get 390s and mix and match them with 380 so at least you get better relsae value on your GPU's vs the older cards unles you can get them really cheap.

the problem with the 390 and 390x is the run crazy hot and consume close to 300 wats of power , thats even worse with a 290x

I'm trying out various brands of 380x cards this week but form my estimation its not worth it to pay anthing more for the 380x at least for mining since it hashes only 5% higer than the 380 and uses more power basically a worthless card.   


Title: Re: Assessing the impact of TLB trashing on memory hard algorhitms
Post by: RustyNoman on January 28, 2016, 11:06:29 AM
I have extracted a solo miner showed resolve, but the server node seems to be ignored, that is, the balls from the miner with chunks.

LOL true. I'm sorry man, just knocked this out blindly without access to an actual AMD card. For now, some further testing by others have indicated there presently no need to worry about allocation problems in the near future. I wil have to verify for myself to be absolutely sure though.

I'd could send you one of my 7950s if you want to pay for shipping i have a bunch laying around due to no motherboards to host them in.

This dag problem is getting huge for my my 900mh/s farm is down to like 700mh/s

The Dag problem is not a problem as it affect all the graphics cards. But I heard that it affects R9 380 less.

I keep hearing this as well, but i don't think I've seen enough data to be sure about this yet, or the reason why the 380s aren't affected. Is it the difference in memory types or what? Also what kind of difference if any does the trashing have on the 380 vs 380X?

Yes. We need more data to assess the situation. I am also interested in knowing the performance of 380 vs 380x.

I have both the 380 and 380x 4G cards and the hash rate is pretty underwhelming 18mh/s vs 19.5 mh/s max it seems.  They are both pretty power hungry too around 240 watts maybe 250 for the x.

a 7950 gets close to 23 mhs/s for around the same power.  One thing i do notice is the hash rate on the 380 and 380x has remained constant regardless of DAG size vs the drop in hash rate of the 7950s to around 22-21 mh/s   not sure to make of all of this .

I think the best bet right now is to get 390s and mix and match them with 380 so at least you get better relsae value on your GPU's vs the older cards unles you can get them really cheap.

the problem with the 390 and 390x is the run crazy hot and consume close to 300 wats of power , thats even worse with a 290x

I'm trying out various brands of 380x cards this week but form my estimation its not worth it to pay anthing more for the 380x at least for mining since it hashes only 5% higer than the 380 and uses more power basically a worthless card.   


380x has 2048 cores while  380 has 1792. The core number is 14% higher, but the hash rate is just 5% high with higher power consumption. So it is not worth it.


Title: Re: Assessing the impact of TLB trashing on memory hard algorhitms
Post by: adaseb on February 25, 2016, 08:33:27 AM
So we are currently at 1280MB for the DAG file size and most people are still mining. Was the bug fixed?


Title: Re: Assessing the impact of TLB trashing on memory hard algorhitms
Post by: Genoil on February 25, 2016, 08:53:33 AM
So we are currently at 1280MB for the DAG file size and most people are still mining. Was the bug fixed?

It turned out the big bug wasn't really there. My (false) assumptions were based on reports by testers of dagSimCL who apparently didn't know how to tune their AMD cards correctly.

The impact of DAG size on hashrate is a fact though. While on Nvidia it has the most dramatic effects in certain circumstances, the impact on AMD cards has been growing steadily now to such a level that the 280X is now dethroned as most cost-effective card to mine on, losing its position to GTX970 on Win7/Linux. 


Title: Re: Assessing the impact of TLB trashing on memory hard algorhitms
Post by: adaseb on February 25, 2016, 09:11:11 AM
So we are currently at 1280MB for the DAG file size and most people are still mining. Was the bug fixed?

It turned out the big bug wasn't really there. My (false) assumptions were based on reports by testers of dagSimCL who apparently didn't know how to tune their AMD cards correctly.

The impact of DAG size on hashrate is a fact though. While on Nvidia it has the most dramatic effects in certain circumstances, the impact on AMD cards has been growing steadily now to such a level that the 280X is now dethroned as most cost-effective card to mine on, losing its position to GTX970 on Win7/Linux. 

I noticed the decrease in speed also.


The 970 however seems to be at least double in price compared to the 280X.


Title: Re: Assessing the impact of TLB trashing on memory hard algorhitms
Post by: Genoil on February 25, 2016, 09:16:23 AM
So we are currently at 1280MB for the DAG file size and most people are still mining. Was the bug fixed?

It turned out the big bug wasn't really there. My (false) assumptions were based on reports by testers of dagSimCL who apparently didn't know how to tune their AMD cards correctly.

The impact of DAG size on hashrate is a fact though. While on Nvidia it has the most dramatic effects in certain circumstances, the impact on AMD cards has been growing steadily now to such a level that the 280X is now dethroned as most cost-effective card to mine on, losing its position to GTX970 on Win7/Linux. 

I noticed the decrease in speed also.


The 970 however seems to be at least double in price compared to the 280X.

Yes it only counts when you have already ROI'd on the cards mining other coins :)


Title: Re: Assessing the impact of TLB trashing on memory hard algorhitms
Post by: sp_ on February 25, 2016, 09:27:14 AM
You can get the gtx 970 to 21 MHASH by putting the gtx 970 in P1 mode. (nvidia-smi tool).

The best card for mining etherum is the r9 Nano. It does 28MHASH.


Title: Re: Assessing the impact of TLB trashing on memory hard algorhitms
Post by: Realetim on February 25, 2016, 09:51:42 AM
You can get the gtx 970 to 21 MHASH by putting the gtx 970 in P1 mode. (nvidia-smi tool).

The best card for mining etherum is the r9 Nano. It does 28MHASH.


Does the R9 nano use more electricity? Which is more efficient in terms of hash per watt? Nano or 970?


Title: Re: Assessing the impact of TLB trashing on memory hard algorhitms
Post by: sp_ on February 25, 2016, 10:07:50 AM
You can get the gtx 970 to 21 MHASH by putting the gtx 970 in P1 mode. (nvidia-smi tool).
The best card for mining etherum is the r9 Nano. It does 28MHASH.
Does the R9 nano use more electricity? Which is more efficient in terms of hash per watt? Nano or 970?

The NANO use less electricity, but cost more.


Title: Re: Assessing the impact of TLB trashing on memory hard algorhitms
Post by: apriyoni on February 25, 2016, 12:55:14 PM
You can get the gtx 970 to 21 MHASH by putting the gtx 970 in P1 mode. (nvidia-smi tool).
The best card for mining etherum is the r9 Nano. It does 28MHASH.
Does the R9 nano use more electricity? Which is more efficient in terms of hash per watt? Nano or 970?

The NANO use less electricity, but cost more.

The R9 nano costs £388 while the 970 costs £250. So there is £138 or $200 difference. that is quite a lot.


Title: Re: Assessing the impact of TLB trashing on memory hard algorhitms
Post by: sp_ on February 25, 2016, 01:53:10 PM
You can get the gtx 970 to 21 MHASH by putting the gtx 970 in P1 mode. (nvidia-smi tool).
The best card for mining etherum is the r9 Nano. It does 28MHASH.
Does the R9 nano use more electricity? Which is more efficient in terms of hash per watt? Nano or 970?
The NANO use less electricity, but cost more.
The R9 nano costs £388 while the 970 costs £250. So there is £138 or $200 difference. that is quite a lot.

33% faster and 55% more expensive, but it draws less power..


Title: Re: Assessing the impact of TLB trashing on memory hard algorhitms
Post by: rednoW on February 25, 2016, 02:07:19 PM
You can get the gtx 970 to 21 MHASH by putting the gtx 970 in P1 mode. (nvidia-smi tool).

The best card for mining etherum is the r9 Nano. It does 28MHASH.

nope, the best card for eth is 390x now, fury is good for decred


Title: Re: Assessing the impact of TLB trashing on memory hard algorhitms
Post by: adaseb on February 26, 2016, 07:19:10 AM
You guys are all wrong the best card to mine is probably the 7950/7970 since its can be bought second hand dirt cheap. And it gets 20Mh/s.

Buying the Nano or Fury? What are the chances that ETH will still be profitable the day you get ROI ?



Title: Re: Assessing the impact of TLB trashing on memory hard algorhitms
Post by: Satlite on February 26, 2016, 08:34:50 AM
You can get the gtx 970 to 21 MHASH by putting the gtx 970 in P1 mode. (nvidia-smi tool).
The best card for mining etherum is the r9 Nano. It does 28MHASH.
Does the R9 nano use more electricity? Which is more efficient in terms of hash per watt? Nano or 970?
The NANO use less electricity, but cost more.
The R9 nano costs £388 while the 970 costs £250. So there is £138 or $200 difference. that is quite a lot.

33% faster and 55% more expensive, but it draws less power..

In percentage term, it could be a good deal if you  can squeeze 6 GPU and reduce the overhead of the system.


Title: Re: Assessing the impact of TLB trashing on memory hard algorhitms
Post by: RustyNoman on March 10, 2016, 10:27:47 AM
I usually use 8 GPU in a system. 4x7990 + 4x other GPUs. AMD allow up to 8 GPU in the Windows sytem. So I use 8.


Title: Re: Assessing the impact of TLB trashing on memory hard algorhitms
Post by: Akarabzie on March 10, 2016, 02:53:49 PM
I usually use 8 GPU in a system. 4x7990 + 4x other GPUs. AMD allow up to 8 GPU in the Windows sytem. So I use 8.

Most people don't like using the 7990s becuase they are pretty finicky and a pain to keep cool. I haven't had too much problem with mine after a pretty big underclock. I had one GPU go out on me while the other worked, and I had some problems with another one constantly crashing my system. I'd rather just run (5) 280Xs with no system downtime. Hey if you actually got 4x7990s running with no issues, more power to you. Your rig is like what 2100 watts?


Title: Re: Assessing the impact of TLB trashing on memory hard algorhitms
Post by: asrilani on March 10, 2016, 04:03:09 PM
I usually use 8 GPU in a system. 4x7990 + 4x other GPUs. AMD allow up to 8 GPU in the Windows sytem. So I use 8.

Most people don't like using the 7990s becuase they are pretty finicky and a pain to keep cool. I haven't had too much problem with mine after a pretty big underclock. I had one GPU go out on me while the other worked, and I had some problems with another one constantly crashing my system. I'd rather just run (5) 280Xs with no system downtime. Hey if you actually got 4x7990s running with no issues, more power to you. Your rig is like what 2100 watts?

I have 4x7990+4x7970. I undervolt and underclock them a lot. 950mv, 850/1500 MHz, the power is about 1330 and hah rate = 156 MH/s.


Title: Re: Assessing the impact of TLB trashing on memory hard algorhitms
Post by: vatusasid on March 28, 2016, 07:58:09 AM
I usually use 8 GPU in a system. 4x7990 + 4x other GPUs. AMD allow up to 8 GPU in the Windows sytem. So I use 8.

Most people don't like using the 7990s becuase they are pretty finicky and a pain to keep cool. I haven't had too much problem with mine after a pretty big underclock. I had one GPU go out on me while the other worked, and I had some problems with another one constantly crashing my system. I'd rather just run (5) 280Xs with no system downtime. Hey if you actually got 4x7990s running with no issues, more power to you. Your rig is like what 2100 watts?

I have 4x7990+4x7970. I undervolt and underclock them a lot. 950mv, 850/1500 MHz, the power is about 1330 and hah rate = 156 MH/s.

I have similar configurations. The hash rate is just 149 MH/s. So the DAG file size has reduced the hash rate by 3%.


Title: Re: Assessing the impact of TLB trashing on memory hard algorhitms
Post by: Akarabzie on March 29, 2016, 08:35:13 PM
I usually use 8 GPU in a system. 4x7990 + 4x other GPUs. AMD allow up to 8 GPU in the Windows sytem. So I use 8.

Most people don't like using the 7990s becuase they are pretty finicky and a pain to keep cool. I haven't had too much problem with mine after a pretty big underclock. I had one GPU go out on me while the other worked, and I had some problems with another one constantly crashing my system. I'd rather just run (5) 280Xs with no system downtime. Hey if you actually got 4x7990s running with no issues, more power to you. Your rig is like what 2100 watts?

I have 4x7990+4x7970. I undervolt and underclock them a lot. 950mv, 850/1500 MHz, the power is about 1330 and hah rate = 156 MH/s.

I have similar configurations. The hash rate is just 149 MH/s. So the DAG file size has reduced the hash rate by 3%.

What are you guys using to undervolt your 7990s? Just looked at mine and realized they werent actually changing from stock speeds. This may help me keep them running 24/7 since i still get the occasional crash form my 7990.


Title: Re: Assessing the impact of TLB trashing on memory hard algorhitms
Post by: Venon on April 01, 2016, 12:58:36 PM
I undervolt my 7990 to 950 mV, and the frequency is from 820 to 880 MHz, depending on the cards.


Title: Re: Assessing the impact of TLB trashing on memory hard algorhitms
Post by: adaseb on September 28, 2016, 11:35:40 AM
Bumping this thread...


Wondering if the RX470/RX480 will be affected by the sudden drop in hashpower when the DAG files goes to 2050MB.

Is that Dag Simulator accurate? Seems that all cards would suffer at >2GB and not just the Tahiti based cards.


Title: Re: Assessing the impact of TLB trashing on memory hard algorhitms
Post by: nerdralph on September 28, 2016, 11:42:56 AM
Bumping this thread...


Wondering if the RX470/RX480 will be affected by the sudden drop in hashpower when the DAG files goes to 2050MB.

Is that Dag Simulator accurate? Seems that all cards would suffer at >2GB and not just the Tahiti based cards.

AMD GCN does not have a TLB to trash.  See pg. 10.
https://www.amd.com/Documents/GCN_Architecture_whitepaper.pdf


Title: Re: Assessing the impact of TLB trashing on memory hard algorhitms
Post by: ahsbqt on November 30, 2018, 07:45:45 PM
Old thread, but R9 390 are are doing very bad these days 26mhs thanks to tlb bug.


Title: Re: Assessing the impact of TLB trashing on memory hard algorhitms
Post by: adaseb on December 01, 2018, 08:57:00 AM
Old thread, but R9 390 are are doing very bad these days 26mhs thanks to tlb bug.

Yes with the R9 290 its even worse. I think I got 29MH/s with the stock clock settings on (947/1250) with Stilt bios. Now it gets less than 25MH/s and despite the speed decrease the power consumption more or less remains the same and hence its no longer profitable to mine with those GPUs.

Surprisingly they still hold a decent value for gamers and are selling on eBay for fair prices. Will most likely be putting mine up for auction soon. Highly doubt AMD will release a fix for the Hawaii chipsets.