Print Page - Assessing the impact of TLB trashing on memory hard algorhitms

Title: Assessing the impact of TLB trashing on memory hard algorhitms
Post by: Genoil on November 28, 2015, 10:24:19 AM

During the development of the CUDA miner for Ethereum, I ran into an issue where the hashrate on GTX750Ti dramatically drops when the size of the memory buffer the miner operates on exceeds a certain threshold (1GB on Win7/Linux, 512MB on Win8/10). After a long discussion on the CUDA forums, one of the designers of CUDA weighed in and identified the issue as TLB trashing. I'm currently conducting a bit of research on the subject and have created a simple test program that measures these effects. It simulates the 'dagger' part of the Ethereum algorithm at different memory buffer (DAG) sizes and writes the results to a CSV file. So far, I have concluded that it is not an Nvidia-only issue, but manifests on AMD hardware as well. And apparently this is not an ETH-only issue, I've got some reports from srcypt-jane miners in as well.

I'm currently looking for as many as possible hardware/OS combinations to come to a recommendation for miners as well as designers of new algo's. Below is an example for ETH hashrate on GTX780 on Windows with increasing buffer size (in MB):

http://i.snag.gy/JQhR3.jpg

The test program can be dowloaded from https://github.com/Genoil/dagSimCL. Win-64 binaries are in the x64/Release folder. You can also build it yourself, but only have supporting MSVC files targetted at Nvidia OpenCL. On AMD hardware you may want to run

Code:

set GPU_MAX_ALLOC_PERCENT 100

first. By default, the program tries to use all of your GPU's RAM up until 4096MB. If you have less system RAM, you may add a cmd line param to test up until a lower maximum:

Code:

dagsimCL.exe 2048

If you have multiple GPU's, you need to add a second param:

Code:

dagsimCL.exe 4096 1

If you have multiple OpenCL platform installed:

Code:

dagsimCL.exe 4096 0 1

I would be very grateful if you could participate in this bit of research and possible discuss any workarounds. Thanks!

p.s. note that achieved hashrates with the test program can be significantly higher than what you actaully get with ethminer. This is because it only simautes the Dagge stages, not the Keccak stages.

Title: Re: Assessing the impact of TLB trashing on memory hard algorhitms
Post by: MaxDZ8 on November 29, 2015, 09:05:57 AM

Interesting.
Please, can you share links to all discussions?

Considering OpenCL is possibly higher level than GL ever was, I'm quite surprised one pinpointed an hardware construct issue especially as GPUs are traditionally managed and there's a huge gap between different OS which in my experience should not be there for HW constructs... odd.

I have a 1GiB card so there's little I can do. I will try to take a look in the next few days if I can set apart some time. Initial analysis in CodeXL gave me inconsistent results.

Have you investigated different access patterns?

Title: Re: Assessing the impact of TLB trashing on memory hard algorhitms
Post by: Genoil on November 29, 2015, 01:41:46 PM

This thread on the CUDA forums is most relevant:
https://devtalk.nvidia.com/default/topic/878455/cuda-programming-and-performance/gtx750ti-and-buffers-gt-1gb-on-win7/
Somebody over there (@allnamac) wrote a completely independent test that verified my findings.

This is not so interesting but shows the problems affect both NVidia and AMD:
http://gathering.tweakers.net/forum/list_messages/1659186

Title: Re: Assessing the impact of TLB trashing on memory hard algorhitms
Post by: gielbier on November 30, 2015, 01:29:57 PM

I'm still trying to get my 7850 2GB do above 1280MB , but getting the out of memory error.
Even with
set GPU_MAX_ALLOC_PERCENT=100 / GPU_MAX_ALLOC_PERCENT=95
set GPU_MAX_HEAP_SIZE=100
set GPU_USE_SYNC_OBJECTS=1

Code:

DAG size (MB)	Bandwidth (GB/s)	Hashrate (MH/s)
128	130.915	17.1593
256	130.547	17.111
384	129.763	17.0083
512	129.429	16.9645
640	129.359	16.9553
768	129.501	16.9739
896	130.307	17.0796
1,024	130.303	17.0791
1,152	113.466	14.8722
1,280	103.826	13.6086

But it does seem to drop hard. from 1,024->1,280

Chunked (512) version below:

Code:

128	130.953	17.1643
256	130.552	17.1117
384	130.483	17.1027
512	129.715	17.002
640	160.314	21.0126
768	166.186	21.7823
896	162.538	21.3042
1,024	166.417	21.8126
1,152	135.096	17.7073
1,280	38.5741	5.05599
1,408	23.306	3.05476
1,536	17.4977	2.29346
1,664	12.6435	1.65721
1,792	12.2781	1.60932
1,920	10.8921	1.42764

Chunked (256) version below:

Code:

DAG size (MB)	Bandwidth (GB/s)	Hashrate (MH/s)
128	131.008	17.1715
256	130.584	17.1158
384	124.342	16.2977
512	114.388	14.9931
640	178.814	23.4376
768	160.401	21.0241
896	166.627	21.8401
1,024	156.984	20.5762
1,152	141.14	18.4996
1,280	123.989	16.2515
1,408	122.695	16.0819
1,536	51.0244	6.68787
1,664	29.0346	3.80563
1,792	21.4296	2.80881
1,920	17.2236	2.25754

Title: Re: Assessing the impact of TLB trashing on memory hard algorhitms
Post by: Genoil on November 30, 2015, 03:23:57 PM

I've modified the sourcecode a bit to allocate in 256MB chunks. Now it should be possible for AMD cards to get to use more RAM. On my GTX780, the hashrate curve is just about the same (tiny bit slower) when using 256MB chunks.

Title: Re: Assessing the impact of TLB trashing on memory hard algorhitms
Post by: Genoil on November 30, 2015, 04:35:27 PM

Quote from: Genoil on November 30, 2015, 03:23:57 PM

I've modified the sourcecode a bit to allocate in ~~256MB~~ user-definable chunks. Now it should be possible for AMD cards to get to use more RAM. On my GTX780, the hashrate curve is just about the same (tiny bit slower) when using 256MB chunks.

Title: Re: Assessing the impact of TLB trashing on memory hard algorhitms
Post by: apriyoni on November 30, 2015, 04:47:31 PM

Do you have a binary for the variable chunk size? I wonder if the future ethminer can also let user choose the chuck size for optimization.

Title: Re: Assessing the impact of TLB trashing on memory hard algorhitms
Post by: Genoil on November 30, 2015, 05:16:32 PM

Quote from: apriyoni on November 30, 2015, 04:47:31 PM

Do you have a binary for the variable chunk size? I wonder if the future ethminer can also let user choose the chuck size for optimization.

Binaries are in the x64/Release folder on the git repo

Title: Re: Assessing the impact of TLB trashing on memory hard algorhitms
Post by: apriyoni on November 30, 2015, 05:24:51 PM

Quote from: Genoil on November 30, 2015, 05:16:32 PM

Quote from: apriyoni on November 30, 2015, 04:47:31 PM

Do you have a binary for the variable chunk size? I wonder if the future ethminer can also let user choose the chuck size for optimization.

Binaries are in the x64/Release folder on the git repo

What parameter do I use in the command line to change the chunk size?

Title: Re: Assessing the impact of TLB trashing on memory hard algorhitms
Post by: Genoil on November 30, 2015, 06:43:01 PM

Quote from: apriyoni on November 30, 2015, 05:24:51 PM

Quote from: Genoil on November 30, 2015, 05:16:32 PM

Quote from: apriyoni on November 30, 2015, 04:47:31 PM

Do you have a binary for the variable chunk size? I wonder if the future ethminer can also let user choose the chuck size for optimization.

Binaries are in the x64/Release folder on the git repo

What parameter do I use in the command line to change the chunk size?

Just a number:

dagSimCL.exe 128

Title: Re: Assessing the impact of TLB trashing on memory hard algorhitms
Post by: Omegasun on November 30, 2015, 07:19:48 PM

It seems the code can be optimised further.

I used the chunk size of 640, the speed varies.

Code:

DAG size (MB)	Bandwidth (GB/s)	Hashrate (MH/s)
1,024	171.293	22.4517
1,040	205.211	26.8974
1,056	134.719	17.6579
1,072	308.554	40.4428
1,088	163.232	21.3952
1,104	305.887	40.0932
1,120	158.693	20.8002
1,136	306.434	40.165
1,152	156.652	20.5326
1,168	301.328	39.4956
1,184	152.128	19.9397
1,200	299.754	39.2894
1,216	147.419	19.3225
1,232	293.886	38.5203
1,248	143.319	18.7851
1,264	295.562	38.7399
1,280	143.378	18.7928
1,296	420.797	55.1547
1,312	187.432	24.5671
1,328	297.962	39.0545
1,344	179.346	23.5073
1,360	302.645	39.6683
1,376	184.994	24.2476
1,392	296.381	38.8472
1,408	187.815	24.6173
1,424	295.367	38.7143
1,440	197.081	25.8318
1,456	288.925	37.87
1,472	201.804	26.4508
1,488	289.776	37.9815
1,504	206.96	27.1267
1,520	287.797	37.7221
1,536	208.31	27.3037
1,552	291.33	38.1852
1,568	219.074	28.7145
1,584	291.814	38.2486
1,600	219.739	28.8016
1,616	292.338	38.3173
1,632	219.81	28.811
1,648	293.074	38.4138
1,664	231.998	30.4084
1,680	291.659	38.2283
1,696	235.582	30.8782
1,712	291.056	38.1493
1,728	242.35	31.7653
1,744	292.548	38.3449
1,760	250.939	32.8911
1,776	290.89	38.1275
1,792	248.618	32.5868
1,808	290.146	38.0301
1,824	250.495	32.8329
1,840	290.413	38.065
1,856	252.41	33.0838
1,872	291.722	38.2365
1,888	256.025	33.5578
1,904	289.27	37.9152
1,920	255.669	33.511
1,936	8.29736	1.08755
1,952	277.308	36.3474
1,968	272.499	35.717
1,984	275.355	36.0913
2,000	273.839	35.8926
2,016	271.384	35.5709
2,032	270.993	35.5196
2,048	253.742	33.2585
2,064	269.63	35.3409
2,080	270.119	35.405
2,096	273.058	35.7903
2,112	271.964	35.6468
2,128	274.38	35.9635
2,144	268.958	35.2529
2,160	273.752	35.8812
2,176	262.851	34.4524
2,192	272.935	35.7741
2,208	258.234	33.8473
2,224	272.811	35.7579
2,240	250.648	32.8529
2,256	274.222	35.9429
2,272	246.125	32.2601
2,288	278.434	36.4949
2,304	240.086	31.4686
2,320	276.598	36.2543
2,336	235.735	30.8982
2,352	274.523	35.9823
2,368	226.982	29.751
2,384	281.238	36.8625
2,400	223.112	29.2438
2,416	282.779	37.0644
2,432	218.77	28.6746
2,448	283.579	37.1693
2,464	212.667	27.8748
2,480	287.033	37.6219
2,496	209.21	27.4216
2,512	287.018	37.62
2,528	202.401	26.5291
2,544	285.671	37.4435
2,560	198.479	26.0151
2,576	23.6545	3.10044
2,592	10.8985	1.42849

Title: Re: Assessing the impact of TLB trashing on memory hard algorhitms
Post by: Omegasun on November 30, 2015, 07:38:22 PM

More details:

AMD 7970 Catalyst 15.7 Win 8.1

Code:

DAG size (MB)	Bandwidth (GB/s)	Hashrate (MH/s)
1,024	169.584	22.2277
1,028	201.148	26.3649
1,032	137.039	17.962
1,036	304.848	39.957
1,040	168.196	22.0458
1,044	306.698	40.1995
1,048	168.269	22.0554
1,052	306.984	40.237
1,056	165.834	21.7362
1,060	304.877	39.9609
1,064	163.642	21.4489
1,068	307.009	40.2403
1,072	165.116	21.6421
1,076	308.427	40.4261
1,080	164.043	21.5015
1,084	308.633	40.4532
1,088	163.075	21.3746
1,092	307.799	40.3439
1,096	160.319	21.0133
1,100	306.587	40.185
1,104	159.124	20.8566
1,108	306.287	40.1457
1,112	159.627	20.9226
1,116	307.489	40.3032
1,120	158.726	20.8045
1,124	305.745	40.0746
1,128	159.441	20.8983
1,132	306.6	40.1867
1,136	158.056	20.7168
1,140	303.038	39.7198
1,144	157.432	20.6349
1,148	305.833	40.0861
1,152	156.652	20.5326
1,156	305.136	39.9947
1,160	155.543	20.3874
1,164	303.046	39.7209
1,168	153.005	20.0547
1,172	299.497	39.2556
1,176	152.261	19.9571
1,180	300.396	39.3734
1,184	152.327	19.9657
1,188	301.339	39.4971
1,192	151.254	19.8251
1,196	300.391	39.3729
1,200	150.914	19.7806
1,204	299.912	39.31
1,208	149.358	19.5767
1,212	298.042	39.065
1,216	146.716	19.2303
1,220	294.716	38.6291
1,224	144.464	18.9352
1,228	293.327	38.4469
1,232	142.717	18.7062
1,236	293.99	38.5338
1,240	143.653	18.8288
1,244	293.986	38.5333
1,248	143.824	18.8513
1,252	294.655	38.6211
1,256	143.908	18.8624
1,260	294.698	38.6266
1,264	143.466	18.8044
1,268	295.105	38.6799
1,272	143.061	18.7513
1,276	294.811	38.6414
1,280	143.163	18.7647
1,284	422.562	55.386
1,288	188.832	24.7506
1,292	295.35	38.7121
1,296	189.787	24.8757
1,300	296.379	38.847
1,304	188.287	24.6791
1,308	298.672	39.1475
1,312	186.728	24.4748
1,316	297.253	38.9615
1,320	183.426	24.042
1,324	295.835	38.7757
1,328	181.456	23.7837
1,332	297.77	39.0293
1,336	179.955	23.5871
1,340	297.201	38.9548
1,344	179.453	23.5213
1,348	299.408	39.244
1,352	180.218	23.6215
1,356	301.928	39.5743
1,360	181.53	23.7935
1,364	298.281	39.0963
1,368	185.617	24.3292
1,372	302.053	39.5907
1,376	185.688	24.3385
1,380	299.94	39.3137
1,384	186.133	24.3968
1,388	298.336	39.1035
1,392	186.56	24.4528
1,396	295.824	38.7742
1,400	187.393	24.562
1,404	295.826	38.7746
1,408	188.098	24.6543
1,412	294.417	38.5898
1,416	190.336	24.9477
1,420	294.859	38.6477
1,424	193.624	25.3786
1,428	293.79	38.5076
1,432	195.098	25.5718
1,436	292.607	38.3526
1,440	197.223	25.8504
1,444	291.125	38.1583
1,448	198.542	26.0233
1,452	289.802	37.985
1,456	199.387	26.1341
1,460	288.238	37.7799
1,464	200.996	26.345
1,468	288.351	37.7947
1,472	202.043	26.4822
1,476	288.643	37.8331
1,480	202.655	26.5624
1,484	289.962	38.006
1,488	203.537	26.678
1,492	290.97	38.138
1,496	205.972	26.9972
1,500	288.161	37.7699
1,504	207.029	27.1358
1,508	287.617	37.6986
1,512	206.491	27.0651
1,516	287.496	37.6826
1,520	205.916	26.9899
1,524	287.823	37.7255
1,528	206.337	27.045
1,532	287.394	37.6693
1,536	208.207	27.2901
1,540	288.091	37.7606
1,544	210.887	27.6414
1,548	289.533	37.9496
1,552	212.232	27.8177
1,556	291.393	38.1934
1,560	215.511	28.2475
1,564	293.224	38.4335
1,568	219.839	28.8147
1,572	293.074	38.4139
1,576	219.205	28.7316
1,580	292.927	38.3946
1,584	219.551	28.777
1,588	292.805	38.3786
1,592	220.017	28.838
1,596	292.63	38.3556
1,600	220.541	28.9067
1,604	292.791	38.3767
1,608	219.929	28.8266
1,612	290.807	38.1167
1,616	220.381	28.8857
1,620	292.645	38.3575
1,624	219.375	28.7539
1,628	290.738	38.1077
1,632	220.587	28.9128
1,636	292.731	38.3689
1,640	221.095	28.9794
1,644	292.084	38.284
1,648	227.149	29.7729
1,652	291.895	38.2593
1,656	230.119	30.1621
1,660	293.108	38.4182
1,664	231.49	30.3419
1,668	291.726	38.2372
1,672	232.928	30.5303
1,676	293.916	38.5242
1,680	234.557	30.7439
1,684	291.779	38.2441
1,688	235.908	30.9209
1,692	291.393	38.1934
1,696	238.068	31.204
1,700	291.31	38.1826
1,704	238.506	31.2614
1,708	290.866	38.1244
1,712	240.209	31.4846
1,716	290.223	38.0402
1,720	241.831	31.6973
1,724	293.142	38.4227
1,728	243.461	31.911
1,732	291.057	38.1495
1,736	244.488	32.0455
1,740	291.978	38.2701
1,744	247.518	32.4427
1,748	292.685	38.3627
1,752	247.4	32.4272
1,756	291.118	38.1575
1,760	249.833	32.7461
1,764	290.777	38.1127
1,768	249.076	32.6469
1,772	291.167	38.1639
1,776	248.571	32.5807
1,780	291.049	38.1484
1,784	247.46	32.4351
1,788	291.456	38.2018
1,792	246.624	32.3255
1,796	290.457	38.0708
1,800	248.342	32.5506
1,804	291.001	38.1421
1,808	248.408	32.5593
1,812	290.238	38.042
1,816	248.047	32.512
1,820	283.815	37.2002
1,824	251.754	32.9978
1,828	291.202	38.1685
1,832	248.827	32.6143
1,836	289.854	37.9918
1,840	249.792	32.7407
1,844	291.752	38.2406
1,848	249.65	32.7222
1,852	291.76	38.2416
1,856	251.274	32.935
1,860	290.769	38.1117
1,864	249.961	32.7629
1,868	289.805	37.9854
1,872	252.607	33.1097
1,876	293.063	38.4123
1,880	253.033	33.1656
1,884	291.514	38.2093
1,888	254.853	33.4041
1,892	289.493	37.9444
1,896	254.539	33.363
1,900	290.336	38.0549
1,904	256.004	33.5549
1,908	289.006	37.8806
1,912	256.708	33.6473
1,916	285.48	37.4185
1,920	255.686	33.5133
1,924	8.20708	1.07572
1,928	270.429	35.4457
1,932	274.044	35.9195
1,936	274.753	36.0125
1,940	273.052	35.7895
1,944	272.767	35.7521
1,948	272.661	35.7382
1,952	271.728	35.6159
1,956	272.43	35.708
1,960	264.622	34.6845
1,964	272.508	35.7181
1,968	262.64	34.4248
1,972	272.497	35.7168
1,976	264.491	34.6673
1,980	270.625	35.4714
1,984	264.232	34.6334
1,988	271.633	35.6035
1,992	266.205	34.892
1,996	270.912	35.509
2,000	266.598	34.9436
2,004	271.738	35.6173
2,008	261.33	34.253
2,012	271.001	35.5206
2,016	260.38	34.1285
2,020	269.918	35.3786
2,024	256.16	33.5754
2,028	271.426	35.5763
2,032	256.371	33.6031
2,036	270.266	35.4243
2,040	255.108	33.4375
2,044	267.721	35.0907
2,048	253.612	33.2415
2,052	266.675	34.9536
2,056	261.409	34.2634
2,060	268.425	35.183
2,064	263.32	34.5139
2,068	269.019	35.2608
2,072	260.561	34.1523
2,076	268.476	35.1897
2,080	263.256	34.5055
2,084	272.322	35.6937
2,088	264.666	34.6903
2,092	273.405	35.8357
2,096	263.937	34.5948
2,100	272.581	35.7277
2,104	265.341	34.7788
2,108	269.691	35.349
2,112	265.216	34.7624
2,116	274.05	35.9203
2,120	264.289	34.6409
2,124	273.125	35.799
2,128	262.044	34.3467
2,132	273.024	35.7858
2,136	251.644	32.9835
2,140	270.812	35.4959
2,144	263.824	34.5799
2,148	270.174	35.4122
2,152	261.542	34.2809
2,156	273.66	35.8691
2,160	262.08	34.3513
2,164	270.529	35.4588
2,168	261.311	34.2505
2,172	271.981	35.6491
2,176	259.845	34.0584
2,180	271.602	35.5994
2,184	257.208	33.7127
2,188	275.531	36.1143
2,192	255.897	33.5409
2,196	272.157	35.6721
2,200	255.044	33.4292
2,204	274.641	35.9978
2,208	253.2	33.1874
2,212	275.592	36.1224
2,216	252.516	33.0978
2,220	271.312	35.5614
2,224	249.935	32.7595
2,228	274.28	35.9504
2,232	249.433	32.6937
2,236	275.829	36.1535
2,240	247.638	32.4584
2,244	276.536	36.2461
2,248	247.598	32.4532
2,252	274.525	35.9825
2,256	246.511	32.3107
2,260	276.15	36.1955
2,264	244.45	32.0405
2,268	275.098	36.0576
2,272	243.543	31.9217
2,276	277.45	36.366
2,280	241.421	31.6435
2,284	277.051	36.3136
2,288	239.379	31.3758
2,292	277.136	36.3247
2,296	239.134	31.3438
2,300	278.188	36.4626
2,304	237.303	31.1038
2,308	280.271	36.7356
2,312	236.599	31.0115
2,316	278.368	36.4863
2,320	234.686	30.7608
2,324	278.092	36.4501
2,328	233.993	30.6699
2,332	277.043	36.3126
2,336	234.41	30.7246
2,340	276.13	36.1929
2,344	232.09	30.4205
2,348	278.554	36.5107
2,352	228.067	29.8932
2,356	277.828	36.4154
2,360	224.97	29.4872
2,364	278.849	36.5492
2,368	224.327	29.4031
2,372	280.193	36.7255
2,376	224.697	29.4515
2,380	280.091	36.7121
2,384	222.211	29.1256
2,388	279.278	36.6056
2,392	220.997	28.9665
2,396	281.534	36.9013
2,400	221.729	29.0625
2,404	282.74	37.0593
2,408	220.127	28.8525
2,412	282.141	36.9807
2,416	217.894	28.5599
2,420	282.731	37.0581
2,424	218.422	28.629
2,428	281.104	36.8448
2,432	216.018	28.3139
2,436	283.408	37.1469
2,440	214.819	28.1568
2,444	282.6	37.041
2,448	214.679	28.1384
2,452	283.089	37.1051
2,456	212.861	27.9001
2,460	285.962	37.4817
2,464	210.865	27.6384
2,468	286.566	37.5608
2,472	210.772	27.6264
2,476	285.056	37.3629
2,480	208.119	27.2786
2,484	286.163	37.508
2,488	207.207	27.159
2,492	287.098	37.6305
2,496	206.643	27.0851
2,500	286.126	37.5031
2,504	205.334	26.9136
2,508	285.176	37.3785
2,512	204.538	26.8093
2,516	287.837	37.7274
2,520	201.79	26.449
2,524	288.896	37.8662
2,528	201.035	26.35
2,532	286.915	37.6065
2,536	199.967	26.21
2,540	288.168	37.7707
2,544	198.046	25.9582
2,548	287.959	37.7433
2,552	197.707	25.9139
2,556	287.504	37.6837
2,560	196.663	25.777
2,564	24.5104	3.21262
2,568	10.8853	1.42676

Title: Re: Assessing the impact of TLB trashing on memory hard algorhitms
Post by: Genoil on December 01, 2015, 09:06:10 AM

That's interesting. Many thanks.

Title: Re: Assessing the impact of TLB trashing on memory hard algorhitms
Post by: MaxDZ8 on December 01, 2015, 10:09:12 AM

I've looked at the resources. Considering the linked threads are 1) in German 2) hundreds of messages long I cannot be 100% sure I got it completely.

What I can tell you is that I've observed considerable lower than expected memory performance on GCN1.0 even with much smaller buffers. I think it is also worth noticing before 'compute on GPU' became a thing graphical resources always had an upper bound (!= from CL max alloc). It is my understanding no such limitation shall be in place at that point (and it should be bigger than 1GiB anyway)...
I'm still very surprised this hardware issue to manifest itself at such big bounds (unless the historical limitation still applies).

Leaving aside the max alloc, have you tried how varying the stride affect result (for K MiB buffer)?

Title: Re: Assessing the impact of TLB trashing on memory hard algorhitms
Post by: virasog on December 01, 2015, 01:59:38 PM

From the 640MB chunk size above, the hash rate changes between 20 to 40 MHz all the time, then it the difference reduces from 20 MH/s to a much lower value. Does it indicate an optimisation opportunity.

Can anybody make the chunk work in the ethminers?

The latest ethminer does not display the hash rate, it makes it difficult to compare the results. I wonder this can be added as well.

Code:

	catch (cl::Error const& err)
		{
			ETHCL_LOG("Allocating/mapping single buffer failed with: " << err.what() << "(" << err.err() << "). GPU can't allocate the DAG in a single chunk. Bailing.");
			return false;
#if 0		// Disabling chunking for release since it seems not to work. Never manages to mine a block. TODO: Fix when time is found.
			int errCode = err.err();
			if (errCode != CL_INVALID_BUFFER_SIZE || errCode != CL_MEM_OBJECT_ALLOCATION_FAILURE)
				ETHCL_LOG("Allocating/mapping single buffer failed with: " << err.what() << "(" << errCode << ")");
			cl_ulong result;
			// if we fail midway on the try above make sure we start clean
			m_dagChunks.clear();
			device.getInfo(CL_DEVICE_MAX_MEM_ALLOC_SIZE, &result);
			ETHCL_LOG(
				"Failed to allocate 1 big chunk. Max allocateable memory is "
				<< result << ". Trying to allocate 4 chunks."
			);
			// The OpenCL kernel has a hard coded number of 4 chunks at the moment
			m_dagChunksCount = 4;
			for (unsigned i = 0; i < m_dagChunksCount; i++)
			{
				// TODO Note: If we ever change to _dagChunksNum other than 4, then the size would need recalculation
				ETHCL_LOG("Creating buffer for chunk " << i);
				m_dagChunks.push_back(cl::Buffer(
					m_context,
					CL_MEM_READ_ONLY,
					(i == 3) ? (_dagSize - 3 * ((_dagSize >> 9) << 7)) : (_dagSize >> 9) << 7
				));
			}
			ETHCL_LOG("Loading chunk kernels");
			m_hashKernel = cl::Kernel(program, "ethash_hash_chunks");
			m_searchKernel = cl::Kernel(program, "ethash_search_chunks");
			// TODO Note: If we ever change to _dagChunksNum other than 4, then the size would need recalculation
			void* dag_ptr[4];
			for (unsigned i = 0; i < m_dagChunksCount; i++)
			{
				ETHCL_LOG("Mapping chunk " << i);
				dag_ptr[i] = m_queue.enqueueMapBuffer(m_dagChunks[i], true, m_openclOnePointOne ? CL_MAP_WRITE : CL_MAP_WRITE_INVALIDATE_REGION, 0, (i == 3) ? (_dagSize - 3 * ((_dagSize >> 9) << 7)) : (_dagSize >> 9) << 7);
			}
			for (unsigned i = 0; i < m_dagChunksCount; i++)
			{
				memcpy(dag_ptr[i], (char *)_dag + i*((_dagSize >> 9) << 7), (i == 3) ? (_dagSize - 3 * ((_dagSize >> 9) << 7)) : (_dagSize >> 9) << 7);
				m_queue.enqueueUnmapMemObject(m_dagChunks[i], dag_ptr[i]);
			}
#endif
		}

Title: Re: Assessing the impact of TLB trashing on memory hard algorhitms
Post by: Eliovp on December 01, 2015, 10:18:30 PM

Hey,

as you already noticed, it is indeed correct, a bigger dag file will decrease speed drastically.

I've done some tests too.

first results: 390X

Code:

DAG size (MB)	Bandwidth (GB/s)	Hashrate (MH/s)
128	261,88	34,3251
256	253,74	33,2583
384	261,27	34,2452
512	263,394	34,5236
640	256,832	33,6635
768	262,678	34,4298
896	253,595	33,2392
1 024	262,77	34,4418
1 152	260,289	34,1166
1 280	262,375	34,3901
1 408	262,426	34,3968
1 536	261,992	34,3398
1 664	263,126	34,4884
1 792	262,872	34,4552
1 920	262,028	34,3446
2 048	262,432	34,3975
2 176	246,609	32,3235
2 304	236,27	30,9684
2 432	222,884	29,2138
2 560	206,16	27,0218
2 688	192,144	25,1847
2 816	180,781	23,6953
2 944	170,977	22,4103
3 072	162,634	21,3168
3 200	155,515	20,3837
3 328	149,158	19,5505
3 456	143,477	18,8058
3 584	138,379	18,1376
3 712	133,821	17,5402
3 840	129,724	17,0031
3 968	126,137	16,533

Second result: Fury X (stopped @ 2816MB)

Code:

DAG size (MB)	Bandwidth (GB/s)	Hashrate (MH/s)
128	254,497	33,3574
256	251,412	32,953
384	250,2	32,7942
512	249,919	32,7574
640	249,457	32,6969
768	249,345	32,6821
896	249,108	32,6511
1 024	248,899	32,6237
1 152	248,822	32,6136
1 280	248,888	32,6223
1 408	248,679	32,5948
1 536	248,822	32,6137
1 664	248,653	32,5914
1 792	248,686	32,5957
1 920	248,547	32,5775
2 048	248,564	32,5798
2 176	244,005	31,9823
2 304	217,763	28,5426
2 432	165,121	21,6427
2 560	135,737	17,7913
2 688	118,606	15,5459
2 816	106,813	14,0002

"Indien nodig, is het best mogelijk om nog wat testen te doen hoor. heb nog andere kaarten..."

Title: Re: Assessing the impact of TLB trashing on memory hard algorhitms
Post by: Genoil on December 02, 2015, 12:12:36 PM

Quote from: virasog on December 01, 2015, 01:59:38 PM

Code:

	catch (cl::Error const& err)
		{
			ETHCL_LOG("Allocating/mapping single buffer failed with: " << err.what() << "(" << err.err() << "). GPU can't allocate the DAG in a single chunk. Bailing.");
			return false;
#if 0		// Disabling chunking for release since it seems not to work. Never manages to mine a block. TODO: Fix when time is found.
			int errCode = err.err();
			if (errCode != CL_INVALID_BUFFER_SIZE || errCode != CL_MEM_OBJECT_ALLOCATION_FAILURE)
				ETHCL_LOG("Allocating/mapping single buffer failed with: " << err.what() << "(" << errCode << ")");
			cl_ulong result;
			// if we fail midway on the try above make sure we start clean
			m_dagChunks.clear();
			device.getInfo(CL_DEVICE_MAX_MEM_ALLOC_SIZE, &result);
			ETHCL_LOG(
				"Failed to allocate 1 big chunk. Max allocateable memory is "
				<< result << ". Trying to allocate 4 chunks."
			);
			// The OpenCL kernel has a hard coded number of 4 chunks at the moment
			m_dagChunksCount = 4;
			for (unsigned i = 0; i < m_dagChunksCount; i++)
			{
				// TODO Note: If we ever change to _dagChunksNum other than 4, then the size would need recalculation
				ETHCL_LOG("Creating buffer for chunk " << i);
				m_dagChunks.push_back(cl::Buffer(
					m_context,
					CL_MEM_READ_ONLY,
					(i == 3) ? (_dagSize - 3 * ((_dagSize >> 9) << 7)) : (_dagSize >> 9) << 7
				));
			}
			ETHCL_LOG("Loading chunk kernels");
			m_hashKernel = cl::Kernel(program, "ethash_hash_chunks");
			m_searchKernel = cl::Kernel(program, "ethash_search_chunks");
			// TODO Note: If we ever change to _dagChunksNum other than 4, then the size would need recalculation
			void* dag_ptr[4];
			for (unsigned i = 0; i < m_dagChunksCount; i++)
			{
				ETHCL_LOG("Mapping chunk " << i);
				dag_ptr[i] = m_queue.enqueueMapBuffer(m_dagChunks[i], true, m_openclOnePointOne ? CL_MAP_WRITE : CL_MAP_WRITE_INVALIDATE_REGION, 0, (i == 3) ? (_dagSize - 3 * ((_dagSize >> 9) << 7)) : (_dagSize >> 9) << 7);
			}
			for (unsigned i = 0; i < m_dagChunksCount; i++)
			{
				memcpy(dag_ptr[i], (char *)_dag + i*((_dagSize >> 9) << 7), (i == 3) ? (_dagSize - 3 * ((_dagSize >> 9) << 7)) : (_dagSize >> 9) << 7);
				m_queue.enqueueUnmapMemObject(m_dagChunks[i], dag_ptr[i]);
			}
#endif
		}

It may be an oppurtunity for an optimization. The chunked implementation in current ethminer is disabled because it doesn't work. I'll see if I can find some time to check if this could work in ethminer.

Title: Re: Assessing the impact of TLB trashing on memory hard algorhitms
Post by: Dofnatues on December 02, 2015, 05:54:48 PM

Quote from: Genoil on December 02, 2015, 12:12:36 PM

Quote from: virasog on December 01, 2015, 01:59:38 PM

Code:

	catch (cl::Error const& err)
		{
			ETHCL_LOG("Allocating/mapping single buffer failed with: " << err.what() << "(" << err.err() << "). GPU can't allocate the DAG in a single chunk. Bailing.");
			return false;
#if 0		// Disabling chunking for release since it seems not to work. Never manages to mine a block. TODO: Fix when time is found.
			int errCode = err.err();
			if (errCode != CL_INVALID_BUFFER_SIZE || errCode != CL_MEM_OBJECT_ALLOCATION_FAILURE)
				ETHCL_LOG("Allocating/mapping single buffer failed with: " << err.what() << "(" << errCode << ")");
			cl_ulong result;
			// if we fail midway on the try above make sure we start clean
			m_dagChunks.clear();
			device.getInfo(CL_DEVICE_MAX_MEM_ALLOC_SIZE, &result);
			ETHCL_LOG(
				"Failed to allocate 1 big chunk. Max allocateable memory is "
				<< result << ". Trying to allocate 4 chunks."
			);
			// The OpenCL kernel has a hard coded number of 4 chunks at the moment
			m_dagChunksCount = 4;
			for (unsigned i = 0; i < m_dagChunksCount; i++)
			{
				// TODO Note: If we ever change to _dagChunksNum other than 4, then the size would need recalculation
				ETHCL_LOG("Creating buffer for chunk " << i);
				m_dagChunks.push_back(cl::Buffer(
					m_context,
					CL_MEM_READ_ONLY,
					(i == 3) ? (_dagSize - 3 * ((_dagSize >> 9) << 7)) : (_dagSize >> 9) << 7
				));
			}
			ETHCL_LOG("Loading chunk kernels");
			m_hashKernel = cl::Kernel(program, "ethash_hash_chunks");
			m_searchKernel = cl::Kernel(program, "ethash_search_chunks");
			// TODO Note: If we ever change to _dagChunksNum other than 4, then the size would need recalculation
			void* dag_ptr[4];
			for (unsigned i = 0; i < m_dagChunksCount; i++)
			{
				ETHCL_LOG("Mapping chunk " << i);
				dag_ptr[i] = m_queue.enqueueMapBuffer(m_dagChunks[i], true, m_openclOnePointOne ? CL_MAP_WRITE : CL_MAP_WRITE_INVALIDATE_REGION, 0, (i == 3) ? (_dagSize - 3 * ((_dagSize >> 9) << 7)) : (_dagSize >> 9) << 7);
			}
			for (unsigned i = 0; i < m_dagChunksCount; i++)
			{
				memcpy(dag_ptr[i], (char *)_dag + i*((_dagSize >> 9) << 7), (i == 3) ? (_dagSize - 3 * ((_dagSize >> 9) << 7)) : (_dagSize >> 9) << 7);
				m_queue.enqueueUnmapMemObject(m_dagChunks[i], dag_ptr[i]);
			}
#endif
		}

If you can make it work, you save a lot of AMD card from being useful in a month or two.

By the way, why the latest ethminer (1.1.0) does not has rate?

Title: Re: Assessing the impact of TLB trashing on memory hard algorhitms
Post by: Grim on December 10, 2015, 08:16:43 AM

Besides all that TLB trashing how come the 280x has more bandwidth (~300GBs) compared to

390x only having 262 GBs

Fury X only 249 GBs

???

(also besides bandwidth the gpu memory timings seem to be a major factor)

PS: maybe the 280x has optimized timings from the stilt (bios update)?

Title: Re: Assessing the impact of TLB trashing on memory hard algorhitms
Post by: MaxDZ8 on December 13, 2015, 09:08:23 AM

It's a possibility. I am positive the distribution of math operations VS mem access has a major incidence in GCN; the OpenCL AMD driver is super fast but also very stupid.

Title: Re: Assessing the impact of TLB trashing on memory hard algorhitms
Post by: Fasdurcas on December 13, 2015, 07:51:42 PM

We have about 60 days for most AMD to work without an update of the ethminer. Who is responsible for update the software?

Title: Re: Assessing the impact of TLB trashing on memory hard algorhitms
Post by: Masked_Immortal on December 14, 2015, 08:32:16 AM

is this issue just related to bandwidth, gtx970 has less bandwidth than 280x.
and what about Maxwell GPU?

Title: Re: Assessing the impact of TLB trashing on memory hard algorhitms
Post by: Genoil on December 15, 2015, 01:13:26 PM

Quote from: Fasdurcas on December 13, 2015, 07:51:42 PM

We have about 60 days for most AMD to work without an update of the ethminer. Who is responsible for update the software?

I filed this as a potential threat in the Ethereum bug bounty program but haven't anything from their end. Keep in mind I'm not 100% certain about this bug. It was an issue with my dagSimCL test program that may apply to ethminer as well. Unfortunately I don't have any time at the moment to further look into this. If it really is an issue, rest assured the private kernel gang has already jumped on it and resolved it, possibly using the approach that's publicly available (https://github.com/Genoil/dagSimCL/commit/cd900ffd83559a3764abfe2fbc6aa5d509c7a448) in the dagSimCL repo. The owners of such modded kernels should be in for some serious profit...

Quote from: Masked_Immortal on December 14, 2015, 08:32:16 AM

is this issue just related to bandwidth, gtx970 has less bandwidth than 280x.
and what about Maxwell GPU?

Maxwell cards with Compute 5.2 (GTX 9xx) only start suffering badly from TLB trashing after 2GB+ allocations, so they are fine until the switch to POS. Maxwell cards with Compute 5.0 (GTX750) have already bitten the dust and are useless for ETH mining.

Note that TLB trashing and the AMD max allocation problem are two separate issues.

Title: Re: Assessing the impact of TLB trashing on memory hard algorhitms
Post by: vatusasid on December 15, 2015, 02:30:22 PM

The developers of Ethereum were paid 13 million Ethe. How come they could not solve this problem? The ethminer 1.0.1 is not stable yet.

Title: Re: Assessing the impact of TLB trashing on memory hard algorhitms
Post by: Omegasun on January 08, 2016, 09:11:37 AM

Is there any news about the development of the etherminer so that it can cope with the larger DAG size. We are approaching 1280MB.

Title: Re: Assessing the impact of TLB trashing on memory hard algorhitms
Post by: Bagdar13 on January 12, 2016, 08:54:00 PM

Quote from: Omegasun on January 08, 2016, 09:11:37 AM

Is there any news about the development of the etherminer so that it can cope with the larger DAG size. We are approaching 1280MB.

I have heard nothing. I also tested this and found a problem at ~1280 as well.

I started looking on forums as I noticed a substancially drop off on the 7970 cards at ~1.2 GB.

At this point my 280xs are down from 27 to about 24
and
At this point my 7970s are down from 22 to about 17 each (this was what supprised me and this problem is present on XFX, powercolor and one other)

This drop in performance seems to be larger than the expected drop

Oddly enough my 7870s seem to have suffered little if any performance hit and are still happy doing 15 same as at launch.

Edit my 7870s are the ghz edition, however, i also have a sapphire and a 7870 MIST (which is really a broken 7950) both of these are also unaffected.

Food for thought.

Title: Re: Assessing the impact of TLB trashing on memory hard algorhitms
Post by: Venon on January 13, 2016, 08:32:35 PM

Quote from: Bagdar13 on January 12, 2016, 08:54:00 PM

Quote from: Omegasun on January 08, 2016, 09:11:37 AM

Is there any news about the development of the etherminer so that it can cope with the larger DAG size. We are approaching 1280MB.

Do you find the drop during the test or the actual mining?

Title: Re: Assessing the impact of TLB trashing on memory hard algorhitms
Post by: sp_ on January 13, 2016, 09:05:42 PM

Let the dagger grow. The ether algo will be perfect for the botnets.

Title: Re: Assessing the impact of TLB trashing on memory hard algorhitms
Post by: Bagdar13 on January 15, 2016, 04:24:49 AM

Quote from: Venon on January 13, 2016, 08:32:35 PM

Quote from: Bagdar13 on January 12, 2016, 08:54:00 PM

Quote from: Omegasun on January 08, 2016, 09:11:37 AM

Is there any news about the development of the etherminer so that it can cope with the larger DAG size. We are approaching 1280MB.

Do you find the drop during the test or the actual mining?

I am now dropping in actual mining on this hardware with the dag update at block 840000; the point being is my drop in hash seems to be more than predicted by the size of DAG increase.

Title: Re: Assessing the impact of TLB trashing on memory hard algorhitms
Post by: Dofnatues on January 15, 2016, 06:15:27 PM

Because of the drop of the hash rate. I decided to reduce the core clock frequency and keep the memory frequency the same. Is that a good idea?

Title: Re: Assessing the impact of TLB trashing on memory hard algorhitms
Post by: Genoil on January 18, 2016, 04:09:29 PM

I just finished implementing the chunk allocation into my fork of ethminer.

https://github.com/Genoil/cpp-ethereum/tree/opencl-chunks

By allocating DAG memory in chunks (--cl-chunks <chunkSizeInMB>), issues with RAM allocation may be averted. A nice side effect of this may be (significantly) higher hashrates. Based on what I've seen from people using dagSimCL, --cl-chunks 640 yields quite good results. It may be however that there is a correlation between optimal setting of chunk size vs dag size.

I wrote this change without access to AMD hardware, so your mileage may vary. Don't bother trying this on CUDA devices, using chunks there only has a negative impact on hashrate.

Title: Re: Assessing the impact of TLB trashing on memory hard algorhitms
Post by: RustyNoman on January 18, 2016, 05:54:13 PM

Quote from: Genoil on January 18, 2016, 04:09:29 PM

Do you have instructions for building the problem. Do you have an exe version so that we can try.

Title: Re: Assessing the impact of TLB trashing on memory hard algorhitms
Post by: Genoil on January 18, 2016, 06:02:29 PM

Binary is on the eth forum in mining section

Title: Re: Assessing the impact of TLB trashing on memory hard algorhitms
Post by: Justicemaxx on January 19, 2016, 07:42:35 PM

Quote from: Genoil on January 18, 2016, 06:02:29 PM

Binary is on the eth forum in mining section

I tried with different settings of chunks, 640, 660, these figures reduce hash rate about 3 times on R280x, R290. 6x280x give about 50 MGh. At the same time setting 1300 or more does not affect the speed, the speed becomes normal, about 150 MGh, and chunks 1300 give 150 MGh. Maybe I do something wrong? ....Before starting hash, miner writes that he can't create 2 block DAG file because it is blocked GPU.

Title: Re: Assessing the impact of TLB trashing on memory hard algorhitms
Post by: mandica on January 19, 2016, 09:04:29 PM

Quote from: Justicemaxx on January 19, 2016, 07:42:35 PM

Quote from: Genoil on January 18, 2016, 06:02:29 PM

Binary is on the eth forum in mining section

Did your miner submit valid shares?

Title: Re: Assessing the impact of TLB trashing on memory hard algorhitms
Post by: Justicemaxx on January 20, 2016, 01:12:28 PM

I have extracted a solo miner showed resolve, but the server node seems to be ignored, that is, the balls from the miner with chunks.

Title: Re: Assessing the impact of TLB trashing on memory hard algorhitms
Post by: Assanger on January 20, 2016, 04:37:58 PM

Quote from: Justicemaxx on January 20, 2016, 01:12:28 PM

I have extracted a solo miner showed resolve, but the server node seems to be ignored, that is, the balls from the miner with chunks.

Does it mean the etherminer is mining, but the shares are not recognized?

Title: Re: Assessing the impact of TLB trashing on memory hard algorhitms
Post by: Justicemaxx on January 20, 2016, 04:47:56 PM

Yes

Title: Re: Assessing the impact of TLB trashing on memory hard algorhitms
Post by: Genoil on January 25, 2016, 09:00:14 AM

Quote from: Justicemaxx on January 20, 2016, 01:12:28 PM

I have extracted a solo miner showed resolve, but the server node seems to be ignored, that is, the balls from the miner with chunks.

LOL true. I'm sorry man, just knocked this out blindly without access to an actual AMD card. For now, some further testing by others have indicated there presently no need to worry about allocation problems in the near future. I wil have to verify for myself to be absolutely sure though.

Title: Re: Assessing the impact of TLB trashing on memory hard algorhitms
Post by: Marvell1 on January 25, 2016, 10:01:22 PM

Quote from: Genoil on January 25, 2016, 09:00:14 AM

Quote from: Justicemaxx on January 20, 2016, 01:12:28 PM

I have extracted a solo miner showed resolve, but the server node seems to be ignored, that is, the balls from the miner with chunks.

I'd could send you one of my 7950s if you want to pay for shipping i have a bunch laying around due to no motherboards to host them in.

This dag problem is getting huge for my my 900mh/s farm is down to like 700mh/s

Title: Re: Assessing the impact of TLB trashing on memory hard algorhitms
Post by: mandica on January 26, 2016, 08:30:57 PM

Quote from: Marvell1 on January 25, 2016, 10:01:22 PM

Quote from: Genoil on January 25, 2016, 09:00:14 AM

Quote from: Justicemaxx on January 20, 2016, 01:12:28 PM

I have extracted a solo miner showed resolve, but the server node seems to be ignored, that is, the balls from the miner with chunks.

The Dag problem is not a problem as it affect all the graphics cards. But I heard that it affects R9 380 less.

Title: Re: Assessing the impact of TLB trashing on memory hard algorhitms
Post by: Akarabzie on January 26, 2016, 10:00:18 PM

Quote from: mandica on January 26, 2016, 08:30:57 PM

Quote from: Marvell1 on January 25, 2016, 10:01:22 PM

Quote from: Genoil on January 25, 2016, 09:00:14 AM

Quote from: Justicemaxx on January 20, 2016, 01:12:28 PM

I have extracted a solo miner showed resolve, but the server node seems to be ignored, that is, the balls from the miner with chunks.

The Dag problem is not a problem as it affect all the graphics cards. But I heard that it affects R9 380 less.

I keep hearing this as well, but i don't think I've seen enough data to be sure about this yet, or the reason why the 380s aren't affected. Is it the difference in memory types or what? Also what kind of difference if any does the trashing have on the 380 vs 380X?

Title: Re: Assessing the impact of TLB trashing on memory hard algorhitms
Post by: RustyNoman on January 27, 2016, 08:58:23 AM

Quote from: Akarabzie on January 26, 2016, 10:00:18 PM

Quote from: mandica on January 26, 2016, 08:30:57 PM

Quote from: Marvell1 on January 25, 2016, 10:01:22 PM

Quote from: Genoil on January 25, 2016, 09:00:14 AM

Quote from: Justicemaxx on January 20, 2016, 01:12:28 PM

I have extracted a solo miner showed resolve, but the server node seems to be ignored, that is, the balls from the miner with chunks.

The Dag problem is not a problem as it affect all the graphics cards. But I heard that it affects R9 380 less.

Yes. We need more data to assess the situation. I am also interested in knowing the performance of 380 vs 380x.

Title: Re: Assessing the impact of TLB trashing on memory hard algorhitms
Post by: Marvell1 on January 27, 2016, 05:45:52 PM

Quote from: RustyNoman on January 27, 2016, 08:58:23 AM

Quote from: Akarabzie on January 26, 2016, 10:00:18 PM

Quote from: mandica on January 26, 2016, 08:30:57 PM

Quote from: Marvell1 on January 25, 2016, 10:01:22 PM

Quote from: Genoil on January 25, 2016, 09:00:14 AM

Quote from: Justicemaxx on January 20, 2016, 01:12:28 PM

I have extracted a solo miner showed resolve, but the server node seems to be ignored, that is, the balls from the miner with chunks.

The Dag problem is not a problem as it affect all the graphics cards. But I heard that it affects R9 380 less.

Yes. We need more data to assess the situation. I am also interested in knowing the performance of 380 vs 380x.

I have both the 380 and 380x 4G cards and the hash rate is pretty underwhelming 18mh/s vs 19.5 mh/s max it seems. They are both pretty power hungry too around 240 watts maybe 250 for the x.

a 7950 gets close to 23 mhs/s for around the same power. One thing i do notice is the hash rate on the 380 and 380x has remained constant regardless of DAG size vs the drop in hash rate of the 7950s to around 22-21 mh/s not sure to make of all of this .

I think the best bet right now is to get 390s and mix and match them with 380 so at least you get better relsae value on your GPU's vs the older cards unles you can get them really cheap.

the problem with the 390 and 390x is the run crazy hot and consume close to 300 wats of power , thats even worse with a 290x

I'm trying out various brands of 380x cards this week but form my estimation its not worth it to pay anthing more for the 380x at least for mining since it hashes only 5% higer than the 380 and uses more power basically a worthless card.

Title: Re: Assessing the impact of TLB trashing on memory hard algorhitms
Post by: RustyNoman on January 28, 2016, 11:06:29 AM

Quote from: Marvell1 on January 27, 2016, 05:45:52 PM

Quote from: RustyNoman on January 27, 2016, 08:58:23 AM

Quote from: Akarabzie on January 26, 2016, 10:00:18 PM

Quote from: mandica on January 26, 2016, 08:30:57 PM

Quote from: Marvell1 on January 25, 2016, 10:01:22 PM

Quote from: Genoil on January 25, 2016, 09:00:14 AM

Quote from: Justicemaxx on January 20, 2016, 01:12:28 PM

I have extracted a solo miner showed resolve, but the server node seems to be ignored, that is, the balls from the miner with chunks.

The Dag problem is not a problem as it affect all the graphics cards. But I heard that it affects R9 380 less.

Yes. We need more data to assess the situation. I am also interested in knowing the performance of 380 vs 380x.

380x has 2048 cores while 380 has 1792. The core number is 14% higher, but the hash rate is just 5% high with higher power consumption. So it is not worth it.

Title: Re: Assessing the impact of TLB trashing on memory hard algorhitms
Post by: adaseb on February 25, 2016, 08:33:27 AM

So we are currently at 1280MB for the DAG file size and most people are still mining. Was the bug fixed?

Title: Re: Assessing the impact of TLB trashing on memory hard algorhitms
Post by: Genoil on February 25, 2016, 08:53:33 AM

Quote from: adaseb on February 25, 2016, 08:33:27 AM

So we are currently at 1280MB for the DAG file size and most people are still mining. Was the bug fixed?

It turned out the big bug wasn't really there. My (false) assumptions were based on reports by testers of dagSimCL who apparently didn't know how to tune their AMD cards correctly.

The impact of DAG size on hashrate is a fact though. While on Nvidia it has the most dramatic effects in certain circumstances, the impact on AMD cards has been growing steadily now to such a level that the 280X is now dethroned as most cost-effective card to mine on, losing its position to GTX970 on Win7/Linux.

Title: Re: Assessing the impact of TLB trashing on memory hard algorhitms
Post by: adaseb on February 25, 2016, 09:11:11 AM

Quote from: Genoil on February 25, 2016, 08:53:33 AM

Quote from: adaseb on February 25, 2016, 08:33:27 AM

So we are currently at 1280MB for the DAG file size and most people are still mining. Was the bug fixed?

I noticed the decrease in speed also.

The 970 however seems to be at least double in price compared to the 280X.

Title: Re: Assessing the impact of TLB trashing on memory hard algorhitms
Post by: Genoil on February 25, 2016, 09:16:23 AM

Quote from: adaseb on February 25, 2016, 09:11:11 AM

Quote from: Genoil on February 25, 2016, 08:53:33 AM

Quote from: adaseb on February 25, 2016, 08:33:27 AM

So we are currently at 1280MB for the DAG file size and most people are still mining. Was the bug fixed?

I noticed the decrease in speed also.

The 970 however seems to be at least double in price compared to the 280X.

Yes it only counts when you have already ROI'd on the cards mining other coins :)

Title: Re: Assessing the impact of TLB trashing on memory hard algorhitms
Post by: sp_ on February 25, 2016, 09:27:14 AM

You can get the gtx 970 to 21 MHASH by putting the gtx 970 in P1 mode. (nvidia-smi tool).

The best card for mining etherum is the r9 Nano. It does 28MHASH.

Title: Re: Assessing the impact of TLB trashing on memory hard algorhitms
Post by: Realetim on February 25, 2016, 09:51:42 AM

Quote from: sp_ on February 25, 2016, 09:27:14 AM

You can get the gtx 970 to 21 MHASH by putting the gtx 970 in P1 mode. (nvidia-smi tool).

The best card for mining etherum is the r9 Nano. It does 28MHASH.

Does the R9 nano use more electricity? Which is more efficient in terms of hash per watt? Nano or 970?

Title: Re: Assessing the impact of TLB trashing on memory hard algorhitms
Post by: sp_ on February 25, 2016, 10:07:50 AM

Quote from: Realetim on February 25, 2016, 09:51:42 AM

Quote from: sp_ on February 25, 2016, 09:27:14 AM

You can get the gtx 970 to 21 MHASH by putting the gtx 970 in P1 mode. (nvidia-smi tool).
The best card for mining etherum is the r9 Nano. It does 28MHASH.

Does the R9 nano use more electricity? Which is more efficient in terms of hash per watt? Nano or 970?

The NANO use less electricity, but cost more.

Title: Re: Assessing the impact of TLB trashing on memory hard algorhitms
Post by: apriyoni on February 25, 2016, 12:55:14 PM

Quote from: sp_ on February 25, 2016, 10:07:50 AM

Quote from: Realetim on February 25, 2016, 09:51:42 AM

Quote from: sp_ on February 25, 2016, 09:27:14 AM

You can get the gtx 970 to 21 MHASH by putting the gtx 970 in P1 mode. (nvidia-smi tool).
The best card for mining etherum is the r9 Nano. It does 28MHASH.

Does the R9 nano use more electricity? Which is more efficient in terms of hash per watt? Nano or 970?

The NANO use less electricity, but cost more.

The R9 nano costs £388 while the 970 costs £250. So there is £138 or $200 difference. that is quite a lot.

Title: Re: Assessing the impact of TLB trashing on memory hard algorhitms
Post by: sp_ on February 25, 2016, 01:53:10 PM

Quote from: apriyoni on February 25, 2016, 12:55:14 PM

Quote from: sp_ on February 25, 2016, 10:07:50 AM

Quote from: Realetim on February 25, 2016, 09:51:42 AM

Quote from: sp_ on February 25, 2016, 09:27:14 AM

You can get the gtx 970 to 21 MHASH by putting the gtx 970 in P1 mode. (nvidia-smi tool).
The best card for mining etherum is the r9 Nano. It does 28MHASH.

Does the R9 nano use more electricity? Which is more efficient in terms of hash per watt? Nano or 970?

The NANO use less electricity, but cost more.

The R9 nano costs £388 while the 970 costs £250. So there is £138 or $200 difference. that is quite a lot.

33% faster and 55% more expensive, but it draws less power..

Title: Re: Assessing the impact of TLB trashing on memory hard algorhitms
Post by: rednoW on February 25, 2016, 02:07:19 PM

Quote from: sp_ on February 25, 2016, 09:27:14 AM

You can get the gtx 970 to 21 MHASH by putting the gtx 970 in P1 mode. (nvidia-smi tool).

The best card for mining etherum is the r9 Nano. It does 28MHASH.

nope, the best card for eth is 390x now, fury is good for decred

Title: Re: Assessing the impact of TLB trashing on memory hard algorhitms
Post by: adaseb on February 26, 2016, 07:19:10 AM

You guys are all wrong the best card to mine is probably the 7950/7970 since its can be bought second hand dirt cheap. And it gets 20Mh/s.

Buying the Nano or Fury? What are the chances that ETH will still be profitable the day you get ROI ?

Title: Re: Assessing the impact of TLB trashing on memory hard algorhitms
Post by: Satlite on February 26, 2016, 08:34:50 AM

Quote from: sp_ on February 25, 2016, 01:53:10 PM

Quote from: apriyoni on February 25, 2016, 12:55:14 PM

Quote from: sp_ on February 25, 2016, 10:07:50 AM

Quote from: Realetim on February 25, 2016, 09:51:42 AM

Quote from: sp_ on February 25, 2016, 09:27:14 AM

You can get the gtx 970 to 21 MHASH by putting the gtx 970 in P1 mode. (nvidia-smi tool).
The best card for mining etherum is the r9 Nano. It does 28MHASH.

Does the R9 nano use more electricity? Which is more efficient in terms of hash per watt? Nano or 970?

The NANO use less electricity, but cost more.

The R9 nano costs £388 while the 970 costs £250. So there is £138 or $200 difference. that is quite a lot.

33% faster and 55% more expensive, but it draws less power..

In percentage term, it could be a good deal if you can squeeze 6 GPU and reduce the overhead of the system.

Title: Re: Assessing the impact of TLB trashing on memory hard algorhitms
Post by: RustyNoman on March 10, 2016, 10:27:47 AM

I usually use 8 GPU in a system. 4x7990 + 4x other GPUs. AMD allow up to 8 GPU in the Windows sytem. So I use 8.

Title: Re: Assessing the impact of TLB trashing on memory hard algorhitms
Post by: Akarabzie on March 10, 2016, 02:53:49 PM

Quote from: RustyNoman on March 10, 2016, 10:27:47 AM

I usually use 8 GPU in a system. 4x7990 + 4x other GPUs. AMD allow up to 8 GPU in the Windows sytem. So I use 8.

Most people don't like using the 7990s becuase they are pretty finicky and a pain to keep cool. I haven't had too much problem with mine after a pretty big underclock. I had one GPU go out on me while the other worked, and I had some problems with another one constantly crashing my system. I'd rather just run (5) 280Xs with no system downtime. Hey if you actually got 4x7990s running with no issues, more power to you. Your rig is like what 2100 watts?

Title: Re: Assessing the impact of TLB trashing on memory hard algorhitms
Post by: asrilani on March 10, 2016, 04:03:09 PM

Quote from: Akarabzie on March 10, 2016, 02:53:49 PM

Quote from: RustyNoman on March 10, 2016, 10:27:47 AM

I usually use 8 GPU in a system. 4x7990 + 4x other GPUs. AMD allow up to 8 GPU in the Windows sytem. So I use 8.

I have 4x7990+4x7970. I undervolt and underclock them a lot. 950mv, 850/1500 MHz, the power is about 1330 and hah rate = 156 MH/s.

Title: Re: Assessing the impact of TLB trashing on memory hard algorhitms
Post by: vatusasid on March 28, 2016, 07:58:09 AM

Quote from: asrilani on March 10, 2016, 04:03:09 PM

Quote from: Akarabzie on March 10, 2016, 02:53:49 PM

Quote from: RustyNoman on March 10, 2016, 10:27:47 AM

I usually use 8 GPU in a system. 4x7990 + 4x other GPUs. AMD allow up to 8 GPU in the Windows sytem. So I use 8.

I have 4x7990+4x7970. I undervolt and underclock them a lot. 950mv, 850/1500 MHz, the power is about 1330 and hah rate = 156 MH/s.

I have similar configurations. The hash rate is just 149 MH/s. So the DAG file size has reduced the hash rate by 3%.

Title: Re: Assessing the impact of TLB trashing on memory hard algorhitms
Post by: Akarabzie on March 29, 2016, 08:35:13 PM

Quote from: vatusasid on March 28, 2016, 07:58:09 AM

Quote from: asrilani on March 10, 2016, 04:03:09 PM

Quote from: Akarabzie on March 10, 2016, 02:53:49 PM

Quote from: RustyNoman on March 10, 2016, 10:27:47 AM

I usually use 8 GPU in a system. 4x7990 + 4x other GPUs. AMD allow up to 8 GPU in the Windows sytem. So I use 8.

I have 4x7990+4x7970. I undervolt and underclock them a lot. 950mv, 850/1500 MHz, the power is about 1330 and hah rate = 156 MH/s.

I have similar configurations. The hash rate is just 149 MH/s. So the DAG file size has reduced the hash rate by 3%.

What are you guys using to undervolt your 7990s? Just looked at mine and realized they werent actually changing from stock speeds. This may help me keep them running 24/7 since i still get the occasional crash form my 7990.

Title: Re: Assessing the impact of TLB trashing on memory hard algorhitms
Post by: Venon on April 01, 2016, 12:58:36 PM

I undervolt my 7990 to 950 mV, and the frequency is from 820 to 880 MHz, depending on the cards.

Title: Re: Assessing the impact of TLB trashing on memory hard algorhitms
Post by: adaseb on September 28, 2016, 11:35:40 AM

Bumping this thread...

Wondering if the RX470/RX480 will be affected by the sudden drop in hashpower when the DAG files goes to 2050MB.

Is that Dag Simulator accurate? Seems that all cards would suffer at >2GB and not just the Tahiti based cards.

Title: Re: Assessing the impact of TLB trashing on memory hard algorhitms
Post by: nerdralph on September 28, 2016, 11:42:56 AM

Quote from: adaseb on September 28, 2016, 11:35:40 AM

AMD GCN does not have a TLB to trash. See pg. 10.
https://www.amd.com/Documents/GCN_Architecture_whitepaper.pdf

Title: Re: Assessing the impact of TLB trashing on memory hard algorhitms
Post by: ahsbqt on November 30, 2018, 07:45:45 PM

Old thread, but R9 390 are are doing very bad these days 26mhs thanks to tlb bug.

Title: Re: Assessing the impact of TLB trashing on memory hard algorhitms
Post by: adaseb on December 01, 2018, 08:57:00 AM

Quote from: ahsbqt on November 30, 2018, 07:45:45 PM

Old thread, but R9 390 are are doing very bad these days 26mhs thanks to tlb bug.

Yes with the R9 290 its even worse. I think I got 29MH/s with the stock clock settings on (947/1250) with Stilt bios. Now it gets less than 25MH/s and despite the speed decrease the power consumption more or less remains the same and hence its no longer profitable to mine with those GPUs.

Surprisingly they still hold a decent value for gamers and are selling on eBay for fair prices. Will most likely be putting mine up for auction soon. Highly doubt AMD will release a fix for the Hawaii chipsets.

Bitcoin Forum

Alternate cryptocurrencies => Mining (Altcoins) => Topic started by: Genoil on November 28, 2015, 10:24:19 AM