Bitcoin Forum
April 20, 2024, 03:00:21 AM *
News: Latest Bitcoin Core release: 26.0 [Torrent]
 
  Home Help Search Login Register More  
  Show Posts
Pages: « 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 [51] 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 ... 208 »
1001  Alternate cryptocurrencies / Altcoin Discussion / Re: The impact of bad crypto (DASH, SDC, etc). How much does math matter? on: April 23, 2016, 12:47:04 AM
Cost for the attacker: Millions of USD to buy masternodes

There is no such cost, since nothing is consumed. You still have the masternodes when you are done. If done as a malicious attack that reduces the value of the token, an attacker would have already stripped his exposure from his stake and resold it via derivatives. If merely done for spying purposes then you can continue to both spy and collect masternode rewards. This will outcompete honest masternodes over time since spying has economic value.

Yeah, well, if you want it that way, buying the monero supply to sybil all the mixins (a la BCN-83% "their mixin is insecure"), would also be a feasible economic attack. So XMR = REKT by the "theoretical" buyer of most coins  Cry Cry Cry

In practice it doesn't work that way.
1002  Alternate cryptocurrencies / Altcoin Discussion / Re: The impact of bad crypto (DASH, SDC, etc). How much does math matter? on: April 23, 2016, 12:27:06 AM
Mixin 0 was always a bad idea and a security weakness, no matter if a deanonymizing implementation was getting it right, wrong, or guessing.
1003  Alternate cryptocurrencies / Altcoin Discussion / Re: The impact of bad crypto (DASH, SDC, etc). How much does math matter? on: April 22, 2016, 11:16:52 PM
The poll was not focused on privacy, the danger of high school level mathematics is far greater than that.

The "high school mathematics" mentioned were of the following type:

"If someone has XXX masternodes then they can jam an InstantX transaction X% of the time because InstantX locking is performed on the masternodes".

Cost for the attacker: Millions of USD to buy masternodes
Gains for the attacker: No gains. Only losses by devaluing his investment.

Elementary game theory logic = violated.
1004  Alternate cryptocurrencies / Altcoin Discussion / Re: The impact of bad crypto (DASH, SDC etc). How much does math matter? on: April 22, 2016, 10:51:14 PM
This issue is related to privacy not "bad crypto" or math.

It's bad crypto alright. Monero users were transacting "anonymously" for a year only to discover later that they could be trivially deanonymized because those in charge hadn't fixed a "hole" in the system from the start.

As to the InstantX jamming theoretical attack:

The attack vector on InstantX was about the attacker owning hundreds or thousands of masternodes (ie paying tens of millions of USD to acquire them) just to ...jam a InstantX transaction, which, if failed, would go as a standard transaction.

So, the game theory of the attack vector is that someone will pay tens of millions of dollars to jam an instant x transaction, while undermining his money in the process.

Do you see that the game theory of the attack vector is completely broken in terms of costs to the attackers and gains for the attacker?

That's elementary logic right there.

It would be like saying "bitcoin is fundamentally flawed because someone could buy 51% of the mining equipment and attack it". Yeah, well, if they did that, their equipment would then be useless. It's an economic suicide for the attacker, so to speak. The game theory has to account for this, no?
1005  Alternate cryptocurrencies / Altcoin Discussion / Re: The impact of bad crypto (DASH, SDC etc). How much does math matter? on: April 22, 2016, 10:39:36 PM
XMR / Monero broken crypto:

I think chainradar are using all the 0 mixin transactions from exchanges and pools in order to guess - the things in https://lab.getmonero.org/pubs/MRL-0004.pdf. I tried some transactions with mixing 7 and 5 between my wallets and they are successfully guessing most of them. This issue is already addressed in the MRL-0004 and we knew that, but it's scary seeing it in chainradar. Everybody should stop using mixing of 0 until this is enforced in the protocol - including pools and exchanges. I suppose some mixings between your own wallets with high mixing should resolve the issue for now. Trollfest incoming Sad.

Cry Cry Cry
1006  Alternate cryptocurrencies / Altcoin Discussion / Re: Why the bitmonero/monero Ninjalaunched Cripplemined Fastmine matters on: April 22, 2016, 01:45:42 AM
Monero - Cloning the premined uber-scam "Bytecoin" and avoid scanning the code for fraudulent algorithms
Monero - Insisting on having oh so much integrity yet irresponsibly/willfully shipping a crippled shit-miner to unsuspecting victims/"users"
Monero - Ninjamining an unknown shit-ton of coins within our inner circle through optimized miners while the public gets the crippled one
Monero - The perfect instrument for ninjamining because our CryptoNote-blockchain (aka copycat technology) is opaque
Monero - Projecting our fraud on DASH and yelling "instamine" because no one can see that we are the actual scammers!
Monero - Evan should have chosen CryptoNote for DASH to hide his 5 trillion DASH instamine like we did with our ninjamine!
Monero - Failing to come up with a single innovation of our own and yelling "stop playing the innovation card" when DASH is proven as superior (butthurt)

Monero - Cloning a scam, putting make up on the pig, hoping no one notices

Monero - Too little, too late

Monero -  Embarrassed

Forgot the artificial barrier to entry (to keep noobz out and increase insider sharing) due to the ...cmd-line nature of the coin...
1007  Alternate cryptocurrencies / Mining (Altcoins) / Re: [ANN]: cpuminer-opt, New v3.1.16 on: April 22, 2016, 01:34:05 AM
Useful link for replacing slow & obsolete implementations: http://bench.cr.yp.to/primitives-hash.html

Perhaps if one googles algo by algo, they can find even better (?).
1008  Alternate cryptocurrencies / Mining (Altcoins) / Re: [ANN]: cpuminer-opt, New v3.1.16 on: April 21, 2016, 09:26:20 PM
No idea, haven't looked into CUDA mining.
1009  Alternate cryptocurrencies / Mining (Altcoins) / Re: [ANN]: cpuminer-opt, New v3.1.16 on: April 21, 2016, 08:22:18 PM
It's not a new idea. It was used back in the GPU bitcoin mining days to get better speed on amd VLIW cards.
It's easy to adapt the miner itself to process multiple nonces per thread, not sure about how much work is needed to work on the algos themselves. Maybe we could make a test with a simple algo like blake. But I'm not the man because I'm not proficient in those cpu instruction extensions.

Neither am I, but it's not that difficult.

Say for example you have a loop like:


for (i = 0; i <100000000; i++)
   b=sqrt (b);
   bb=sqrt(bb);
   bbb=sqrt(bbb);
   bbbb=sqrt(bbbb);


...gcc will make it something like:

40072e:   0f 84 9b 00 00 00       je     4007cf <main+0x12f>
  400734:   f2 0f 51 d6             sqrtsd %xmm6,%xmm2
  400738:   66 0f 2e d2             ucomisd %xmm2,%xmm2
  40073c:   0f 8a 63 02 00 00       jp     4009a5 <main+0x305>
  400742:   66 0f 28 f2             movapd %xmm2,%xmm6
  400746:   f2 0f 51 cd             sqrtsd %xmm5,%xmm1
  40074a:   66 0f 2e c9             ucomisd %xmm1,%xmm1
  40074e:   0f 8a d9 01 00 00       jp     40092d <main+0x28d>
  400754:   66 0f 28 e9             movapd %xmm1,%xmm5
  400758:   f2 0f 51 c7             sqrtsd %xmm7,%xmm0
  40075c:   66 0f 2e c0             ucomisd %xmm0,%xmm0
  400760:   0f 8a 47 01 00 00       jp     4008ad <main+0x20d>
  400766:   66 0f 28 f8             movapd %xmm0,%xmm7
  40076a:   f2 0f 51 c3             sqrtsd %xmm3,%xmm0
  40076e:   66 0f 2e c0             ucomisd %xmm0,%xmm0
  400772:   0f 8a b5 00 00 00       jp     40082d <main+0x18d>

...which is sqrt-scalar-double.

4 instructions / 4 math operations.

What could be done differently (intel syntax follows):

     movlpd xmm1, b      //loading the first variable "b" to the lower part of xmm1
     movhpd xmm1, bb     //loading the second variable "bb" to the higher part of xmm1
     SQRTPD xmm1, xmm1   //batch processing both variables for their square root, with one SIMD command
     movlpd xmm2, bbb    //loading the third variable "bbb" to the lower part of xmm2
     movhpd xmm2, bbbb   //loading the fourth variable "bbbb" to the higher part of xmm2
     SQRTPD xmm2, xmm2   //batch processing their square roots
     movlpd b, xmm1      //
     movhpd bb, xmm1     // Returning all results from the register back memory
     movlpd bbb, xmm2    //
     movhpd bbbb, xmm2   //

SQRTPD - Square root - P(acked)-Double.

So now 4 maths instructions became 2 and the time got down in half (I've actually benchmarked the above and it goes near half). But in order to pack instructions (math or logical) you need to have similar processing load, similar operations. You can't have that in a scenario where it goes like

sqrt
add
shift
xor

and the function is changing...

But if you loaded 4x hashes together, you'd be looking at

sqrt(of the first) sqrt (of the second) sqrt (third) sqrt (fourth) (<=pack them)
add add add add (<=pack them)
shift shift shift shift (<=pack them)
xor xor xor xor (<pack them)

...etc

I wasn't even aware of the above, until a couple of weeks ago when I got down to asm level to see what happens and why some Pascal output was slower than C output... then I run into http://x86.renejeschke.de as a reference where I was trying to understand the instructions and what they are doing, and then rewrote some instructions myself - like the above with the packed (I thought it was pretty easy really) and then, more recently, I went over the code of the asm hash functions of altcoins and bitcoin - and it was full of serial operations, despite "SSE/AVX use" / "SSE/AVX enhanced". And I'm like WHAT THE F***? This is all crippled.
1010  Alternate cryptocurrencies / Mining (Altcoins) / Re: [ANN]: cpuminer-opt, New v3.1.16 on: April 21, 2016, 08:06:33 PM
May I propose a different approach for much faster mining?

Currently, most, if not all of CPU-mineable coins, are cripple-mined.

The reason is simple: Under-utilizing of the SIMD nature of SSE & AVX sets.

SSE and AVX commands are used in SISD fashion (single instruction single data, instead of Multiple data / SIMD), meaning they are not processing 2 batches of information but one.

Right now hashing goes on like that:

The main mining routine sends one output to each hash, where it will be subject to a process of SERIAL transmutations / permutation and in the end the hash will output that data back to the miner (some times to send it to the next hash).

This serial process doesn't allow for much Single Instruction Multiple Data utilization.

What should be done instead is that the miner program should issue 2-4 hash candidates to the hashing routines. The hashing routines should be able to get 2-4 inputs (instead of 1) and return back 2-4 outputs. In this way the process would be paralleled and SIMD utilization (packed processing of similar instructions) would result in much faster processing.

Now this might require a lot of recoding, or, one could adjust the code in C for use with a special compiler which runs multiple instances of serial data crunching in order to process them in "packs" with SIMD or "packed" instructions - and then let the compiler do all the packing. Performance benefits of such an approach here: http://ispc.github.io/perf.html

That's a fascinating idea but I don't think it will get the visibility here that it deserves. Pooler and TPruvot are the two main guys for
CPU mining although TPruvot is focussed more on other projects at the moment. Both have active threads in this forum. I suggest you
present your idea to them in case they, or their folllowers, may want to take on the challenge. It's beyond my skill level.

It's ok, don't worry. Some people reading this thread will know what to do with it.

I'm not in altcoin mining really as I don't have the hardware and I'm not in the mood of renting. Obviously there's a lot of money here for optimized miners that are doing multiple hashrates than the ordinary ones. But this idea also extends to scaling of bitcoin and altcoins for things like cryptographic verification etc. They are using serial functionality when it could be done in packs of 2 or 4 (or 8 in something like ...AVX3-4-5 - or AVX512 which already exists).
1011  Alternate cryptocurrencies / Mining (Altcoins) / Re: [ANN]: cpuminer-opt, New v3.1.16 on: April 21, 2016, 06:48:56 PM
May I propose a different approach for much faster mining?

Currently, most, if not all of CPU-mineable coins, are cripple-mined.

The reason is simple: Under-utilizing of the SIMD nature of SSE & AVX sets.

SSE and AVX commands are used in SISD fashion (single instruction single data, instead of Multiple data / SIMD), meaning they are not processing 2 batches of information but one.

Right now hashing goes on like that:

The main mining routine sends one output to each hash, where it will be subject to a process of SERIAL transmutations / permutation and in the end the hash will output that data back to the miner (some times to send it to the next hash).

This serial process doesn't allow for much Single Instruction Multiple Data utilization.

What should be done instead is that the miner program should issue 2-4 hash candidates to the hashing routines. The hashing routines should be able to get 2-4 inputs (instead of 1) and return back 2-4 outputs. In this way the process would be paralleled and SIMD utilization (packed processing of similar instructions) would result in much faster processing.

Now this might require a lot of recoding, or, one could adjust the code in C for use with a special compiler which runs multiple instances of serial data crunching in order to process them in "packs" with SIMD or "packed" instructions - and then let the compiler do all the packing. Performance benefits of such an approach here: http://ispc.github.io/perf.html
1012  Economy / Speculation / Re: Wall Observer BTC/USD - Bitcoin price movement tracking & discussion on: April 20, 2016, 07:18:50 PM
...<Mike Hearn crap>...

+

Quote
I don’t believe fees will become high and stable if Bitcoin runs out of capacity. Instead, I believe Bitcoin will crash.

Mike Hearn on May 7th 2015

https://medium.com/@octskyward/crash-landing-f5cc19908e32#.bxovk7dwr

On a related discussion, back then, about the above FUD/"prediction".
=>

We've actually hit the soft limit before, consistently for long periods and did not see the negative effects described there (beyond confirmation times for lower fee transactions going up, of course).
....
The comments about the filling up memory stuff are grimly amusing to me in general for reasons I currently can't discuss in public (please feel free to ping in six months).
1013  Alternate cryptocurrencies / Altcoin Discussion / Re: [neㄘcash, ᨇcash, net⚷eys, or viᖚes?] Name AnonyMint's vapor coin? on: April 19, 2016, 10:24:22 PM
1) The miner's main routine sends to the hashing routine 2 (SSE) (or 4 for AVX) inputs for checking, not one.

2) A hashing routine with 2 (or 4) inputs and 2 (or 4) outputs.

That is normally how SIMD code is written yes.

But as you may have seen in the asm file above, there's hardly any SIMD in there. It's mostly serial operations. Miner program sending one input, expects one output from the hashing subroutine, thus nothing to do in batches of "same".

Quote
If you think CPU miners can be optimized the do more SIMD processing, and you are probably right in at least some cases, then go optimize and make some money.

I believe it's not only CPU miners that are at stake here. Even signatures could be validated 2-4 at a time, on the same subroutine, to get SIMD benefits. So scaling is affected as well. It would need the main routine to be adjusted accordingly instead of sending 1 for processing (per thread) to 2-4 (per thread) and expect back 2-4. Heck, even cracking speed could be improved.
1014  Alternate cryptocurrencies / Altcoin Discussion / Re: [neㄘcash, ᨇcash, net⚷eys, or viᖚes?] Name AnonyMint's vapor coin? on: April 19, 2016, 10:01:41 PM
I'll not reply again unless this gets more interesting, it is repetitive now.

While moving some strings to get compilers better at SIMD processing (btw ICC is crappy too in non-array use -I tested it), I think I stumbled on something which could be... well.... interesting...

I was going over the asm files of known hashtypes, in files like this: https://github.com/pooler/cpuminer/blob/master/sha2-x64.S

...to check what is the level of packed instructions that can cut cycles by 2.

(Don't mind that the case above is about bitcoin (which is now ASICed) - this can be pretty relevant for a whole lotta cases out there, including cpu altcoin mining).

So I'm going over the lines... and I'm like, where can I pack stuff together to make a difference, you know... And while I saw some stuff that can be packed, they are not generally too many because the action is serial... one after the other permutation. Then I did a lookup on the file to see how many registers it uses. It's up to XMM11. So there are quite a few extra registers to play around with. And then BAM. It hit me.

It's all wrong on how these hashtypes are used for mass-hashing. You can't do it one by one.

With hashing they are inserting some data and using sequential operations, where one permutation goes to the next, doing some kind of altering to the data, moving bits around etc etc, and in the end you get the hash. One input, one output, no parallelism - except in ...another thread. You can do that, say, 4 times in a quad core.

But that's wrong because you get no SIMD action and packing per thread, to cut the processing clock cycles in half.

What is needed is a mining program that does this:

1) The miner's main routine sends to the hashing routine 2 (SSE) (or 4 for AVX) inputs for checking, not one.

2) A hashing routine with 2 (or 4) inputs and 2 (or 4) outputs.

While inside the hashing routine, and since the routine will be doing the exact same thing for 2 hashes, these operations can be done in SIMD fashion - with packed instructions, instead of serial/scalar. Say the first step of the hash is "we do this to that" but since we also have another "this" we can pack them both to do it. And then there are some more benefits in maths involving the prime number tables where you can do it in parallel by loading the prime in a movddup on a register, moving the data from the first hash to the lower part of an xmm register and from the second hash to the upper part of the xmm register. Then you do them both, and you'are -1 mov too. LOL. The more "fixed data" there are in the hash, the better for parallelism within the same routine - if you load 2 or 4 hashes.

The routine will have to be custom written to process at least 2 or 4 hashes in parallel in order to be able to use packed instructions. In those stages where packing can't be done for whatever reason, the routine will process the stages in a serial fashion (as it would, normally).

3) In the end the routine returns to the main mining routing 2 (or 4) outputs.

Supposing the bottleneck is CPU and not RAM (but even if it is RAM, the CPU will be finishing faster) we are talking about gains that could be very serious (triple digit % on AVX).

The implications of the above, is that, well, every single cpu altcoin is currently cripple-mined.

How is that for interesting? Cheesy
1015  Bitcoin / Bitcoin Discussion / Re: Ever dreamed about Bitcoin? on: April 19, 2016, 08:00:05 PM
I dream about bitcoin very often.
Now it is not too often, but when I started with bitcoin back in 2013 I would dream about bitcoin almost every night.

Ladies and gentlemen, we have a winner right here Cheesy
1016  Alternate cryptocurrencies / Altcoin Discussion / Re: [neㄘcash, ᨇcash, net⚷eys, or viᖚes?] Name AnonyMint's vapor coin? on: April 19, 2016, 06:41:10 PM
I've been thinking about social platforms and stuff. These are kind of peaking, or are past their peak. Meaning that fatigue is increasing with their use, but there is one thing where the need is very persistent: The basic problem right now on the internet is the lack of a unified messaging system.

Before ICQ there was nothing really (IRC was very different and non-practical).

Then MSN and yahoo etc came along, ICQ was bought out and buried. Then, years later, skype came, but it too was on a fragmented market where some had skype, another had msn, etc etc. Now it's the same with viber, whatsapp, skype, fb messenger...

What people need is really an open protocol where clients can work with it, but something that a corporation can't buy out. We don't need a new ICQ or msn where suddenly a company decides to change it or even make it something else (=>skype). If continuity is a goal (=making something that can endure, like email for example), then it must operate as an open protocol.

Perhaps even the email protocol itself can be used to do that. Like an email-on-steroids, in terms of speed, with similar functionality but different ports + encryption. Max message size could be enforced at something like a few kilobytes for speed of delivery, but in order to eliminate the possibility of spam on the network, one would have to "add" another in their "contact list" with something like a key exchange. Someone would be unable to receive email-type IMs from unknown senders - so there would be no incentive by spammers to send messages that will never be received. Every client could then work on top of that protocol.

Another approach would be a meta-messenger. Something that renders irrelevant what your messenger is, just like metacrawler used to make yahoo, altavista, infoseek and lycos as the next best site to visit. Why get 1 result when you can get all of the best results in one site? But a meta messenger would have to somehow bypass the restrictions imposed by each application. There were many meta-messengers in the 2000s, like trilian, pidgin etc, but they usually left behind some large network which made them unfit as universal messengers. And they were also targeted for the more l33t users, so...
1017  Alternate cryptocurrencies / Altcoin Discussion / Re: The Ethereum Paradox on: April 18, 2016, 10:20:26 PM
I found it extremely hard to believe that floating point can't be supported. Floating point is required in a lot of financial transactions.

Fixed (decimal) point suffices in most cases, which is easily represented with integers.

Bitcoin fractions have 8 decimal places and thus we can represent arbitrary BTC amounts as integer
multiples of 10^-8 BTC (satoshis).

I was doing an excel today with the /2 structure. Look what happens at the 10th halving:

5000000000 (=50btc X 100mn satoshis)
2500000000
1250000000
625000000
312500000
156250000
78125000
39062500
19531250
9765625
4882812.5 (<fractional)

If this is specified as an integer, won't that result to a crash instead of a rounding?

In other news, eth-related:

That sounds like the start of the end for Ethereum. When you can't even manage to keep an forum online. Let me guess big salary for the creators and now is there own wallet empty.
....
https://bitcointalk.org/index.php?topic=1441198.20

lol?
1018  Economy / Speculation / Re: Wall Observer BTC/USD - Bitcoin price movement tracking & discussion on: April 18, 2016, 12:23:30 AM
are the blocks full because of spam tx's or genuine ones? like is someone artificially filling blocks atm or is it natural?

If there is genuine demand for blockspace, you'll see the fees rising significantly coupled with tx spillover to altcoin blockchains.

Given the above:

1) Long-established altcoins have very low tx count
2) BTC fees are EXTREMELY low / practically zero cost, indicating low tx demand (for genuine uses).

https://bitcoinfees.21.co/





1019  Alternate cryptocurrencies / Altcoin Discussion / Re: [neㄘcash, ᨇcash, net⚷eys, or viᖚes?] Name AnonyMint's vapor coin? on: April 16, 2016, 01:11:37 PM
Yes, the streaming argument is valid, but the processor is capable of more than that.

Compilers are not superoptimizers. They can't and don't promise to do everything a processor is capable of.

Basically that brings up back to the starting point... When C was first created, it promised to be very fast and suitable for creating OS'es, etc. Meaning, its compiler wasn't leaving much performance on the table. With khz of speed and few kbytes of memory there was no room for inefficiency.

Granted, the instruction set has expanded greatly since the 70's with FPUs (x387), MMX, SSE(x), AVX(x), AES, etc, but that was the promise. To keep the result close to no overhead (compared to asm). That's what C promised to be.

But that has gone out the window as the compilers failed to match the progress and expansion of the cpu's arsenal of tools. We are 15 years after SSE2 and we are still discussing why the hell isn't it using SSE2 in a packed manner. This isn't normal for my standards.

If you look at the basic C operations, they all pretty much correspond to a single (or very small number) CPU instruction from the 70s. As TPTB said, it was pretty much intended to be a thin somewhat higher level abstraction but still close to the hardware. Original C wasn't even that portable in the sense that you didn't have things like fixed size integer types and such. To get something close to that today you have to consider intrinsics for new instructions (that didn't exist in the 70s) as part of the language.

The original design of C never included all these highly aggressive optimizations that compilers try to do today, that was all added later.  Back in the day, optimizers were largely confined to the realm of FORTRAN. They succeed in some cases for C of course, but its a bit of square peg, round hole.

AlexGR, assembly wouldn't regroup instructions to make them SIMD if you hadn't explicitly made them that way. So don't insinuate that C is a regression.

I'm not even asking it to guess what is going to happen. It won't do it, even after profiling the logic and flow (monitoring the runtime - and recompiling after taking notes of the runtime that will be used for the PGO build).

Quote
C has no sematics for SIMD, rather it is all inferred from what the invariants don't prevent. As smooth showed, that C compiler's heuristics are able to group into SIMD over loops but for that compiler and flag choices, it isn't grouping over unrolled loops.

What they are doing is primarily confining the use to arrays. If you don't use an array you are fucked (not that if you use arrays it will produce optimal code, but it will tend to be much better).

However, with AVX gaining ground, the losses are now not 2-4x as with leaving packed SSE SIMDs out, but rather 4-8x - which is quite unacceptable.

The spread between what your hardware can do and what your code generates in terms of result is increasing by the day. When you can group together 4 or 8 instructions in one, it's like your processor is not running at 4 ghz, but rather at 16ghz or 32ghz - as you want 1/8th the clock cycles for batch processing.

Contemplate this: In the 70s, "C is fast" meant something like we are leaving 5-10% on the table. Now it's gone up to 87.5% left on the table in an AVX scenario where 8 commands could be packed and batch processed, but weren't. Is this still considered fast? I thought so.

Quote
Go write a better C compiler or contribute to an existing one, if you think that optimization case is important.

I can barely write programs, let alone compilers Tongue But that doesn't prevent me from understanding their deficiencies and pointing them out. I'm very rusty in terms of skills - I've not been doing much since the 90s. Even the asm code has changed two times from 16 bit, to 32 bit, to 64 bit...

Quote
If you want explicit SIMD, then use a language that has such semantics. Then you can blame the compiler for not adhering to the invariants.

Actually there are two-three additions in the form of #pragma that help the compiler "get it"... mainly for icc though. GCC might have ivdep.
1020  Alternate cryptocurrencies / Altcoin Discussion / Re: [neㄘcash, ᨇcash, net⚷eys, or viᖚes?] Name AnonyMint's vapor coin? on: April 16, 2016, 02:41:52 AM
The thing with superoptimizers is that they might work differently on same instruction sets but different cpus.

Say one cpu has 32k l1 cache and another has 64k l1.

You can time all possible combinations in a 32k processor and it might be a totally different combo to what a 64k l1 processor need. The unrolling and inlining may be totally different, as can be the speed of execution in certain instructions where a cpu might be weaker.

So what you need is to superoptimize all profiles for all cpus, store these cpu-data-paths and then, on runtime, perform a cpu id check and execute the right instruction-path for that cpuid.

As for the

Quote
On the merits of the optimizations, you're still looking at one particular program, and not considering factors such as compile time, or how many programs those sorts of optimizations would help, how much, and how often.

Let me just add that if you go over at john the ripper's site, it has like support for dozens of ciphers. People, over there, are sitting and manually optimizing these one by one because optimizations suck. The c code is typically much slower. This is not "my" problem. It's a broader problem. The fact that very simple code can't be optimized just illustrates the point. You have to walk before you can run. How will you optimize (unprunable) complex code if you can't optimize a few lines of code? You won't. It's that simple really.
Pages: « 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 [51] 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 ... 208 »
Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!