[XMR] Monero Improvement Technical Discussion

August 20, 2015, 06:16:57 AM
Last edit: August 20, 2015, 07:02:31 AM by TPTB_need_war

#21

I am trying to recall and regenerate some of my thought processes from my 2013 research about memory-hard hash function designs.

The basic idea of Scrypt is to one-way hash a value and write the hash output to a memory location, hash that output write the hash output to the next memory location, and repeat until all the memory you want to use is written. Then random access read from memory locations using the value read to determine the next memory location to read. It is hoped that your algorithm becomes random access latency bound. However the problem is a GPU or ASIC can trade computation for memory size by only storing every Nth hash output and then calculating on the fly the other hash values as they are accessed. So then the algorithm becomes more compute bound than latency bound. However that doesn't necessarily mean hashes/watt is superior with the compute bound version of the algorithm. It depends on the relative power inefficiencies of the smaller memory versus the increased computation. Also if hashes/capital cost is a factor in the return on investment, then hashrate versus cost of the silicon hardware for two algorithms has to be compared and noting that smaller caches typically have nearly an order-of-magnitude lower random access latencies.

I suppose the ASICs and GPUs mining Litecoin proved that the factors favored the compute bound algorithm, at least for the Scrypt parameters Litecoin uses.

The next attempt at an improvement over Scrypt was (which I wrote down in 2013 well before the Cryptonite proof-of-work hash was published) was to not just read pseudo-randomly as described above, but to write new pseudo-random values to each pseudo-randomly accessed location. This defeats storing only every Nth random access location. However it is still possible to trade latency for computation by only storing the location that was accessed to compute the value to store instead of storing the entire value. The compute bound version of the algorithm can not be defeated by reading numerous locations before storing a value because the starting seed need only be stored.

Thus I concluded it is always possible to trade computation for latency in any memory-hard hash function.

ASICs are usually orders-of-magnitude more efficient at preordained computation, thus they will probably always be able to use a smaller cache than is used by the memory-hard CPU algorithm.

Even if the ASIC did not trade more computation for less latency, the CPU will be using more electricity to do the computational parts of the latency bound algorithm such as the one-way hash functions or how ever the data is manipulated before each write to make it pseudo-random. Even though the CPU can mask the time required to do this computation by running two threads per cache thus doing computation during the latency of the cache accesses of the partner thread, the CPU can not make the electricity cost of the computation disappear. And the ASIC are usually orders-of-magnitude more efficient at preordained computation, because the CPU has all this general computation logic such as generalized pipelines, the L1 and L2 caches (with their 8-way set associativity overhead!), and lots of other transistors that are not off when not being used.

Thus as per Litecoin's experience, I do not expect any derivative of a memory-hard hash function, including Cryptonite, to be immune from radical speedups and efficiency improvenments due to well designed ASICs. This of course won't happen until a Cryptonite coin has a very large market cap.

There is another issue. Using large caches such as 1 or 2MB and smaller data words, e.g. 256-bit hashes, and then the large number of random access reads and writes necessary to attempt to defeat compute bound algorithms, means the hash can become very slow, perhaps as slow as 10 or less hashes per second. This introduces an additional problem because the pool will expend nearly as much resources verifying mining share hashes as the miner will expend producing them. In 2013, I had attempted to resolve this side issue by successfully designing a 4096-bit hash based on the concepts of Bernstein's ChaCha hash. Any way, I abandoned it all because of the bolded statement above.

formerly AnonyMint, UnunoctiumTesticles, iamback, contagion, TheFascistMind, etc…

smooth

Legendary

Offline

Activity: 2968
Merit: 1198

Quote from: TPTB_need_war on August 20, 2015, 06:16:57 AM

August 20, 2015, 07:20:15 AM
Last edit: August 20, 2015, 07:35:59 AM by smooth

#22

ASICs are usually orders-of-magnitude more efficient at preordained computation, thus they will probably always be able to use a smaller cache than is used by the memory-hard CPU algorithm.

This sort of general statement is too broad to be useful. Most of the work being done in cryptonight is already done by dedicated hardware (AES round or 64x64 multiply), which is very different from the experience with SHA256 or Scrypt, where the CPU implementation of the hashes uses many simple ALU operations. The inner loop of cryptonight is about a dozen instructions. By contrast SHA256 is roughly 3375 basic integer operations. That's a huge difference in kind. I can't speak to Scrypt as I haven't studied it at all.

For example, in the case of AES ASICs that exist, they are not orders of magnitude faster than the in-CPU implementation and likely also aren't orders-of-magnitude better in terms of power-efficiency either (though the latter numbers are harder to come by and harder to even estimate in the case of the in-CPU implementation).

Intel reports 4.52 cycles per byte single threaded, which at 2 GHz clock comes to roughly 3.5 gigabits per second. That's in line with reported commercial AES ASICs such as the Helios FAST core (3 Gbps). Their Giga core is one order of magnitude faster (40 Gbps), but carries a heavy penalty in latency and gate count (sounds like extremely heavily pipelined to me, though that could conceivably be okay for mining).

But from that one order of magnitude you have to subtract the added costs of the more complex table lookup.

This is not to say that no TMTO is potentially possible but the assumption of orders-of-magnitude gains on the T side with which to make that tradeoff is questionable at best (and remember this doesn't get you to no memory, only to less memory). But as you say earlier in your post, the feasibility and efficiency payoff of this depends very much on the numbers. It is certainly possible that the most efficient way to implement any particular algorithm is with a simple lookup table, even if such a table is not strictly required.

I remain fairly unconvinced that cryptonight necessarily can't use the latency bound of a lookup table access (which is largely an irreducible quantity) and the inherent cost of such a table along with the significant costs of reducing it to limit the payoff from highly-parallel dedicated implementations. But that doesn't mean that it can either.

I understand your comments about associative caches vs a simple lookup table but it still isn't at all clear that large gains are possible. It seems most of the latency of larger caches just comes from them being larger. But then this too remains to be seen.

Quote

hash can become very slow, perhaps as slow as 10 or less hashes per second. This introduces an additional problem because the pool will expend nearly as much resources verifying mine share hashes as the miner will expend producing them

Believe it or not we saw one brain-dead, or more likely deliberate scam, attempt that had 0.1 (!) hashes/sec!

But in any case the pool expending more resources will never happen, regardless of the hash rate. In order to produce a valid share you have to hash many invalid ones, but only the valid share is sent to the pool. If you send invalid shares you will be banned. You can attempt to sybil attack but then the pool can respond by requiring some sort of "one time" registration fee or identify validation. These are likely desirable anyway as probably the only defenses against share withholding.

The bigger problem with super-slow hashes is verification by the network. Despite some early concerns (I guess maybe before the implementation was un-deoptimized), cryptonight doesn't seem too bad; it only requires about 20ms, which is roughly in line with how long it might take to physically deliver a block over the network. Networks vary of course. Two nodes in the same data center will be much faster than that. Across the world it might take longer even if the block is empty.

TPTB_need_war

Sr. Member

Offline

Activity: 420
Merit: 257

August 20, 2015, 08:20:15 AM

#23

Quote from: TPTB_need_war on August 20, 2015, 06:16:57 AM

ASICs are usually orders-of-magnitude more efficient at preordained computation, thus they will probably always be able to use a smaller cache than is used by the memory-hard CPU algorithm.

This sort of general statement is too broad to be useful. Most of the work being done in cryptonight is already done by dedicated hardware (AES round or 64x64 multiply), which is very different from the experience with SHA256 or Scrypt, where the CPU implementation of the hashes uses many simple ALU operations. The inner loop of cryptonight is about a dozen instructions. By contrast SHA256 is roughly 3375 basic integer operations. That's a huge difference in kind. I can't speak to Scrypt as I haven't studied it at all.

Scrypt uses the very efficient Salsa (ChaCha is nearly identical) one-way hash from Bernstein (not SHA256) which has a very low count of ALU operations. Afaik, Litecoin (which uses Scrypt) has ASICs which I assume have displaced CPU and GPU mining. I haven't been following Litecoin since the ASICs were rumored to be released.

Even if you are using dedicated hardware, to some extent it must be diluting the benefit of doing so by accessing them through generalized pipelines, 3 levels of caches (L1, L2, L3), and all the glue logic to make all those parts of the CPU work for the general purpose use case of the CPU. It is very unlikely that you've been able to isolate such that you are only activating the same number of transistors that the ASIC would for the specialized silicon.

Thus I think the general statement is not too broad, but rather apropos until someone can convince me otherwise.

For example, in the case of AES ASICs that exist, they are not orders of magnitude faster than the in-CPU implementation and likely also aren't orders-of-magnitude better in terms of power-efficiency either (though the latter numbers are harder to come by and harder to even estimate in the case of the in-CPU implementation).

Intel reports 4.52 cycles per byte single threaded, which at 2 GHz clock comes to roughly 3.5 gigabits per second. That's in line with reported commercial AES ASICs such as the Helios FAST core (3 Gbps). Their Giga core is one order of magnitude faster (40 Gbps), but carries a heavy penalty in latency and gate count (sounds like extremely heavily pipelined to me, though that could conceivably be okay for mining).

Well yeah I've cited the similar references in my work. That is why I went with AES-NI for my CPU proof-of-work hash. And why did I not access caches, instead keep a tight loop that can be loaded once in microcode and not reengage the pipeline logic over and over, and try to minimize the number of other transistors I would be activating in the CPU. One can assume the CPU will highly optimized for doing encryption since OS's offer full disk encryption.

Afaics, complexity of Cryptonite's bag-O-all-tricks (AES-NI, 64x64, Scrypt-like cache latency) makes that less likely that extra generalized logic in the CPU won't be accessed, e.g. checking if a cache hit is in L1 and L2 before accessing L3 which an ASIC would not need to do.

But from that one order of magnitude you have to subtract the added costs of the more complex table lookup.

Not even that. Just run the ASIC with a full size cache but eliminate L1 and L2 and other generalized logic that is consuming electricity. Intel is working hard at improving the parts of the CPU that are powered down when not in use, but it is still not very fine grained and will never be the equivalent of just specializing all the logic in an ASIC.

We try to minimize the advantage the ASIC will have, but we can not be at parity with the ASIC.

This is not to say that no TMTO is potentially possible but the assumption of orders-of-magnitude gains on the T side with which to make that tradeoff is questionable at best (and remember this doesn't get you to no memory, only to less memory). But as you say earlier in your post, the feasibility and efficiency payoff of this depends very much on the numbers. It is certainly possible that the most efficient way to implement any particular algorithm is with a simple lookup table, even if such a table is not strictly required.

I think perhaps you are missing my point and perhaps because it had to be deduced from what I wrote in the prior post, i.e. I didn't state it explicitly.

In order to make it so it is not economical for the ASIC to convert the latency bound to compute bound, you end up making your "memory-hard" hash compute bound in the first place! That was a major epiphany.

Thus why waste the time pretending the memory-hard complexity is helping you. If anything it is likely hurting you. Just go directly to using AES-NI for the hash. Which is what I did in spades.

I remain fairly unconvinced that cryptonight necessarily can't use the latency bound of a lookup table access (which is largely an irreducible quantity) and the inherent cost of such a table along with the significant costs of reducing it to limit the payoff from highly-parallel dedicated implementations. But that doesn't mean that it can either.

If you are not doing the algorithm with serialized random access the way I described it as an improvement over Scrypt, then yes parallelism can eat your lunch.

And if you do it the way I described, then you end up making it compute bound in order to prevent an ASIC from making it compute bound. So no matter which way you go, you are going to be compute bound.

I understand your comments about associative caches vs a simple lookup table but it still isn't at all clear that large gains are possible. It seems most of the latency of larger caches just comes from them being larger. But then this too remains to be seen.

Afair, I believe my concern was on the extra transistors that are being activated with the ASIC won't need to. Again remember I am loading this from memory not even having looked at my old work today. Don't really have time to do an exhaustive reload of prior hash work today.

Quote

hash can become very slow, perhaps as slow as 10 or less hashes per second. This introduces an additional problem because the pool will expend nearly as much resources verifying mine share hashes as the miner will expend producing them

Believe it or not we saw one brain-dead, or more likely deliberate scam, attempt that had 0.1 (!) hashes/sec!

And there was MemoryCoin with afair an 8MB memory hard hash and with upwards of 1 second per hash.

But in any case the pool expending more resources will never happen, regardless of the hash rate. In order to produce a valid share you have to hash many invalid ones, but only the valid share is sent to the pool.

I am talking about the ratio. If the pool is doing 1/1000th of the work of the share miners, then the pool has to maybe charge a larger fee.

Any way that wasn't my major point. I am just pointing out that there is limit to how large you can make the memory cache and thus how slow the hash function.

The bigger problem with super-slow hashes is verification by the network. Despite some early concerns (I guess maybe before the implementation was un-deoptimized), cryptonight doesn't seem too bad; it only requires about 20ms, which is roughly in line with how long it might take to physically deliver a block over the network. Networks vary of course. Two nodes in the same data center will be much faster than that. Across the world it might take longer even if the block is empty.

Also theoretically the relative slowness of the hash verification could become a more severe problem in some imagined consensus network designs which attempted to eliminate pools and record all mining shares directly to the block chain.

formerly AnonyMint, UnunoctiumTesticles, iamback, contagion, TheFascistMind, etc…

smooth

Legendary

Offline

Activity: 2968
Merit: 1198

Quote from: TPTB_need_war on August 20, 2015, 08:20:15 AM

August 20, 2015, 08:45:42 AM
Last edit: August 20, 2015, 09:04:05 AM by smooth

#24

Scrypt uses the very efficient Salsa (ChaCha is nearly identical) one-way hash from Bernstein (not SHA256) which has a very low count of ALU operations. Afaik, Litecoin (which uses Scrypt) has ASICs which I assume have displaced CPU and GPU mining. I haven't been following Litecoin since the ASICs were rumored to be released.

Yes I was relating to the gains from SHA256 ASICs. I'm less familiar with Scrypt ASICs in all matters (how much they cost, how much performance they gain, etc.)

But as you are aware Scrypt is designed in a manner that makes the TMTO particularly trivial to implement. Even then it isn't necessarily the case that a Scrypt ASIC would pay off (as you said). It seems they do but that is an empirical result that depends on the particulars of the algorithm. It is not the case that the lookup table approach for read-write scratchpads is equally trivial. Still feasible, possibly, but it isn't obvious.

Quote

Even if you are using dedicated hardware, to some extent it must be diluting the benefit of doing so by accessing them through generalized pipelines, 3 levels of caches (L1, L2, L3), and all the glue logic to make all those parts of the CPU work for the general purpose use case of the CPU. It is very unlikely that you've been able to isolate such that you are only activating the same number of transistors that the ASIC would for the specialized silicon.

Clearly true, but less clear there is anything yielding "orders of magnitude" here, nor, again, that the gain is enough to offset the costs of trying to reduce latency.

Quote

We try to minimize the advantage the ASIC will have, but we can not be at parity with the ASIC

Of course, it is a given that you can take some things off the chip and yield an ASIC with better marginal economics. But equally ASICs can (probably) never equal the economies of scale of general purpose CPUs, either on production or R&D. So the actual numbers matter. If If you can convincingly show orders of magnitude gain, then it is easy to dismiss the rest. Otherwise not.

Quote

In order to make it so it is not economical for the ASIC to convert the latency bound to compute bound, you end up making your "memory-hard" hash compute bound in the first place! That was a major epiphany.

I don't think you have shown it is economical, only that it is possible. In fact you have specifically agreed that the gain from ASICs vs. in-CPU dedicate circuits is likely relatively small, which is why you propose using the in-CPU circuits!

Quote

Thus why waste the time pretending the memory-hard complexity is helping you. If anything it is likely hurting you. Just go directly to using AES-NI for the hash. Which is what I did in spades.

Because I think this more clearly loses to economies of scale by replicating the relatively small AES hardware many, many times and getting rid of overhead of packaging, being instructions decoded inside a CPU (you can never eliminate all overhead from that regardless of how "tight" your algorithm) inside a computer (memory, fans, batteries, I/O, etc.) with a relatively tiny hash rate compared to a chip with 1000x AES circuits inside a mining rig with 100 chips. Memory gives physical size (and also its own cost, but mostly just size), which makes it impossible to replicate to whatever arbitrary degree is needed to amortize product-type and packaging overhead.

There may be economic arguments apart from mining efficiency why people mine on computers vs. minings rigs. But if those exist they must not exclusively rely on pure raw efficiency (because that will clearly lose by the argument in the previous paragraph) but to efficiency being "close enough" for the other factors to come into play. So again, the number matter. And once you resort to "close enough" then yet again numbers matter.

Anyway, I won't be surprised to be proven "wrong" (not really wrong because I'm not saying any of this won't work, just that it isn't clear that it will work) but I will be surprised if the market cap at which cryptonight ASICs appear isn't a lot higher than Scrypt ASICs (adjusting for differences in time period and such).

EDIT: there is actually a real world example of using just in-CPU crypto not being enough to compete with ASICs and that is Intels SHA extensions. They are not competitive with Bitcoin mining ASICs.

TPTB_need_war

Sr. Member

Offline

Activity: 420
Merit: 257

August 20, 2015, 09:49:43 AM
Last edit: August 20, 2015, 11:39:11 AM by TPTB_need_war

#25

Quote from: TPTB_need_war on August 20, 2015, 08:20:15 AM

Scrypt uses the very efficient Salsa (ChaCha is nearly identical) one-way hash from Bernstein (not SHA256) which has a very low count of ALU operations. Afaik, Litecoin (which uses Scrypt) has ASICs which I assume have displaced CPU and GPU mining. I haven't been following Litecoin since the ASICs were rumored to be released.

Yes I was relating to the gains from SHA256 ASICs. I'm less familiar with Scrypt ASICs in all matters (how much they cost, how much performance they gain, etc.)

But as you are aware Scrypt is designed in a manner that makes the TMTO particularly trivial to implement. Even then it isn't necessarily the case that a Scrypt ASIC would pay off (as you said). It seems they do but that is an empirical result that depends on the particulars of the algorithm. It is not the case that the lookup table approach for read-write scratchpads is equally trivial. Still feasible, possibly, but it isn't obvious.

Indeed converting a more compute intensive variant that makes it less trivial to convert to compute bound thus makes it compute bound. Hehe. See where I ended up?

Quote from: TPTB_need_war on August 20, 2015, 08:20:15 AM

Even if you are using dedicated hardware, to some extent it must be diluting the benefit of doing so by accessing them through generalized pipelines, 3 levels of caches (L1, L2, L3), and all the glue logic to make all those parts of the CPU work for the general purpose use case of the CPU. It is very unlikely that you've been able to isolate such that you are only activating the same number of transistors that the ASIC would for the specialized silicon.

Clearly true, but less clear there is anything yielding "orders of magnitude" here, nor, again, that the gain is enough to offset the costs of trying to reduce latency.

Apologies perhaps the plurality on "orders" in my prior post is what you meant by being an overly generalized statement and in that case you are literally correct.

When I write orders in plural form, I include the singular form as one of the applicable cases.

I am more inclined to believe both Cryptonite and my pure AES-NI are within 1 - 2 orders of magnitude of any possible ASIC optimization, in terms of the relevant parameters of Hash/watt and Hash/$capital outlaw.

I would only argue that I feel a more complex approach such that Cryptonite uses, has a greater risk of slipping a bit maybe between 1 and 2 orders range. Again we are just making very rough guesses based on plausible theories.

Quote from: TPTB_need_war on August 20, 2015, 08:20:15 AM

We try to minimize the advantage the ASIC will have, but we can not be at parity with the ASIC

Of course, it is a given that you can take some things off the chip and yield an ASIC with better marginal economics. But equally ASICs can (probably) never equal the economies of scale of general purpose CPUs, either on production or R&D. So the actual numbers matter. If If you can convincingly show orders of magnitude gain, then it is easy to dismiss the rest. Otherwise not.

I almost mentioned Intel's superior fabs, but I didn't because well for one thing Intel has been failing to meet deadlines recent years. Moore's law appears to be failing.

If we are talking about a $4 billion Bitcoin market cap then yeah. But if we are talking about a $10 trillion crypto-coin world, then maybe not so any more.

Agreed the devil is in the details.

I think what I am trying to argue is that I've hoped to stay below 2 orders-of-magnitude advantage for the ASIC and by being very conservative with which CPU features are combined, then hopefully less than 1 order-of-magnitude. And that is enough for me in my consensus network design, because I don't need economic parity with ASICs in order to eliminate the profitability of ASICs in my holistic coin design.

And I am saying that the extra CPU features that Cryptonite is leveraging seem to add more risk, not less (not only risk of ASICs but risk that someone else found a way to optimize the source code). So I opted for simplicity and clarity instead. AES benchmarks against ASICs are published. Cryptonite versus ASIC benchmarks don't exist. I prefer the marketing argument that is easier to support.

Quote from: TPTB_need_war on August 20, 2015, 08:20:15 AM

In order to make it so it is not economical for the ASIC to convert the latency bound to compute bound, you end up making your "memory-hard" hash compute bound in the first place! That was a major epiphany.

I don't think you have shown it is economical, only that it is possible. In fact you have specifically agreed that the gain from ASICs vs. in-CPU dedicate circuits is likely relatively small, which is why you propose using the in-CPU circuits!

I think perhaps you are missing my point. In order to make it so it is only "possible" and not likely for the ASIC to convert the latency bound (of the CPU hash) to compute bound, then it appears it is likely to end up making the CPU hash compute bound any way by adding more computation between latency accesses, as I believe Cryptonite has done. Haven't benchmarked it though.

So that is one reason why I discarded all that latency bound circuitry and just K.I.S.S. on the documented less than 1 order-of-magnitude advantage for ASICs w.r.t. to AES-NI. And then I also did one innovation better than that (secret).

Quote from: TPTB_need_war on August 20, 2015, 08:20:15 AM

Thus why waste the time pretending the memory-hard complexity is helping you. If anything it is likely hurting you. Just go directly to using AES-NI for the hash. Which is what I did in spades.

Because I think this more clearly loses to economies of scale by replicating the relatively small AES hardware many, many times and getting rid of overhead of packaging, being instructions decoded inside a CPU (you can never eliminate all overhead from that regardless of how "tight" your algorithm) inside a computer (memory, fans, batteries, I/O, etc.) with a relatively tiny hash rate compared to a chip with 1000x AES circuits inside a mining rig with 100 chips.

In terms of Hashes/capital outlaw yes. But not necessarily true for Hashes/watt.

Even you admitted upthread that the 40 Gbytes/sec for ASIC on AES might not be an order-of-magnitude advantage in terms of Hashes/watt. I had seen some stats on relative Hashes/watt and I've forgotten what they said. Will need to dig my past notes for it.

But for our application, I wasn't concerned about the capital cost because the user has a $0 capital cost because they already own a CPU.

I was supremely focused on the Hashes/watt, because I know if I can get users to mine, and they don't notice a 1% increase in their electricity bill, then their Hashes/watt is infinite.

Also so that the rest of the user's CPU isn't tied up by the hashing, so they can mine and work simultaneously.

And thus I make ASICs relatively unprofitable. They may still be able to make a "profit" if the coin value is rising to level of hashrate invested (which would so awesome if you have 100 million users mining), but the ASICs can not likely dominate the network hashrate. Or at least they can't make it totally worthless for a user to mine, if there is perceived value to microtransactions and users don't want to hassle with an exchange just to use a service they want that only accepts microtransactions. But even that is not the incentive I count on to get users to mine.

There is a lot of holistic economics involved in designing your proof-of-work hash. It is very complex.

Memory gives physical size (and also its own cost, but mostly just size), which makes it impossible to replicate to whatever arbitrary degree is needed to amortize product-type and packaging overhead.

Agreed if Hashes/capital outlaw is your objective but you may also worse your relative (to ASIC) Hashes/watt. Maybe not, but again there are no Cryptonite vs. ASIC benchmarks so we can't know. There are AES benchmarks.

There may be economic arguments apart from mining efficiency why people mine on computers vs. minings rigs. But if those exist they must not exclusively rely on pure raw efficiency (because that will clearly lose by the argument in the previous paragraph) but to efficiency being "close enough" for the other factors to come into play. So again, the number matter. And once you resort to "close enough" then yet again numbers matter.

Yup. Well stated to point out that as get closer to parity on efficiencies then all sorts of game theory economic scenarios come into play. That is what I meant by it is complex.

Anyway, I won't be surprised to be proven "wrong" (not really wrong because I'm not saying any of this won't work, just that it isn't clear that it will work) but I will be surprised if the market cap at which cryptonight ASICs appear isn't a lot higher than Scrypt ASICs (adjusting for differences in time period and such).

Well no one can for any circuit of any reasonable complexity implemented on a complex circuit of the CPU which is somewhat of a blackbox. Berstein even complained recently that Intel isn't documenting all its timings.

The basic problem as I see it is that the only thing Cryptonote does to deal with ASICs is the proof-of-work hash. It doesn't look at other aspects of the system for ways to drive away ASICs. And in some sense, it shouldn't since Cryptonote like all existing crypto is subject to a 51% attack and thus it needs as much hashrate as it can get (up to some level).

EDIT: there is actually a real world example of using just in-CPU crypto not being enough to compete with ASICs and that is Intels SHA extensions. They are not competitive with Bitcoin mining ASICs.

Seems I had read something about that where Intel hadn't done the circuit necessary to be competitive. As you said, SHA is a complex algorithm. If there is some instruction which is fundamental and highly optimized, e.g. 64x64 it might be as optimized as it can get in silicon. I doubt Intel has fully optimized anything to that degree, but apparently on the AES benchmarks they got close.

It seems to me that a slower SHA hash is not something that can slow your computer to a crawl. Whereas with full disk encryption, Intel needs AES to be very fast.

formerly AnonyMint, UnunoctiumTesticles, iamback, contagion, TheFascistMind, etc…

smooth

Legendary

Offline

Activity: 2968
Merit: 1198

Quote from: TPTB_need_war on August 20, 2015, 09:49:43 AM

August 20, 2015, 09:56:58 AM
Last edit: August 20, 2015, 10:13:10 AM by smooth

#26

Indeed converting a more compute intensive variant that makes it less trivial to convert to compute bound thus makes it compute bound. Hehe. See where I ended up?

Well sure, certainly compute bound, but not necessarily useful at all (as you said in your first post).

Quote

It seems to me that a slower SHA hash is not something that can slow your computer to a crawl. Whereas with full disk encryption, Intel needs AES to be very fast.

Maybe maturity too. This is the first version of ISHAE (shipped for less than a month). I think AESNI has had at least two significant speed bumps (one for sure).

That may include seeing if people actually use it. Otherwise it may get slower (or even gone; as usual with these extensions you are supposed to always check CPUID before using it, so they can take it out whenever they want really). AESNI has the virtuous cycle of people using it because it was reasonably fast, which motivated Intel to make it even faster. Repeat. ISHAE may not.

Quote

the 40 Gbytes/sec for ASIC on

It was gigabits, btw. The main application for this stuff seems to be comm gear so usually quoted in bits (but yes an OOM faster than Intel in-CPU)

TPTB_need_war

Sr. Member

Offline

Activity: 420
Merit: 257

Quote from: smooth on August 20, 2015, 09:56:58 AM

August 20, 2015, 11:40:49 AM

#27

Quote from: TPTB_need_war on August 20, 2015, 09:49:43 AM

the 40 Gbytes/sec for ASIC on

It was gigabits, btw. The main application for this stuff seems to be comm gear so usually quoted in bits (but yes an OOM faster than Intel in-CPU)

But it wasn't stated whether it also consumed an OOM more electricity (than the Intel in-CPU).

formerly AnonyMint, UnunoctiumTesticles, iamback, contagion, TheFascistMind, etc…

ArticMine

Legendary

Offline

Activity: 2282
Merit: 1050

Monero Core Team

August 21, 2015, 04:01:40 AM
Last edit: August 21, 2015, 04:14:43 AM by ArticMine

#28

I am proposing for discussion making the minimum per kb fee, F, a function of the difficulty as follows

F = F₀*D₀/D_A

Where

F₀ is a constant comparable to the current per kb fee
D₀ is a constant comparable to the current difficulty
D_A is the average difficulty of the blocks N-2K to N-K
N is the last block number
K is a constant for example 1000

Rationale: The concept is that the difficulty is an approximate measure of the cost of hardware, bandwidth etc in terms of XMR. The minimum fee is would then approximate that adverse impact of the spam it is intended to prevent. Furthermore as the purchasing power of XMR increases the fee in XMR would decrease, while if the purchasing power of XMR decreases the fee in XMR would increase. The remaining suggested fees for both nodes and miners could also be dynamically set in terms of the minimum per kb fee.

Concerned that blockchain bloat will lead to centralization? Storing less than 4 GB of data once required the budget of a superpower and a warehouse full of punched cards. https://upload.wikimedia.org/wikipedia/commons/8/87/IBM_card_storage.NARA.jpg https://en.wikipedia.org/wiki/Punched_card

GingerAle (OP)

Legendary

Offline

Activity: 1260
Merit: 1008

Quote from: ArticMine on August 21, 2015, 04:01:40 AM

August 21, 2015, 01:35:11 PM

#29

I am proposing for discussion making the minimum per kb fee, F, a function of the difficulty as follows

F = F₀*D₀/D_A

Where

F₀ is a constant comparable to the current per kb fee
D₀ is a constant comparable to the current difficulty
D_A is the average difficulty of the blocks N-2K to N-K
N is the last block number
K is a constant for example 1000

Rationale: The concept is that the difficulty is an approximate measure of the cost of hardware, bandwidth etc in terms of XMR. The minimum fee is would then approximate that adverse impact of the spam it is intended to prevent. Furthermore as the purchasing power of XMR increases the fee in XMR would decrease, while if the purchasing power of XMR decreases the fee in XMR would increase. The remaining suggested fees for both nodes and miners could also be dynamically set in terms of the minimum per kb fee.

interesting. The difficulty is possibly the best surrogate for the value of XMR. Of course there's the botnet / sysadmin issue (i.e., boost difficulty with abstract costs), but IMO thats not really addressable so its best ignored.

So basically use the existing numbers as constants.

What are the possible attack / manipulation vectors?

1. To drive the min per kb fee down to an unusable level, one would have to increase the hashrate of the network for an extended amount of time.
1. To drive the min per kb fee up one would have to decrease the hashrate of the network, which is achievable by attacking pools (if we don't find a way to escape mining centralization)

The problem, I think, will be determining the width of the rolling window (K). Prices can skyrocket overnight, and people wouldn't transact if the network difficulty doesn't catch up to the valuation overnight.

Or maybe prices will be more gradual, who knows.

With the above proposal, I think we would have to include a hard floor and ceiling. I.e., if the entire solar system is using monero (presuming we've figured out using superposition to overcome speed of light problems in a consensus network), the network difficulty will be astronomical. Would the above equation work?

0.01*865102700/865102700e12 = 0.00000000000001. I guess realistically the floor would be a piconero

Finally, when utilizing difficulty as a means to adjust fees, I think we would have to wait to implement this post primary emissions, or phase it in somehow (perhaps couple K to block reward). Right now the difficulty is high (presumably) due to the high block reward - its a murky connection between cost of mining, valuation of XMR, and ideal cost of transacting.

< Track your bitcoins! > < Track them again! > <<< [url=https://www.reddit.com/r/Bitcoin/comments/1qomqt/what_a_landmark_legal_case_from_mid1700s_scotland/] What is fungibility? >>> 46P88uZ4edEgsk7iKQUGu2FUDYcdHm2HtLFiGLp1inG4e4f9PTb4mbHWYWFZGYUeQidJ8hFym2WUmWc p34X8HHmFS2LXJkf <<< Free subdomains at moneroworld.com!! >>> <<< If you don't want to run your own node, point your wallet to node.moneroworld.com, and get connected to a random node! @@@@ FUCK ALL THE PROFITEERS! PROOF OF WORK OR ITS A SCAM !!! @@@@

smooth

Legendary

Offline

Activity: 2968
Merit: 1198

August 21, 2015, 02:19:49 PM

#30

There should be a factor for block subsidy.

ArticMine

Legendary

Offline

Activity: 2282
Merit: 1050

Monero Core Team

Quote from: smooth on August 21, 2015, 02:19:49 PM

August 21, 2015, 04:48:48 PM

#31

There should be a factor for block subsidy.

Yes. In order for the difficulty to be a surrogate for the value of XMR one needs to factor in the actual block reward, So the formula would look as follows:

F = F₀*D₀*R_A/(D_A*R₀)

Where

F₀ is a constant comparable to the current per kb fee
R₀ is a constant comparable to the current block reward
R_A is the average block reward of the blocks N-2K to N-K
D₀ is a constant comparable to the current difficulty
D_A is the average difficulty of the blocks N-2K to N-K
N is the last block number
K is a constant for example 1000

One could replace R_A by the average of base block reward namely by netting out fees and penalties; however then it would not track value as closely particularly if fees were to dominate the actual block reward in the future.

The question of the responsiveness of the difficulty to a sudden increase in price is valid to a degree; however it is tempered by the fact that the increase would be temporary, the increase would hit XMR holders at a time when they are experiencing a significant increase in the value of their XMR, and in the worst case scenario any drop in transaction volume would likely temper the sudden price movement. Attacks based on manipulating the hash rate I do not see as an issue here simply because any impact on fees is extremely minor compared to the risk of a 51% type attack.

My main concern here is to not create a situation similar to what is currently happening in Bitcoin with the blocksize debate. It is fairly easy to get the necessary consensus to fork an 18 month old coin, it is quite another matter to try to undo the fork 5 years later; particularly when there is no immediate threat. This is the critical lesson we must learn form Bitcoin.

Concerned that blockchain bloat will lead to centralization? Storing less than 4 GB of data once required the budget of a superpower and a warehouse full of punched cards. https://upload.wikimedia.org/wikipedia/commons/8/87/IBM_card_storage.NARA.jpg https://en.wikipedia.org/wiki/Punched_card

smooth

Legendary

Offline

Activity: 2968
Merit: 1198

Quote from: ArticMine on August 21, 2015, 04:48:48 PM

August 21, 2015, 05:32:27 PM

#32

My main concern here is to not create a situation similar to what is currently happening in Bitcoin with the blocksize debate. It is fairly easy to get the necessary consensus to fork an 18 month old coin, it is quite another matter to try to undo the fork 5 years later; particularly when there is no immediate threat. This is the critical lesson we must learn form Bitcoin.

Fee rates don't need a fork or consensus though. Miners can mine whatever they want. People can set their wallets to use whatever fees they want. Relays can relay whatever they want, but if that becomes an impediment to miners and wallets, then wallets can connect directly to miners.

However, I still agree there is merit in default/recommended fees following some sort of metric rather than just an arbitrary number.

ArticMine

Legendary

Offline

Activity: 2282
Merit: 1050

Monero Core Team

Quote from: smooth on August 21, 2015, 05:32:27 PM

August 21, 2015, 05:44:53 PM

#33

Quote from: ArticMine on August 21, 2015, 04:48:48 PM

My main concern here is to not create a situation similar to what is currently happening in Bitcoin with the blocksize debate. It is fairly easy to get the necessary consensus to fork an 18 month old coin, it is quite another matter to try to undo the fork 5 years later; particularly when there is no immediate threat. This is the critical lesson we must learn form Bitcoin.

Fee rates don't need a fork or consensus though. Miners can mine whatever they want. People can set their wallets to use whatever fees they want. Relays can relay whatever they want, but if that becomes an impediment to miners and wallets, then wallets can connect directly to miners.

However, I still agree there is merit in default/recommended fees following some sort of metric rather than just an arbitrary number.

Thanks for the clarification. It is a critical distinction that the minimum fee is not enforced in the blockchain but rather by convention. among the nodes. So if this became an issue node could simple start relaying transactions with lower than the minimum fee without creating a hard fork.

Concerned that blockchain bloat will lead to centralization? Storing less than 4 GB of data once required the budget of a superpower and a warehouse full of punched cards. https://upload.wikimedia.org/wikipedia/commons/8/87/IBM_card_storage.NARA.jpg https://en.wikipedia.org/wiki/Punched_card

GingerAle (OP)

Legendary

Offline

Activity: 1260
Merit: 1008

Quote from: ArticMine on August 21, 2015, 04:48:48 PM

August 21, 2015, 06:03:20 PM

#34

Quote from: smooth on August 21, 2015, 02:19:49 PM

There should be a factor for block subsidy.

Yes. In order for the difficulty to be a surrogate for the value of XMR one needs to factor in the actual block reward, So the formula would look as follows:

F = F₀*D₀*R_A/(D_A*R₀)

Where

F₀ is a constant comparable to the current per kb fee
R₀ is a constant comparable to the current block reward
R_A is the average block reward of the blocks N-2K to N-K
D₀ is a constant comparable to the current difficulty
D_A is the average difficulty of the blocks N-2K to N-K
N is the last block number
K is a constant for example 1000

One could replace R_A by the average of base block reward namely by netting out fees and penalties; however then it would not track value as closely particularly if fees were to dominate the actual block reward in the future.

The question of the responsiveness of the difficulty to a sudden increase in price is valid to a degree; however it is tempered by the fact that the increase would be temporary, the increase would hit XMR holders at a time when they are experiencing a significant increase in the value of their XMR, and in the worst case scenario any drop in transaction volume would likely temper the sudden price movement. Attacks based on manipulating the hash rate I do not see as an issue here simply because any impact on fees is extremely minor compared to the risk of a 51% type attack.

My main concern here is to not create a situation similar to what is currently happening in Bitcoin with the blocksize debate. It is fairly easy to get the necessary consensus to fork an 18 month old coin, it is quite another matter to try to undo the fork 5 years later; particularly when there is no immediate threat. This is the critical lesson we must learn form Bitcoin.

Exactly. Thats why I wanted to get these conversations going. IMO, the fee thing is the most addressible component of the code that needs to be dehumanized ASAP.

Re: first bold - are we talking block reward (as in, fees + hard coded emission) or block subsidy (hard coded emission?)

Regarding these semantics, and the concept of using the difficulty + block reward as a surrogate for XMR value..

I think we could use fees per block as an additional surrogate (to fit somewhere in the equation). So, remove base block reward.

Ultimately, I think I'm just talking my way back to using the # of transactions / block as a surrogate for valuation. Because if we just use total fees, or fees / transaction, we run into the problem that high mixin transactions are huge, and thus have a higher fee, so the total reward per block can be skewed by an era of high mixin transactions.

here's a ~2X difficulty increase, where the emission has stopped and the average block reward is 2.2 (lots of transactions)

0.01 * 9e8 * 2.2 / (1.8e9 * 9) = 0.0012

here's a 2X difficulty increase, where the emission has stopped and average block reward is 0.2 (very few transactions)

0.01 * 9e8 * 0.2 / (1.8e9 * 9) = 1.1e-4

IMO, I don't agree with this behavior. To me, a lot of transactions indicate that the coin has higher value - the real world actually values the coin so it is used as a means to transfer value. Few transactions indicate the coin has less value, because the real world isn't using it that much. That the fee decreases is odd.

Perhaps it has to do with using the block subsidy and not just total fees. Here's using just the fees (using 0.001 as average fees for existing usage)

0.01 * 9e8 * 2.2 / (1.8e9 * 0.001) = 11

hrm...... did I do my math wrong? PMDAS

< Track your bitcoins! > < Track them again! > <<< [url=https://www.reddit.com/r/Bitcoin/comments/1qomqt/what_a_landmark_legal_case_from_mid1700s_scotland/] What is fungibility? >>> 46P88uZ4edEgsk7iKQUGu2FUDYcdHm2HtLFiGLp1inG4e4f9PTb4mbHWYWFZGYUeQidJ8hFym2WUmWc p34X8HHmFS2LXJkf <<< Free subdomains at moneroworld.com!! >>> <<< If you don't want to run your own node, point your wallet to node.moneroworld.com, and get connected to a random node! @@@@ FUCK ALL THE PROFITEERS! PROOF OF WORK OR ITS A SCAM !!! @@@@

MoneroMooo

Legendary

Offline

Activity: 1276
Merit: 1001

August 21, 2015, 08:28:22 PM

#35

Two points of interest:

- a moving min reward like this might cause delayed transactions to be spuriously rejected, if the min fee increases just after the tx is broadcasted. Maybe some kind of slack is warranted. However, if some slack is allowed, it is expected that some people will pay fees taking the slack into account, nnullifying the slack.

- some people might decide to "bet" on decreasing fees, and delay transactions till they can be sent with a lower fee. This might have knock on effects on the blockchain utilization: both in instantaneous decreased use, and burst use when the fees goes down.

ArticMine

Legendary

Offline

Activity: 2282
Merit: 1050

Monero Core Team

August 21, 2015, 10:07:54 PM

#36

The slack would be built into the spread between the minimum fee for relay and the suggested fee to the user. If the user goes out of her way to pay just the minimum fee then she would run the risk of her transaction not being relayed. The averaging over a number of blocks is designed to dampen the impact of a sudden rise in difficulty. I am not convinced that users would go out of their way to delay transactions in order to speculate on lower fees since the cost of delaying the transaction would be larger in most cases than any possible fee savings.

The problem I see with using fees paid as a measure of the value of XMR is that unlike the difficulty this could be easy to game. A spammer could gradually ramp up the spam since as the total fees per block increase then individual per kb fees would drop. This would leave the actual base reward before the inclusion of fees and / or penalties as the optimal parameter for R_A.

Concerned that blockchain bloat will lead to centralization? Storing less than 4 GB of data once required the budget of a superpower and a warehouse full of punched cards. https://upload.wikimedia.org/wikipedia/commons/8/87/IBM_card_storage.NARA.jpg https://en.wikipedia.org/wiki/Punched_card

GingerAle (OP)

Legendary

Offline

Activity: 1260
Merit: 1008

Quote from: ArticMine on August 21, 2015, 10:07:54 PM

August 23, 2015, 03:20:45 AM

#37

The slack would be built into the spread between the minimum fee for relay and the suggested fee to the user. If the user goes out of her way to pay just the minimum fee then she would run the risk of her transaction not being relayed. The averaging over a number of blocks is designed to dampen the impact of a sudden rise in difficulty. I am not convinced that users would go out of their way to delay transactions in order to speculate on lower fees since the cost of delaying the transaction would be larger in most cases than any possible fee savings.

The problem I see with using fees paid as a measure of the value of XMR is that unlike the difficulty this could be easy to game. A spammer could gradually ramp up the spam since as the total fees per block increase then individual per kb fees would drop. This would leave the actual base reward before the inclusion of fees and / or penalties as the optimal parameter for R_A.

I think the aspect that would make it easy to game is the timescale... the rolling window within which the average is calculated. I think if you made the window absurdly large it would become impossible to game. Perhaps 6 months in blocktime. The amount of spam required to ramp up over a 6 month period would be significant.

Hrm, and perhaps the other problem with using total fees paid as a measure of value of XMR is that the formula becomes recursive or is looped. The new minimum fee calculated will influence the total fees in the next block.

Gah, im trying to attack this with excel. its really uncharted territory. The only decentralized currency network we have to work with for example on how the difficulty will rise is bitcoin, and asics really borked that. Perhaps I need to study litecoin. I think if we could extract some models / functions from those data, we could get a better sense of how the system operates, and thus design the fee equation. Does anyone know how to do this? If you can get the raw data, I think I have some software that could pull some equations from the data.

< Track your bitcoins! > < Track them again! > <<< [url=https://www.reddit.com/r/Bitcoin/comments/1qomqt/what_a_landmark_legal_case_from_mid1700s_scotland/] What is fungibility? >>> 46P88uZ4edEgsk7iKQUGu2FUDYcdHm2HtLFiGLp1inG4e4f9PTb4mbHWYWFZGYUeQidJ8hFym2WUmWc p34X8HHmFS2LXJkf <<< Free subdomains at moneroworld.com!! >>> <<< If you don't want to run your own node, point your wallet to node.moneroworld.com, and get connected to a random node! @@@@ FUCK ALL THE PROFITEERS! PROOF OF WORK OR ITS A SCAM !!! @@@@

ArticMine

Legendary

Offline

Activity: 2282
Merit: 1050

Monero Core Team

August 24, 2015, 11:23:03 PM

#38

The Bitcoin price data going back to July 10, 2010 can be downloaded from CoinDesk http://www.coindesk.com/price/ It is the CoinDesk BPI. The difficulty data going back to January 3, 2009 can also be downloaded also from CoinDesk http://www.coindesk.com/data/bitcoin-mining-difficulty-time/. In both cases click on Export in the chart and select CSV Chart Data. There was some Bitcoin trading before July 18, 2010 on New Liberty Standard and Bitcoin Market both of which have gone defunct. The New Liberty Standard site is still up with the 2009 data. http://newlibertystandard.wikifoundry.com/page/2009+Exchange+Rate however the early 2010 data is now gone. I did manage to download the early 2010 data in November 2013 before it was lost. Send me a PM and I can send the file. LibreOffice (.ods) format only to preserve the download timestamp.

One thing to keep in mind is that difficulty measure price in terms of computing resources. As for the time period for Bitcoin the best is 2010 - 2012. Still there is an evolution here from CPU, GPU and then some FPGA before ASIC mining. One other option that would be clean is actually Monero over the last 14 months since in the Monero case the technology has not changed.

Concerned that blockchain bloat will lead to centralization? Storing less than 4 GB of data once required the budget of a superpower and a warehouse full of punched cards. https://upload.wikimedia.org/wikipedia/commons/8/87/IBM_card_storage.NARA.jpg https://en.wikipedia.org/wiki/Punched_card

GingerAle (OP)

Legendary

Offline

Activity: 1260
Merit: 1008

August 24, 2015, 11:52:06 PM

#39

sweet, thanks for the leads. Currently hunting down litecoin info.

edited to add: got the difficulty / time.

i think I can hack this to get price history... its somewhere in here

view-source:https://coinplorer.com/Charts?fromCurrency=LTC&toCurrency=USD

< Track your bitcoins! > < Track them again! > <<< [url=https://www.reddit.com/r/Bitcoin/comments/1qomqt/what_a_landmark_legal_case_from_mid1700s_scotland/] What is fungibility? >>> 46P88uZ4edEgsk7iKQUGu2FUDYcdHm2HtLFiGLp1inG4e4f9PTb4mbHWYWFZGYUeQidJ8hFym2WUmWc p34X8HHmFS2LXJkf <<< Free subdomains at moneroworld.com!! >>> <<< If you don't want to run your own node, point your wallet to node.moneroworld.com, and get connected to a random node! @@@@ FUCK ALL THE PROFITEERS! PROOF OF WORK OR ITS A SCAM !!! @@@@

smooth

Legendary

Offline

Activity: 2968
Merit: 1198