average cardinality of shares per block problem:

We have encountered an interesting problem:

In Proof of Collaborative Work proposal, miners find shares with difficulties scaled down 10,000 times (hence, easier to mine) and submit them to the network to be accumulated lately both as proof of work and as the basis for block reward distribution. Proof of work is supposed to take place by accumulating shares to reach network target difficulty, cumulatively.

Each share is assigned a score proportional to the share's difficulty compared to target difficulty, like what we do here

Code:

     SCALE = 10000; //scale by which we utilize mining
     T = targetDifficulty; // calculated difficulty for current height
     /***********************************************/
     hash = ComputeHash(share);
     assert (hash <> 0 ); //preventing crash even in the most unlikely event of the hash being exactly 0 (2^-256 probability)
     ratio = T/hash;
     assert (ratio >= 1/SCALE); //shares should have passed this condition before approaching here
     If (ratio > 1 )
         score = 1; //much rare
     else 
         score = round(ratio, -5 ) ;

According to the code, shares are in [T , T/SCALE] and consequently scores are in [1/SCALE , 1] neighborhoods.

According to the algorithm, we prove work by presenting a collection of shares with sum of scores exceeding 1. (actually it is 95% but we use 1 for convenience here).

@tromp has calculated the expected value for score to be ln(SCALE)/SCALE:

Quote from: tromp on June 16, 2018, 06:31:44 PM

Quote from: aliashraf on June 16, 2018, 03:12:25 PM

Would you please do a complete rewrite of your proposed formula, ... for clarification? Substituting ln(1/mindiff) for ln(n) just makes no sense to me or I'm missing something here.

Let T be the target threshold determined by the difficulty adjustment,
and scale be some suitably big number like 10^4.

Let shares be hashes that fall into the interval [T, T*scale], and define their score as T / hash.
When accumulating shares until their sum score exceeds 1, one is interested in the expected score of a share.

This can be seen to equal 1/scale times the expected value of 1/x for a uniformly random real x in the interval [1/scale,1]. Considering the area under a share score, the latter satisfies (1-1/scale) E(1/x) = integral of 1/x dx from 1/scale to 1 = ln 1 - ln(1/scale) = ln(scale).

So the expected score is approximately ln(scale)/scale.

for SCALE = 10,000 it yields an expected value of 0.000921 for scores.

A raw estimation of the expected value (average) number of shares per block, which is of most interest, could be simply assuming zero variance and
and as we need n shares to exceed 1 cumulatively putting
n*ln(SCALE)/SCALE = 1
Hence
n= SCALE/ln(SCALE)

For 10,000 times scaling of difficulty the latter equation yields n~1086 for the expected number of shares per block, which is very encouraging for the protocol, implying very low overheads for high ratios of scaling BUT ...

As @tromp has shown later, this is not a correct assumption from the first hand to suppose the variance of distribution function for number of shares per block as being zero.

Typically, it is not a trivial problem and I'm not personally entitled to approach it in a formal persuasive way.

On the other hand I need the original topic to cover more general aspects of the protocol, so I decided to start a new thread and ask for more contributions by @tromp or other interested people with a good taste for probability theory to be continued here.

Highly appreciate any further assessment to this problem:

For a suitably large number of blocks, what is the expected value for number of shares per block where each block contains a collection of shares each scoring [1/SCALE, 1] in a uniform random distribution, where sum of the scores is supposed to exceed 1.