Bitcoin Forum
April 24, 2024, 08:44:45 AM *
News: Latest Bitcoin Core release: 27.0 [Torrent]
 
   Home   Help Search Login Register More  
Pages: [1] 2 »  All
  Print  
Author Topic: POW via training/validating Deep Neural Networks  (Read 338 times)
lordjulian (OP)
Newbie
*
Offline Offline

Activity: 15
Merit: 3


View Profile
June 19, 2018, 03:31:22 AM
Last edit: June 19, 2018, 03:41:39 AM by lordjulian
Merited by ABCbits (1)
 #1

Has anyone seen work attempted on using deep neural networks (DNN) training/validation as POW?

The basic idea is to let everyone solve a randomly picked (hashed) useful but difficult machine learning problem, which is in continuous supply, e.g., detecting top-1000 wanted fugitives in hundreds of thousands of live streaming public video footage.

POW can be in the form of DNN model optimization, where done work is submitted in the form of a Docker file plus a data model file containing the neural network trained weights and configuration file, that can be validated by anyone using the agreed-upon training dataset and validation methodology and running Docker or Kubernetes.

The lowest achieved 10-fold Cross-Validated error that has not been surpassed within a fixed period of say 1 hours, will be confirmed as the winner of POW, and vested to package transactions into the next block. This is reminiscent of the netflix challenge, except that here the train/test data is open.

Advantage of this approach:

1. ASIC resistant, because DNN are too varied and complex, and requires a full docker image to compute/deploy
2. Achieve a greater good, doing useful work for humanity
3. Promote machine learning / AI

We at GDOC (Global Data Ownership Chain) are contemplating this approach, but would like to solicit inputs from the powers that may be.

Thank you for sharing your ideas and feedback.
1713948285
Hero Member
*
Offline Offline

Posts: 1713948285

View Profile Personal Message (Offline)

Ignore
1713948285
Reply with quote  #2

1713948285
Report to moderator
1713948285
Hero Member
*
Offline Offline

Posts: 1713948285

View Profile Personal Message (Offline)

Ignore
1713948285
Reply with quote  #2

1713948285
Report to moderator
1713948285
Hero Member
*
Offline Offline

Posts: 1713948285

View Profile Personal Message (Offline)

Ignore
1713948285
Reply with quote  #2

1713948285
Report to moderator
Advertised sites are not endorsed by the Bitcoin Forum. They may be unsafe, untrustworthy, or illegal in your jurisdiction.
1713948285
Hero Member
*
Offline Offline

Posts: 1713948285

View Profile Personal Message (Offline)

Ignore
1713948285
Reply with quote  #2

1713948285
Report to moderator
1713948285
Hero Member
*
Offline Offline

Posts: 1713948285

View Profile Personal Message (Offline)

Ignore
1713948285
Reply with quote  #2

1713948285
Report to moderator
1713948285
Hero Member
*
Offline Offline

Posts: 1713948285

View Profile Personal Message (Offline)

Ignore
1713948285
Reply with quote  #2

1713948285
Report to moderator
andrew1carlssin
Jr. Member
*
Offline Offline

Activity: 168
Merit: 3

#Please, read:Daniel Ellsberg,-The Doomsday *wk


View Profile WWW
June 19, 2018, 04:41:38 AM
 #2

with back propagation we need lots of pull/push weight ...

quick recap

For symbolists, all intelligence can be reduced to manipulating symbols, in the same way that a mathematician solves equations by replacing expressions by other expressions. Symbolists understand that you can’t learn from scratch: you need some initial knowledge to go with the data. They’ve figured out how to incorporate preexisting knowledge into learning, and how to combine different pieces of knowledge on the fly in order to solve new problems. Their master algorithm is inverse deduction, which figures out what knowledge is missing in order to make a deduction go through, and then makes it as general as possible.

For connectionists, learning is what the brain does, and so what we need to do is reverse engineer it. The brain learns by adjusting the strengths of connections between neurons, and the crucial problem is figuring out which connections are to blame for which errors and changing them accordingly. The connectionists’ master algorithm is backpropagation, which compares a system’s output with the desired one and then successively changes the connections in layer after layer of neurons so as to bring the output closer to what it should be.

Evolutionaries believe that the mother of all learning is natural selection. If it made us, it can make anything, and all we need to do is simulate it on the computer. The key problem that evolutionaries solve is learning structure: not just adjusting parameters, like backpropagation does, but creating the brain that those adjustments can then fine-tune. The incomplete, and even contradictory information without falling apart.

The solution is probabilistic inference, and the master algorithm is Bayes’ theorem and its derivates. Bayes’ theorem tells us how to incorporate new evidence into our beliefs, and probabilistic inference algorithms do that as efficiently as possible.

For analogizers, the key to learning is recognizing similarities between situations and thereby inferring other similarities. If two patients have similar symptoms, perhaps they have the same disease. The key problem is judging how similar two things are. The analogizers’ master algorithm is the support vector machine, which figures out which experiences to remember and how to combine them to make new predictions.

Domingos, Pedro
AAAI

Satoshi's book editor; SCIpher - https://pdos.csail.mit.edu/archive/scigen/scipher.html
MuskShing
Newbie
*
Offline Offline

Activity: 8
Merit: 0


View Profile
June 19, 2018, 08:10:21 AM
 #3

which phase of GDOC that DNN would be deployed to ?
how much&what impacts would be come out after DNN's deployment for GDOC?

monsterer2
Full Member
***
Offline Offline

Activity: 351
Merit: 134


View Profile
June 19, 2018, 08:11:34 AM
Merited by ABCbits (1)
 #4

We at GDOC (Global Data Ownership Chain) are contemplating this approach, but would like to solicit inputs from the powers that may be.

Thank you for sharing your ideas and feedback.

Not easily achievable IMO. For PoW, you need two characteristics:

1) The solution must apply to data available 'within' the chain
2) Any proposed solution must be easily verifiable using only the solution itself

What you're proposing fails both of these two requirements.
HeRetiK
Legendary
*
Online Online

Activity: 2912
Merit: 2079


Cashback 15%


View Profile
June 19, 2018, 09:32:57 AM
Merited by ABCbits (3)
 #5

monsterer2 and ETFbitcoin pretty much stated the core of the matter. To expand on what they pointed out:

1) How to provide viable problems in a decentralized manner? Just picking one up at random from a previously agreed-upon set is not enough -- who provides the set? How does the set get agreed upon?

2) Requiring the likes of Docker and Kubernetes to verify transactions adds quite a overhead for running nodes. Also this opens up the question of how the datasets are provided to validating nodes in a tamper-proof and reliable way. Additionally the datasets would increase the overhead for running nodes even further.

3) It seems like you are suggesting block times of 1 hour which, given the flak Bitcoin occasionally gets for its 10 minutes, would definitely need to get reduced if such a cryptocurrency were to gain any form of traction.

4) How to keep block times steady? How to reliably know when 1 hour has passed without having to rely on an external, centralized oracle? Traditional PoW can easily quantify how much work is to be put into a block to keep block intervals steady. Time is derived from the timestamp of said blocks, without an external time source. How to quantify how much deep-learning PoW is to be put into a block?

.
.HUGE.
▄██████████▄▄
▄█████████████████▄
▄█████████████████████▄
▄███████████████████████▄
▄█████████████████████████▄
███████▌██▌▐██▐██▐████▄███
████▐██▐████▌██▌██▌██▌██
█████▀███▀███▀▐██▐██▐█████

▀█████████████████████████▀

▀███████████████████████▀

▀█████████████████████▀

▀█████████████████▀

▀██████████▀▀
█▀▀▀▀











█▄▄▄▄
▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀
.
CASINSPORTSBOOK
▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄
▀▀▀▀█











▄▄▄▄█
lordjulian (OP)
Newbie
*
Offline Offline

Activity: 15
Merit: 3


View Profile
June 19, 2018, 11:34:07 AM
 #6

which phase of GDOC that DNN would be deployed to ?
how much&what impacts would be come out after DNN's deployment for GDOC?



We are thinking of incorporating it into the POW itself, but this is still a thought.
lordjulian (OP)
Newbie
*
Offline Offline

Activity: 15
Merit: 3


View Profile
June 19, 2018, 11:40:13 AM
Last edit: June 19, 2018, 12:06:02 PM by lordjulian
 #7

Domingos, Pedro
AAAI[/tt]

Thanks for the nice synopsis of the state-of-the-art by Domingos. A probabilisitic ensemble (evolutionists) of:

symbolism and connectionist (deep learning)

would make a formidable system, with the weights tuned by analogizers similarity approach.

An example is the chatbot, who can output useful facts if it has knowledge (rule-based symbolism, e.g., what time you get off work? where do you live? Shanghai? oh, I have a friend in shanghai working in Zhangjiang area, etc.), but can also quibble with you on pleasantries.
lordjulian (OP)
Newbie
*
Offline Offline

Activity: 15
Merit: 3


View Profile
June 19, 2018, 11:48:53 AM
 #8


Not easily achievable IMO. For PoW, you need two characteristics:

1) The solution must apply to data available 'within' the chain
2) Any proposed solution must be easily verifiable using only the solution itself

What you're proposing fails both of these two requirements.

Agreed. Not easy to achieve. So far only prime coin does something useful, to the best of my knowledge.

1) Suppose we put 1000 different datasets on IPFS,  and a public chain like GDOC randomly picks a dataset to start the deep neural net training, training can take anywhere between few seconds to few days, but the idea is to let whoever achieves the smallest error in the shortest period of time, win the bounty, we have to define the bounty as, perhaps:

achieve accuracy verifiable and 10% better than 50% of all results out there, then he walks away with the bounty

2) Verification is relatively slow (compared to SHA, perhaps 100-1000 times slower) for a DNN, but can still be done in a matter of seconds. DNN is slow to train, but validating a trained model on data is very fast (seconds for thousands of samples). But verification requires a dataset, again from IPFS. So making the dataset easily readily accessible is important.
lordjulian (OP)
Newbie
*
Offline Offline

Activity: 15
Merit: 3


View Profile
June 19, 2018, 12:04:52 PM
Merited by ABCbits (1)
 #9

Using computational power for science-technology research is good idea instead only for calculating/search "meaningless" hash.
However, your idea won't work since there's no way to submit the task and verify the result without 3rd/centralized party help.

Even if you find way to make the data required for training available on network while keep decentralization, it would sacrifice decentralization/scaling since storage required for training data and computational power to verify the result would be far bigger than any PoW algorithm available today.

I share your views on doing useful work with computational power.

1) "miners" will periodically broadcast their model (very small file of weights), anyone can validate the results (with IPFS data) in a few seconds, and come up with a number/metric. No 3rd party is needed.

2) data will be submitted to a distributed repository like IPFS, monitored by a public chain, with "miners" randomly picking a dataset of their liking to train. Easy datasets are highly competitive. Hard (to learn) datasets are more challenging, so the dataset with the most number of miners picked (simple majority) could become the candidate dataset for the next block.

storage need not be centralized, as the dataset will have a hash. picking an obscure dataset to train may not win the miner any reward, so miner will monitor the broadcast stream to figure out what dataset everyone is training on.

Miners will periodically broadcast their current model result (10 fold Cross-validated results on the dataset), anyone can easily validate it. And the winner whose lead margin exceeds 50% of all miners by 10% will win the current round, and gets to seal the block.
lordjulian (OP)
Newbie
*
Offline Offline

Activity: 15
Merit: 3


View Profile
June 19, 2018, 12:34:36 PM
Merited by ABCbits (1)
 #10

monsterer2 and ETFbitcoin pretty much stated the core of the matter. To expand on what they pointed out:

1) How to provide viable problems in a decentralized manner? Just picking one up at random from a previously agreed-upon set is not enough -- who provides the set? How does the set get agreed upon?

2) Requiring the likes of Docker and Kubernetes to verify transactions adds quite a overhead for running nodes. Also this opens up the question of how the datasets are provided to validating nodes in a tamper-proof and reliable way. Additionally the datasets would increase the overhead for running nodes even further. Datasets will have a hash signature. So every competiting miner must work on the dataset with the same signature, otherwise it is a different dataset.

3) It seems like you are suggesting block times of 1 hour which, given the flak Bitcoin occasionally gets for its 10 minutes, would definitely need to get reduced if such a cryptocurrency were to gain any form of traction.

4) How to keep block times steady? How to reliably know when 1 hour has passed without having to rely on an external, centralized oracle? Traditional PoW can easily quantify how much work is to be put into a block to keep block intervals steady. Time is derived from the timestamp of said blocks, without an external time source. How to quantify how much deep-learning PoW is to be put into a block?


1) problems/datasets will be contributed by the populace. Selection of problems will also be decided by the miners; each miner pick a problem he is interested to work on, and broadcast his progress periodically. Picking simple problems risks immense competition, picking difficult problems will be easier to make progress if the miner has good hardware. The dataset picked by the majority of the network will become the dataset for sealing the next block. And competition will begin, with a check on everybody's progress every 1 minute, for example.

2) I agree, Kubernetes is a bit of an overkill, probably a Docker image running on a node is sufficient to do the POW or validation. validation should be enforced in the protocol, and given a DNN model weight file, a Dockerfile, and IPFS address to the dataset, anyone should be able to validate the results in tens of seconds. Note that testing a DNN is way faster than training it.

Training can take anywhere from seconds to days, so the goal is not to finish the training, but let the faster runner/leaper win the bounty of each round. Each round can be 1 minute, and see who beats 50% of the competition by a 10% margin in terms of 10-fold cross-validated classification validation error or Mean squared error (enforced by the protocol, with a fixed random key for the folds). and the competition will continue in the next block of 10 seconds to pick the next winner.

As DNN training converges, the margin of lead will diminish, until there is no more winner with a clear lead, than a new dataset will be selected and the next themed competition begins.

3) you are right, I should decrease the period to 1 minute, but then collecting all the reported results from the network may take time.

4) As mentioned above, we can just do a rain check every minute, and the best progressive learning node wins the round for that time period.
monsterer2
Full Member
***
Offline Offline

Activity: 351
Merit: 134


View Profile
June 19, 2018, 01:23:37 PM
Merited by ABCbits (1)
 #11


Not easily achievable IMO. For PoW, you need two characteristics:

1) The solution must apply to data available 'within' the chain
2) Any proposed solution must be easily verifiable using only the solution itself

What you're proposing fails both of these two requirements.

Agreed. Not easy to achieve. So far only prime coin does something useful, to the best of my knowledge.

1) Suppose we put 1000 different datasets on IPFS,  and a public chain like GDOC randomly picks a dataset to start the deep neural net training, training can take anywhere between few seconds to few days, but the idea is to let whoever achieves the smallest error in the shortest period of time, win the bounty, we have to define the bounty as, perhaps:

achieve accuracy verifiable and 10% better than 50% of all results out there, then he walks away with the bounty

2) Verification is relatively slow (compared to SHA, perhaps 100-1000 times slower) for a DNN, but can still be done in a matter of seconds. DNN is slow to train, but validating a trained model on data is very fast (seconds for thousands of samples). But verification requires a dataset, again from IPFS. So making the dataset easily readily accessible is important.


It can't be called a 'proof of work' if said proof is subjective. Using IPFS presents all kinds of attack scenarios on the validation and task data sets.
HeRetiK
Legendary
*
Online Online

Activity: 2912
Merit: 2079


Cashback 15%


View Profile
June 19, 2018, 01:27:16 PM
Merited by ABCbits (2)
 #12

1) problems/datasets will be contributed by the populace. Selection of problems will also be decided by the miners; each miner pick a problem he is interested to work on, and broadcast his progress periodically. Picking simple problems risks immense competition, picking difficult problems will be easier to make progress if the miner has good hardware. The dataset picked by the majority of the network will become the dataset for sealing the next block. And competition will begin, with a check on everybody's progress every 1 minute, for example.

How is it decided whether a dataset is "simple" or "difficult"? How to prevent a sybill attack on the dataset contribution process?


4) As mentioned above, we can just do a rain check every minute, and the best progressive learning node wins the round for that time period.

How to determine how much time has passed, ie. how to coordinate that the network checks every minute (and not any other arbitrary timeframe that may be beneficial to would-be adversaries)?

.
.HUGE.
▄██████████▄▄
▄█████████████████▄
▄█████████████████████▄
▄███████████████████████▄
▄█████████████████████████▄
███████▌██▌▐██▐██▐████▄███
████▐██▐████▌██▌██▌██▌██
█████▀███▀███▀▐██▐██▐█████

▀█████████████████████████▀

▀███████████████████████▀

▀█████████████████████▀

▀█████████████████▀

▀██████████▀▀
█▀▀▀▀











█▄▄▄▄
▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀
.
CASINSPORTSBOOK
▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄
▀▀▀▀█











▄▄▄▄█
lordjulian (OP)
Newbie
*
Offline Offline

Activity: 15
Merit: 3


View Profile
June 19, 2018, 05:02:51 PM
 #13


Not easily achievable IMO. For PoW, you need two characteristics:

1) The solution must apply to data available 'within' the chain
2) Any proposed solution must be easily verifiable using only the solution itself

What you're proposing fails both of these two requirements.

Agreed. Not easy to achieve. So far only prime coin does something useful, to the best of my knowledge.

1) Suppose we put 1000 different datasets on IPFS,  and a public chain like GDOC randomly picks a dataset to start the deep neural net training, training can take anywhere between few seconds to few days, but the idea is to let whoever achieves the smallest error in the shortest period of time, win the bounty, we have to define the bounty as, perhaps:

achieve accuracy verifiable and 10% better than 50% of all results out there, then he walks away with the bounty

2) Verification is relatively slow (compared to SHA, perhaps 100-1000 times slower) for a DNN, but can still be done in a matter of seconds. DNN is slow to train, but validating a trained model on data is very fast (seconds for thousands of samples). But verification requires a dataset, again from IPFS. So making the dataset easily readily accessible is important.


It can't be called a 'proof of work' if said proof is subjective. Using IPFS presents all kinds of attack scenarios on the validation and task data sets.

Proof is not subjective, but determined by a collection of results, and the one with the smallest error wins.

You are right, IPFS does open a pandora box of attack scenarios. maybe models should only be trained on datasets that meet a certain characteristics like: recency, age, size, transaction fee?
lordjulian (OP)
Newbie
*
Offline Offline

Activity: 15
Merit: 3


View Profile
June 19, 2018, 05:17:50 PM
 #14

1) problems/datasets will be contributed by the populace. Selection of problems will also be decided by the miners; each miner pick a problem he is interested to work on, and broadcast his progress periodically. Picking simple problems risks immense competition, picking difficult problems will be easier to make progress if the miner has good hardware. The dataset picked by the majority of the network will become the dataset for sealing the next block. And competition will begin, with a check on everybody's progress every 1 minute, for example.

How is it decided whether a dataset is "simple" or "difficult"? How to prevent a sybill attack on the dataset contribution process?

Evaluation of difficulty will depend on the individual miner's experience after reading the dataset's metadata and existing reported results (if any) recorded also on the block chain.

Sybill attacks can be discouraged by requiring each dataset that has a defined problem to come with an attached processing fee. X% of miners could collude to pick one fake dataset, and pretend to do work on it, where X is simply the dataset with largest number of miners (colluded). but in the end, only one winner earns the reward. But this does not prevent the remainder miners from also working on it, possibly with better algorithms or experience, and who may eventually win the competition.

This leads to another problem: if the dataset is artificially generated with a known formula or DNN by the colluder, then he already knows the answer (perfect fit model), so he could guarantee to be able to win. but he must then control enough miners
in order to make this dataset the chosen dataset for the next block, so in effect he will have to gather on average 30% of the node power in other to fake the win.

4) As mentioned above, we can just do a rain check every minute, and the best progressive learning node wins the round for that time period.

How to determine how much time has passed, ie. how to coordinate that the network checks every minute (and not any other arbitrary timeframe that may be beneficial to would-be adversaries)?


This timing will have to be solidified into the protocol's design, a fixed number, e.g. 2 minutes.

If a majority picked dataset is too simple, such that it completes training (converges) before 2 minutes elapsed, then there will be no clear winner, then the next dataset is picked as the candidate dataset and everything restarts?
monsterer2
Full Member
***
Offline Offline

Activity: 351
Merit: 134


View Profile
June 19, 2018, 05:36:39 PM
 #15

Proof is not subjective, but determined by a collection of results, and the one with the smallest error wins.

You are right, IPFS does open a pandora box of attack scenarios. maybe models should only be trained on datasets that meet a certain characteristics like: recency, age, size, transaction fee?

But the datasets containing the verification and the proof are external to the chain - the integrity of this external data is not guaranteed, therefore this entire process is subjective; you have to trust it is valid, which is the antitheses of cryptocurrency.
HeRetiK
Legendary
*
Online Online

Activity: 2912
Merit: 2079


Cashback 15%


View Profile
June 19, 2018, 06:15:37 PM
Merited by ABCbits (1)
 #16

Evaluation of difficulty will depend on the individual miner's experience after reading the dataset's metadata and existing reported results (if any) recorded also on the block chain.

How to prevent sybill attacks on the evaluation process? Or is reading and evaluating a dataset also subject to a processing fee?

How would existing reported results help in determining the difficulty of a challenge?

What metric is used to objectively define the difficulty of a challenge to begin with?


Sybill attacks can be discouraged by requiring each dataset that has a defined problem to come with an attached processing fee. X% of miners could collude to pick one fake dataset, and pretend to do work on it, where X is simply the dataset with largest number of miners (colluded). but in the end, only one winner earns the reward. But this does not prevent the remainder miners from also working on it, possibly with better algorithms or experience, and who may eventually win the competition.

How to bootstrap such a cryptocurrency if mining a block requires a processing fee? Ie. where do the first miners get their coins for paying the processing fee if no coins have yet been mined?

How to determine if a dataset is fake?

Note that when referring to sybill attacks I'm not even yet talking about the mining process. I'm talking about the dataset contribution process that happens beforehand, where no miner is yet involved.

Or are you suggesting that the miners solving the challenges should also be the ones contributing the datasets, each dataset submission being attached to a fee?


This leads to another problem: if the dataset is artificially generated with a known formula or DNN by the colluder, then he already knows the answer (perfect fit model), so he could guarantee to be able to win. but he must then control enough miners
in order to make this dataset the chosen dataset for the next block, so in effect he will have to gather on average 30% of the node power in other to fake the win.

How did you reach the conclusion that 30% of computational power would be sufficient for faking a challenge win? 30% seems an awfully low threshold for maintaining security.


This timing will have to be solidified into the protocol's design, a fixed number, e.g. 2 minutes.

If a majority picked dataset is too simple, such that it completes training (converges) before 2 minutes elapsed, then there will be no clear winner, then the next dataset is picked as the candidate dataset and everything restarts?

How to stop rogue clients from DDoSing the network by flodding it with wrong timestamps and turning 2 minute block intervals into 2 days or weeks or years?

.
.HUGE.
▄██████████▄▄
▄█████████████████▄
▄█████████████████████▄
▄███████████████████████▄
▄█████████████████████████▄
███████▌██▌▐██▐██▐████▄███
████▐██▐████▌██▌██▌██▌██
█████▀███▀███▀▐██▐██▐█████

▀█████████████████████████▀

▀███████████████████████▀

▀█████████████████████▀

▀█████████████████▀

▀██████████▀▀
█▀▀▀▀











█▄▄▄▄
▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀
.
CASINSPORTSBOOK
▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄
▀▀▀▀█











▄▄▄▄█
lordjulian (OP)
Newbie
*
Offline Offline

Activity: 15
Merit: 3


View Profile
June 20, 2018, 02:02:40 AM
Last edit: June 20, 2018, 03:07:28 AM by lordjulian
 #17

Proof is not subjective, but determined by a collection of results, and the one with the smallest error wins.

You are right, IPFS does open a pandora box of attack scenarios. maybe models should only be trained on datasets that meet a certain characteristics like: recency, age, size, transaction fee?

But the datasets containing the verification and the proof are external to the chain - the integrity of this external data is not guaranteed, therefore this entire process is subjective; you have to trust it is valid, which is the antitheses of cryptocurrency.

hmmm, in that case, how about putting a portion (10K samples) of the validation dataset on chain, rendering them immutable, at the start of each contest epoch.

A simple way to validate data integrity is to check the data signature hash.
lordjulian (OP)
Newbie
*
Offline Offline

Activity: 15
Merit: 3


View Profile
June 20, 2018, 02:40:38 AM
Last edit: June 20, 2018, 09:25:55 AM by lordjulian
 #18

Evaluation of difficulty will depend on the individual miner's experience after reading the dataset's metadata and existing reported results (if any) recorded also on the block chain.

How to prevent sybill attacks on the evaluation process? Or is reading and evaluating a dataset also subject to a processing fee?

How would existing reported results help in determining the difficulty of a challenge?

What metric is used to objectively define the difficulty of a challenge to begin with?


1) the evaluation process is similar to validating the hash signature of hashcash, there is zero cost in evaluation, as long as 51% of the miners are incentivized to be honest, the evaluation will be sound.

2) existing reported results could be results of previous epochs, e.g.,

dataset 236 (with hash signature fadc432ad.... and random seed 342)

epoch, datetime stamp, winner's MSE, winning margin over runner up
1         10:33                0.233             0.2
2         10:34                0.210             0.15
3         10:35                0.20               0.12
....

3) metric = 10-fold cross validated prediction / classification error on dataset 236 using random seed 342 (random seed is used to generate the same K=10 equal sized partitions of dataset 236).

3a) will someone manipulate the metric for his benefit? yes, only if the manipulated results help his own submission, but his submission results will be validated by everyone else.

3b) at the beginning of the challenge, with zero reported results, miners can run a quick training, to get the results of the first few epochs, and get a feel of the problem difficulty.

3c) Experts in the field, e.g., computer vision, after looking at the dataset, will know how difficult the problem is.

Sybill attacks can be discouraged by requiring each dataset that has a defined problem to come with an attached processing fee. X% of miners could collude to pick one fake dataset, and pretend to do work on it, where X is simply the dataset with largest number of miners (colluded). but in the end, only one winner earns the reward. But this does not prevent the remainder miners from also working on it, possibly with better algorithms or experience, and who may eventually win the competition.

How to bootstrap such a cryptocurrency if mining a block requires a processing fee? Ie. where do the first miners get their coins for paying the processing fee if no coins have yet been mined?

How to determine if a dataset is fake?

Note that when referring to sybill attacks I'm not even yet talking about the mining process. I'm talking about the dataset contribution process that happens beforehand, where no miner is yet involved.

Or are you suggesting that the miners solving the challenges should also be the ones contributing the datasets, each dataset submission being attached to a fee?

1) bootstrapping is similar to how bitcoin started, no miners, easy to control 51% of the hash power. Value of coin is low. the pioneers will have to bootstrap the chain by giving out coins, etc. or follow the footsteps of ETH. Need to do research on how these 2 started.

2) yes, authentic dataset creation is a tough problem. Dataset contribution can be constrained as follows:

only aggregate datasets are considered, i.e., 100 individual user profile pictures, contributed by 100 users. This will increase the sybil attack cost. The mining problem could be to classify the user pictures into male/female, different races, etc.

3) fake dataset if contributed by a single person, is hard to detect. That's why I suggested the above. Some one posting a dataset and solving it himself would not reap benefits, unless he mobilize an army of miners, as described earlier.

This leads to another problem: if the dataset is artificially generated with a known formula or DNN by the colluder, then he already knows the answer (perfect fit model), so he could guarantee to be able to win. but he must then control enough miners
in order to make this dataset the chosen dataset for the next block, so in effect he will have to gather on average 30% of the node power in other to fake the win.

How did you reach the conclusion that 30% of computational power would be sufficient for faking a challenge win? 30% seems an awfully low threshold for maintaining security.

30% is just a random number, assuming a free form nomination process: the highest voted candidate gets 30% votes, second highest 25%, 3rd 15%, etc.


How to stop rogue clients from DDoSing the network by flooding it with wrong timestamps and turning 2 minute block intervals into 2 days or weeks or years?


By attaching a cost/fee to each broadcasted result?
lordjulian (OP)
Newbie
*
Offline Offline

Activity: 15
Merit: 3


View Profile
June 20, 2018, 02:49:06 AM
Last edit: June 20, 2018, 03:02:42 AM by lordjulian
 #19

Has anyone seen work attempted on using deep neural networks (DNN) training/validation as POW?


Sergey Surkov proposed a more in-depth breakdown of doing SGD (Stochastic Gradien Descent) on mini-batch of data as POW:

https://medium.com/@sergey_surkov/random-a-marriage-between-crypto-and-ai-1a3f4aa752cad

I believe he was the initiator of a similar thread on bitcointalk.org:

https://bitcointalk.org/index.php?topic=2240148.0
monsterer2
Full Member
***
Offline Offline

Activity: 351
Merit: 134


View Profile
June 20, 2018, 09:04:22 AM
 #20

Proof is not subjective, but determined by a collection of results, and the one with the smallest error wins.

You are right, IPFS does open a pandora box of attack scenarios. maybe models should only be trained on datasets that meet a certain characteristics like: recency, age, size, transaction fee?

But the datasets containing the verification and the proof are external to the chain - the integrity of this external data is not guaranteed, therefore this entire process is subjective; you have to trust it is valid, which is the antitheses of cryptocurrency.

hmmm, in that case, how about putting a portion (10K samples) of the validation dataset on chain, rendering them immutable, at the start of each contest epoch.

A simple way to validate data integrity is to check the data signature hash.

You need both validation and proof data sets on chain for this to work objectively. And worse yet, they need to be generated by the blockchain itself, not some external party, otherwise you have more attack angles to worry about.
Pages: [1] 2 »  All
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!