Weekly Consolidation Update 2 Containing Communications Direct From XAI Dev:
Previous consolidation (Week 1) :
https://bitcointalk.org/index.php?topic=864895.msg10301510#msg10301510There has been much progression since last week, most notably work towards an update that is due to be released very soon.
The following is a range of information that JoeMoz (XAI developer) has provided in various sources including Slack.
In relation to comments about him listing SharePoint on his LinkedIn profile:SharePoint just happens to be
one thing I am an expert in... and focused on for a couple years because frankly its the highest billable rates :wink:
Age old Opensource / Closedsource dilemma continues:I mean the tech we are doing is sufficiently advanced that anyone who would try to rip it off right out of the gate would probably wind up pushing broken stuff.
From an open source perspective, it would probably be a valuable contribution to the community at large, like DHT but i remember with XQN, within a few days there were a couple of clone coins with the profit explorer graph feature etc.
Choosing a Name for the Data Layer:I'm thinking of ditching the treespaces name i came up with, because it generates confusion with the biology stuff when you google it. I think i am going to call the entire data layer PlumeDB, because it is something that can be broken out and marketed for a lot more than just the AI stuff and it is essentially a full blown decentralized database engine running on top of the coin network funneling network communications over the bitcoin p2p protocol.
Where does the name plume come from?I thought it sounded cool! I was thinking of clouds of data so "databases" are plumes in the cloud.
More notes on fee structure:The way I am doing fees right now is..it is based on data reservations by kilobyte-hour. e.g. 0.0001 XAI/kbh when you initialize a new data plume, you put out a request for data reservations based on the estimated size of the database and the lifetime you want. You get responses from volunteer/available slave nodes that are willing to replicate your data index, and you have to pay the total kbh fee to each of the slaves by default
The actual data itself at a low level is distributed across all nodes via filesystem abstracted into DHT and all nodes participate for public data sets, you don't pay for data reservations, but consumers pay-per-use.
So, I am thinking maybe there should be a small fee for loading public data, as an anti-spam type of measure. The other thing i am thinking is that maybe all Atom data types (opencog atomspaces), should be public, to keep contributing to a large body-of-knowledge, seems maybe pointless to load Atoms that are private...hmm, so who should get the fee for public data loads... maybe the 1st relay node.
How is data loaded onto the Sapience distributed AI network?You load data either through RPC or the console.
More in-depth infomation related XAI and PlumeDB:
http://wiki.dfx.io/display/XAI/Sapience+AIFX+Homehttp://wiki.dfx.io/display/XAI/PlumeDBHint on Potential New Look:I'm doing something cool with it and doing the UI in QML/QT Quick, eventually i want to rewrite the entire wallet using it and ditch the existing hokey interface. It'll let us get a "responsive" UI on the android/different devices so stuff isn't sitting off the side of the screen etc., and get a wallet that actually looks like a modern app and not something from 1998.
Further development related comments:Although in the latest sapience source i have moved the leveldb dependency out from being code included in the source to being external dependency so i can use latest google leveldb from github + cross-compile for android easily. Just means an extra step, have to pull and compile leveldb separately and set the include/lib path. I'm trying to get it so i can use 1 .pro file i suppose on linux you could just install the libleveldb-dev package and do it that way, same as libboost-all-dev etc.
I should note that this is a test/beta build... so there's definitely TODO's etc. Under the hood tweaking and stuff that we might want to adjust like by default the low level DHT that is just doing mindless raw data storage will tries to get a penetration of 72 nodes for a given record but i don't know if we even have 72 live nodes on the network, let alone getting them on testnet and there's rebuild scenarios like what if all 8 slave nodes go down, having the network detect that and automatically assign replacements and rebuild trie indexes, etc.
There's really like 4 overlay networks running on top of each other at once... the low level DHT, a mapping & rebuild info DHT, the slave PHT's, and the master/originator the biggest thing we'll have to play around with in testing is seeing what the latency is like i know there will be latency, because it is a p2p database so for some use cases it might not be suitable, for others it might mean just adapting how you work with the data.
The more nodes the better... if we had a couple thousand nodes on the network for instance you can do better load balancing but just in general, basically _everything_ in the entire system is async.
Its just different i guess, from anything i've worked with to date at least :wink: should enable new scenarios actually a good analogy is using it is more like hitting web services asynchronously, instead of direct database access... but in exchange, you get the redundancy/massive scale-out/decentralization.
Is there anyway for the system to determine what nodes are closest?Well, there isn't any geolocation /location based proximity right now... but that is something i've been thinking about.
Uniqueness?As far as i know, this is the only (DHT) implementation that is running over the bitcon p2p protocol, even maidsafe DHT is a parallel/external implementation running over UDP.
Is that useful for anything outside of AI?There's like a million other things besides AI you can build on top of a decentralized DHT.
Process:The way i did it is each key in addition to a hash can have 3 attributes, and those get indexed by the slaves in PHTs so that is the value-added service you are getting from the slaves in exchange for the XAI/kilobyte-hour fee. It's key/value storage on the low raw DHT level but with 3 attribute components in the key. So like lets say i want to run a distributed aggregate range query to do a SUM across the data plume where attribute 2 is between A and B... the slaves provide the service of fast lookup to get the subset of infohashes that fall within the criteria, then you use those against DHT2 to get the possible nodes that have each one, that are then looked up against DHT1 to retrieve the individual records/values from the raw key/value store....So with something like a SUM, you slave can pull the subset of infohashes and then chunk them out into groups of say 100 and dole those out as individual compute operations across nodes, and then the results are concentrated and returned to the originator.
DHT1 and DHt2 are just levels within the 4(?) DHTs in PlumeDB?Layered dht's... yeah, sort of i mean, like for something like a torrent its pretty trivial, so a basic k/v dht works fine but as soon as you want to do anything more involved, you need more metadata etc.so you layer it on top. the raw DHT gives you the redundancy/resilience, etc. and can just focus on getting the values where they need to be
In response to a question on Slaves.Slaves are responsible for building that multi-rooted PHT the DHT only knows about hash256 + value...there's TODO's i'm probably not going to get to for the release tonight (today), like being able to configure how much of your free space you want to allocate so don't go loading gigantic data sets