theymos (OP)
Administrator
Legendary
Offline
Activity: 5250
Merit: 13093
|
Currently there are two big data dumps available which auto-update weekly, trust.txt.xz and merit.txt.xz. These auto-updating dumps are pretty easy to set up, so I was thinking that it might be a good idea to produce several more of these, perhaps in the end forming a "ghetto API". What dumps would be most useful? Some that I was thinking of were:
UID -> name, merit, potential activity, posts post ID -> topic ID, time, UID topic ID -> board ID, first post ID board ID -> board name
I'm not going to dump post contents in any form, since that would both be a massive file and it'd make things very easy for those annoying phishing mirror sites.
|
1NXYoJ5xU91Jp83XfVMHwwTUyZFK64BoAD
|
|
|
SFR10
Legendary
Offline
Activity: 3052
Merit: 3474
Crypto Swap Exchange
|
|
March 16, 2018, 05:00:58 AM |
|
What dumps would be most useful? Some that I was thinking of were:
UID -> name, merit, potential activity, posts
This (the rest, aren't that important). I hope accounts with 0 post/activity are excluded (to eliminate having a massive file for information that's not needed).
Can we get another weekly dump, in form of tracking the positive/negative ratings (ex. Sent from where and sent to where) and also knowing removed ratings from someone? (Credit goes to Vod, based on this thread).
|
|
|
|
MadZ
|
|
March 16, 2018, 05:11:17 AM |
|
It might be helpful to have a continuous version of the seclog without having to rely on archived pages.
|
|
|
|
botany
Legendary
Offline
Activity: 1582
Merit: 1064
|
|
March 16, 2018, 05:33:12 AM |
|
Currently there are two big data dumps available which auto-update weekly, trust.txt.xz and merit.txt.xz. These auto-updating dumps are pretty easy to set up, so I was thinking that it might be a good idea to produce several more of these, perhaps in the end forming a "ghetto API". What dumps would be most useful? Some that I was thinking of were:
UID -> name, merit, potential activity, posts post ID -> topic ID, time, UID topic ID -> board ID, first post ID board ID -> board name
I'm not going to dump post contents in any form, since that would both be a massive file and it'd make things very easy for those annoying phishing mirror sites.
Modlog definitely.
|
|
|
|
Quickseller
Copper Member
Legendary
Offline
Activity: 2926
Merit: 2347
|
|
March 17, 2018, 04:41:21 AM |
|
I might suggest dumping the post history of individual users/accounts. This could be restricted by rank and otherwise be rate limited. I think this would be difficult to recreate any meaningful mirror site with this information.
As others have mentioned, the security log would be beneficial. The mod log, not so much because of its limited information.
It would be helpful if users outboxes (and other folders) can be downloaded since they cannot be easily searched. Obviously downloading this information would be restricted to users who are logged into their own account.
|
|
|
|
MyIota
Jr. Member
Offline
Activity: 41
Merit: 5
PM me to buy my sig space.
|
|
March 17, 2018, 05:17:39 AM |
|
Just fyi,
You can see and gauge how much sMerit someone has simply by the transparency of the system. So that's a data dump hidden field.
You can calculate how much they've receieved versus how much they've sent... and from there you'll know how much sMerit they have left :/
|
PM me if you're interested in my signature space.
|
|
|
LoyceV
Legendary
Offline
Activity: 3360
Merit: 16975
Thick-Skinned Gang Leader and Golden Feather 2021
|
|
March 17, 2018, 09:01:20 AM Last edit: March 17, 2018, 09:57:01 AM by LoyceV |
|
UID -> name, merit, potential activity, posts I can think of a few: 1. Add "Activity" (not just "potential") 2. Add a banned-status to this list (ignore temporary bans) 3. Add either "merit earned" or "merit received for free at introduction" Side note: there are more than 200 usernames with a comma, this will make processing a CSV difficult. Can you make this a file with just UID and name?
|
|
|
|
sncc
|
|
March 17, 2018, 02:37:35 PM |
|
It seems to me that some local boards do not have sufficient smerit distribution, and it would be good to clarify that directly from data dump, which would help designing an appropriate distribution of merit sources. It would be useful to have
post ID, topic ID, board ID, merit
and check how much each local board is active and whether sufficient smerits are distributed. Of course spams and non-high-quality posts will be counted but I assume they are roughly proportional to the total number of posts.
|
|
|
|
DdmrDdmr
Legendary
Offline
Activity: 2366
Merit: 10871
There are lies, damned lies and statistics. MTwain
|
|
March 17, 2018, 05:01:52 PM |
|
It all comes down really to what needs to be found out. That is, building a set of questions that need to be answered and derive the raw data information that enables an aggregated or derived dataset to be queried for the answers.
Some questions are answerable by a snapshot of the data, whilst others require the inclusion on a timeframe and datestamps to resolve.
For example, in order to see how long it takes to rank up for members, we would need the whole history per UserId of rank changes <UserId, Rank, Activity, Date>, where the registry would only be necessary to be created when there is a user creation or a change in the Rank, being Date the associated timestamp. If we wanted to see this in relation to Merit, we would need to build a registry in the shape of <UserId, Rank, Activity, Merit, InitialMerit, Date > .
The other key factor is related to the current way in which data is stored. The raw data layout and capture process is part of the process to reach our solution goal. For example, if there is a trigger in the database that currently logs changes on the User Table for the <UserId, Rank, Activity, Date> record structure, the underlying table is direct and all that has to be done, once exported, is to select records that relate to a change in user’s rank (and ignore those that are a mere activity change).
If alas the underlying user table does not hold a historical record of changes (i.e. no logged timestamp historical), then the question of how long it takes to rank up would not be answerable or need to be crossed with other raw data from another table.
Questions that I would boldly put on the list due to sMerit introduction would be such as:
- What is the average time per Rank to rank-up?: before and after the introduction of the Merit system (this is not entirely comparable yet, since merit system is only a few months old so top Ranks are not comparable yet).
- How much sMerit is assigned per rank (from/to), per forum section, per forum subsection, in relation to number of posts in topic, in relation to topic heatness, in relation to post position in topic (quartiles for example), in relation to size of merited post, etc.
- How much sMerit is being withheld and for how long (averages).
- Round merit assignment candidate (from User A to User B and back -> That is derivable from current Merit.txt file as I’ve posted previously – it is not necessarily a cheat, but a source of study for such cases).
The match between a closed set of key questions to answer, and potential raw data structure should give us what additional files are required in my opinion.
|
|
|
|
suchmoon
Legendary
Offline
Activity: 3724
Merit: 8996
https://bpip.org
|
|
March 17, 2018, 09:30:36 PM |
|
Currently there are two big data dumps available which auto-update weekly, trust.txt.xz and merit.txt.xz. These auto-updating dumps are pretty easy to set up, so I was thinking that it might be a good idea to produce several more of these, perhaps in the end forming a "ghetto API". What dumps would be most useful? Some that I was thinking of were:
UID -> name, merit, potential activity, posts post ID -> topic ID, time, UID topic ID -> board ID, first post ID board ID -> board name
I'm not going to dump post contents in any form, since that would both be a massive file and it'd make things very easy for those annoying phishing mirror sites.
All of the above, plus Starting merit, starting sMerit, activity, rank for each user This should allow us to see who's doing well (or not) at sending merits. Ideally we would also want merit source info but you didn't seem to want to publish that. UID -> name, merit, potential activity, posts I can think of a few: 1. Add "Activity" (not just "potential") 2. Add a banned-status to this list (ignore temporary bans) 3. Add either "merit earned" or "merit received for free at introduction" Side note: there are more than 200 usernames with a comma, this will make processing a CSV difficult. Can you make this a file with just UID and name? Usernames should be double-quoted then, and double quotes should be doubled inside double quotes... Yes, CSV format sucks but there is an RFC document for it and most modern tools should be able to handle that.
|
|
|
|
1020kingz
Full Member
Offline
Activity: 350
Merit: 106
Telegram Moderator, Hire me
|
|
March 18, 2018, 01:58:26 AM |
|
Currently there are two big data dumps available which auto-update weekly, trust.txt.xz and merit.txt.xz. These auto-updating dumps are pretty easy to set up, so I was thinking that it might be a good idea to produce several more of these, perhaps in the end forming a "ghetto API". What dumps would be most useful? Some that I was thinking of were:
UID -> name, merit, potential activity, posts post ID -> topic ID, time, UID topic ID -> board ID, first post ID board ID -> board name
I'm not going to dump post contents in any form, since that would both be a massive file and it'd make things very easy for those annoying phishing mirror sites.
i think the UID -> name, merit, potential activity, post is useful. in this you can easily compile the post contents of a user and create an outbox for each user to be compiled into it and easy to look or search the users activity and recent post, also some useful ideas are suggested like this UID -> name, merit, potential activity, posts I can think of a few: 1. Add "Activity" (not just "potential") 2. Add a banned-status to this list (ignore temporary bans) 3. Add either "merit earned" or "merit received for free at introduction" Side note: there are more than 200 usernames with a comma, this will make processing a CSV difficult. Can you make this a file with just UID and name? you can also monitor the give and take of merits by each user. this is what i understand by this thread please feel free to correct me if im wrong.
|
|
|
|
mobilazy
Member
Offline
Activity: 308
Merit: 22
|
|
March 18, 2018, 08:45:12 AM |
|
I hope user zentdex will come up with some beautiful and informative charts. I'd love to see his posts.
Meanwhile, I will try to come up with something decent myself. That will be the perfect way to study data analyze.
|
---Bounty is a stupid use of my time---
|
|
|
JeremyB
|
|
March 18, 2018, 10:25:06 AM |
|
I was asking for such information here today and just see this thread now. I think all dumps related to forum architecture will be great to compute local boards stats. I am especially interested in analyses of this data which could point to sub-communities where the initial sMerit is exhausted and new sources are necessary, and people who might be good merit sources. This kind of requests would be easier to implement. And what about some automatic dump archiving to avoid several people to do the same?
|
|
|
|
esmanthra
|
|
March 18, 2018, 12:17:08 PM Last edit: March 18, 2018, 12:54:22 PM by esmanthra |
|
Recently someone asked about their account which was hacked in december, and I even didn't have a possibility to look at the date it happened (since it's gone from the page). So the security log dump would be indeed helpful.
|
|
|
|
Joel_Jantsen
Legendary
Offline
Activity: 1932
Merit: 1311
Get your game girl
|
|
March 18, 2018, 04:44:00 PM |
|
Currently there are two big data dumps available which auto-update weekly, trust.txt.xz and merit.txt.xz. These auto-updating dumps are pretty easy to set up, so I was thinking that it might be a good idea to produce several more of these, perhaps in the end forming a "ghetto API". What dumps would be most useful? Some that I was thinking of were:
How big of a operation is to auto-update the data on a daily basis ? I was thinking I could set-up end points which downloads the file daily and keep my source for the charts (whatever I choose to represent) updated every day. It would be great if you can send a status flag along with the account details like "active/inactive/banned".
|
|
|
|
TheBeardedBaby
Legendary
Offline
Activity: 2240
Merit: 3150
₿uy / $ell ..oeleo ;(
|
|
March 19, 2018, 10:28:34 AM |
|
For me it will be useful to get a data of all the users IDs posting in a specific topic and time, like in the ANN section. If we can get a UID and Time on a topic, I can easily check for ICO pumpers.
|
|
|
|
DdmrDdmr
Legendary
Offline
Activity: 2366
Merit: 10871
There are lies, damned lies and statistics. MTwain
|
|
March 24, 2018, 07:11:54 PM |
|
Is there a possibility of including the Rank in the merit.txt file or having another file to complement it so as to perform rank analysis tied to data in the merit.txt file? I've seen Zentdex managed to cross this information, but it's not in the public raw data files for general usage as far as I can see.
It's True that Rank will vary for some user's within the timeframe of data within the merit.txt file, but is would be a helpful source to breakdown data and comprehend it better.
|
|
|
|
Jet Cash
Legendary
Offline
Activity: 2744
Merit: 2462
https://JetCash.com
|
|
March 24, 2018, 07:31:22 PM |
|
Deleted posts that have been awarded merit.
|
Offgrid campers allow you to enjoy life and preserve your health and wealth. Save old Cars - my project to save old cars from scrapage schemes, and to reduce the sale of new cars. My new Bitcoin transfer address is - bc1q9gtz8e40en6glgxwk4eujuau2fk5wxrprs6fys
|
|
|
LoyceV
Legendary
Offline
Activity: 3360
Merit: 16975
Thick-Skinned Gang Leader and Golden Feather 2021
|
|
April 16, 2018, 11:50:34 AM |
|
Any follow up on this?
|
|
|
|
mobilazy
Member
Offline
Activity: 308
Merit: 22
|
|
April 16, 2018, 04:21:36 PM |
|
I wish it was in csv format as easiest one to work with. I'd love to practice my Seaborn skills what I learned from short Udemy course.
|
---Bounty is a stupid use of my time---
|
|
|
|