Bitcoin Forum
May 14, 2024, 05:44:50 AM *
News: Latest Bitcoin Core release: 27.0 [Torrent]
 
   Home   Help Search Login Register More  
Pages: [1] 2 »  All
  Print  
Author Topic: Additional data dumps?  (Read 925 times)
theymos (OP)
Administrator
Legendary
*
Offline Offline

Activity: 5194
Merit: 12985


View Profile
March 16, 2018, 04:13:52 AM
Merited by mprep (1), suchmoon (1), 1020kingz (1)
 #1

Currently there are two big data dumps available which auto-update weekly, trust.txt.xz and merit.txt.xz. These auto-updating dumps are pretty easy to set up, so I was thinking that it might be a good idea to produce several more of these, perhaps in the end forming a "ghetto API". What dumps would be most useful? Some that I was thinking of were:

 UID -> name, merit, potential activity, posts
 post ID -> topic ID, time, UID
 topic ID -> board ID, first post ID
 board ID -> board name

I'm not going to dump post contents in any form, since that would both be a massive file and it'd make things very easy for those annoying phishing mirror sites.

1NXYoJ5xU91Jp83XfVMHwwTUyZFK64BoAD
1715665490
Hero Member
*
Offline Offline

Posts: 1715665490

View Profile Personal Message (Offline)

Ignore
1715665490
Reply with quote  #2

1715665490
Report to moderator
1715665490
Hero Member
*
Offline Offline

Posts: 1715665490

View Profile Personal Message (Offline)

Ignore
1715665490
Reply with quote  #2

1715665490
Report to moderator
1715665490
Hero Member
*
Offline Offline

Posts: 1715665490

View Profile Personal Message (Offline)

Ignore
1715665490
Reply with quote  #2

1715665490
Report to moderator
Each block is stacked on top of the previous one. Adding another block to the top makes all lower blocks more difficult to remove: there is more "weight" above each block. A transaction in a block 6 blocks deep (6 confirmations) will be very difficult to remove.
Advertised sites are not endorsed by the Bitcoin Forum. They may be unsafe, untrustworthy, or illegal in your jurisdiction.
SFR10
Legendary
*
Offline Offline

Activity: 2996
Merit: 3429


Crypto Swap Exchange


View Profile WWW
March 16, 2018, 05:00:58 AM
Merited by Vod (2)
 #2

What dumps would be most useful? Some that I was thinking of were:

 UID -> name, merit, potential activity, posts
This (the rest, aren't that important). I hope accounts with 0 post/activity are excluded (to eliminate having a massive file for information that's not needed).

Can we get another weekly dump, in form of tracking the positive/negative ratings (ex. Sent from where and sent to where) and also knowing removed ratings from someone? (Credit goes to Vod, based on this thread).

█▀▀▀











█▄▄▄
▀▀▀▀▀▀▀▀▀▀▀
e
▄▄▄▄▄▄▄▄▄▄▄
█████████████
████████████▄███
██▐███████▄█████▀
█████████▄████▀
███▐████▄███▀
████▐██████▀
█████▀█████
███████████▄
████████████▄
██▄█████▀█████▄
▄█████████▀█████▀
███████████▀██▀
████▀█████████
▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀
c.h.
▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄
▀▀▀█











▄▄▄█
▄██████▄▄▄
█████████████▄▄
███████████████
███████████████
███████████████
███████████████
███░░█████████
███▌▐█████████
█████████████
███████████▀
██████████▀
████████▀
▀██▀▀
MadZ
Hero Member
*****
Offline Offline

Activity: 908
Merit: 657


View Profile
March 16, 2018, 05:11:17 AM
Merited by actmyname (1)
 #3

It might be helpful to have a continuous version of the seclog without having to rely on archived pages.
botany
Legendary
*
Offline Offline

Activity: 1582
Merit: 1064


View Profile
March 16, 2018, 05:33:12 AM
 #4

Currently there are two big data dumps available which auto-update weekly, trust.txt.xz and merit.txt.xz. These auto-updating dumps are pretty easy to set up, so I was thinking that it might be a good idea to produce several more of these, perhaps in the end forming a "ghetto API". What dumps would be most useful? Some that I was thinking of were:

 UID -> name, merit, potential activity, posts
 post ID -> topic ID, time, UID
 topic ID -> board ID, first post ID
 board ID -> board name

I'm not going to dump post contents in any form, since that would both be a massive file and it'd make things very easy for those annoying phishing mirror sites.

Modlog definitely.
Quickseller
Copper Member
Legendary
*
Offline Offline

Activity: 2870
Merit: 2301


View Profile
March 17, 2018, 04:41:21 AM
 #5

I might suggest dumping the post history of individual users/accounts. This could be restricted by rank and otherwise be rate limited. I think this would be difficult to recreate any meaningful mirror site with this information.

As others have mentioned, the security log would be beneficial. The mod log, not so much because of its limited information.

It would be helpful if users outboxes (and other folders) can be downloaded since they cannot be easily searched. Obviously downloading this information would be restricted to users who are logged into their own account.
MyIota
Jr. Member
*
Offline Offline

Activity: 41
Merit: 5

PM me to buy my sig space.


View Profile
March 17, 2018, 05:17:39 AM
 #6

Just fyi,

You can see and gauge how much sMerit someone has simply by the transparency of the system. So that's a data dump hidden field.

You can calculate how much they've receieved versus how much they've sent... and from there you'll know how much sMerit they have left :/

PM me if you're interested in my signature space.
LoyceV
Legendary
*
Offline Offline

Activity: 3304
Merit: 16655


Thick-Skinned Gang Leader and Golden Feather 2021


View Profile WWW
March 17, 2018, 09:01:20 AM
Last edit: March 17, 2018, 09:57:01 AM by LoyceV
 #7

UID -> name, merit, potential activity, posts
I can think of a few:
1. Add "Activity" (not just "potential")
2. Add a banned-status to this list (ignore temporary bans)
3. Add either "merit earned" or "merit received for free at introduction"

Side note: there are more than 200 usernames with a comma, this will make processing a CSV difficult. Can you make this a file with just UID and name?

sncc
Hero Member
*****
Offline Offline

Activity: 536
Merit: 513


View Profile
March 17, 2018, 02:37:35 PM
 #8

It seems to me that some local boards do not have sufficient smerit distribution, and it would be good to clarify that directly from data dump, which would help designing an appropriate distribution of merit sources.  It would be useful to have

post ID, topic ID, board ID, merit

and check how much each local board is active and whether sufficient smerits are distributed.  Of course spams and non-high-quality posts will be counted but I assume they are roughly proportional to the total number of posts.
DdmrDdmr
Legendary
*
Offline Offline

Activity: 2310
Merit: 10759


There are lies, damned lies and statistics. MTwain


View Profile WWW
March 17, 2018, 05:01:52 PM
Merited by suchmoon (1)
 #9

It all comes down really to what needs to be found out. That is, building a set of questions that need to be answered and derive the raw data information that enables an aggregated or derived dataset to be queried for the answers.

Some questions are answerable by a snapshot of the data, whilst others require the inclusion on a timeframe and datestamps to resolve.

For example, in order to see how long it takes to rank up for members, we would need the whole history per UserId  of rank changes <UserId, Rank, Activity, Date>, where the registry would only be necessary to be created when there is a user creation or a change in the Rank, being Date the associated timestamp.
If we wanted to see this in relation to Merit, we would need to build a registry in the shape of <UserId, Rank, Activity, Merit, InitialMerit, Date > .

The other key factor is related to the current way in which data is stored. The raw data layout and capture process is part of the process to reach our solution goal.
For example, if there is a trigger in the database that currently logs  changes on the User Table for the <UserId, Rank, Activity, Date> record structure, the underlying table is direct and all that has to be done, once exported, is to select records that relate to a change in user’s rank (and ignore those that are a mere activity change).

If alas the underlying user table does not hold a historical record of changes (i.e. no logged timestamp historical), then the question of how long it takes to rank up would not be answerable or need to be crossed with other raw data from another table.

Questions that I would boldly put on the list due to sMerit introduction would be such as:

- What is the average time per Rank to rank-up?: before and after the introduction of the Merit system (this is not entirely comparable yet, since merit system is only a few months old so top Ranks are not comparable yet).

- How much sMerit is assigned per rank (from/to), per forum section, per forum subsection, in relation to number of posts in topic, in relation to topic heatness, in relation to post position in topic (quartiles for example), in relation to size of merited post, etc.

- How much sMerit is being withheld and for how long (averages).

- Round merit assignment candidate (from User A to User B and back -> That is derivable from current Merit.txt file as I’ve posted previously – it is not necessarily a cheat, but a source of study for such cases).

The match between a closed set of key questions to answer, and potential raw data structure should give us what additional files are required in my opinion.
suchmoon
Legendary
*
Offline Offline

Activity: 3654
Merit: 8922


https://bpip.org


View Profile WWW
March 17, 2018, 09:30:36 PM
 #10

Currently there are two big data dumps available which auto-update weekly, trust.txt.xz and merit.txt.xz. These auto-updating dumps are pretty easy to set up, so I was thinking that it might be a good idea to produce several more of these, perhaps in the end forming a "ghetto API". What dumps would be most useful? Some that I was thinking of were:

 UID -> name, merit, potential activity, posts
 post ID -> topic ID, time, UID
 topic ID -> board ID, first post ID
 board ID -> board name

I'm not going to dump post contents in any form, since that would both be a massive file and it'd make things very easy for those annoying phishing mirror sites.

All of the above, plus

Starting merit, starting sMerit, activity, rank for each user

This should allow us to see who's doing well (or not) at sending merits. Ideally we would also want merit source info but you didn't seem to want to publish that.

UID -> name, merit, potential activity, posts
I can think of a few:
1. Add "Activity" (not just "potential")
2. Add a banned-status to this list (ignore temporary bans)
3. Add either "merit earned" or "merit received for free at introduction"

Side note: there are more than 200 usernames with a comma, this will make processing a CSV difficult. Can you make this a file with just UID and name?

Usernames should be double-quoted then, and double quotes should be doubled inside double quotes... Yes, CSV format sucks but there is an RFC document for it and most modern tools should be able to handle that.



1020kingz
Full Member
***
Offline Offline

Activity: 350
Merit: 106

Telegram Moderator, Hire me


View Profile
March 18, 2018, 01:58:26 AM
 #11

Currently there are two big data dumps available which auto-update weekly, trust.txt.xz and merit.txt.xz. These auto-updating dumps are pretty easy to set up, so I was thinking that it might be a good idea to produce several more of these, perhaps in the end forming a "ghetto API". What dumps would be most useful? Some that I was thinking of were:

 UID -> name, merit, potential activity, posts
 post ID -> topic ID, time, UID
 topic ID -> board ID, first post ID
 board ID -> board name

I'm not going to dump post contents in any form, since that would both be a massive file and it'd make things very easy for those annoying phishing mirror sites.
i think the UID -> name, merit, potential activity, post is useful. in this you can easily compile the post contents of a user and create an outbox for each user to be compiled into it and easy to look or search the users activity and recent post, also some useful ideas are suggested like this
UID -> name, merit, potential activity, posts
I can think of a few:
1. Add "Activity" (not just "potential")
2. Add a banned-status to this list (ignore temporary bans)
3. Add either "merit earned" or "merit received for free at introduction"

Side note: there are more than 200 usernames with a comma, this will make processing a CSV difficult. Can you make this a file with just UID and name?
you can also monitor the give and take of merits by each user. this is what i understand by this thread please feel free to correct me if im wrong.
mobilazy
Member
**
Offline Offline

Activity: 308
Merit: 22


View Profile
March 18, 2018, 08:45:12 AM
 #12

I hope user zentdex will come up with some beautiful and informative charts. I'd love to see his posts.

Meanwhile, I will try to come up with something decent myself. That will be the perfect way to study data analyze.

---Bounty is a stupid use of my time---
JeremyB
Sr. Member
****
Offline Offline

Activity: 812
Merit: 270



View Profile
March 18, 2018, 10:25:06 AM
 #13

I was asking for such information here today and just see this thread now.

I think all dumps related to forum architecture will be great to compute local boards stats.

I am especially interested in analyses of this data which could point to sub-communities where the initial sMerit is exhausted and new sources are necessary, and people who might be good merit sources.

This kind of requests would be easier to implement.

And what about some automatic dump archiving to avoid several people to do the same?
esmanthra
Hero Member
*****
Offline Offline

Activity: 504
Merit: 732


View Profile
March 18, 2018, 12:17:08 PM
Last edit: March 18, 2018, 12:54:22 PM by esmanthra
 #14

Recently someone asked about their account which was hacked in december, and I even didn't have a possibility to look at the date it happened (since it's gone from the page). So the security log dump would be indeed helpful.
Joel_Jantsen
Legendary
*
Offline Offline

Activity: 1876
Merit: 1308

Get your game girl


View Profile
March 18, 2018, 04:44:00 PM
 #15

Currently there are two big data dumps available which auto-update weekly, trust.txt.xz and merit.txt.xz. These auto-updating dumps are pretty easy to set up, so I was thinking that it might be a good idea to produce several more of these, perhaps in the end forming a "ghetto API". What dumps would be most useful? Some that I was thinking of were:
How big of a operation is to auto-update the data on a daily basis ? I was thinking I could set-up end points which downloads the file daily and keep my source for the charts (whatever I choose to represent) updated every day.

It would be great if you can send a status flag along with the account details like "active/inactive/banned".
TheBeardedBaby
Legendary
*
Offline Offline

Activity: 2184
Merit: 3134


₿uy / $ell


View Profile
March 19, 2018, 10:28:34 AM
 #16

For me it will be useful to get a data of all the users IDs posting in a specific topic and time, like in the ANN section.
If we can get a UID and Time on a topic, I can easily check for ICO pumpers.

DdmrDdmr
Legendary
*
Offline Offline

Activity: 2310
Merit: 10759


There are lies, damned lies and statistics. MTwain


View Profile WWW
March 24, 2018, 07:11:54 PM
 #17

Is there a possibility of including the Rank in the merit.txt file or having another file to complement it so as to perform rank analysis tied to data in the merit.txt file?
I've seen Zentdex managed to cross this information, but it's not in the public raw data files for general usage as far as I can see.

It's True that Rank will vary for some user's within the timeframe of data within the merit.txt file, but is would be a helpful source to breakdown data and comprehend it better.
Jet Cash
Legendary
*
Offline Offline

Activity: 2716
Merit: 2457


https://JetCash.com


View Profile WWW
March 24, 2018, 07:31:22 PM
 #18

Deleted posts that have been awarded merit.

Offgrid campers allow you to enjoy life and preserve your health and wealth.
Save old Cars - my project to save old cars from scrapage schemes, and to reduce the sale of new cars.
My new Bitcoin transfer address is - bc1q9gtz8e40en6glgxwk4eujuau2fk5wxrprs6fys
LoyceV
Legendary
*
Offline Offline

Activity: 3304
Merit: 16655


Thick-Skinned Gang Leader and Golden Feather 2021


View Profile WWW
April 16, 2018, 11:50:34 AM
 #19

Any follow up on this?

mobilazy
Member
**
Offline Offline

Activity: 308
Merit: 22


View Profile
April 16, 2018, 04:21:36 PM
 #20

I wish it was in csv format as easiest one to work with. I'd love to practice my Seaborn skills what I learned from short Udemy course.


---Bounty is a stupid use of my time---
Pages: [1] 2 »  All
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!