Bitcoin Forum

Other => Meta => Topic started by: ptrk on September 28, 2019, 04:39:39 PM



Title: Readable merit dataset for your own evaluations
Post by: ptrk on September 28, 2019, 04:39:39 PM
I have read that some would like to perform merit evaluations themselves, but the data provided by theymos (https://bitcointalk.org/merit.txt.xz) is not very readable. That's why I wrote a script that provides the same data with readable date, time, category path, thread name and usernames (from & to).

I make the data freely available for everyone on Github. Have fun with the data analysis.

The data was created automatically, so there is no guarantee the data is consistent. It is based on raw data provided by theymos and LoyceV.



Full History (24th January 2018 and 31th January 2020)
299935 merit records
Github (https://raw.githubusercontent.com/ptrk01/bitcointalkorg_meritdata/master/merit_full_history.csv)

Subset History (23th May 2019 and 20th September 2019)
41948 merit records
Github (https://raw.githubusercontent.com/ptrk01/bitcointalkorg_meritdata/master/merits.csv)

October 2019 History
11912 merit records
Github (https://raw.githubusercontent.com/ptrk01/bitcointalkorg_meritdata/master/merit_october_2019.csv)

November 2019 History
18228 merit records
Github (https://raw.githubusercontent.com/ptrk01/bitcointalkorg_meritdata/master/merit_november_2019.csv)

December 2019 History
14734 merit records
Github (https://raw.githubusercontent.com/ptrk01/bitcointalkorg_meritdata/master/merit_december_2019.csv)

January 2020 History
16086 merit records
Github (https://raw.githubusercontent.com/ptrk01/bitcointalkorg_meritdata/master/merit_january_2020.csv)


Title: Re: Readable merit dataset for your own evaluations
Post by: nutildah on September 28, 2019, 05:00:00 PM
I sent you my last merit. Could you potentially do this for the entire history of the merit system? I'd like to compile my own database in a single Excel file. I know DdmrDdmr and LoyceV have done similar things but I do appreciate being able to import the data directly into Excel.


Title: Re: Readable merit dataset for your own evaluations
Post by: ptrk on September 28, 2019, 05:02:57 PM
I sent you my last merit. Could you potentially do this for the entire history of the merit system?

Thank you!
Sure, where do I get the entire history? I only found the data set of the last four months.


Title: Re: Readable merit dataset for your own evaluations
Post by: LoyceV on September 28, 2019, 05:15:47 PM
Sure, where do I get the entire history?
http://loyce.club/Merit/merit.all.txt (updated weekly, usually at the end of Friday).


Title: Re: Readable merit dataset for your own evaluations
Post by: DdmrDdmr on September 28, 2019, 06:23:29 PM
<...>
Currently, I publish all the sMerit TXs here: https://fusiontables.google.com/DataSource?docid=1wM2Op6_ol8_0iP0sDEemIGr9weKvIeLPvKsKMpFy#rows:id=1. The data is downloadable (File-> Dowload) as a csv, and is updated every Friday. The only issue is that I may not continue feeding that tool from December onwards, since it is going to be discontinued.

Prior to publishing the TXs there, I do upload them to internal Google Sheets such as there:
https://docs.google.com/spreadsheets/d/1GTngeRJlWSEg1bFY-z0S6nqZxwSGUsUTeaYREnOQieI/edit?usp=sharing (Part I)
https://docs.google.com/spreadsheets/d/1V7kW7q-dHIK-dJj7byUbBE1PLyVjmG5lQb_wJHxgUIU/edit?usp=sharing (Part II)

The above are Google Sheets, and cam be exported to Excel and csv amongst others. The reason for the file to be split is to make it easier to load on the Fusion Tables structure, but otherwise I would just feed a single Google spreadsheet.

Data is derived from the cumulative of merit.txt files that the forum published every Friday (it’s not a simple merge, since there are TX in common between files – 113 aprox in common out of the 120 days in each merit.txt. All the data is kept in a single database, applying each week’s cumulative file beforehand.

That is not all the info nevertheless, since forum structure tied to each message is obtained separately.


Title: Re: Readable merit dataset for your own evaluations
Post by: ptrk on October 01, 2019, 08:35:47 AM
I sent you my last merit. Could you potentially do this for the entire history of the merit system?

I uploaded the full history (from 25th January 2018 to 27th September 2019).


Title: Re: Readable merit dataset for your own evaluations
Post by: ptrk on November 02, 2019, 05:08:55 PM
Data sets were updated.

Full history file now contains merit records from 24th January 2018 to 31th October 2019.

Also, there is a file which contains merit records from October 2019 only.


Title: Re: Readable merit dataset for your own evaluations
Post by: PrimeNumber7 on November 02, 2019, 09:25:00 PM
Full History (24th January 2018 and 31th October 2019)
250887 merit records
Github (https://raw.githubusercontent.com/ptrk01/bitcointalkorg_meritdata/master/merit_full_history.csv)
I noticed that some data has a delimiter of a comma, while other data has a semi-colon as a delimiter. This makes it more difficult to analyze. It is also a best practice to use ID numbers instead of user generated names in CSV files because thread names, or usernames may contain the delimiter.

You can map names into your dataset after you analyze it for display purposes. This also helps remove any biases you may have with regards to what you are trying to prove.  

I put the entire merit dataset into a comma delimited CSV file with a header row and uploaded it here (http://s000.tinyupload.com/download.php?file_id=04149932748953484023&t=0414993274895348402308198).

edit: as an example, I believe the following line in your CSV file contains an incorrect name:
Quote
2018-09-17;07:02:12;1;;nkampala;BITSSA

I haven't looked, but I suspect there is also issues with the transactions involving the following UIDs:
['1053767', '1187433', '2307758', '2471646', '2471831']


Title: Re: Readable merit dataset for your own evaluations
Post by: ptrk on November 08, 2019, 09:12:12 PM
edit: as an example, I believe the following line in your CSV file contains an incorrect name:
Quote
2018-09-17;07:02:12;1;;nkampala;BITSSA

Thanks for letting me know about the issue.

I took a closer look. This is the corresponding raw data set.
Quote
1537167732   1   5025631.msg45813886   2093373   1053767

The issue with the double ; appears when the thread cannot be found. It seems the thread or post was deleted. In this case it is about this thread https://bitcointalk.org/index.php?topic=5025631.msg45813886.0 which is missed.

The second issue in the data record is the receiver's username. It is documented as BITSSA but instead it is BITSSA : BITCOIN EXCHANGE (https://bitcointalk.org/index.php?action=profile;u=1053767). I will adjust my script accordingly so that if there is a colon in the username (which is a very rare case), the complete username is documented.


Title: Re: Readable merit dataset for your own evaluations
Post by: ptrk on December 07, 2019, 07:28:33 PM
Data sets were updated.

Full history file now contains merit records from 24th January 2018 to 30th November 2019.

Also, there is a file which contains merit records from November 2019 only.

Find all file in https://github.com/ptrk01/bitcointalkorg_meritdata


Title: Re: Readable merit dataset for your own evaluations
Post by: ptrk on January 04, 2020, 03:12:10 PM
Data sets were updated. New file added with December 2019 merit data.

Full history file now contains merit records from 24th January 2018 to 31th December 2019.

Find all files https://github.com/ptrk01/bitcointalkorg_meritdata


Title: Re: Readable merit dataset for your own evaluations
Post by: ptrk on February 02, 2020, 11:20:15 AM
Data sets were updated. New file added with January 2020 merit data.

Full history file now contains merit records from 24th January 2018 to 31th January 2020.

Find all files https://github.com/ptrk01/bitcointalkorg_meritdata