Bitcoin Forum
March 29, 2024, 05:30:59 AM *
News: Latest Bitcoin Core release: 26.0 [Torrent]
 
   Home   Help Search Login Register More  
Pages: [1] 2 »  All
  Print  
Author Topic: Open scraped data of all the users - SQL Lite DB - 2.481.270 users  (Read 929 times)
Piggy (OP)
Hero Member
*****
Offline Offline

Activity: 784
Merit: 1416



View Profile WWW
November 09, 2018, 08:43:47 AM
Last edit: December 07, 2018, 09:10:49 AM by Piggy
Merited by Welsh (10), malevolent (6), Micio (6), TMAN (5), suchmoon (4), LoyceV (4), DdmrDdmr (4), mOgliE (3), r_victory (2), Daniel91 (1), TheBeardedBaby (1), cryptovigi (1), mazdafunsun (1)
 #1

December data is available here: https://drive.google.com/open?id=1mGEk6V3c_D-IhSYbuJvPrGGVEWLb0V8L

There is both raw data and the SQL Lite DB. There are now 2.481.270 users in it and about 44.206 new users since the last run.


Since from time to time people get interested in having all user data available for different purposes, but they don’t know how to scrape it or they scrape each time their own (and can take a really long time to do it), i thought i would do it and share the data with everybody.
We had already some great thread showing different kind of stories and aspects of users over time, i’m sure somebody will come up with something new we have't seen so far (this contain also a fresh snapshot of the trust details of all users) or can simply be useful to update own user data.

The data was taken between the 3th and 8th of November, there are 2.437.064 users.

This is the table and data available for each user in the database, which you can use to run your query as per example below:
Code:
TABLE UserData(
UserId Integer PRIMARY KEY,
UserName TEXT,
Trust TEXT,
Merit Integer,
Posts Integer,
Activity Integer,
Rank TEXT,
DateRegistered Integer,
LastActive Integer,
ICQ TEXT,
AIM TEXT,
MSN TEXT,
YIM TEXT,
Email TEXT,
Website TEXT,
Bitcoinaddress TEXT,
Gender TEXT,
Age TEXT,
Location TEXT,
LastUpdatedData Integer
);


You can download the data here, both db and raw(db is about 366MB): https://drive.google.com/open?id=1l0xz1OC4mc3FvXzX18scSbSTzsnUkD4f


Quote
Here is a brief description of the necessary step to use the database:

Download the Precompiled Binaries for your favourite platform here (we just need the sqlite3):
https://www.sqlite.org/download.html
Quote
"A bundle of command-line tools for managing SQLite database files, including the command-line shell program, the sqldiff program, and the sqlite3_analyzer program."

If you want to know more in detail how to use Sqlite3 directly: https://www.sqlite.org/cli.html

Example for usage under Windows platform, from command line:

Put the database, scripts and the sqlite3.exe in the same folder and you have everything you need to start:

Query.bat file:
Code:
sqlite3 btctalk_full.db < CommandsQuery.txt

CommandsQuery.txt:
Code:
.mode csv
.separator "\t"
.output ResultQuery.csv
SELECT * FROM UserData WHERE Activity > 100 AND LastActive <= '2015-01-01';
.output stdout
.quit

Now from command line call Query.bat and after a bit of crunching you will get the data in the ResultQuery.csv tab separated.


1711690259
Hero Member
*
Offline Offline

Posts: 1711690259

View Profile Personal Message (Offline)

Ignore
1711690259
Reply with quote  #2

1711690259
Report to moderator
1711690259
Hero Member
*
Offline Offline

Posts: 1711690259

View Profile Personal Message (Offline)

Ignore
1711690259
Reply with quote  #2

1711690259
Report to moderator
Be very wary of relying on JavaScript for security on crypto sites. The site can change the JavaScript at any time unless you take unusual precautions, and browsers are not generally known for their airtight security.
Advertised sites are not endorsed by the Bitcoin Forum. They may be unsafe, untrustworthy, or illegal in your jurisdiction.
DdmrDdmr
Legendary
*
Offline Offline

Activity: 2268
Merit: 10640


There are lies, damned lies and statistics. MTwain


View Profile WWW
November 09, 2018, 11:42:18 AM
Last edit: November 09, 2018, 11:54:38 AM by DdmrDdmr
Merited by Welsh (2)
 #2

<...>
Nice initiative Piggy. My environment is normally SQL Server, so I’ve tried to get the data loaded there. In order to do so, I have installed a browser that let’s me see the data in the btctalk_full.db file in a MS SQL Server studio manner. The DB contains 2 tables:

MeritData -> 112.345 records
UserData -> 2.437.064 records.

The only  think I can’t seem to figure out is why the dates on the UserData table show only as years (i.e. 2018, 2017, etc.) , and not as full dates. The fields are defined as bigints, so I assumed they contained the Unix timestamp. Theoretically, I’m looking at the original database, and the installed environment lets me see the content, so it should not have changed the date fields.

Are they full Unix dates or just the year on the UserData table?


P.D. Great scraping speed there (I figure at least 5 processes running 24/7)

..JAMBLER.io..Create Your Bitcoin Mixing
Business Now for   F R E E 
▄█████████████████████████████
█████████████████████████
████▀████████████████████
███▀█████▄█▀███▀▀▀██████
██▀█████▄█▄██████████████
██▄▄████▀▄▄▄▀▀▀▀▀▄▄██████
█████▄▄▄██████████▀▄████
█████▀▄█▄██████▀█▄█████
███████▀▄█▀█▄██▀█▄███████
█████████▄█▀▄█▀▄█████████
█████████████████████████
█████████████████████████
▀█████████████████████████████
█████████████████████████████████████████████████
.
      OUR      
PARTNERS

.
█████████████████████████████████████████████████
████▄
██
██
██
██
██
██
██
██
██
██
██
████▀
▄█████████████████████████████
████████▀▀█████▀▀████████
█████▀█████████████▀█████
████████████████████████
███████████████▄█████████
█████████████████████████
█████████████████████████
█████████████████████████
███████████████▀█████████
████████████████████████
█████▄█████████████▄█████
████████▄▄█████▄▄████████
▀█████████████████████████████
█████████████████████████████████████████████████
.
   INVEST   
BITCOIN

.
█████████████████████████████████████████████████
████▄
██
██
██
██
██
██
██
██
██
██
██
████▀
Piggy (OP)
Hero Member
*****
Offline Offline

Activity: 784
Merit: 1416



View Profile WWW
November 09, 2018, 12:16:04 PM
 #3

the dates are originally in UTC format, i just uploaded the raw files(raw.zip) where i saved all the data while scraping, i figure it's going to be easier to import the data in SQL Server with these.

You need to remember to clean the table after you import the data, you should end up with some record full of NULL for all the fields beside the user id as the raw files contains empty lines (some user id does not correspond to any user account)

the raw data looks like this:
Code:
450001 RubenDitte 0: -0 / +0 0 0 0 Brand new 2015-02-18T16:33:21 2015-02-18T18:33:06 hidden How Any Girl Can Develop into Abundant 88 Woodwark Crescent N/A Australia, Kirrama 2018-11-03T09:55:25.575278
400001 JetbuyingMoneypak 0: -0 / +0 0 2 2 Newbie 2014-12-03T13:43:16 2016-03-15T04:49:08 jetgogoing hidden Male N/A 2018-11-03T09:55:25.575182
250001 Noobi3 0: -0 / +0 0 1 1 Newbie 2014-02-14T14:47:17 2014-02-14T18:47:00 hidden N/A 2018-11-03T09:55:25.574999
100001 jukee 0: -0 / +0 0 0 0 Brand new 2013-04-14T23:59:11 2017-07-28T18:02:10 hidden N/A 2018-11-03T09:55:25.646278
350001 Kekinos 0: -0 / +0 0 0 0 Brand new 2014-06-30T14:02:44 2014-06-30T14:02:44 hidden N/A 2018-11-03T09:55:25.67539
LoyceV
Legendary
*
Offline Offline

Activity: 3262
Merit: 16316


Thick-Skinned Gang Leader and Golden Feather 2021


View Profile WWW
November 09, 2018, 12:50:06 PM
 #4

I love the raw data, I found myself an imposter!

Will you update this once in a while? How did you scrape this many profiles in just 5 days? Did you use 6 different IP-addresses? I can only download one page per second from Bitcointalk.

█▀▀▀











█▄▄▄
▀▀▀▀▀▀▀▀▀▀▀
e
▄▄▄▄▄▄▄▄▄▄▄
█████████████
████████████▄███
██▐███████▄█████▀
█████████▄████▀
███▐████▄███▀
████▐██████▀
█████▀█████
███████████▄
████████████▄
██▄█████▀█████▄
▄█████████▀█████▀
███████████▀██▀
████▀█████████
▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀
c.h.
▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄
▀▀▀█











▄▄▄█
▄██████▄▄▄
█████████████▄▄
███████████████
███████████████
███████████████
███████████████
███░░█████████
███▌▐█████████
█████████████
███████████▀
██████████▀
████████▀
▀██▀▀
Piggy (OP)
Hero Member
*****
Offline Offline

Activity: 784
Merit: 1416



View Profile WWW
November 09, 2018, 01:05:44 PM
 #5

I love the raw data, I found myself an imposter!

Will you update this once in a while? How did you scrape this many profiles in just 5 days? Did you use 6 different IP-addresses? I can only download one page per second from Bitcointalk.

yes, you need to use different ips, with multiple requests per second you can cut down the time quite significantly. For this kind of job just one call per second it just takes too long to finish.

It should not be a problem to re run once in while.
DdmrDdmr
Legendary
*
Offline Offline

Activity: 2268
Merit: 10640


There are lies, damned lies and statistics. MTwain


View Profile WWW
November 09, 2018, 01:30:08 PM
 #6

Thanks for the raw files @Piggy.

I managed to load them onto SQL Server using MS Access as a bridge, since direct import was giving me a hassle. Once I merged all the data, the raw table has  2.563.660 records, out of which 116.071 are the ones you mentioned as baring no associated real record (16K real missing IDs on Bitcointalk, and the rest are 100k IDs greater than the largest UserId –> excedent scrape IDs).

When I have some time I’ll cleanse the data and take a look if there’s anything interesting to derive.

..JAMBLER.io..Create Your Bitcoin Mixing
Business Now for   F R E E 
▄█████████████████████████████
█████████████████████████
████▀████████████████████
███▀█████▄█▀███▀▀▀██████
██▀█████▄█▄██████████████
██▄▄████▀▄▄▄▀▀▀▀▀▄▄██████
█████▄▄▄██████████▀▄████
█████▀▄█▄██████▀█▄█████
███████▀▄█▀█▄██▀█▄███████
█████████▄█▀▄█▀▄█████████
█████████████████████████
█████████████████████████
▀█████████████████████████████
█████████████████████████████████████████████████
.
      OUR      
PARTNERS

.
█████████████████████████████████████████████████
████▄
██
██
██
██
██
██
██
██
██
██
██
████▀
▄█████████████████████████████
████████▀▀█████▀▀████████
█████▀█████████████▀█████
████████████████████████
███████████████▄█████████
█████████████████████████
█████████████████████████
█████████████████████████
███████████████▀█████████
████████████████████████
█████▄█████████████▄█████
████████▄▄█████▄▄████████
▀█████████████████████████████
█████████████████████████████████████████████████
.
   INVEST   
BITCOIN

.
█████████████████████████████████████████████████
████▄
██
██
██
██
██
██
██
██
██
██
██
████▀
DdmrDdmr
Legendary
*
Offline Offline

Activity: 2268
Merit: 10640


There are lies, damned lies and statistics. MTwain


View Profile WWW
November 12, 2018, 12:43:46 PM
 #7

@Piggy, what could be interesting to do, is to have another data extraction after, let’s say, a month or so after your current dataset, in order to be able to use both datasets to see if some meaningful insights are derivable.

With one dataset, the core derivable data will be similar to part of what @mazdafunsun posted on his OPs, but two datasets allow for the time dimension to play a role. I can see a couple of additional things I can derive with just one dataset, but two, with a one month interval in between or so, would be smashing if you have the time.

..JAMBLER.io..Create Your Bitcoin Mixing
Business Now for   F R E E 
▄█████████████████████████████
█████████████████████████
████▀████████████████████
███▀█████▄█▀███▀▀▀██████
██▀█████▄█▄██████████████
██▄▄████▀▄▄▄▀▀▀▀▀▄▄██████
█████▄▄▄██████████▀▄████
█████▀▄█▄██████▀█▄█████
███████▀▄█▀█▄██▀█▄███████
█████████▄█▀▄█▀▄█████████
█████████████████████████
█████████████████████████
▀█████████████████████████████
█████████████████████████████████████████████████
.
      OUR      
PARTNERS

.
█████████████████████████████████████████████████
████▄
██
██
██
██
██
██
██
██
██
██
██
████▀
▄█████████████████████████████
████████▀▀█████▀▀████████
█████▀█████████████▀█████
████████████████████████
███████████████▄█████████
█████████████████████████
█████████████████████████
█████████████████████████
███████████████▀█████████
████████████████████████
█████▄█████████████▄█████
████████▄▄█████▄▄████████
▀█████████████████████████████
█████████████████████████████████████████████████
.
   INVEST   
BITCOIN

.
█████████████████████████████████████████████████
████▄
██
██
██
██
██
██
██
██
██
██
██
████▀
Piggy (OP)
Hero Member
*****
Offline Offline

Activity: 784
Merit: 1416



View Profile WWW
November 12, 2018, 04:15:06 PM
 #8

@Piggy, what could be interesting to do, is to have another data extraction after, let’s say, a month or so after your current dataset, in order to be able to use both datasets to see if some meaningful insights are derivable.

With one dataset, the core derivable data will be similar to part of what @mazdafunsun posted on his OPs, but two datasets allow for the time dimension to play a role. I can see a couple of additional things I can derive with just one dataset, but two, with a one month interval in between or so, would be smashing if you have the time.

Yes this seems a good idea to get something sensible out of it. I'll make another run in the begin of December.
DdmrDdmr
Legendary
*
Offline Offline

Activity: 2268
Merit: 10640


There are lies, damned lies and statistics. MTwain


View Profile WWW
November 12, 2018, 06:55:31 PM
 #9

<...>Yes this seems a good idea to get something sensible out of it. I'll make another run in the begin of December.
Ok, thanks, looking forward to it. With the current dataset, I’ve seen a couple of things that may be worth summing up and posting in the coming days. I just have to be able to get enough free time for it.

Note: Got the raw files into a table with the same cardinality as your original full user table, so the cleansing process I’ve applied is the same. I’ve also contrasted it to @mazdafunsun’s topics, and, as expected, the distribution by rank is very much aligned.


..JAMBLER.io..Create Your Bitcoin Mixing
Business Now for   F R E E 
▄█████████████████████████████
█████████████████████████
████▀████████████████████
███▀█████▄█▀███▀▀▀██████
██▀█████▄█▄██████████████
██▄▄████▀▄▄▄▀▀▀▀▀▄▄██████
█████▄▄▄██████████▀▄████
█████▀▄█▄██████▀█▄█████
███████▀▄█▀█▄██▀█▄███████
█████████▄█▀▄█▀▄█████████
█████████████████████████
█████████████████████████
▀█████████████████████████████
█████████████████████████████████████████████████
.
      OUR      
PARTNERS

.
█████████████████████████████████████████████████
████▄
██
██
██
██
██
██
██
██
██
██
██
████▀
▄█████████████████████████████
████████▀▀█████▀▀████████
█████▀█████████████▀█████
████████████████████████
███████████████▄█████████
█████████████████████████
█████████████████████████
█████████████████████████
███████████████▀█████████
████████████████████████
█████▄█████████████▄█████
████████▄▄█████▄▄████████
▀█████████████████████████████
█████████████████████████████████████████████████
.
   INVEST   
BITCOIN

.
█████████████████████████████████████████████████
████▄
██
██
██
██
██
██
██
██
██
██
██
████▀
cryptovigi
Hero Member
*****
Offline Offline

Activity: 714
Merit: 611



View Profile
November 15, 2018, 02:13:49 PM
Last edit: November 15, 2018, 08:20:28 PM by cryptovigi
 #10

Great dataset!!!

It would be great to have similar one from the past for example the 24th January 2018 - many interesting comparisons could be made than...
but even without it it's a huge material for research... thanks for sharing

One important thing: I understand that this dataset was prepared and shared for statistical and research purposes, this case you should consider deleting some personal data that are not necessary for these purposes such as e-mail addresses, messengers or wallets. Although they are publicly available they can be also used in an undesirable manner, so maybe better not give away full sets of them...

mazdafunsun
Full Member
***
Offline Offline

Activity: 490
Merit: 123



View Profile
November 16, 2018, 07:06:31 PM
 #11

Nice job, somthing I was thinking of doing but never got around to do .

P.D. Great scraping speed there (I figure at least 5 processes running 24/7)

Last time i used 10+ processes and it included the time of last post which slows down the process.

LoyceV
Legendary
*
Offline Offline

Activity: 3262
Merit: 16316


Thick-Skinned Gang Leader and Golden Feather 2021


View Profile WWW
November 17, 2018, 08:41:57 AM
 #12

One important thing: I understand that this dataset was prepared and shared for statistical and research purposes, this case you should consider deleting some personal data that are not necessary for these purposes such as e-mail addresses, messengers or wallets. Although they are publicly available they can be also used in an undesirable manner, so maybe better not give away full sets of them...
I've thought about this too, but it's the user's choice to make their data public. Some even use YOPmail, but only a few of them have ever posted. Those accounts are just waiting to be compromised.

█▀▀▀











█▄▄▄
▀▀▀▀▀▀▀▀▀▀▀
e
▄▄▄▄▄▄▄▄▄▄▄
█████████████
████████████▄███
██▐███████▄█████▀
█████████▄████▀
███▐████▄███▀
████▐██████▀
█████▀█████
███████████▄
████████████▄
██▄█████▀█████▄
▄█████████▀█████▀
███████████▀██▀
████▀█████████
▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀
c.h.
▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄
▀▀▀█











▄▄▄█
▄██████▄▄▄
█████████████▄▄
███████████████
███████████████
███████████████
███████████████
███░░█████████
███▌▐█████████
█████████████
███████████▀
██████████▀
████████▀
▀██▀▀
Piggy (OP)
Hero Member
*****
Offline Offline

Activity: 784
Merit: 1416



View Profile WWW
December 05, 2018, 01:53:34 PM
 #13

I have started a new scraping run, if there are no problems within a few days the new data should be available
DdmrDdmr
Legendary
*
Offline Offline

Activity: 2268
Merit: 10640


There are lies, damned lies and statistics. MTwain


View Profile WWW
December 05, 2018, 03:28:49 PM
 #14

<...>
Thankyou @Piggy. It will be interesting to take a look at from a comparison point of view with the previous dataset. If you can upload the raw files upon completion (like last time) all the better. I’m not sure how speedy I can get on to it, but I definitively want to see what insights are derivable.

..JAMBLER.io..Create Your Bitcoin Mixing
Business Now for   F R E E 
▄█████████████████████████████
█████████████████████████
████▀████████████████████
███▀█████▄█▀███▀▀▀██████
██▀█████▄█▄██████████████
██▄▄████▀▄▄▄▀▀▀▀▀▄▄██████
█████▄▄▄██████████▀▄████
█████▀▄█▄██████▀█▄█████
███████▀▄█▀█▄██▀█▄███████
█████████▄█▀▄█▀▄█████████
█████████████████████████
█████████████████████████
▀█████████████████████████████
█████████████████████████████████████████████████
.
      OUR      
PARTNERS

.
█████████████████████████████████████████████████
████▄
██
██
██
██
██
██
██
██
██
██
██
████▀
▄█████████████████████████████
████████▀▀█████▀▀████████
█████▀█████████████▀█████
████████████████████████
███████████████▄█████████
█████████████████████████
█████████████████████████
█████████████████████████
███████████████▀█████████
████████████████████████
█████▄█████████████▄█████
████████▄▄█████▄▄████████
▀█████████████████████████████
█████████████████████████████████████████████████
.
   INVEST   
BITCOIN

.
█████████████████████████████████████████████████
████▄
██
██
██
██
██
██
██
██
██
██
██
████▀
Piggy (OP)
Hero Member
*****
Offline Offline

Activity: 784
Merit: 1416



View Profile WWW
December 07, 2018, 09:10:06 AM
Merited by DdmrDdmr (3), babo (1)
 #15

<...>
Thankyou @Piggy. It will be interesting to take a look at from a comparison point of view with the previous dataset. If you can upload the raw files upon completion (like last time) all the better. I’m not sure how speedy I can get on to it, but I definitively want to see what insights are derivable.


Scraping has finished and the data can be found in here: https://drive.google.com/open?id=1mGEk6V3c_D-IhSYbuJvPrGGVEWLb0V8L

There is both raw data and the SQL Lite DB. There are now 2.481.270 users in it and about 44.206 new users since the last run.
DdmrDdmr
Legendary
*
Offline Offline

Activity: 2268
Merit: 10640


There are lies, damned lies and statistics. MTwain


View Profile WWW
December 07, 2018, 10:56:46 AM
Last edit: December 07, 2018, 02:14:05 PM by DdmrDdmr
 #16

<…>
Ok, thanks @Piggy. I managed to download the raw files, but I’m having some trouble importing them into my environment. The issue is on my side, so I just have to keep on at it until I resolve it (import field displacement, but since I’m doing it manually, I may have changed something compared to last time). Anyway, I’ll let you know when I resolve it (I need to find some linear time and it’s week-end now plus I’m getting data for the Dashboard).

Edit: Ok got the import working and ended with the same 2.481.270 records as you indicated in the updatated OP.

Note: Really fast data retrieval on your behalf! (under three days).

Edit 2: I think I found 651 profiles that were in the first extract, but that do not form part of the second extract. I've checked a few cases against the raw files and did not find them there (i.e. 553457   biteditor; 406094 0099ff; 715282 Pleasersvxuq; 812886 Sumprnma). The issue is barely noticeable from a statistical point of view, since most are Brand New forum members anyway.

..JAMBLER.io..Create Your Bitcoin Mixing
Business Now for   F R E E 
▄█████████████████████████████
█████████████████████████
████▀████████████████████
███▀█████▄█▀███▀▀▀██████
██▀█████▄█▄██████████████
██▄▄████▀▄▄▄▀▀▀▀▀▄▄██████
█████▄▄▄██████████▀▄████
█████▀▄█▄██████▀█▄█████
███████▀▄█▀█▄██▀█▄███████
█████████▄█▀▄█▀▄█████████
█████████████████████████
█████████████████████████
▀█████████████████████████████
█████████████████████████████████████████████████
.
      OUR      
PARTNERS

.
█████████████████████████████████████████████████
████▄
██
██
██
██
██
██
██
██
██
██
██
████▀
▄█████████████████████████████
████████▀▀█████▀▀████████
█████▀█████████████▀█████
████████████████████████
███████████████▄█████████
█████████████████████████
█████████████████████████
█████████████████████████
███████████████▀█████████
████████████████████████
█████▄█████████████▄█████
████████▄▄█████▄▄████████
▀█████████████████████████████
█████████████████████████████████████████████████
.
   INVEST   
BITCOIN

.
█████████████████████████████████████████████████
████▄
██
██
██
██
██
██
██
██
██
██
██
████▀
Coin-1
Legendary
*
Offline Offline

Activity: 2408
Merit: 2160



View Profile
February 12, 2019, 04:21:06 AM
 #17

Thanks for sharing the standalone database you scraped.

I used your file "btctalk_full.db" to create the full list of red tagged members.

I simply executed the following SQL query:
Code:
select UserId from UserData where SUBSTR(Trust, -1, 1) = "!"

sqlite3 has listed about 15000 users. It was easy. Smiley
Piggy (OP)
Hero Member
*****
Offline Offline

Activity: 784
Merit: 1416



View Profile WWW
February 12, 2019, 10:14:46 AM
 #18

Thanks for sharing the standalone database you scraped.

I used your file "btctalk_full.db" to create the full list of red tagged members.

I simply executed the following SQL query:
Code:
select UserId from UserData where SUBSTR(Trust, -1, 1) = "!"

sqlite3 has listed about 15000 users. It was easy. Smiley

Keep in mind the list can be slightly different now, since the recent DT changes. That is a snapshot made in Dicember.
DdmrDdmr
Legendary
*
Offline Offline

Activity: 2268
Merit: 10640


There are lies, damned lies and statistics. MTwain


View Profile WWW
February 13, 2019, 10:01:02 AM
 #19

<...>
I took a look at that with @Piggy’s prior full BD profile scrape (November 2018), and encountered at the time 14.969 negative trusted profiles. I broke it up by rank and a couple of other things at the time (see Analysis - DT Depth 2 - Profile Distribution).

If I’m correct, the scraped profiles show Trust with a DT depth 2 view (which will be different from the view of those that have a Custom Trust list).

..JAMBLER.io..Create Your Bitcoin Mixing
Business Now for   F R E E 
▄█████████████████████████████
█████████████████████████
████▀████████████████████
███▀█████▄█▀███▀▀▀██████
██▀█████▄█▄██████████████
██▄▄████▀▄▄▄▀▀▀▀▀▄▄██████
█████▄▄▄██████████▀▄████
█████▀▄█▄██████▀█▄█████
███████▀▄█▀█▄██▀█▄███████
█████████▄█▀▄█▀▄█████████
█████████████████████████
█████████████████████████
▀█████████████████████████████
█████████████████████████████████████████████████
.
      OUR      
PARTNERS

.
█████████████████████████████████████████████████
████▄
██
██
██
██
██
██
██
██
██
██
██
████▀
▄█████████████████████████████
████████▀▀█████▀▀████████
█████▀█████████████▀█████
████████████████████████
███████████████▄█████████
█████████████████████████
█████████████████████████
█████████████████████████
███████████████▀█████████
████████████████████████
█████▄█████████████▄█████
████████▄▄█████▄▄████████
▀█████████████████████████████
█████████████████████████████████████████████████
.
   INVEST   
BITCOIN

.
█████████████████████████████████████████████████
████▄
██
██
██
██
██
██
██
██
██
██
██
████▀
Coin-1
Legendary
*
Offline Offline

Activity: 2408
Merit: 2160



View Profile
February 16, 2019, 01:50:18 AM
 #20

Thanks for sharing the standalone database you scraped.

I used your file "btctalk_full.db" to create the full list of red tagged members.

I simply executed the following SQL query:
Code:
select UserId from UserData where SUBSTR(Trust, -1, 1) = "!"

sqlite3 has listed about 15000 users. It was easy. Smiley

Keep in mind the list can be slightly different now, since the recent DT changes. That is a snapshot made in Dicember.

Yes, I understand it. The trust ratings of some users have really changed. I created two more lists of red tagged members which are called "_included" and "_excluded". I manually manage these lists. Undecided

Will you again scrape data in the future?



If I’m correct, the scraped profiles show Trust with a DT depth 2 view (which will be different from the view of those that have a Custom Trust list).

I guess that Piggy uses his wonderful @mention notification bot to scrape data. This auxiliary account probably only has DefaultTrust in its list of trusted users.
Pages: [1] 2 »  All
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!