suchmoon
Legendary
Offline
Activity: 3738
Merit: 8999
https://bpip.org
|
![](https://bitcointalk.org/Themes/custom1/images/post/xx.gif) |
September 12, 2020, 01:12:42 AM |
|
In this case, you can send me the boards right away so I can figure out how to do that. ![Cheesy](https://bitcointalk.org/Smileys/default/cheesy.gif) Thanks! Here you go: https://bpip.org/boards_202009112042.zip (~100MB compressed, ~600MB uncompresed) You may be able to get board details from here: https://bitcointalk.org/index.php?action=searchIt's a bit messy but it has all boards listed on one page. Otherwise you'd have to recursively scrape multiple pages starting from the front page. Another option - if you are currently scraping recent posts including the full path (with board names and IDs) then you can extract the hierarchy from that data. It might not be complete though. Some boards that are rarely posted in might need to be added manually.
|
|
|
|
TryNinja (OP)
Legendary
Offline
Activity: 2898
Merit: 7298
Top Crypto Casino
|
![](https://bitcointalk.org/Themes/custom1/images/post/xx.gif) |
September 12, 2020, 03:40:13 AM |
|
~
Thank you. I think I got it. I went to https://bitcointalk.org/sitemap.php?t=b, grabbed every board url, scraped each one of them and linked them to their closest parent. There are only 248 boards, so it was pretty quick. "board_id","name","parent_id" 1, Bitcoin Discussion, 4, Bitcoin Technical Support, 14, Mining, 40, Mining support, 14 42, Mining software (miners), 14 If you want it: https://pastebin.com/raw/xhudKFZ8
|
|
|
|
PrimeNumber7
Copper Member
Legendary
Offline
Activity: 1638
Merit: 1899
Amazon Prime Member #7
|
![](https://bitcointalk.org/Themes/custom1/images/post/xx.gif) |
September 12, 2020, 06:02:40 AM |
|
If you are looking for every post, you can do this: total_posts = 5273824 # for x in range(total_posts): page = 0 #go to 'bitcointalk.org/index.php?topic={}.{}'.format(x, page) #if not available to you: pass #scrape board information #scrape each post via loop #I believe there are two classes of posts - scrape both classes, you will insert posts into your DB out of order, but this is okay page += 20 #there is a middletext td class #there is a prevnext span class next_page = bitcointalk.org/index.php?topic={}.{}'.format(x, page) #sleep for 1 second #if you can find a link equal to next_page, goto that page, else pass
in parallel to the above, and starting at the same time the above starts: scrape the recent posts page, and add each post to your DB. Here you can scrape the board each thread is on, via adding it if it doesn't exist in your DB, and updating it if it doesn't exist.
The above will capture every post that you have access to. The first loop will take quite some time, and a thread being moved to a different board while you are in the process of scraping all posts will not cause you to miss any posts.
|
|
|
|
FatFork
Legendary
Offline
Activity: 1666
Merit: 2631
Top Crypto Casino
|
![](https://bitcointalk.org/Themes/custom1/images/post/xx.gif) |
September 12, 2020, 07:48:53 AM |
|
If you are looking for every post, you can do this: [...]
How is this relevant for this thread? Or did I miss something?
|
|
|
|
Vod
Legendary
Offline
Activity: 3766
Merit: 3103
Licking my boob since 1970
|
![](https://bitcointalk.org/Themes/custom1/images/post/xx.gif) |
September 12, 2020, 05:44:06 PM |
|
There is a much better way to scrape recent posts without hitting that page every few seconds. I have a feeling Theymos may have to eventually restrict parsing if everyone keeps doing it, so I'm putting an alternative up on github. ![Smiley](https://bitcointalk.org/Smileys/default/smiley.gif)
|
|
|
|
suchmoon
Legendary
Offline
Activity: 3738
Merit: 8999
https://bpip.org
|
![](https://bitcointalk.org/Themes/custom1/images/post/xx.gif) |
September 13, 2020, 01:06:42 PM |
|
Here are the timestamps: https://bpip.org/timestamps_202009130840.zip (~160MB compressed, ~1.3GB uncompressed) As I mentioned earlier, it's in UTC, 24h format. Here's a sample: 28,2009-11-22 18:04:28 29,2009-11-22 18:31:44 30,2009-11-22 18:32:00 31,2009-11-22 18:34:21 32,2009-11-22 18:35:15 33,2009-11-25 18:15:57 34,2009-11-25 18:17:23 36,2009-11-27 17:17:22 37,2009-11-27 17:27:09 38,2009-11-27 22:48:39 40,2009-12-09 05:34:46 41,2009-12-09 18:45:10 42,2009-12-09 19:25:31 43,2009-12-10 13:13:51 44,2009-12-10 14:00:17 45,2009-12-10 19:31:49 46,2009-12-10 20:49:02 47,2009-12-11 04:59:19 48,2009-12-11 17:20:29 49,2009-12-11 17:58:57 50,2009-12-11 19:27:55 51,2009-12-12 06:34:21 52,2009-12-12 13:08:17 53,2009-12-12 14:11:37 54,2009-12-12 17:52:44 55,2009-12-12 18:17:10 56,2009-12-12 18:23:59 57,2009-12-12 18:47:45 58,2009-12-12 20:46:14 59,2009-12-13 06:44:04 60,2009-12-13 06:46:30 61,2009-12-13 06:50:05 62,2009-12-13 16:51:25 63,2009-12-14 09:29:44 64,2009-12-14 13:09:48 65,2009-12-14 14:46:37 66,2009-12-14 15:01:39 67,2009-12-14 17:15:56 68,2009-12-15 05:21:09 69,2009-12-15 05:30:53 70,2009-12-15 20:37:32 71,2009-12-15 21:14:13 72,2009-12-16 15:49:23 73,2009-12-16 22:45:36 74,2009-12-17 11:36:36 75,2009-12-17 13:21:49 76,2009-12-17 13:23:43 77,2009-12-17 18:38:06 78,2009-12-18 15:11:53 79,2009-12-18 17:37:48 81,2009-12-30 01:40:50 82,2009-12-30 15:28:04 83,2010-01-01 18:09:58 84,2010-01-05 01:20:06 85,2010-01-05 20:00:46 86,2010-01-07 06:14:17 87,2010-01-11 16:13:20 88,2010-01-12 19:31:22 90,2010-01-13 04:13:37 91,2010-01-13 06:24:54 92,2010-01-13 07:45:57 93,2010-01-13 08:22:56 94,2010-01-13 17:12:16 95,2010-01-13 17:44:57 96,2010-01-13 19:08:19 97,2010-01-14 20:17:20 100,2010-01-15 09:42:18 101,2010-01-16 10:39:25 102,2010-01-16 14:27:15 103,2010-01-16 23:16:56 104,2010-01-16 23:22:55 105,2010-01-16 23:44:35 106,2010-01-16 23:45:22 107,2010-01-17 10:34:44 108,2010-01-17 22:55:31 109,2010-01-18 04:49:43 110,2010-01-18 11:06:35 111,2010-01-19 08:06:15 112,2010-01-20 20:07:15 113,2010-01-20 22:05:28 114,2010-01-21 17:30:39 115,2010-01-22 10:37:09 116,2010-01-24 02:48:37 117,2010-01-24 08:27:13 118,2010-01-24 09:52:48 119,2010-01-24 20:49:47 120,2010-01-24 20:52:59 121,2010-01-24 20:53:36 122,2010-01-24 22:48:30 123,2010-01-24 23:31:48 124,2010-01-25 02:42:13 125,2010-01-25 03:11:01 126,2010-01-25 04:06:03 127,2010-01-25 05:13:23 128,2010-01-25 05:28:39 129,2010-01-25 07:40:02 130,2010-01-25 16:30:50 131,2010-01-25 17:36:10 132,2010-01-25 19:25:29
|
|
|
|
TryNinja (OP)
Legendary
Offline
Activity: 2898
Merit: 7298
Top Crypto Casino
|
![](https://bitcointalk.org/Themes/custom1/images/post/xx.gif) |
September 16, 2020, 04:09:56 AM Last edit: September 16, 2020, 09:53:52 AM by TryNinja |
|
Finally, the new update! - Board filterYou can now search posts by board. It includes their childrens, so you can select "Wallet software" to search on "Armory", "Electrum", "Hardware Wallet", etc... or just go right to the child board to search only the "Electrum" or "Bounties (Altcoins)" board. Caveat: a very small number of posts has an unknown board. So, if you search for a board, there is a small chance of missing a few posts. - Archive updatedThe archive went from ~42m posts without a title or exact date to just a couple thousands without that data! This means that you can now search by date range without getting a bunch of random posts that shouldn't be there (or missing most of them). - New search backendSearching is now a LOT better. It's faster, more accurate, returns more data and shouldn't doesn't crash my server! - User stats pageYou can now visit the new page ( http://ninjastic.space/user/TryNinja) to check some data about an user: most active boards, graph of posts made in the last 7 and 30 days and his known addresses! New data will come later (with your suggestions). - Find addresses by authorWhat about finding every known addresses an user has posted (BTC and ETH only) by searching for their username or checking their user page? Now you can! - Deleted post in the post edit historyThe bot will detect if a post was deleted less than 5 minutes after it was made/scraped and will mark it as so in the "post edit history" card. Tip: CTRL + F5 in the website if you don't see the changes. Tell me what do you think about it. ![Smiley](https://bitcointalk.org/Smileys/default/smiley.gif)
|
|
|
|
FatFork
Legendary
Offline
Activity: 1666
Merit: 2631
Top Crypto Casino
|
![](https://bitcointalk.org/Themes/custom1/images/post/xx.gif) |
September 16, 2020, 12:40:46 PM |
|
- Find addresses by author What about finding every known addresses an user has posted (BTC and ETH only) by searching for their username or checking their user page? Now you can!
This is a pretty nice update! I haven't explored everything yet. Some suggestions for finding addresses by author: - When listing ETH addresses, it also displays ETH transactions (TXIDs), perhaps the results should be filtered by string length - You should ignore the results that are within the quote tags (if possible) - Address grouping should not be case sensitive Otherwise, great job!
|
|
|
|
bitmover
Legendary
Offline
Activity: 2366
Merit: 6137
bitcoindata.science
|
![](https://bitcointalk.org/Themes/custom1/images/post/xx.gif) |
September 16, 2020, 02:23:04 PM |
|
- Find addresses by authorWhat about finding every known addresses an user has posted (BTC and ETH only) by searching for their username or checking their user page? Now you can! ![Smiley](https://bitcointalk.org/Smileys/default/smiley.gif) Beautiful update. I liked the design. The field "Known Addresses" is a very inaccurate name imo. I saw about 10 addresses in my profile, and none of them were mine. MOst of them quotes or addresses that were being discussed. Maybe you could use a different name, such as "possible addresses" or "mentioned addresses"
|
█████████████████████████ ████████▀▀████▀▀█▀▀██████ █████▀████▄▄▄▄██████▀████ ███▀███▄████████▄████▀███ ██▀███████████████████▀██ █████████████████████████ █████████████████████████ █████████████████████████ ██▄███████████████▀▀▄▄███ ███▄███▀████████▀███▄████ █████▄████▀▀▀▀████▄██████ ████████▄▄████▄▄█████████ █████████████████████████ | BitList | | █▀▀▀▀ █ █ █ █ █ █ █ █ █ █ █ █▄▄▄▄ | ▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀ . REAL-TIME DATA TRACKING CURATED BY THE COMMUNITY . ▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄ | ▀▀▀▀█ █ █ █ █ █ █ █ █ █ █ █ ▄▄▄▄█ | | List #kycfree Websites |
|
|
|
TryNinja (OP)
Legendary
Offline
Activity: 2898
Merit: 7298
Top Crypto Casino
|
![](https://bitcointalk.org/Themes/custom1/images/post/xx.gif) |
September 16, 2020, 02:35:44 PM |
|
An option not to include child board of selected board would make it better ![Smiley](https://bitcointalk.org/Smileys/default/smiley.gif) Noted!
Some suggestions for finding addresses by author: - When listing ETH addresses, it also displays ETH transactions (TXIDs), perhaps the results should be filtered by string length
Hmm.. I don't know. I could technically use a blockchain's API for that. Will see. ![Cheesy](https://bitcointalk.org/Smileys/default/cheesy.gif) - You should ignore the results that are within the quote tags (if possible)
It's on my TODO list. - Address grouping should not be case sensitive
I didn't think of that. But I'll leave it that way for now. At least BTC addresses are case sensitive, so it shouldn't be an issue.
The field "Known Addresses" is a very inaccurate name imo. I saw about 10 addresses in my profile, and none of them were mine. MOst of them quotes or addresses that were being discussed.
The bot doesn't know that. "known" doesn't necessarily mean you own it. It's just the ones the bot has found in any of your (scraped) posts. There could be others, so those are the ones he knows about. ![Cheesy](https://bitcointalk.org/Smileys/default/cheesy.gif) I can change it to "Mentioned Addresses" to make that clear.
|
|
|
|
bitmover
Legendary
Offline
Activity: 2366
Merit: 6137
bitcoindata.science
|
![](https://bitcointalk.org/Themes/custom1/images/post/xx.gif) |
September 16, 2020, 03:00:28 PM |
|
The bot doesn't know that. "known" doesn't necessarily mean you own it. It's just the ones the bot has found in any of your (scraped) posts. There could be others, so those are the ones he knows about. ![Cheesy](https://bitcointalk.org/Smileys/default/cheesy.gif) I can change it to "Mentioned Addresses" to make that clear. But the definition of "known" is something which is generally recognized. In this case addresses which are generally recognized as mine. Your bot is not recognizing those are mine, but as "mentioned". Mentioned Addresses is a lot better. Definition of known : generally recognized https://www.merriam-webster.com/dictionary/known
|
█████████████████████████ ████████▀▀████▀▀█▀▀██████ █████▀████▄▄▄▄██████▀████ ███▀███▄████████▄████▀███ ██▀███████████████████▀██ █████████████████████████ █████████████████████████ █████████████████████████ ██▄███████████████▀▀▄▄███ ███▄███▀████████▀███▄████ █████▄████▀▀▀▀████▄██████ ████████▄▄████▄▄█████████ █████████████████████████ | BitList | | █▀▀▀▀ █ █ █ █ █ █ █ █ █ █ █ █▄▄▄▄ | ▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀ . REAL-TIME DATA TRACKING CURATED BY THE COMMUNITY . ▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄ | ▀▀▀▀█ █ █ █ █ █ █ █ █ █ █ █ ▄▄▄▄█ | | List #kycfree Websites |
|
|
|
PrimeNumber7
Copper Member
Legendary
Offline
Activity: 1638
Merit: 1899
Amazon Prime Member #7
|
![](https://bitcointalk.org/Themes/custom1/images/post/xx.gif) |
September 16, 2020, 03:31:43 PM |
|
- Find addresses by authorWhat about finding every known addresses an user has posted (BTC and ETH only) by searching for their username or checking their user page? Now you can! ![Smiley](https://bitcointalk.org/Smileys/default/smiley.gif) Beautiful update. I liked the design. The field "Known Addresses" is a very inaccurate name imo. I saw about 10 addresses in my profile, and none of them were mine. MOst of them quotes or addresses that were being discussed. Maybe you could use a different name, such as "possible addresses" or "mentioned addresses" I would argue that the majority of the time, if you post an address in a quote, it won’t belong to you. I would suggest that any address posted inside a quote be excluded from being displayed if it does not also appear outside of a quote. To say this another way, the person must post the address outside of a quote for it to count to being in this list.
|
|
|
|
TryNinja (OP)
Legendary
Offline
Activity: 2898
Merit: 7298
Top Crypto Casino
|
I would argue that the majority of the time, if you post an address in a quote, it won’t belong to you. I would suggest that any address posted inside a quote be excluded from being displayed if it does not also appear outside of a quote. To say this another way, the person must post the address outside of a quote for it to count to being in this list.
It does exclude addresses insides quotes. The only addresses that show up there are the ones you mentioned in the body of your post.
|
|
|
|
FatFork
Legendary
Offline
Activity: 1666
Merit: 2631
Top Crypto Casino
|
![](https://bitcointalk.org/Themes/custom1/images/post/xx.gif) |
September 16, 2020, 06:24:28 PM |
|
I would argue that the majority of the time, if you post an address in a quote, it won’t belong to you. I would suggest that any address posted inside a quote be excluded from being displayed if it does not also appear outside of a quote. To say this another way, the person must post the address outside of a quote for it to count to being in this list.
It does exclude addresses insides quotes. The only addresses that show up there are the ones you mentioned in the body of your post. It's much better now. But, I have another suggestion. ![Wink](https://bitcointalk.org/Smileys/default/wink.gif) How about excluding certain boards, for example: Scam Accusations and Reputation? That way it won't connect us to the scammer addresses we reported.
|
|
|
|
SiNeReiNZzz
Legendary
Offline
Activity: 1022
Merit: 1043
αLPʜα αɴd ΩMeGa
|
![](https://bitcointalk.org/Themes/custom1/images/post/xx.gif) |
September 17, 2020, 10:34:11 AM |
|
Finely made, nice! Really a good idea and it makes "patrols" much easier. You can include almost all parameters in your investigations. Most important for me are links to Twitter or Facebook accounts and payout addresses... All this is available! The website is clearly arranged and above all really helpful...
Thank you and best Regards SiNeReiNZzz
|
|
|
|
TryNinja (OP)
Legendary
Offline
Activity: 2898
Merit: 7298
Top Crypto Casino
|
![](https://bitcointalk.org/Themes/custom1/images/post/xx.gif) |
September 17, 2020, 11:34:33 AM |
|
How about excluding certain boards, for example: Scam Accusations and Reputation? That way it won't connect us to the scammer addresses we reported.
I prefer to keep them. People can read the posts where you mentioned it and see if you posted the address as your own or if you are just talking about it. Also, I can see people searching for an address and wanting to know if it was previously mentioned in a Scam Accusation, the Known Alts topic, etc... the more cases the better. You can always exclude the false-positives, but can't easily include the ones that didn't show. Maybe what I can do is put a mark next to the posts in those boards so you can distinguish them.
Added: - Graph of posts in the last 24 hours. - Edited post difference (HTML). E.g: https://ninjastic.space/post/55206900 (click "check diff")
|
|
|
|
Stalker22
Legendary
Offline
Activity: 1568
Merit: 1376
|
![](https://bitcointalk.org/Themes/custom1/images/post/xx.gif) |
September 17, 2020, 12:09:53 PM |
|
I agree with TryNinja. It is better to show all the posts where the address appears than to miss important information. Maybe, in the future, when reporting ban evaders and bounty abusers, we can put their related addresses in quotes and that way they won't show up in the search? Maybe what I can do is put a mark next to the posts in those boards so you can distinguish them.
This is actually a very good idea.
|
|
|
|
LoyceV
Legendary
Offline
Activity: 3374
Merit: 17062
Thick-Skinned Gang Leader and Golden Feather 2021
|
![](https://bitcointalk.org/Themes/custom1/images/post/xx.gif) |
September 29, 2020, 06:43:30 PM |
|
Can you increase maximum string size in the Search ("Content") field? I now can't search for this: <a href="https://bitcointalk.org/index.php?topic=577765
|
|
|
|
TryNinja (OP)
Legendary
Offline
Activity: 2898
Merit: 7298
Top Crypto Casino
|
![](https://bitcointalk.org/Themes/custom1/images/post/xx.gif) |
September 29, 2020, 07:18:04 PM Last edit: September 29, 2020, 08:06:29 PM by TryNinja |
|
Can you increase maximum string size in the Search ("Content") field?
Done. The limit was there because the old search engine worked very poorly with big strings. Also added a "total results" count (without having to scroll down until there are no results), since I guess that's the info you want. But I don't think you need the html tag. IIRC, the search index stripes them, so you can just search for the full URL.
|
|
|
|
LoyceV
Legendary
Offline
Activity: 3374
Merit: 17062
Thick-Skinned Gang Leader and Golden Feather 2021
|
![](https://bitcointalk.org/Themes/custom1/images/post/xx.gif) |
September 30, 2020, 08:28:41 AM |
|
I'm taking this quote from another topic here: I know my database is very incomplete and many posts from this year are missing I thought you had the more recent posts already, but if you want I can quite easily get you a compressed version of my posts archive.
|
|
|
|
|