Bitcoin Forum
July 21, 2024, 04:16:38 PM *
News: Latest Bitcoin Core release: 27.0 [Torrent]
 
   Home   Help Search Login Register More  
Pages: [1] 2 3 »  All
  Print  
Author Topic: LoyceV's Topic Details: highlight deleted and edited posts (forum wide)  (Read 2243 times)
LoyceV (OP)
Legendary
*
Offline Offline

Activity: 3374
Merit: 17054


Thick-Skinned Gang Leader and Golden Feather 2021


View Profile WWW
June 20, 2020, 07:42:25 AM
Last edit: May 29, 2021, 08:32:47 AM by LoyceV
Merited by dbshck (8), Welsh (6), fillippone (6), OmegaStarScream (3), Vod (2), DdmrDdmr (2), o_e_l_e_o (2), Rikafip (2), vapourminer (1), Daniel91 (1), BitMaxz (1), SFR10 (1), hosseinimr93 (1), TheBeardedBaby (1), cryptoaddictchie (1), GazetaBitcoin (1), dragonvslinux (1), Poker Player (1), Rizzrack (1)
 #1

Archiving a thread is now as easy as posting the right link anywhere on the forum (and waiting a bit)!

Short version:
Get a topicID you want to see, for instance 5145594.
Insert the topicID into the following link and post it on any public board on Bitcointalk: https://loyce.club/archive/details/topic_5145594.html
Wait a bit, then click the link!

Full version
Almost a year ago, I opened 35M posts! View unedited/deleted posts (search per post, per user or per topic). This has a lot of data (currently around 60 GB), but it's painstaking to manually find exactly which posts in a topic are edited or deleted.
I started archiving posts in July 2019. Especially at the beginning I missed some posts due to down time, and even now I occasionally miss some posts due to connection problems.
Since February 2020, I'm scraping and archiving all older posts too (this will take a couple more months to complete) Update: this was completed in August 2020.

What it does
I've created an on-demand service to get details from any topic for:
  • All posts that I didn't archive yet*
  • All posts that have been edited*
  • All posts that have been deleted
  • All posts that received Merit (not implemented yet)
*I create a new archive of every edited or unarchived post.

The Topic Details
If a new update for the same topic is requested, I'll include a list of all previous Topic Details.
You'll have to make a new post to be detected by my scraper. Editing an existing post won't get detected.
Please don't quote the archive link, it'll trigger another update.

Sample output explained
Image loading...
Post 50988796 links to the post on Bitcointalk (even if the post itself has been deleted).
Post 235 is older than my archive. I don't have an unedited backup, so I created a new backup.
Post 236 is Deleted! I have an unedited backup, no need for a new backup.
Post 242 was Edited! I have an unedited backup, and created a new backup of the current post.
Post 246 is Unedited! No need for a new backup.
Post 251 doesn't have an unedited backup (which means my scraper was offline at the moment), so I created a new backup.
The (link) at the end of each line points at that specific row in my list.

Image loading...
If I have no unedited backup, I check if I made a later backup. This backup can't tell if the post was edited in the first months (or even years), so I don't mark the post as Edited! or Unedited!. However, if the post was edited after I created the backup, I make a new backup.
If a post was removed before I tried to archive it, I (obviously) can't list it.

Limitations
  • Only one request per post. If you post more than one request, only the first one is processed.
  • I allow 5 tasks at once! If my scraper is busy (see status.txt), you'll have to wait a bit and post a new request (in a new post). It's okay to delete or edit the post afterwards.
  • Topics in Investigation are ignored.
  • Creating an overview takes about 10 seconds per page (to limit load on the forum). However, if several tasks are running simultaneously, it slows down other tasks. It might also take a bit longer for topics with many deleted posts.
  • Every quote has "Today" in my archived post, while the actual post now shows the date. I ignore this when comparing the current post and my archive. A few seconds around the end of each day, this can lead to a post accidentally being marked as edited.
  • My initial plan was to make this for a user's post history too, but it was much more work than anticipated, so I skipped that.
  • This service is currently limited to scraping the first 250 pages of a topic. If that's not enough, or you want for instance pages 400-500, feel free to post your request.
    If there are more than 250 pages, all archived posts will be marked as "Deleted" (example). I know this isn't ideal, but I'll let it be for now.
  • Scrape time is Amsterdam time, but the time mentioned in scraped quotes is forum time.

Test it!
Please try it, and let me know if it works as expected.

Bugs
Please post! This is far from finished.

Intended use
I'm hoping this can be useful to expose certain scammers. Please don't turn this into a(nother) witch hunt.

Be nice
Don't try to abuse this. I don't want to make a blacklist, but I will if I have to.

Todo
  • Fix "Today" for today's posts Done!
  • Image tags seem to change within the HTML code over time, so unedited posts with images might be marked as edited. I'm not sure yet how to tackle this.
  • Show Merit per post.
  • Fix the missing username Done!
  • highlight banned users
  • Add "older posts" to my topic-lists (once I'm done, so I can catch it if those have been deleted (after I scraped them, of course). See this post. Such deleted posts (for instance in staked addresses are currently overlooked
  • Also list deleted "older" posts. I only found out now (May 29, 2021) that this doesn't work.

Rikafip
Legendary
*
Offline Offline

Activity: 1820
Merit: 6198



View Profile WWW
June 21, 2020, 08:15:08 AM
Last edit: June 21, 2020, 08:41:07 AM by Rikafip
 #2

Bump: I need testers Smiley

55 second early bump, that's how much I need testers
Hm, I don't know what I am doing wrong, but this thing doesn't work for me, all I get is 404 error message.

I tried inserting topicID 5256136 ([ANN] DSF - The SoFi Blockchain - Redefine Social Network with Blockchain, but nothing. Tried waiting for a few minutes, as you said to wait a bit, but that didn't help either.

This is the link
http://loyce.club/archive/details/topic_5256136.html

edit: it works now Smiley
So I guess I had to wait for it to get processed/updated, as you said. I like this new feature a lot, it makes it so much easier to find deleted/edited posts.

██
██
██
██
██
██
██
██
██
██
██
██
██
... LIVECASINO.io    Play Live Games with up to 20% cashback!...██
██
██
██
██
██
██
██
██
██
██
██
██
acroman08
Legendary
*
Online Online

Activity: 2394
Merit: 1096



View Profile
June 21, 2020, 08:38:31 AM
Last edit: July 13, 2020, 10:42:44 AM by acroman08
 #3

I'm not sure if  I am doing this correctly. will delete the post later if I made a mistake but if not, I'll update if it's working.

testing: http://loyce.club/archive/details/topic_5257024.html

the link works! thanks! great job as always!

edit: it works now Smiley
So I guess I had to wait for it to get processed/updated, as you said. I like this new feature a lot, it makes it so much easier to find deleted/edited posts.
true

▄▄███████▄▄
▄██████████████▄
▄██████████████████▄
▄████▀▀▀▀███▀▀▀▀█████▄
▄█████████████▄█▀████▄
███████████▄███████████
██████████▄█▀███████████
██████████▀████████████
▀█████▄█▀█████████████▀
▀████▄▄▄▄███▄▄▄▄████▀
▀██████████████████▀
▀███████████████▀
▀▀███████▀▀
.
 MΞTAWIN  THE FIRST WEB3 CASINO   
.
.. PLAY NOW ..
SFR10
Legendary
*
Offline Offline

Activity: 3066
Merit: 3481


Crypto Swap Exchange


View Profile WWW
June 21, 2020, 08:39:42 AM
Last edit: June 21, 2020, 10:57:19 AM by SFR10
 #4

Any way to get a "permanent archive" link prior to its current date or up to a specific date only?

Edited!
Can your scraper detect edits within the 10-minute mark?

Bump: I need testers Smiley
Another test for "this topic" Smiley
- Link: http://loyce.club/archive/details/topic_5243791.html
Update: It's working.

Wish I had some sMerits for you.

Update 2:
~Snipped~
Thank you for these Smiley

█▀▀▀











█▄▄▄
▀▀▀▀▀▀▀▀▀▀▀
e
▄▄▄▄▄▄▄▄▄▄▄
█████████████
████████████▄███
██▐███████▄█████▀
█████████▄████▀
███▐████▄███▀
████▐██████▀
█████▀█████
███████████▄
████████████▄
██▄█████▀█████▄
▄█████████▀█████▀
███████████▀██▀
████▀█████████
▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀
c.h.
▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄
▀▀▀█











▄▄▄█
▄██████▄▄▄
█████████████▄▄
███████████████
███████████████
███████████████
███████████████
███░░█████████
███▌▐█████████
█████████████
███████████▀
██████████▀
████████▀
▀██▀▀
LoyceV (OP)
Legendary
*
Offline Offline

Activity: 3374
Merit: 17054


Thick-Skinned Gang Leader and Golden Feather 2021


View Profile WWW
June 21, 2020, 08:44:39 AM
Merited by Rikafip (1)
 #5

Hm, I don't know what I am doing wrong, but this thing doesn't work for me, all I get is 404 error message.
Did you post the link in a new post?
In your post history, I can't find the link anywhere else than in your current post. My scraper didn't miss any recent posts, so I can only conclude you didn't post it.
If you don't post it, my scraper won't create it. I've underlined the "post it" in my OP.

Quote
Tried waiting for a few minutes
If all goes well, after a few seconds there should be a page telling you it's now scraping your topic. Depending on the number of pages that can take a while, but it auto-refreshes.

Quote
So I guess I had to wait for it to get processed/updated
I think all you had to do was posting the link. When you posted about it, it worked Tongue

Rikafip
Legendary
*
Offline Offline

Activity: 1820
Merit: 6198



View Profile WWW
June 21, 2020, 08:51:19 AM
 #6

Did you post the link in a new post?
If you don't post it, my scraper won't create it. I've underlined the "post it" in my OP.
No I didn't, I was just trying it without actually posting, somehow I completely missed that underlined "post it" part, and I posted it when I thought it's not working. And then it worked :p


I think all you had to do was posting the link. When you posted about it, it worked Tongue
Yep, that was the issue. My bad, all good now!

██
██
██
██
██
██
██
██
██
██
██
██
██
... LIVECASINO.io    Play Live Games with up to 20% cashback!...██
██
██
██
██
██
██
██
██
██
██
██
██
LoyceV (OP)
Legendary
*
Offline Offline

Activity: 3374
Merit: 17054


Thick-Skinned Gang Leader and Golden Feather 2021


View Profile WWW
June 21, 2020, 08:52:42 AM
 #7

Any way to get a "permanent archive" link prior to its current date or up to a specific date only?
I'm not sure what you're looking for. If I have an older version available, the link shows up.
If there is no older version, I'll mention that from now on. Example: http://loyce.club/archive/details/topic_5256854.html

Quote
Can your scraper detect edits within the 10-minute mark?
Yes. Your post for instance was scraped after 4 seconds.

No I didn't, I was just trying it without actually posting, somehow I completely missed that underlined "post it" part, and I posted it when I thought it's not working.
I'm not so advanced that I can do cool server-side stuff. I need a post to trigger a request.



The "no unedited backup" shows up for posts that are less than a few minutes old. Even though I archived the post already, my topic list gets updated only once every 5 minutes. This is a shortcoming I can live with Smiley

alani123
Legendary
*
Offline Offline

Activity: 2464
Merit: 1454


Leading Crypto Sports Betting & Casino Platform


View Profile
July 13, 2020, 09:59:53 AM
 #8

What better place to contribute to the testing other than this thread?

I'll try one of my own posts:
https://loyce.club/archive/details/topic_797299.html

Let's see.

..Stake.com..   ▄████████████████████████████████████▄
   ██ ▄▄▄▄▄▄▄▄▄▄            ▄▄▄▄▄▄▄▄▄▄ ██  ▄████▄
   ██ ▀▀▀▀▀▀▀▀▀▀ ██████████ ▀▀▀▀▀▀▀▀▀▀ ██  ██████
   ██ ██████████ ██      ██ ██████████ ██   ▀██▀
   ██ ██      ██ ██████  ██ ██      ██ ██    ██
   ██ ██████  ██ █████  ███ ██████  ██ ████▄ ██
   ██ █████  ███ ████  ████ █████  ███ ████████
   ██ ████  ████ ██████████ ████  ████ ████▀
   ██ ██████████ ▄▄▄▄▄▄▄▄▄▄ ██████████ ██
   ██            ▀▀▀▀▀▀▀▀▀▀            ██ 
   ▀█████████▀ ▄████████████▄ ▀█████████▀
  ▄▄▄▄▄▄▄▄▄▄▄▄███  ██  ██  ███▄▄▄▄▄▄▄▄▄▄▄▄
 ██████████████████████████████████████████
▄▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▄
█  ▄▀▄             █▀▀█▀▄▄
█  █▀█             █  ▐  ▐▌
█       ▄██▄       █  ▌  █
█     ▄██████▄     █  ▌ ▐▌
█    ██████████    █ ▐  █
█   ▐██████████▌   █ ▐ ▐▌
█    ▀▀██████▀▀    █ ▌ █
█     ▄▄▄██▄▄▄     █ ▌▐▌
█                  █▐ █
█                  █▐▐▌
█                  █▐█
▀▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▀█
▄▄█████████▄▄
▄██▀▀▀▀█████▀▀▀▀██▄
▄█▀       ▐█▌       ▀█▄
██         ▐█▌         ██
████▄     ▄█████▄     ▄████
████████▄███████████▄████████
███▀    █████████████    ▀███
██       ███████████       ██
▀█▄       █████████       ▄█▀
▀█▄    ▄██▀▀▀▀▀▀▀██▄  ▄▄▄█▀
▀███████         ███████▀
▀█████▄       ▄█████▀
▀▀▀███▄▄▄███▀▀▀
..PLAY NOW..
LoyceV (OP)
Legendary
*
Offline Offline

Activity: 3374
Merit: 17054


Thick-Skinned Gang Leader and Golden Feather 2021


View Profile WWW
July 13, 2020, 10:06:43 AM
 #9

Unfortunately, since I scraped this thread only this year, I can't highlight older deleted posts.

Timelord2067
Legendary
*
Offline Offline

Activity: 3738
Merit: 2228


💲🏎️💨🚓


View Profile
July 13, 2020, 12:48:38 PM
 #10

Only one of three requests seems to have worked:  https://bitcointalk.org/index.php?topic=2544574.msg54786188#msg54786188

Mk III has 117 pages (and counting) - far more than the Mk II & Mk I combined.  (and all three threads have multiple posts that get reviewed and edited at later dates)

Do more pages get scraped at a later date if a new request is made?

TheBeardedBaby
Legendary
*
Offline Offline

Activity: 2240
Merit: 3150


₿uy / $ell ..oeleo ;(


View Profile
July 13, 2020, 12:53:46 PM
 #11

Let's try it Smiley
https://loyce.club/archive/details/topic_996318.html

Timelord2067
Legendary
*
Offline Offline

Activity: 3738
Merit: 2228


💲🏎️💨🚓


View Profile
July 13, 2020, 12:55:00 PM
 #12


Perhaps yours is in a queue behind my three (two of which might be in a holding pattern at the moment?)

TheBeardedBaby
Legendary
*
Offline Offline

Activity: 2240
Merit: 3150


₿uy / $ell ..oeleo ;(


View Profile
July 13, 2020, 12:58:04 PM
 #13

Mine is quite a huge thread -> Stake your Bitcoin address here.
I hope I haven't killed the server Cheesy (if so, send me the bill in PM...)
It will take quite some time to analyze it myself but it could be worth it.

alani123
Legendary
*
Offline Offline

Activity: 2464
Merit: 1454


Leading Crypto Sports Betting & Casino Platform


View Profile
July 13, 2020, 03:18:01 PM
 #14

Doesn't quoting the link re-trigger the scrapper? You might have killed it twice or thrice for good measure. Cheesy

Just kidding. LoyceV probably had considered the possibility. Wink

..Stake.com..   ▄████████████████████████████████████▄
   ██ ▄▄▄▄▄▄▄▄▄▄            ▄▄▄▄▄▄▄▄▄▄ ██  ▄████▄
   ██ ▀▀▀▀▀▀▀▀▀▀ ██████████ ▀▀▀▀▀▀▀▀▀▀ ██  ██████
   ██ ██████████ ██      ██ ██████████ ██   ▀██▀
   ██ ██      ██ ██████  ██ ██      ██ ██    ██
   ██ ██████  ██ █████  ███ ██████  ██ ████▄ ██
   ██ █████  ███ ████  ████ █████  ███ ████████
   ██ ████  ████ ██████████ ████  ████ ████▀
   ██ ██████████ ▄▄▄▄▄▄▄▄▄▄ ██████████ ██
   ██            ▀▀▀▀▀▀▀▀▀▀            ██ 
   ▀█████████▀ ▄████████████▄ ▀█████████▀
  ▄▄▄▄▄▄▄▄▄▄▄▄███  ██  ██  ███▄▄▄▄▄▄▄▄▄▄▄▄
 ██████████████████████████████████████████
▄▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▄
█  ▄▀▄             █▀▀█▀▄▄
█  █▀█             █  ▐  ▐▌
█       ▄██▄       █  ▌  █
█     ▄██████▄     █  ▌ ▐▌
█    ██████████    █ ▐  █
█   ▐██████████▌   █ ▐ ▐▌
█    ▀▀██████▀▀    █ ▌ █
█     ▄▄▄██▄▄▄     █ ▌▐▌
█                  █▐ █
█                  █▐▐▌
█                  █▐█
▀▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▀█
▄▄█████████▄▄
▄██▀▀▀▀█████▀▀▀▀██▄
▄█▀       ▐█▌       ▀█▄
██         ▐█▌         ██
████▄     ▄█████▄     ▄████
████████▄███████████▄████████
███▀    █████████████    ▀███
██       ███████████       ██
▀█▄       █████████       ▄█▀
▀█▄    ▄██▀▀▀▀▀▀▀██▄  ▄▄▄█▀
▀███████         ███████▀
▀█████▄       ▄█████▀
▀▀▀███▄▄▄███▀▀▀
..PLAY NOW..
LoyceV (OP)
Legendary
*
Offline Offline

Activity: 3374
Merit: 17054


Thick-Skinned Gang Leader and Golden Feather 2021


View Profile WWW
July 13, 2020, 03:24:34 PM
Last edit: July 13, 2020, 05:37:10 PM by LoyceV
 #15

Only one of three requests seems to have worked:
That is correct, I ignore all new requests during scraping.
Update: I now allow up to 5 requests (from one user or from different users) at once. So what you and TheBeardedBaby did earlier shouldn't be a problem from now on.

I also noticed I forgot to activate the limitation on number of pages. I'll increase it to 250 pages, just know that it will take a very long time if a topic is that long.

Quote
Do more pages get scraped at a later date if a new request is made?
No, not at the moment. However, I've quoted your posts to trigger your previously ignored requests.

Perhaps yours is in a queue behind my three (two of which might be in a holding pattern at the moment?)
Correct. TheBeardedBaby posted this while you occupied my scraper.
IF it works, you should see something on the URL you posted within a few seconds. That placeholder stays until it's done.
I restarted TheBeardedBaby's request.

Doesn't quoting the link re-trigger the scrapper?
Correct.



I'll make some changes: when I'm done, I'll allow more than one request at once. But each additional request will slow down the other requests, I don't want to cause too much load on the forum.
Updates done Smiley Feel free to test it.
See status.txt for currently running tasks.



Mine is quite a huge thread -> Stake your Bitcoin address here.
It becomes interesting from the moment I started archiving all posts. It also shows the startup problems I've had with failed archives due to down time.
I currently can't catch older deleted posts (that may have been deleted after I scraped them). I'll add this to my Todo.

TheBeardedBaby
Legendary
*
Offline Offline

Activity: 2240
Merit: 3150


₿uy / $ell ..oeleo ;(


View Profile
July 14, 2020, 12:23:56 PM
 #16

I can say this is an amazing tool. I finally got the time to go trough the long threads and yes, it's working perfectly well.
My bookmark list of LoyceVs threads is growing, soon start to be difficult to get oriented in it, I have to start dividing them in sections based on categories.
This is only a good news.

Perfect tool to spot a bumping services, I'll give it more tests after we are back from vacation, traveling this weekend Smiley

LoyceV (OP)
Legendary
*
Offline Offline

Activity: 3374
Merit: 17054


Thick-Skinned Gang Leader and Golden Feather 2021


View Profile WWW
July 17, 2020, 06:23:46 PM
 #17

TheBeardedBaby made this suggestion:
Can you add a link to the deleted posts from each user in the list. I'm talking about this >
Code:
https://loyce.club/archive/members/267/2670747.html
I've added it, click "more" to see all unedited/deleted posts:
Image loading...

Disclaimer:
I don't check if I actually have archived older posts for that user. Users who haven't posted in the last year will lead to a 404 for now. When I'm done scraping older posts, I may add all of them.

kire - cryptzino
Newbie
*
Offline Offline

Activity: 19
Merit: 1


View Profile
July 21, 2020, 05:45:12 PM
 #18

Great tool, and I'm trying to utilize it.

It seems my last topic is deleted. But as I don't have PM or other messages, I don't know the topic ID.

Is there a way to check it?

Thank you
LoyceV (OP)
Legendary
*
Offline Offline

Activity: 3374
Merit: 17054


Thick-Skinned Gang Leader and Golden Feather 2021


View Profile WWW
July 21, 2020, 06:15:35 PM
 #19

I don't know the topic ID.

Is there a way to check it?
You can find your old posts using your userID here:
Viewing unedited/deleted posts

How to use it
  • Find the msgID, userID or topicID you need. Let's use msgID 51902990.
  • Remove the last 4 digits from the msgID to get the directory name (if there are less than 4 digits, use 0): 5190.
  • Put everything together behind the (above) URL and add ".html": http://loyce.club/archive/posts/5190/51902990.html.

So in your case, that brings you to unedited (or deleted) posts made by kire - cryptzino. This post was deleted, and I guess this is what you're looking for: https://loyce.club/archive/details/topic_5263237.html

kire - cryptzino
Newbie
*
Offline Offline

Activity: 19
Merit: 1


View Profile
July 21, 2020, 08:41:54 PM
 #20

...

Thank you! Precisely what I was looking for.

Now I will go after the mystery of why the topic was deleted. I just hope this is not a standard practice around.
Pages: [1] 2 3 »  All
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!