Bitcoin Forum
May 06, 2024, 06:51:08 PM *
News: Latest Bitcoin Core release: 27.0 [Torrent]
 
   Home   Help Search Login Register More  
Pages: [1] 2 3 4 5 6 7 8 9 10 11 »  All
  Print  
Author Topic: 60M posts! View unedited/deleted posts (search per post, per user or per topic)  (Read 8641 times)
LoyceV (OP)
Legendary
*
Offline Offline

Activity: 3304
Merit: 16609


Thick-Skinned Gang Leader and Golden Feather 2021


View Profile WWW
July 21, 2019, 05:08:59 PM
Last edit: February 21, 2023, 09:23:33 AM by LoyceV
Merited by 1miau (10), DdmrDdmr (5), OmegaStarScream (3), Ucy (3), Halab (2), bitmover (2), DireWolfM14 (2), Rikafip (2), TMAN (2), vapourminer (1), JayJuanGee (1), nutildah (1), TheQuin (1), sujonali1819 (1), Coin-1 (1), The Cryptovator (1), hd49728 (1), wildan88 (1), dragonvslinux (1), Steamtyme (1), FontSeli (1), lulucrypto (1), Rrita (1), kaggie (1), 0x256 (1)
 #1

February 22, 2020: All updates are now live!
August 12, 2020: I finished scraping all oldposts!



Ever wanted to see who's lying when a post has been edited or deleted? I may be able to help!

I archive most posts within seconds after they are created (before any edits). I started this data collection around the time I started this topic. All data I have since then is available online.
I also have older posts: I've saved (most) unedited posts (6.2 million posts) since September 12, 2018, until the start of this topic. This data has not been added to this topic, and I can't really add it because I tried to remove quotes and that has some bugs. You can request to dig up unedited data when needed.

Viewing unedited/deleted posts

How to use it
Just click one of the links, and enter the msgID, userID or topicID.
Or (this older method still works):
  • Find the msgID, userID or topicID you need. Let's use msgID 51902990.
  • Remove the last 4 digits from the msgID to get the directory name (if there are 4 or less digits, use 0): 5190.
  • Put everything together behind the (above) URL and add ".html": https://loyce.club/archive/posts/5190/51902990.html.

Details
  • Files are stored with their msgID, userID or topicID as file name. I remove the last 4 digits to create the directory name. Each directory contains up to 10,000 HTML-files. Use CTRL-F to find what you're looking for.
  • I don't scrape hidden boards (such as Investigations).
  • I don't keep post titles
  • I save raw HTML, including quotes
  • If I run out of disk space, I might create compressed archives per 10,000 posts.
  • Although I plan to preserve all data, I make no guarantees. Feel free to archive posts.
  • My current (sponsored) webhost has enough storage space for years to come.
  • All scrape-times use Amsterdam time (CET).
  • Usually, I capture at least 99.95% of all posts. Server or internet connection problems can severely reduce this.

Examples



Older posts
Sneak preview: https://loyce.club/archive/oldposts/
How to use:
  • Find the msgID you need. Let's use 28228
  • Remove the last 5 digits from the msgID to get the directory name (if there are 5 or less digits, use 0): 0
  • Replace the last 2 digits of the msgID by xx, and add .html (if there are 5 or less digits, use 0xx): 282xx.html
  • Add "#msg" and the msgID: #msg28228
  • Put everything together and go to https://loyce.club/archive/oldposts/0/282xx.html#msg28228

Limitations
  • Currently, the first 6.1 million posts are available.
  • I'll scrape the first 5.21 million topics and all posts in there.
  • That means I'll archive 53.36 million posts, this partially overlaps with my scraper for new posts.
  • This is a one-time thing, I won't update it with newer posts (I scrape unedited versions for those).
  • The time "scraped on" is Amsterdam time.

If no username is mentioned, it's either "Anonymous" or "random". I forgot those exist when I started scraping, and it's not important enough to start over.

If anything goes wrong, let me know here.



See [overview] LoyceV's useful data on Bitcointalk for more of my forum-related topics

1715021468
Hero Member
*
Offline Offline

Posts: 1715021468

View Profile Personal Message (Offline)

Ignore
1715021468
Reply with quote  #2

1715021468
Report to moderator
Advertised sites are not endorsed by the Bitcoin Forum. They may be unsafe, untrustworthy, or illegal in your jurisdiction.
1715021468
Hero Member
*
Offline Offline

Posts: 1715021468

View Profile Personal Message (Offline)

Ignore
1715021468
Reply with quote  #2

1715021468
Report to moderator
suchmoon
Legendary
*
Offline Offline

Activity: 3654
Merit: 8922


https://bpip.org


View Profile WWW
July 21, 2019, 05:43:03 PM
Merited by LoyceV (2), Welsh (2), Steamtyme (1)
 #2

quotes are not very clear

If you'd like to fix that: wrap the HTML in <div class="post">...</div> and use the following CSS:

Code:
    .post {
        color: #000000;
        background-color: #ECEDF3;
        font-size: 12px;
        font-family: verdana, sans-serif;
        margin-bottom: 5px;
        padding: 5px;
    }

    .post .quoteheader {
        color: #476C8E;
        text-decoration: none;
        font-style: normal;
        font-weight: bold;
        font-size: 10px;
        line-height: 1.2em;
        margin-left: 6px;
    }

    .post .quote {
        color: #000000;
        background-color: #f1f2f4;
        border: 1px solid #d0d0e0;
        padding: 5px;
        margin: 1px 3px 6px 6px;
        font-size: 11px;
        line-height: 1.4em;
    }

It's by no means complete (still has problems with code tags etc) but should help with the quotes and makes it look similar to Bitcointalk styling. You can save it as a .css file and just reference in each html so space usage would be minimal and then you can adjust the CSS as needed.
LoyceV (OP)
Legendary
*
Offline Offline

Activity: 3304
Merit: 16609


Thick-Skinned Gang Leader and Golden Feather 2021


View Profile WWW
July 21, 2019, 06:37:54 PM
 #3

If you'd like to fix that: wrap the HTML in <div class="post">...</div> and use the following CSS:
Thanks!
The "div class post" part is there already, I never removed it. I'll make some more adjustments, I was lazy using some headers from the forum HTML, but I'll recreate them on my own.

I've named it suchmoon.css Smiley

Code:
I'm creating a new post to test the new version. I'm also adding some code

Update: see http://loyce.club/archive/posts/5190/51903915.html

suchmoon
Legendary
*
Offline Offline

Activity: 3654
Merit: 8922


https://bpip.org


View Profile WWW
July 21, 2019, 06:40:59 PM
 #4

The "div class post" part is there already, I never removed it. I'll make some more adjustments, I was lazy using some headers from the forum HTML, but I'll recreate them on my own.

I've named it suchmoon.css Smiley

Feel free to call it theymos.css because it's mostly stolen from here: https://bitcointalk.org/Themes/custom1/style.css Smiley

You can steal borrow more stuff from the above file, like the .code class
LoyceV (OP)
Legendary
*
Offline Offline

Activity: 3304
Merit: 16609


Thick-Skinned Gang Leader and Golden Feather 2021


View Profile WWW
July 21, 2019, 07:02:31 PM
Last edit: July 21, 2019, 07:24:16 PM by LoyceV
 #5

Feel free to call it theymos.css because it's mostly stolen from here: https://bitcointalk.org/Themes/custom1/style.css Smiley

You can steal borrow more stuff from the above file, like the .code class
I didn't know CSS is that easy Cheesy

It works Cheesy See http://loyce.club/archive/posts/5190/51904241.html

LoyceV (OP)
Legendary
*
Offline Offline

Activity: 3304
Merit: 16609


Thick-Skinned Gang Leader and Golden Feather 2021


View Profile WWW
July 22, 2019, 07:14:16 PM
Last edit: August 20, 2019, 05:47:31 PM by LoyceV
 #6

This morning, I checked http://loyce.club/archive/posts/members/?SD and it instantly revealed a spammer: http://loyce.club/archive/posts/members/1514722.html
It got me thinking: I can create a daily list of users (sorted by the number of posts they made in the past 24 hours). That would instantly highlight users who post a lot, and makes it easy to identify bump spammers. If anyone's interested to check it once in a while, I'll make it Smiley

Timelord2067
Legendary
*
Offline Offline

Activity: 3668
Merit: 2217


💲🏎️💨🚓


View Profile
July 23, 2019, 08:22:59 AM
Last edit: May 16, 2023, 11:30:17 PM by Timelord2067
 #7

[quote author=LoyceV link=topic=5167469.msg51915669#msg51915669 date=1563822856]
Bump!

[quote author=LoyceV link=topic=4720640.msg51908047#msg51908047 date=1563775702]
This morning, I checked http://loyce.club/archive/posts/members/?SD and it instantly revealed a spammer: http://loyce.club/archive/posts/members/1514722.html
It got me thinking: I can create a daily list of users (sorted by the number of posts they made in the past 24 hours). That would instantly highlight users who post a lot, and makes it easy to identify bump spammers. If anyone's interested to check it once in a while, I'll make it Smiley[/quote]
[/quote]

I thought I'd see how significant the posts step down at the 8k/4k file size:



the middle one had just one post while the first and third have 29 and eight respectively.  I guess the posting list will ebb and surge during holidays and work days/week-ends.  Perhaps a known spammers' link be changed to another colour? (red/purple/orange etc for spammer/scammer nuked Flag etc)??

Where might we post our findings?

LoyceV (OP)
Legendary
*
Offline Offline

Activity: 3304
Merit: 16609


Thick-Skinned Gang Leader and Golden Feather 2021


View Profile WWW
July 23, 2019, 08:31:44 AM
 #8

I thought I'd see how significant the posts step down at the 8k/4k file size:



the middle one had just one post while the first and third have 29 and eight respectively. 
The third one in your list has 8 posts. Those 8k/4k aren't real file sizes, I think the webserver shows the block size it uses on the file system.
That means this isn't the best way to sort files. I'll add it to my TODO: create a index.html with more information.

Quote
Perhaps a known spammers' link be changed to another colour? (red/purple/orange etc for spammer/scammer nuked Flag etc)??
I can strike out banned users (also on my TODO now), but since they're still posting, that won't be many users yet.

Quote
Where might we post our findings?
I'm not sure, maybe a separate thread?

Timelord2067
Legendary
*
Offline Offline

Activity: 3668
Merit: 2217


💲🏎️💨🚓


View Profile
July 23, 2019, 08:43:14 AM
 #9

Instead of the 8k/4k approx file size an actual post count?

Also, (am making work for you now) a sort by file name / last posted etc?

Quote
Quote
Where might we post our findings?
I'm not sure, maybe a separate thread?

Perhaps self moderated and a simple code

Code:
date: (time GMT)
name+uid
post count
post type: scam [] one line [] signature [] bump []

people can see when the posts were last reviewed so they aren't doubling up on work?

LoyceMobile
Hero Member
*****
Offline Offline

Activity: 1655
Merit: 687


LoyceV on the road. Or couch.


View Profile WWW
July 23, 2019, 10:17:08 PM
Last edit: July 30, 2019, 05:22:19 AM by LoyceMobile
 #10

Just a thought: if I get deleted posts from modlog, I can highlight them too.

Another idea: list posts for each topicID, so it's easier to find posts that have been deleted from a certain topic.

LoyceV on the road Advertise here for LN Don't deal with this account (exception)
Advertise here for LN Tip my kids Exchange LN (20 coins). 1% fee. No KYC <€50/month
My useful topics: Meritt & Trust & Moreee Art Advertise here for LN Foru[url=https://bitcointalk.org/m
LoyceMobile
Hero Member
*****
Offline Offline

Activity: 1655
Merit: 687


LoyceV on the road. Or couch.


View Profile WWW
July 28, 2019, 08:36:31 PM
 #11

The member directory got messed up, I can't access my VPS from here so just don't look at it for the coming week.....

LoyceV on the road Advertise here for LN Don't deal with this account (exception)
Advertise here for LN Tip my kids Exchange LN (20 coins). 1% fee. No KYC <€50/month
My useful topics: Meritt & Trust & Moreee Art Advertise here for LN Foru[url=https://bitcointalk.org/m
nutildah
Legendary
*
Offline Offline

Activity: 2982
Merit: 7976



View Profile WWW
August 01, 2019, 01:29:41 PM
 #12

So, just that I understand what's going on here, you're saving the first version of every post made by everybody, ever? I don't quite get what you're doing I guess.

▄▄███████▄▄
▄██████████████▄
▄██████████████████▄
▄████▀▀▀▀███▀▀▀▀█████▄
▄█████████████▄█▀████▄
███████████▄███████████
██████████▄█▀███████████
██████████▀████████████
▀█████▄█▀█████████████▀
▀████▄▄▄▄███▄▄▄▄████▀
▀██████████████████▀
▀███████████████▀
▀▀███████▀▀
.
 MΞTAWIN  THE FIRST WEB3 CASINO   
.
.. PLAY NOW ..
LoyceV (OP)
Legendary
*
Offline Offline

Activity: 3304
Merit: 16609


Thick-Skinned Gang Leader and Golden Feather 2021


View Profile WWW
August 01, 2019, 01:39:26 PM
Last edit: August 01, 2019, 02:15:24 PM by LoyceV
Merited by nutildah (2)
 #13

you're saving the first version of every post made by everybody, ever?
Correct.

Update.

tranthidung
Legendary
*
Offline Offline

Activity: 2268
Merit: 4010


Farewell o_e_l_e_o


View Profile WWW
August 01, 2019, 02:34:12 PM
 #14

you're saving the first version of every post made by everybody, ever?
Correct.

Update.
I think it should be the version of posts within 15 minutes (if I am remembering correctly) after published. It will be more matched with forum data. Only posts edited after 15 minutes will be shown with editing history and last editing time.
are all my edits within the first 10 minutes also logged?

No, edits in the grace period are not logged.

btw, is this still the same TradeFortress?

Probably.
[New Feature] "Last edit" to be shown as text on mobile. FIXED! 10x Theymos:)

▄▄███████▄▄
▄██████████████▄
▄██████████████████▄
▄████▀▀▀▀███▀▀▀▀█████▄
▄█████████████▄█▀████▄
███████████▄███████████
██████████▄█▀███████████
██████████▀████████████
▀█████▄█▀█████████████▀
▀████▄▄▄▄███▄▄▄▄████▀
▀██████████████████▀
▀███████████████▀
▀▀███████▀▀
.
 MΞTAWIN  THE FIRST WEB3 CASINO   
.
.. PLAY NOW ..
LoyceV (OP)
Legendary
*
Offline Offline

Activity: 3304
Merit: 16609


Thick-Skinned Gang Leader and Golden Feather 2021


View Profile WWW
August 01, 2019, 02:41:14 PM
 #15

I think it should be the version of posts within 15 minutes (if I am remembering correctly) after published. It will be more matched with forum data. Only posts edited after 15 minutes will be shown with editing history and last editing time.
Posts can be edited for 10 minutes without showing (or even keeping!) an edit history. But I'm not aiming to match the forum, I'm aiming to show the unedited post. And I can only download posts from recent when they're new, searching for posts that are 10 minutes old will be more work.

nutildah
Legendary
*
Offline Offline

Activity: 2982
Merit: 7976



View Profile WWW
August 01, 2019, 02:45:07 PM
 #16

Wow Loyce, you've really managed to outdo yourself this time.

Can we keep this on the downlow, I'm sure it could be quite a weapon, lol.

I'm just kidding, about the downlow part. If its a weapon everyone should have equal access to it.

Its going to be a great utility for busting liars.

▄▄███████▄▄
▄██████████████▄
▄██████████████████▄
▄████▀▀▀▀███▀▀▀▀█████▄
▄█████████████▄█▀████▄
███████████▄███████████
██████████▄█▀███████████
██████████▀████████████
▀█████▄█▀█████████████▀
▀████▄▄▄▄███▄▄▄▄████▀
▀██████████████████▀
▀███████████████▀
▀▀███████▀▀
.
 MΞTAWIN  THE FIRST WEB3 CASINO   
.
.. PLAY NOW ..
tranthidung
Legendary
*
Offline Offline

Activity: 2268
Merit: 4010


Farewell o_e_l_e_o


View Profile WWW
August 01, 2019, 02:48:32 PM
 #17

I think it should be the version of posts within 15 minutes (if I am remembering correctly) after published. It will be more matched with forum data. Only posts edited after 15 minutes will be shown with editing history and last editing time.
Posts can be edited for 10 minutes without showing (or even keeping!) an edit history. But I'm not aiming to match the forum, I'm aiming to show the unedited post. And I can only download posts from recent when they're new, searching for posts that are 10 minutes old will be more work.
Ooops. I did not know that page. Checked it, and saw only ten pages available. Is it a fixed one (for all)? Or it is just a default page, and I can modify total displayed pages if I want?

▄▄███████▄▄
▄██████████████▄
▄██████████████████▄
▄████▀▀▀▀███▀▀▀▀█████▄
▄█████████████▄█▀████▄
███████████▄███████████
██████████▄█▀███████████
██████████▀████████████
▀█████▄█▀█████████████▀
▀████▄▄▄▄███▄▄▄▄████▀
▀██████████████████▀
▀███████████████▀
▀▀███████▀▀
.
 MΞTAWIN  THE FIRST WEB3 CASINO   
.
.. PLAY NOW ..
LoyceV (OP)
Legendary
*
Offline Offline

Activity: 3304
Merit: 16609


Thick-Skinned Gang Leader and Golden Feather 2021


View Profile WWW
August 03, 2019, 08:50:43 AM
 #18

I'm sure it could be quite a weapon, lol.
If the truth can be used as a weapon against someone, he totally deserves it Tongue

Quote
Its going to be a great utility for busting liars.
I'm not sure yet how to keep it long-term though, it currently grows at a rate of about 2 GB and a couple of hundred thousand files per month. I'll get a bigger hosting soon, but long-term, I'm looking at some serious hosting requirements.
My first priority is moving more of my data to a VPS, and get a more permanent solution (the current VPS is paid per month).

Checked it, and saw only ten pages available. Is it a fixed one (for all)?
I only download the first page, there's not really a need to download other pages, as long as I get the first one often enough.



Just a thought: if I get deleted posts from modlog, I can highlight them too.
To answer my own suggestion: this won't work, modlog doesn't show which post was deleted.

nutildah
Legendary
*
Offline Offline

Activity: 2982
Merit: 7976



View Profile WWW
August 03, 2019, 08:58:57 AM
 #19

I'm sure it could be quite a weapon, lol.
If the truth can be used as a weapon against someone, he totally deserves it Tongue

Well said. Quotable LoyceV.

Of course you are inadvertently insinuating that women never lie.  Cheesy

I'm thinking eventually it will be handy in trying to compare writing styles between users, in addition to the usual "but you originally said this" type situations.

I know you could probably parse all the text from particular users from the forum itself, but in the format on your server its easier for me to attempt such a thing.

I encourage you to keep it up as long as you can.


▄▄███████▄▄
▄██████████████▄
▄██████████████████▄
▄████▀▀▀▀███▀▀▀▀█████▄
▄█████████████▄█▀████▄
███████████▄███████████
██████████▄█▀███████████
██████████▀████████████
▀█████▄█▀█████████████▀
▀████▄▄▄▄███▄▄▄▄████▀
▀██████████████████▀
▀███████████████▀
▀▀███████▀▀
.
 MΞTAWIN  THE FIRST WEB3 CASINO   
.
.. PLAY NOW ..
nutildah
Legendary
*
Offline Offline

Activity: 2982
Merit: 7976



View Profile WWW
August 07, 2019, 07:34:49 AM
 #20

Hmm... Website seems to be down, might want to take a look at it...

▄▄███████▄▄
▄██████████████▄
▄██████████████████▄
▄████▀▀▀▀███▀▀▀▀█████▄
▄█████████████▄█▀████▄
███████████▄███████████
██████████▄█▀███████████
██████████▀████████████
▀█████▄█▀█████████████▀
▀████▄▄▄▄███▄▄▄▄████▀
▀██████████████████▀
▀███████████████▀
▀▀███████▀▀
.
 MΞTAWIN  THE FIRST WEB3 CASINO   
.
.. PLAY NOW ..
Pages: [1] 2 3 4 5 6 7 8 9 10 11 »  All
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!