Bitcoin Forum
May 06, 2024, 09:40:24 AM *
News: Latest Bitcoin Core release: 27.0 [Torrent]
 
   Home   Help Search Login Register More  
Pages: « 1 [2] 3 4 5 6 7 8 9 10 11 »  All
  Print  
Author Topic: 60M posts! View unedited/deleted posts (search per post, per user or per topic)  (Read 8641 times)
LoyceV (OP)
Legendary
*
Offline Offline

Activity: 3304
Merit: 16604


Thick-Skinned Gang Leader and Golden Feather 2021


View Profile WWW
August 07, 2019, 07:51:24 AM
 #21

Hmm... Website seems to be down, might want to take a look at it...
Great! I was waiting for that Smiley

Now see if I can get it back up Cheesy

1714988424
Hero Member
*
Offline Offline

Posts: 1714988424

View Profile Personal Message (Offline)

Ignore
1714988424
Reply with quote  #2

1714988424
Report to moderator
1714988424
Hero Member
*
Offline Offline

Posts: 1714988424

View Profile Personal Message (Offline)

Ignore
1714988424
Reply with quote  #2

1714988424
Report to moderator
1714988424
Hero Member
*
Offline Offline

Posts: 1714988424

View Profile Personal Message (Offline)

Ignore
1714988424
Reply with quote  #2

1714988424
Report to moderator
Even in the event that an attacker gains more than 50% of the network's computational power, only transactions sent by the attacker could be reversed or double-spent. The network would not be destroyed.
Advertised sites are not endorsed by the Bitcoin Forum. They may be unsafe, untrustworthy, or illegal in your jurisdiction.
Steamtyme
Legendary
*
Offline Offline

Activity: 1540
Merit: 2036


Betnomi.com Sportsbook, Casino and Poker


View Profile WWW
August 07, 2019, 07:53:01 AM
 #22

Does it have anything to do with the recount theymos did? Not sure how that would affect the data LoyceV uses.

I did a recount of post counts earlier today. There are several bugs which cause the post count to drift from its real value over time. The current count is the accurate one.

I do recounts from time to time.

Edit: I guess not, just behind the scenes


░░░░░▄▄██████▄▄
░░▄████▀▀▀▀▀▀████▄
███▀░░░░░░░░░░▀█▀█
███░░░▄██████▄▄░░░██
░░░░░█████████░░░░██▌
░░░░█████████████████
░░░░█████████████████
░░░░░████████████████
███▄░░▀██████▀░░░███
█▀█▄▄░░░░░░░░░░▄███
░░▀████▄▄▄▄▄▄████▀
░░░░░▀▀██████▀▀
Ripmixer
░░░░░▄▄██████▄▄
░░▄████▀▀▀▀▀▀████▄
███▀░░░░░░░░░░▀█▀█
███░░░▄██████▄▄░░░██
░░░░░█████████░░░░██▌
░░░░█████████████████
░░░░█████████████████
░░░░░████████████████
███▄░░▀██████▀░░░███
█▀█▄▄░░░░░░░░░░▄███
░░▀████▄▄▄▄▄▄████▀
░░░░░▀▀██████▀▀
LoyceV (OP)
Legendary
*
Offline Offline

Activity: 3304
Merit: 16604


Thick-Skinned Gang Leader and Golden Feather 2021


View Profile WWW
August 20, 2019, 05:41:41 PM
Last edit: August 23, 2019, 08:41:08 AM by LoyceV
 #23

This is currently uploading data from the past days. When done, new posts should be available online in less than a minute.
Update August 23: Upload will take a few more hours to catch up.

Quote
See http://loyce.club/archive/posts/members/ for all posts made per a certain userID
This part is still disabled.

LoyceV (OP)
Legendary
*
Offline Offline

Activity: 3304
Merit: 16604


Thick-Skinned Gang Leader and Golden Feather 2021


View Profile WWW
August 23, 2019, 02:12:07 PM
 #24

There was a bug that overwrote files a few times. Some of the scraped posts were as old as a minute or more.
I fixed the bug, let's see how long it takes to post scrape this.

Update: http://loyce.club/archive/posts/5224/52243599.html was scraped 1 second after posting. Let me know if anything else fails.

Timelord2067
Legendary
*
Offline Offline

Activity: 3668
Merit: 2217


💲🏎️💨🚓


View Profile
August 28, 2019, 03:25:03 PM
 #25

Bump

Is this user's posts able to be updated at all?

LoyceV (OP)
Legendary
*
Offline Offline

Activity: 3304
Merit: 16604


Thick-Skinned Gang Leader and Golden Feather 2021


View Profile WWW
August 28, 2019, 03:38:59 PM
 #26

Is this user's posts able to be updated at all?
The scraping should be okay now, but I didn't fix the upload yet. I've uploaded an update for just this file.

I didn't notice before, but it shows many duplicate lines. The links to "scraped" don't work either, you'll have to manually change "/posts/posts/" in the URL to "/posts/".

Sorry for this, I haven't had the time to fix this yet.

Timelord2067
Legendary
*
Offline Offline

Activity: 3668
Merit: 2217


💲🏎️💨🚓


View Profile
August 28, 2019, 03:48:13 PM
 #27

Is this user's posts able to be updated at all?
The scraping should be okay now, but I didn't fix the upload yet. I've uploaded an update for just this file.

I didn't notice before, but it shows many duplicate lines. The links to "scraped" don't work either, you'll have to manually change "/posts/posts/" in the URL to "/posts/".

Sorry for this, I haven't had the time to fix this yet.

Thanks, I'll check back this time tomorrow (my free time this week is fairly limited)

Regards,

LoyceV (OP)
Legendary
*
Offline Offline

Activity: 3304
Merit: 16604


Thick-Skinned Gang Leader and Golden Feather 2021


View Profile WWW
August 29, 2019, 06:02:34 AM
 #28

Last night, "recent" was broken for 8 hours.  It's the first time I've seen that, theymos didn't post yet what caused it.

As a result, I couldn't scrape anything between post 52295210 and post 52297852. That means I'm missing 2641 posts.

LoyceV (OP)
Legendary
*
Offline Offline

Activity: 3304
Merit: 16604


Thick-Skinned Gang Leader and Golden Feather 2021


View Profile WWW
September 03, 2019, 07:44:04 AM
Last edit: September 06, 2019, 09:36:14 AM by LoyceV
 #29

I just checked loyce.club/archive/posts/5233/: it shows 9994 files. That means my scraper missed 6 posts, or 0.06%.
This might have been caused by burst posting (faster than my scraper can handle), or those 6 posts could be in Investigation or some other hidden board that normal users can't access.

legendster
Hero Member
*****
Offline Offline

Activity: 1778
Merit: 764


www.V.systems


View Profile
September 07, 2019, 12:16:13 AM
 #30

A great tool. But I have some thoughts about the possible use and misuse of this.

Let's talk about the positive first.

I am amidst manually collecting data from a few accounts that is operated by one individual. Now, I have an X number of accounts linked with this person but I am sure this person has 2X the number of accounts in total.

I am just being lucky to collect the ones where he has made some errors.

So if I post this half report of these linked accounts, then the person would be alerted and would, in theory, try to delete all his errors from all of his accounts' post history.

Your tool, could stop this. This basically enables you to get the posts made by anyone at the first instance they hit post. Which is great in this kind of example.


But now let's talk about the bad side.

Your tool is freely available for anyone to use. The very same person I am trying to bust for multi-accounting, is also using information scattered about me on this forum to track my identity and send me threatening messages on my telegram, phone etc.

This kind of tool would essentially be a weapon in the hands of such blackmailers and extortioners.

Furthermore,

In our country, our government is banning crypto, so if I posted my information here and if tomorrow some government agency wanted to track me down - then your tool is going to enable them to do that. And possibly land a person in jail for 10 years - just for being associated with crypto.



Now, I'd like to hear your thoughts about these 2 situations. What do you intend to do to avoid / solve such issues?


   ██████████        ████████████
     ██████████        ██████████
       ██████████        ████████
         ██████████        ██████
           ██████████        ████
             ██████████        ██
               ██████████
                 ██████████
                   ████████
                     ██████
                       ████
                        ██
|
     ▄▀▀▀▀▀▀▀▀▀█                 ▄▀▀▀▀▀▀▀▀▀█
 ▄▀                ▄▀█             ▄▀                ▄▀█
 ██████████    █             ██████████    █
 █                █                   █                █    █
 █                █     ▀▀▀▀▀▀▀█                █    █
 █                █  ▄▀             █                █  ▄▀
 ██████████▀                 ██████████▀
          █                                    █
          █                                    █
     ▄▀ █  ▀▀▀▀█                   ▄▀ █ ▀▀▀▀▀▀█
 ▄▀             ▄▀█               ▄▀               ▄▀ █
 █████████   █               ██████████    █
 █              █   █               █                █    █
 █              █   █               █                █    █
 █              █  ▄▀▀▀▀▀▀▀  █                █  ▄▀
 █████████▀                  ██████████▀

Blockchain
Database
                             ▄▄▄
                         ▄▄▀  ▀▄▄
        ▄           ▄▄▀  ▄▀▄  ▀▄▄
      █▄█   █████████████████    █
        █     █                              █ ▄▀ ▌
        █     █        ▄    █   ▄         █▀ ▄▌
       ██    █      ▀▄   █    ▄▀       █▀█
       ▌ ▌   █            █                █  █
       ▌ ▌   █                              █  █
       ██    ███████████████████
                     ▀▀▄  ▀▄▀  ▄▀▀
                         ▀▀▄  ▄▀▀
                             ▀▀▀
Dev friendly
SDK Platform
                             ▄▄▄▄
                         ▄▄█    █▄▄
                     ▄▄█            █▄▄
                 ▄▄█       ▄▄▄       █▄▄
                 █       ▄▀      ▀▄       █
               █▀     █      █      █     ▀█
               ▀▀█  █   ▄█▀█▄   █  █▀▀
               █▀▀   █  ▀███▀  █   ▀▀█
               ▀▀█     █    █    █     █▀▀
                   ▀▀█   █  █  █   █▀▀
                       ▀████████▀
                           █▄▄▄▄█
                 █        █▄▄▄▄█      █
             ▄▀ █▄                   ▄█  ▀▄
            █   █▀▄         ▀      ▄▀█    █
           █   █  █  ▌      ▀   ▐  █  █    █
           █   █▄▀▄▌      ▀   ▐▄▀▄█    █
           █       █          ▀        █       █
        █▀▀▀▀▀▀█                █▀▀▀▀▀▀█
        ▀▀▀▀▀▀▀▀                ▀▀▀▀▀▀▀▀
User-friendly
Token Creation
|
suchmoon
Legendary
*
Offline Offline

Activity: 3654
Merit: 8922


https://bpip.org


View Profile WWW
September 07, 2019, 12:35:04 AM
 #31

Now, I'd like to hear your thoughts about these 2 situations. What do you intend to do to avoid / solve such issues?

It's not LoyceV's responsibility to solve it. If you post something you shouldn't have - it's your problem. The government doesn't need LoyceV's site, they can just grab the info directly from Bitcointalk, Google cache, archive sites, or NSA hard drives.
eddie13
Legendary
*
Offline Offline

Activity: 2296
Merit: 2262


BTC or BUST


View Profile
September 07, 2019, 01:33:46 AM
Merited by AdolfinWolf (1)
 #32

What I do
  • I scrape posts within seconds, but upload them in batches every minute.
  • The list of posts per user is updated once a day (5:50 AM Amsterdam time) > This is still messy, I'm working on it!
  • Files are stored with their post number as file name. I use the first 4 digits as directory name, then upload 10,000 files per directory. You're going to want to use CTRL-F Tongue


You're glowing..

Chancellor on Brink of Second Bailout for Banks
legendster
Hero Member
*****
Offline Offline

Activity: 1778
Merit: 764


www.V.systems


View Profile
September 07, 2019, 03:00:39 AM
 #33

Now, I'd like to hear your thoughts about these 2 situations. What do you intend to do to avoid / solve such issues?

It's not LoyceV's responsibility to solve it. If you post something you shouldn't have - it's your problem. The government doesn't need LoyceV's site, they can just grab the info directly from Bitcointalk, Google cache, archive sites, or NSA hard drives.

Not all posts are archived.

Google caches have an expiry date.

And there is, as far as I know, no way to grab info from Bitcointalk if the info is never quoted by someone else and deleted / edited out. And with Theymos' response rate I don't think any gov agency would have much luck getting him to dish out user data. He's pro anonymity as far as I can tell.

Btw. you seem to think that by gov. I implied only US. Obviously that's not the case.

In countries like Bangladesh the gov does not have the diplomatic resources to use the same tools that NSA offers to the US or any US allies. They employ low-level white hat / grey hat servicemen to do their online digging work. It is the same with many developing and underdeveloped nations.

Besides, I am sure NSA data is accessed in case of national security - NOT detecting / tracking crypto users for political purposes.

Information, shared by anyone on this forum is kind of made with the confidence that they can perhaps edit out any unrequired information in the future - hence the existence of the edit button.

But Loyce's tool is circumventing that basic forum function that the average user takes for granted.

idk guys - this is a big area of grey for me. This is dwelling into the areas of retaining user data, and user privacy - It is a bit uncomfortable for me to endorse it - even though this is great work and can be used to keep the forum clean. But it has a huge potential of misuse and abuse as well.

Loyce - maybe you should think this through and make your stance clear.


   ██████████        ████████████
     ██████████        ██████████
       ██████████        ████████
         ██████████        ██████
           ██████████        ████
             ██████████        ██
               ██████████
                 ██████████
                   ████████
                     ██████
                       ████
                        ██
|
     ▄▀▀▀▀▀▀▀▀▀█                 ▄▀▀▀▀▀▀▀▀▀█
 ▄▀                ▄▀█             ▄▀                ▄▀█
 ██████████    █             ██████████    █
 █                █                   █                █    █
 █                █     ▀▀▀▀▀▀▀█                █    █
 █                █  ▄▀             █                █  ▄▀
 ██████████▀                 ██████████▀
          █                                    █
          █                                    █
     ▄▀ █  ▀▀▀▀█                   ▄▀ █ ▀▀▀▀▀▀█
 ▄▀             ▄▀█               ▄▀               ▄▀ █
 █████████   █               ██████████    █
 █              █   █               █                █    █
 █              █   █               █                █    █
 █              █  ▄▀▀▀▀▀▀▀  █                █  ▄▀
 █████████▀                  ██████████▀

Blockchain
Database
                             ▄▄▄
                         ▄▄▀  ▀▄▄
        ▄           ▄▄▀  ▄▀▄  ▀▄▄
      █▄█   █████████████████    █
        █     █                              █ ▄▀ ▌
        █     █        ▄    █   ▄         █▀ ▄▌
       ██    █      ▀▄   █    ▄▀       █▀█
       ▌ ▌   █            █                █  █
       ▌ ▌   █                              █  █
       ██    ███████████████████
                     ▀▀▄  ▀▄▀  ▄▀▀
                         ▀▀▄  ▄▀▀
                             ▀▀▀
Dev friendly
SDK Platform
                             ▄▄▄▄
                         ▄▄█    █▄▄
                     ▄▄█            █▄▄
                 ▄▄█       ▄▄▄       █▄▄
                 █       ▄▀      ▀▄       █
               █▀     █      █      █     ▀█
               ▀▀█  █   ▄█▀█▄   █  █▀▀
               █▀▀   █  ▀███▀  █   ▀▀█
               ▀▀█     █    █    █     █▀▀
                   ▀▀█   █  █  █   █▀▀
                       ▀████████▀
                           █▄▄▄▄█
                 █        █▄▄▄▄█      █
             ▄▀ █▄                   ▄█  ▀▄
            █   █▀▄         ▀      ▄▀█    █
           █   █  █  ▌      ▀   ▐  █  █    █
           █   █▄▀▄▌      ▀   ▐▄▀▄█    █
           █       █          ▀        █       █
        █▀▀▀▀▀▀█                █▀▀▀▀▀▀█
        ▀▀▀▀▀▀▀▀                ▀▀▀▀▀▀▀▀
User-friendly
Token Creation
|
LoyceV (OP)
Legendary
*
Offline Offline

Activity: 3304
Merit: 16604


Thick-Skinned Gang Leader and Golden Feather 2021


View Profile WWW
September 07, 2019, 09:22:12 AM
Merited by Foxpup (5)
 #34

But now let's talk about the bad side.

Your tool is freely available for anyone to use. The very same person I am trying to bust for multi-accounting, is also using information scattered about me on this forum to track my identity and send me threatening messages on my telegram, phone etc.
Anything you put on the internet, should be considered public information for the rest of eternity.

Quote
This kind of tool would essentially be a weapon in the hands of such blackmailers and extortioners.
If that's the case, you shouldn't have put the information online in the first place.

Several phishing sites clone all posts on Bitcointalk too. I just do it without trying to steal passwords. Archive.org and Archive.is store many pages too.

Quote
In our country, our government is banning crypto, so if I posted my information here and if tomorrow some government agency wanted to track me down - then your tool is going to enable them to do that. And possibly land a person in jail for 10 years - just for being associated with crypto.
As much as I hate government oppression, it's not my responsibility to follow local laws from any country on earth. If it's illegal to use crypto, and you insist on doing it anyway, all I can say is you should be very careful what you post and ensure your real identity stays hidden. Use encryption to hide yourself Smiley

Now, I'd like to hear your thoughts about these 2 situations. What do you intend to do to avoid / solve such issues?
It's not LoyceV's responsibility to solve it. If you post something you shouldn't have - it's your problem. The government doesn't need LoyceV's site, they can just grab the info directly from Bitcointalk, Google cache, archive sites, or NSA hard drives.
Agreed!
I don't scrape Investigations, which is the only place on Bitcointalk that allows DOXing. If someone gets DOXed anywhere else, posts can be reported and the user gets banned. If that happens, feel free to contact me to edit one of the archived posts. When I do that, I'll also edit the filename to make it very obvious it was edited by me, and I'll probably create a log here. Until now, I haven't done this.
Another thing I've been thinking about is if someone posts something that's not allowed by my webhost. If that would happen, I'll have to edit it too.
I'm not sure how Archive.org and the likes handle illegal stuff on their servers.

You're glowing..
I'm not sure what that means

suchmoon
Legendary
*
Offline Offline

Activity: 3654
Merit: 8922


https://bpip.org


View Profile WWW
September 07, 2019, 01:40:35 PM
 #35

In countries like Bangladesh the gov does not have the diplomatic resources to use the same tools that NSA offers to the US or any US allies. They employ low-level white hat / grey hat servicemen to do their online digging work. It is the same with many developing and underdeveloped nations.

Besides, I am sure NSA data is accessed in case of national security - NOT detecting / tracking crypto users for political purposes.

You're missing the point. NSA or not NSA, any government agency that is interested in tracking their citizens' involvement in crypto could be doing what LoyceV does and scrape those posts without making an announcement thread here. They could be doing even more, e.g. scrape posts in Investigations.

Information, shared by anyone on this forum is kind of made with the confidence that they can perhaps edit out any unrequired information in the future - hence the existence of the edit button.

You cannot "edit out" anything from the internet. The edit button is useful to fix spelling errors and such. It's utterly useless for hiding information, which could have been copied by anyone, could have ended in Bitcoinalk backups (and yes, likely to be handed over to law enforcement if theymos gets a subpoena), etc.

eddie13
Legendary
*
Offline Offline

Activity: 2296
Merit: 2262


BTC or BUST


View Profile
September 07, 2019, 02:40:09 PM
 #36

any government agency that is interested in tracking their citizens' involvement in crypto could be doing what LoyceV does and scrape those posts


Chancellor on Brink of Second Bailout for Banks
suchmoon
Legendary
*
Offline Offline

Activity: 3654
Merit: 8922


https://bpip.org


View Profile WWW
September 07, 2019, 03:16:39 PM
 #37

So the Switzerland thing is just a CIA cover? Makes sense now.
PrimeNumber7
Copper Member
Legendary
*
Offline Offline

Activity: 1624
Merit: 1899

Amazon Prime Member #7


View Profile
September 07, 2019, 07:46:50 PM
 #38


In our country, our government is banning crypto, so if I posted my information here and if tomorrow some government agency wanted to track me down - then your tool is going to enable them to do that. And possibly land a person in jail for 10 years - just for being associated with crypto.



Now, I'd like to hear your thoughts about these 2 situations. What do you intend to do to avoid / solve such issues?

If someone posts something on Facebook or Twitter, with privacy settings allowing anyone to see the post with the hashtag #crypto, anyone specifically looking at your profile can see your post, but it is not trivial for someone to obtain all posts containing a hashtag.

There are limitations as to how many tweets can be distributed to entity user per day and per month. The daily number of tweets that can be sent to a third party is large (50,000), but is a small percentage of the total tweets posted every day (500 million). If someone like DPR was posting on twitter instead of bitcointalk, they probably would still have gotten caught, while a HK protestor would probably be safe on twitter, while the HK government (a sockpuppet of the Chinese government) might be investigated similar to how legendster describes if they are posting on bitcointalk.

LoyceV is not the one who invented scraping forum posts, nor is he the only one to be actively scraping posts. There are many ways to download forum posts via automated means, and it is not difficult to get posts into a DataFrame that can later be analyzed.  

The the root cause problem is the administration allows too much access to posts. A straightforward solution is to have an hourly rate limit as to how many page views an individual IP address/range can access on an hourly and daily basis that is something above what a *person* would see in the normal course of reading, but well below the necessary amount of page views required to view all posts. The scraping of posts for non-academic use should also be explicitly prohibited by the administration.
LoyceV (OP)
Legendary
*
Offline Offline

Activity: 3304
Merit: 16604


Thick-Skinned Gang Leader and Golden Feather 2021


View Profile WWW
September 08, 2019, 09:32:07 AM
 #39

The the root cause problem is the administration allows too much access to posts. A straightforward solution is to have an hourly rate limit as to how many page views an individual IP address/range can access on an hourly and daily basis that is something above what a *person* would see in the normal course of reading, but well below the necessary amount of page views required to view all posts. The scraping of posts for non-academic use should also be explicitly prohibited by the administration.
I'm glad those restrictions aren't in place. That wouldn't stop any government agency to scrape all posts, as they can just use many different servers, but it would make it very difficult to make user-contributions (such as Vod's BPIP.org or my Trust/Merit data).

The forum currently allows on average 1 page download per second, and that already means I have to set several seconds delay, to prevent different scrapers from conflicting with each other's data scraping.

PrimeNumber7
Copper Member
Legendary
*
Offline Offline

Activity: 1624
Merit: 1899

Amazon Prime Member #7


View Profile
September 09, 2019, 05:02:46 AM
 #40

The the root cause problem is the administration allows too much access to posts. A straightforward solution is to have an hourly rate limit as to how many page views an individual IP address/range can access on an hourly and daily basis that is something above what a *person* would see in the normal course of reading, but well below the necessary amount of page views required to view all posts. The scraping of posts for non-academic use should also be explicitly prohibited by the administration.
I'm glad those restrictions aren't in place. That wouldn't stop any government agency to scrape all posts, as they can just use many different servers, but it would make it very difficult to make user-contributions (such as Vod's BPIP.org or my Trust/Merit data).

The forum currently allows on average 1 page download per second, and that already means I have to set several seconds delay, to prevent different scrapers from conflicting with each other's data scraping.
The 1 page download per second is a standard generalized limit when no commercial relationship exists. Although this limit has been communicated by the administration, it is also what should be the assumed limit when scraping information from a website.

I believe merit data is actually published by the administration, along with trust data. This information is less intrusive than information contained in posts. If my above proposal were to be changed to 'thread page view' I understand BPIP would be entirely unaffected.

I would encourage you to review the Twitter developer terms, and the Instagram API TOS. These policy documents prohibit many of the things that are done with bitcointalk information. I would presume the 'average' bitcointalk user to care more about privacy than the 'average' Instagram or Twitter user.

From a technical perspective, there is nothing to prevent a government from collecting post information on bitcointalk. However if bitcointalk policies explicitly disallow government law enforcement from collecting information in mass via automation, in general, law enforcement will have trouble using information gained via these means as the basis for a warrant, or as admissible evidence in court.
Pages: « 1 [2] 3 4 5 6 7 8 9 10 11 »  All
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!