Bitcoin Forum
November 07, 2024, 03:02:40 PM *
News: Latest Bitcoin Core release: 28.0 [Torrent]
 
   Home   Help Search Login Register More  
Pages: [1]
  Print  
Author Topic: Making a backup of this website with HTTrack  (Read 88 times)
takuma sato (OP)
Sr. Member
****
Offline Offline

Activity: 317
Merit: 448


View Profile
August 26, 2022, 03:39:00 PM
 #1

I had a dream the other day where I wasn't able to log in in there. The site was basically 404'ing as I refreshed. I went into Google and looked up news. "Bitcointalk.org website has been seized by some supranational authority". Basically some agency with a fancy acronym coerced theymos into giving away the database for tax purposes. I eventually woke up.

This has made me think what would happen if all the information here was lost, and thus I would like to make a copy of the entire website with a handy little software called HTTrack which allows you to get pretty much a 1:1 mirror of a site. And you could keep updating the offline copy from time to time, kind of like keeping up with the blockchain as you sync.

My question is: Is it ok to do this? Im assuming theymos wouldn't care.

I would like to ask, how big is this website? just to see how long it would take to have an offline database. This site is mostly text, so it shouldn't be a lot, however, there's lots of text, and all images from external url's would be saved too.

I just wouldn't like to wake up some day and see the site is gone for some reason. There's a lot of stuff to read in here that has been written for 10 year already, so it would be difficult to catch up. Just reading all of satoshi's posts would take a while. And I like to read things from source, so I guarantee nothing was modified. I've seen satoshi misquoted or even modified posts with inspect element.
jackg
Copper Member
Legendary
*
Offline Offline

Activity: 2856
Merit: 3071


https://bit.ly/387FXHi lightning theory


View Profile
August 26, 2022, 03:45:11 PM
 #2

The forum is big for an archive. There are archive sites that scour this forum and archive a lot of details.

There are 5411334 topics (according to when this one was created). There are a lot of user profile pages if you want to archive those and many pages to a lot of threads.

There are projects to archive this forum though too so you'd just be adding to that. You could also try doing a day of scraping the forum for number of users online so you aren't hammering the servers while they're already being hammered.

The forum bans multiple requests that are sent within a 1 second period - from the same host - so you'd be best off only making requests less than that.
DVlog
Full Member
***
Offline Offline

Activity: 504
Merit: 212


View Profile
August 26, 2022, 05:41:22 PM
 #3

I do not think we need every thread of this forum to archive. This forum is too big to do that anyway. Also, there are many threads that are full of spam and useless or not necessary for the future (bounty thread, altcoin section, gambling, speculation, etc).

You can archive sections like bitcoin, project development, and economics if you really think those will be useful in the future.

HTTrak is a great tool by the way. I have used it several times and works fine if you know how to do it.
FatFork
Legendary
*
Offline Offline

Activity: 1778
Merit: 2663


Crypto Swap Exchange


View Profile WWW
August 26, 2022, 08:08:45 PM
Merited by philipma1957 (2)
 #4

My question is: Is it ok to do this? Im assuming theymos wouldn't care.

In the event that this affects the site's overall performance, theymos will most likely care.

I would like to ask, how big is this website? just to see how long it would take to have an offline database. This site is mostly text, so it shouldn't be a lot, however, there's lots of text, and all images from external url's would be saved too.

Based on LoyceV's information from two years ago, the required storage space for the entire bitcointalk forum is probably somewhere around 60GB, at the moment.

I just wouldn't like to wake up some day and see the site is gone for some reason. There's a lot of stuff to read in here that has been written for 10 year already, so it would be difficult to catch up. Just reading all of satoshi's posts would take a while. And I like to read things from source, so I guarantee nothing was modified. I've seen satoshi misquoted or even modified posts with inspect element.

Two very good archives of bitcointalk posts already exist: LoycV's Loyce.Club and TryNinja's Ninjastic.space. I think you can trust their authenticity.

█▀▀▀











█▄▄▄
▀▀▀▀▀▀▀▀▀▀▀
e
▄▄▄▄▄▄▄▄▄▄▄
█████████████
████████████▄███
██▐███████▄█████▀
█████████▄████▀
███▐████▄███▀
████▐██████▀
█████▀█████
███████████▄
████████████▄
██▄█████▀█████▄
▄█████████▀█████▀
███████████▀██▀
████▀█████████
▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀
c.h.
▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄
▀▀▀█











▄▄▄█
▄██████▄▄▄
█████████████▄▄
███████████████
███████████████
███████████████
███████████████
███░░█████████
███▌▐█████████
█████████████
███████████▀
██████████▀
████████▀
▀██▀▀
philipma1957
Legendary
*
Offline Offline

Activity: 4298
Merit: 8804


'The right to privacy matters'


View Profile WWW
August 26, 2022, 08:39:10 PM
 #5

I would think he could ask to back them up as an extra secure method.

In fact I have a lot of pc's and would be willing to back up either site if Joyce or tryninja need some one to hold a few hdds of info.

▄▄███████▄▄
▄██████████████▄
▄██████████████████▄
▄████▀▀▀▀███▀▀▀▀█████▄
▄█████████████▄█▀████▄
███████████▄███████████
██████████▄█▀███████████
██████████▀████████████
▀█████▄█▀█████████████▀
▀████▄▄▄▄███▄▄▄▄████▀
▀██████████████████▀
▀███████████████▀
▀▀███████▀▀
.
 MΞTAWIN  THE FIRST WEB3 CASINO   
.
.. PLAY NOW ..
LoyceV
Legendary
*
Offline Offline

Activity: 3486
Merit: 17642


Thick-Skinned Gang Leader and Golden Feather 2021


View Profile WWW
August 26, 2022, 08:47:50 PM
Merited by FatFork (1)
 #6

I would like to make a copy of the entire website with a handy little software called HTTrack which allows you to get pretty much a 1:1 mirror of a site. And you could keep updating the offline copy from time to time, kind of like keeping up with the blockchain as you sync.

My question is: Is it ok to do this? Im assuming theymos wouldn't care.
You're allowed one page load per second on average. That means it takes months to download everything, and it would take a long time to find edited posts to update them.

Quote
I just wouldn't like to wake up some day and see the site is gone for some reason.
If shit hits the fan, there's some funding that can be used to setup a new forum.

Based on LoyceV's information from two years ago, the required storage space for the entire bitcointalk forum is probably somewhere around 60GB, at the moment.
My archive directory is currently 146 GB. That includes some duplicate indices because I don't use a database.

▄▄███████████████████▄▄
▄█████████▀█████████████▄
███████████▄▐▀▄██████████
███████▀▀███████▀▀███████
██████▀███▄▄████████████
█████████▐█████████▐█████
█████████▐█████████▐█████
██████████▀███▀███▄██████
████████████████▄▄███████
███████████▄▄▄███████████
█████████████████████████
▀█████▄▄████████████████▀
▀▀███████████████████▀▀
Peach
BTC bitcoin
Buy and Sell
Bitcoin P2P
.
.
▄▄███████▄▄
▄████████
██████▄
▄██
█████████████████▄
▄███████
██████████████▄
███████████████████████
█████████████████████████
████████████████████████
█████████████████████████
▀███████████████████████▀
▀█████████████████████▀
▀██████████████████▀
▀███████████████▀
▀▀███████▀▀

▀▀▀▀███▀▀▀▀
EUROPE | AFRICA
LATIN AMERICA
▄▀▀▀











▀▄▄▄


███████▄█
███████▀
██▄▄▄▄▄░▄▄▄▄▄
████████████▀
▐███████████▌
▐███████████▌
████████████▄
██████████████
███▀███▀▀███▀
.
Download on the
App Store
▀▀▀▄











▄▄▄▀
▄▀▀▀











▀▄▄▄


▄██▄
██████▄
█████████▄
████████████▄
███████████████
████████████▀
█████████▀
██████▀
▀██▀
.
GET IT ON
Google Play
▀▀▀▄











▄▄▄▀
TryNinja
Legendary
*
Offline Offline

Activity: 3010
Merit: 7435


Top Crypto Casino


View Profile WWW
August 26, 2022, 11:21:01 PM
 #7

Based on LoyceV's information from two years ago, the required storage space for the entire bitcointalk forum is probably somewhere around 60GB, at the moment.
My archive directory is currently 146 GB. That includes some duplicate indices because I don't use a database.
Mine weights around 58 GB (Postgres database with parsed content/data, counting only the "posts" table), but I probably have some posts missing (not too many, I hope?).

███████████████████████
████▐██▄█████████████████
████▐██████▄▄▄███████████
████▐████▄█████▄▄████████
████▐█████▀▀▀▀▀███▄██████
████▐███▀████████████████
████▐█████████▄█████▌████
████▐██▌█████▀██████▌████
████▐██████████▀████▌████
█████▀███▄█████▄███▀█████
███████▀█████████▀███████
██████████▀███▀██████████

███████████████████████
.
BC.GAME
▄▄▀▀▀▀▀▀▀▄▄
▄▀▀░▄██▀░▀██▄░▀▀▄
▄▀░▐▀▄░▀░░▀░░▀░▄▀▌░▀▄
▄▀▄█▐░▀▄▀▀▀▀▀▄▀░▌█▄▀▄
▄▀░▀░░█░▄███████▄░█░░▀░▀▄
█░█░▀░█████████████░▀░█░█
█░██░▀█▀▀█▄▄█▀▀█▀░██░█
█░█▀██░█▀▀██▀▀█░██▀█░█
▀▄▀██░░░▀▀▄▌▐▄▀▀░░░██▀▄▀
▀▄▀██░░▄░▀▄█▄▀░▄░░██▀▄▀
▀▄░▀█░▄▄▄░▀░▄▄▄░█▀░▄▀
▀▄▄▀▀███▄███▀▀▄▄▀
██████▄▄▄▄▄▄▄██████
.
..CASINO....SPORTS....RACING..


▄▄████▄▄
▄███▀▀███▄
██████████
▀███▄░▄██▀
▄▄████▄▄░▀█▀▄██▀▄▄████▄▄
▄███▀▀▀████▄▄██▀▄███▀▀███▄
███████▄▄▀▀████▄▄▀▀███████
▀███▄▄███▀░░░▀▀████▄▄▄███▀
▀▀████▀▀████████▀▀████▀▀
Pages: [1]
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!