Bitcoin Forum
April 24, 2024, 10:02:33 AM *
News: Latest Bitcoin Core release: 27.0 [Torrent]
 
   Home   Help Search Login Register More  
Pages: [1]
  Print  
Author Topic: Making a backup of this website with HTTrack  (Read 87 times)
takuma sato (OP)
Sr. Member
****
Offline Offline

Activity: 289
Merit: 409


View Profile
August 26, 2022, 03:39:00 PM
 #1

I had a dream the other day where I wasn't able to log in in there. The site was basically 404'ing as I refreshed. I went into Google and looked up news. "Bitcointalk.org website has been seized by some supranational authority". Basically some agency with a fancy acronym coerced theymos into giving away the database for tax purposes. I eventually woke up.

This has made me think what would happen if all the information here was lost, and thus I would like to make a copy of the entire website with a handy little software called HTTrack which allows you to get pretty much a 1:1 mirror of a site. And you could keep updating the offline copy from time to time, kind of like keeping up with the blockchain as you sync.

My question is: Is it ok to do this? Im assuming theymos wouldn't care.

I would like to ask, how big is this website? just to see how long it would take to have an offline database. This site is mostly text, so it shouldn't be a lot, however, there's lots of text, and all images from external url's would be saved too.

I just wouldn't like to wake up some day and see the site is gone for some reason. There's a lot of stuff to read in here that has been written for 10 year already, so it would be difficult to catch up. Just reading all of satoshi's posts would take a while. And I like to read things from source, so I guarantee nothing was modified. I've seen satoshi misquoted or even modified posts with inspect element.
1713952953
Hero Member
*
Offline Offline

Posts: 1713952953

View Profile Personal Message (Offline)

Ignore
1713952953
Reply with quote  #2

1713952953
Report to moderator
1713952953
Hero Member
*
Offline Offline

Posts: 1713952953

View Profile Personal Message (Offline)

Ignore
1713952953
Reply with quote  #2

1713952953
Report to moderator
"Bitcoin: the cutting edge of begging technology." -- Giraffe.BTC
Advertised sites are not endorsed by the Bitcoin Forum. They may be unsafe, untrustworthy, or illegal in your jurisdiction.
jackg
Copper Member
Legendary
*
Offline Offline

Activity: 2856
Merit: 3071


https://bit.ly/387FXHi lightning theory


View Profile
August 26, 2022, 03:45:11 PM
 #2

The forum is big for an archive. There are archive sites that scour this forum and archive a lot of details.

There are 5411334 topics (according to when this one was created). There are a lot of user profile pages if you want to archive those and many pages to a lot of threads.

There are projects to archive this forum though too so you'd just be adding to that. You could also try doing a day of scraping the forum for number of users online so you aren't hammering the servers while they're already being hammered.

The forum bans multiple requests that are sent within a 1 second period - from the same host - so you'd be best off only making requests less than that.
DVlog
Full Member
***
Offline Offline

Activity: 476
Merit: 212


Tontogether | Save Smart & Win Big


View Profile
August 26, 2022, 05:41:22 PM
 #3

I do not think we need every thread of this forum to archive. This forum is too big to do that anyway. Also, there are many threads that are full of spam and useless or not necessary for the future (bounty thread, altcoin section, gambling, speculation, etc).

You can archive sections like bitcoin, project development, and economics if you really think those will be useful in the future.

HTTrak is a great tool by the way. I have used it several times and works fine if you know how to do it.

|     T o n T o g e t h e r     |     Saving Empowers Winning     |
Join Launchpool  >  Jan 10th - Feb 10th
●      T W I T T E R      ●      T E L E G R A M      ●      M E D I U M      ●
FatFork
Legendary
*
Offline Offline

Activity: 1582
Merit: 2582


Top Crypto Casino


View Profile WWW
August 26, 2022, 08:08:45 PM
Merited by philipma1957 (2)
 #4

My question is: Is it ok to do this? Im assuming theymos wouldn't care.

In the event that this affects the site's overall performance, theymos will most likely care.

I would like to ask, how big is this website? just to see how long it would take to have an offline database. This site is mostly text, so it shouldn't be a lot, however, there's lots of text, and all images from external url's would be saved too.

Based on LoyceV's information from two years ago, the required storage space for the entire bitcointalk forum is probably somewhere around 60GB, at the moment.

I just wouldn't like to wake up some day and see the site is gone for some reason. There's a lot of stuff to read in here that has been written for 10 year already, so it would be difficult to catch up. Just reading all of satoshi's posts would take a while. And I like to read things from source, so I guarantee nothing was modified. I've seen satoshi misquoted or even modified posts with inspect element.

Two very good archives of bitcointalk posts already exist: LoycV's Loyce.Club and TryNinja's Ninjastic.space. I think you can trust their authenticity.

█████████████████████████
████▐██▄█████████████████
████▐██████▄▄▄███████████
████▐████▄█████▄▄████████
████▐█████▀▀▀▀▀███▄██████
████▐███▀████████████████
████▐█████████▄█████▌████
████▐██▌█████▀██████▌████
████▐██████████▀████▌████
█████▀███▄█████▄███▀█████
███████▀█████████▀███████
██████████▀███▀██████████
█████████████████████████
.
BC.GAME
▄▄░░░▄▀▀▄████████
▄▄▄
██████████████
█████░░▄▄▄▄████████
▄▄▄▄▄▄▄▄▄██▄██████▄▄▄▄████
▄███▄█▄▄██████████▄████▄████
███████████████████████████▀███
▀████▄██▄██▄░░░░▄████████████
▀▀▀█████▄▄▄███████████▀██
███████████████████▀██
███████████████████▄██
▄███████████████████▄██
█████████████████████▀██
██████████████████████▄
.
..CASINO....SPORTS....RACING..
█░░░░░░█░░░░░░█
▀███▀░░▀███▀░░▀███▀
▀░▀░░░░▀░▀░░░░▀░▀
░░░░░░░░░░░░
▀██████████
░░░░░███░░░░
░░█░░░███▄█░░░
░░██▌░░███░▀░░██▌
░█░██░░███░░░█░██
░█▀▀▀█▌░███░░█▀▀▀█▌
▄█▄░░░██▄███▄█▄░░▄██▄
▄███▄
░░░░▀██▄▀


▄▄████▄▄
▄███▀▀███▄
██████████
▀███▄░▄██▀
▄▄████▄▄░▀█▀▄██▀▄▄████▄▄
▄███▀▀▀████▄▄██▀▄███▀▀███▄
███████▄▄▀▀████▄▄▀▀███████
▀███▄▄███▀░░░▀▀████▄▄▄███▀
▀▀████▀▀████████▀▀████▀▀
philipma1957
Legendary
*
Offline Offline

Activity: 4102
Merit: 7763


'The right to privacy matters'


View Profile WWW
August 26, 2022, 08:39:10 PM
 #5

I would think he could ask to back them up as an extra secure method.

In fact I have a lot of pc's and would be willing to back up either site if Joyce or tryninja need some one to hold a few hdds of info.

▄▄███████▄▄
▄██████████████▄
▄██████████████████▄
▄████▀▀▀▀███▀▀▀▀█████▄
▄█████████████▄█▀████▄
███████████▄███████████
██████████▄█▀███████████
██████████▀████████████
▀█████▄█▀█████████████▀
▀████▄▄▄▄███▄▄▄▄████▀
▀██████████████████▀
▀███████████████▀
▀▀███████▀▀
.
 MΞTAWIN  THE FIRST WEB3 CASINO   
.
.. PLAY NOW ..
LoyceV
Legendary
*
Online Online

Activity: 3290
Merit: 16541


Thick-Skinned Gang Leader and Golden Feather 2021


View Profile WWW
August 26, 2022, 08:47:50 PM
Merited by FatFork (1)
 #6

I would like to make a copy of the entire website with a handy little software called HTTrack which allows you to get pretty much a 1:1 mirror of a site. And you could keep updating the offline copy from time to time, kind of like keeping up with the blockchain as you sync.

My question is: Is it ok to do this? Im assuming theymos wouldn't care.
You're allowed one page load per second on average. That means it takes months to download everything, and it would take a long time to find edited posts to update them.

Quote
I just wouldn't like to wake up some day and see the site is gone for some reason.
If shit hits the fan, there's some funding that can be used to setup a new forum.

Based on LoyceV's information from two years ago, the required storage space for the entire bitcointalk forum is probably somewhere around 60GB, at the moment.
My archive directory is currently 146 GB. That includes some duplicate indices because I don't use a database.

█▀▀▀











█▄▄▄
▀▀▀▀▀▀▀▀▀▀▀
e
▄▄▄▄▄▄▄▄▄▄▄
█████████████
████████████▄███
██▐███████▄█████▀
█████████▄████▀
███▐████▄███▀
████▐██████▀
█████▀█████
███████████▄
████████████▄
██▄█████▀█████▄
▄█████████▀█████▀
███████████▀██▀
████▀█████████
▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀
c.h.
▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄
▀▀▀█











▄▄▄█
▄██████▄▄▄
█████████████▄▄
███████████████
███████████████
███████████████
███████████████
███░░█████████
███▌▐█████████
█████████████
███████████▀
██████████▀
████████▀
▀██▀▀
TryNinja
Legendary
*
Offline Offline

Activity: 2814
Merit: 6970



View Profile WWW
August 26, 2022, 11:21:01 PM
 #7

Based on LoyceV's information from two years ago, the required storage space for the entire bitcointalk forum is probably somewhere around 60GB, at the moment.
My archive directory is currently 146 GB. That includes some duplicate indices because I don't use a database.
Mine weights around 58 GB (Postgres database with parsed content/data, counting only the "posts" table), but I probably have some posts missing (not too many, I hope?).

.
.HUGE.
▄██████████▄▄
▄█████████████████▄
▄█████████████████████▄
▄███████████████████████▄
▄█████████████████████████▄
███████▌██▌▐██▐██▐████▄███
████▐██▐████▌██▌██▌██▌██
█████▀███▀███▀▐██▐██▐█████

▀█████████████████████████▀

▀███████████████████████▀

▀█████████████████████▀

▀█████████████████▀

▀██████████▀▀
█▀▀▀▀











█▄▄▄▄
▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀
.
CASINSPORTSBOOK
▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄
▀▀▀▀█











▄▄▄▄█
Pages: [1]
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!