Title: Automate downloading a thread Post by: LoyceV on September 02, 2016, 02:19:25 PM I've been running a giveaway in Games and Rounds for a few months, and I would like to build a script to download all entries once a day. I know some threads are processed that way, so it must be possible.
I can download the source per page, but it has a lot of unnecessary data in there. Is there a way to view topics without all extra data? What I need is the user's name, level ("Member") and the post. The source looks like this: Code: <b><a href="https://bitcointalk.org/index.php?action=profile;u=849998" title="View the profile of salsa321">salsa321</a></b> Any suggestions where to start searching would be much appreciated. Title: Re: Automate downloading a thread Post by: achow101 on September 02, 2016, 03:26:28 PM For some help processing threads, take a look at https://github.com/achow101/BitcointalkForum/blob/master/app/src/main/java/com/achow101/bitcointalkforum/fragments/TopicFragment.java#L368 and https://github.com/achow101/BitcointalkAccountPricer/blob/master/src/com/achow101/bctalkaccountpricer/server/AccountPricer.java#L195 where I have written code (and comments) on parsing posts.
I can also write a program for you, for a fee, that can get all of the data and parse it for whatever you need. Title: Re: Automate downloading a thread Post by: LoyceV on September 04, 2016, 06:40:25 PM Thanks for the links knightdk, but it's above my understanding of languages. So I've just started from scratch with what I know.
I was hoping there may be a more machine-readable version of the forum, but it's not that hard to grep the right parts out of the HTML after all. So I'll manage. I start by getting the userID, and from there each user's post. When I'm done, I might post the script here if there is any interest. Title: Re: Automate downloading a thread Post by: achow101 on September 04, 2016, 07:07:03 PM Thanks for the links knightdk, but it's above my understanding of languages. So I've just started from scratch with what I know. Both of those use css search paths, which will make this a lot easier to do since the forum uses the same templates for each post. One thing you do have to keep in mind though is that each time you load a thread, the class name for the posts is random and changes every time.I was hoping there may be a more machine-readable version of the forum, but it's not that hard to grep the right parts out of the HTML after all. So I'll manage. I start by getting the userID, and from there each user's post. When I'm done, I might post the script here if there is any interest. |