Bitcoin Forum
May 11, 2024, 07:22:52 AM *
News: Latest Bitcoin Core release: 27.0 [Torrent]
 
   Home   Help Search Login Register More  
Pages: [1]
  Print  
Author Topic: Automate downloading a thread  (Read 481 times)
LoyceV (OP)
Legendary
*
Online Online

Activity: 3304
Merit: 16634


Thick-Skinned Gang Leader and Golden Feather 2021


View Profile WWW
September 02, 2016, 02:19:25 PM
 #1

I've been running a giveaway in Games and Rounds for a few months, and I would like to build a script to download all entries once a day. I know some threads are processed that way, so it must be possible.

I can download the source per page, but it has a lot of unnecessary data in there. Is there a way to view topics without all extra data?
What I need is the user's name, level ("Member") and the post.

The source looks like this:
Code:
							<b><a href="https://bitcointalk.org/index.php?action=profile;u=849998" title="View the profile of salsa321">salsa321</a></b>
<div class="smalltext">
Member<br />
<img src="https://bitcointalk.org/Themes/custom1/images/star.gif" alt="*" border="0" /><img src="https://bitcointalk.org/Themes/custom1/images/star.gif" alt="*" border="0" /><br />
<a href="https://bitcointalk.org/index.php?action=pm;sa=send;u=849998" title="Personal Message (Online)"><img src="https://bitcointalk.org/Themes/custom1/images/useron.gif" alt="Online" border="0" style="margin-top: 2px;" /></a><span class="smalltext"> Online</span><br /><br />
Activity: 98<br /><br />
Design and dice<br />
<br />



<br />
<a href="https://bitcointalk.org/index.php?action=profile;u=849998"><img src="https://bitcointalk.org/Themes/custom1/images/icons/profile_sm.gif" alt="View Profile" title="View Profile" border="0" /></a>
<a href="https://bitcointalk.org/index.php?action=pm;sa=send;u=849998" title="Personal Message (Online)"><img src="https://bitcointalk.org/Themes/custom1/images/im_on.gif" alt="Personal Message (Online)" border="0" /></a><br /><a href="https://bitcointalk.org/index.php?action=trust;u=849998">Trust:</a> <span class="trustscore" style="color:black"><b>0</b>: -0 / +0</span>
  <br /><a href=& [b]DO NOT POST SESC LINKS[/b] ">Ignore</a>
</div>
</td>
<td valign="top" width="85%" height="100%" style="padding: 2px;" class="td_headerandpost">
<table width="100%" border="0"><tr>
<td valign="middle"><a href="https://bitcointalk.org/index.php?topic=1497343.msg16122945#msg16122945"><img src="https://bitcointalk.org/Themes/custom1/images/post/xx.gif" alt="" border="0" /></a></td>
<td valign="middle">
<div style="font-weight: bold;" class="subject" id="subject_16122945">
<a href="https://bitcointalk.org/index.php?topic=1497343.msg16122945#msg16122945">Re: ¦¯¦¦¯¦ Rollin.io Free Lottery Tickets ¦¯¦¦¯¦ Daily Giveaway ¦¯¦¦¯¦ 526 mBTC won</a>
</div>
<div class="smalltext"><b>Today</b> at 03:08:12 PM</div></td>
  <td align="right" valign="middle" height="20" style="font-size: smaller; padding-top: 4px;" class="td_buttons" ><div id="ignmsgbttns3145" style="visibility: visible;">
<a href=& [b]DO NOT POST SESC LINKS[/b] "><img src="https://bitcointalk.org/Themes/custom1/images/frostee/frostee_quote.png" alt="Reply with quote" class="reply_button" style="vertical-align: middle;" /></a>  <a class="message_number" style="vertical-align: middle;" href="https://bitcointalk.org/index.php?topic=1497343.msg16122945#msg16122945">#3145</a>
</div>
</td>
</tr></table>
<hr width="100%" size="1" class="hrcolor"  style="margin-top: 4px;" />
<div class="post">username : maleficent</div>
And this was just one post! I can probably dissect this, but it would be a lot easier if it's not necessary.
Any suggestions where to start searching would be much appreciated.

Advertised sites are not endorsed by the Bitcoin Forum. They may be unsafe, untrustworthy, or illegal in your jurisdiction.
1715412172
Hero Member
*
Offline Offline

Posts: 1715412172

View Profile Personal Message (Offline)

Ignore
1715412172
Reply with quote  #2

1715412172
Report to moderator
achow101
Staff
Legendary
*
Offline Offline

Activity: 3388
Merit: 6635


Just writing some code


View Profile WWW
September 02, 2016, 03:26:28 PM
 #2

For some help processing threads, take a look at https://github.com/achow101/BitcointalkForum/blob/master/app/src/main/java/com/achow101/bitcointalkforum/fragments/TopicFragment.java#L368 and https://github.com/achow101/BitcointalkAccountPricer/blob/master/src/com/achow101/bctalkaccountpricer/server/AccountPricer.java#L195 where I have written code (and comments) on parsing posts.

I can also write a program for you, for a fee, that can get all of the data and parse it for whatever you need.

LoyceV (OP)
Legendary
*
Online Online

Activity: 3304
Merit: 16634


Thick-Skinned Gang Leader and Golden Feather 2021


View Profile WWW
September 04, 2016, 06:40:25 PM
 #3

Thanks for the links knightdk, but it's above my understanding of languages. So I've just started from scratch with what I know.
I was hoping there may be a more machine-readable version of the forum, but it's not that hard to grep the right parts out of the HTML after all. So I'll manage. I start by getting the userID, and from there each user's post.
When I'm done, I might post the script here if there is any interest.

achow101
Staff
Legendary
*
Offline Offline

Activity: 3388
Merit: 6635


Just writing some code


View Profile WWW
September 04, 2016, 07:07:03 PM
 #4

Thanks for the links knightdk, but it's above my understanding of languages. So I've just started from scratch with what I know.
I was hoping there may be a more machine-readable version of the forum, but it's not that hard to grep the right parts out of the HTML after all. So I'll manage. I start by getting the userID, and from there each user's post.
When I'm done, I might post the script here if there is any interest.
Both of those use css search paths, which will make this a lot easier to do since the forum uses the same templates for each post. One thing you do have to keep in mind though is that each time you load a thread, the class name for the posts is random and changes every time.

Pages: [1]
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!