Bitcoin Forum
August 19, 2019, 01:26:42 PM *
News: Latest Bitcoin Core release: 0.18.0 [Torrent] (New!)
 
   Home   Help Search Login Register More  
Pages: [1]
  Print  
Author Topic: Automate downloading a thread  (Read 389 times)
LoyceV
Legendary
*
Online Online

Activity: 1582
Merit: 4405


Self-made Legendary outside Meta!


View Profile WWW
September 02, 2016, 02:19:25 PM
 #1

I've been running a giveaway in Games and Rounds for a few months, and I would like to build a script to download all entries once a day. I know some threads are processed that way, so it must be possible.

I can download the source per page, but it has a lot of unnecessary data in there. Is there a way to view topics without all extra data?
What I need is the user's name, level ("Member") and the post.

The source looks like this:
Code:
<b><a href="https://bitcointalk.org/index.php?action=profile;u=849998" title="View the profile of salsa321">salsa321</a></b>
<div class="smalltext">
Member<br />
<img src="https://bitcointalk.org/Themes/custom1/images/star.gif" alt="*" border="0" /><img src="https://bitcointalk.org/Themes/custom1/images/star.gif" alt="*" border="0" /><br />
<a href="https://bitcointalk.org/index.php?action=pm;sa=send;u=849998" title="Personal Message (Online)"><img src="https://bitcointalk.org/Themes/custom1/images/useron.gif" alt="Online" border="0" style="margin-top: 2px;" /></a><span class="smalltext"> Online</span><br /><br />
Activity: 98<br /><br />
Design and dice<br />
<br />



<br />
<a href="https://bitcointalk.org/index.php?action=profile;u=849998"><img src="https://bitcointalk.org/Themes/custom1/images/icons/profile_sm.gif" alt="View Profile" title="View Profile" border="0" /></a>
<a href="https://bitcointalk.org/index.php?action=pm;sa=send;u=849998" title="Personal Message (Online)"><img src="https://bitcointalk.org/Themes/custom1/images/im_on.gif" alt="Personal Message (Online)" border="0" /></a><br /><a href="https://bitcointalk.org/index.php?action=trust;u=849998">Trust:</a> <span class="trustscore" style="color:black"><b>0</b>: -0 / +0</span>
  <br /><a href=&[b]DO NOT POST SESC LINKS[/b]">Ignore</a>
</div>
</td>
<td valign="top" width="85%" height="100%" style="padding: 2px;" class="td_headerandpost">
<table width="100%" border="0"><tr>
<td valign="middle"><a href="https://bitcointalk.org/index.php?topic=1497343.msg16122945#msg16122945"><img src="https://bitcointalk.org/Themes/custom1/images/post/xx.gif" alt="" border="0" /></a></td>
<td valign="middle">
<div style="font-weight: bold;" class="subject" id="subject_16122945">
<a href="https://bitcointalk.org/index.php?topic=1497343.msg16122945#msg16122945">Re: ¦¯¦¦¯¦ Rollin.io Free Lottery Tickets ¦¯¦¦¯¦ Daily Giveaway ¦¯¦¦¯¦ 526 mBTC won</a>
</div>
<div class="smalltext"><b>Today</b> at 03:08:12 PM</div></td>
  <td align="right" valign="middle" height="20" style="font-size: smaller; padding-top: 4px;" class="td_buttons" ><div id="ignmsgbttns3145" style="visibility: visible;">
<a href="https://bitcointalk.org/index.php?action=post;quote=16122945;topic=1497343.3140;num_[b]DO NOT POST SESC LINKS[/b]"><img src="https://bitcointalk.org/Themes/custom1/images/frostee/frostee_quote.png" alt="Reply with quote" class="reply_button" style="vertical-align: middle;" /></a>  <a class="message_number" style="vertical-align: middle;" href="https://bitcointalk.org/index.php?topic=1497343.msg16122945#msg16122945">#3145</a>
</div>
</td>
</tr></table>
<hr width="100%" size="1" class="hrcolor"  style="margin-top: 4px;" />
<div class="post">username : maleficent</div>
And this was just one post! I can probably dissect this, but it would be a lot easier if it's not necessary.
Any suggestions where to start searching would be much appreciated.

Advertised sites are not endorsed by the Bitcoin Forum. They may be unsafe, untrustworthy, or illegal in your jurisdiction. Advertise here.
1566221202
Hero Member
*
Offline Offline

Posts: 1566221202

View Profile Personal Message (Offline)

Ignore
1566221202
Reply with quote  #2

1566221202
Report to moderator
achow101
Staff
Legendary
*
Offline Offline

Activity: 1862
Merit: 2645


bc1qshxkrpe4arppq89fpzm6c0tpdvx5cfkve2c8kl


View Profile WWW
September 02, 2016, 03:26:28 PM
 #2

For some help processing threads, take a look at https://github.com/achow101/BitcointalkForum/blob/master/app/src/main/java/com/achow101/bitcointalkforum/fragments/TopicFragment.java#L368 and https://github.com/achow101/BitcointalkAccountPricer/blob/master/src/com/achow101/bctalkaccountpricer/server/AccountPricer.java#L195 where I have written code (and comments) on parsing posts.

I can also write a program for you, for a fee, that can get all of the data and parse it for whatever you need.

LoyceV
Legendary
*
Online Online

Activity: 1582
Merit: 4405


Self-made Legendary outside Meta!


View Profile WWW
September 04, 2016, 06:40:25 PM
 #3

Thanks for the links knightdk, but it's above my understanding of languages. So I've just started from scratch with what I know.
I was hoping there may be a more machine-readable version of the forum, but it's not that hard to grep the right parts out of the HTML after all. So I'll manage. I start by getting the userID, and from there each user's post.
When I'm done, I might post the script here if there is any interest.

achow101
Staff
Legendary
*
Offline Offline

Activity: 1862
Merit: 2645


bc1qshxkrpe4arppq89fpzm6c0tpdvx5cfkve2c8kl


View Profile WWW
September 04, 2016, 07:07:03 PM
 #4

Thanks for the links knightdk, but it's above my understanding of languages. So I've just started from scratch with what I know.
I was hoping there may be a more machine-readable version of the forum, but it's not that hard to grep the right parts out of the HTML after all. So I'll manage. I start by getting the userID, and from there each user's post.
When I'm done, I might post the script here if there is any interest.
Both of those use css search paths, which will make this a lot easier to do since the forum uses the same templates for each post. One thing you do have to keep in mind though is that each time you load a thread, the class name for the posts is random and changes every time.

Pages: [1]
  Print  
 
Jump to:  

Sponsored by , a Bitcoin-accepting VPN.
Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!