Bitcoin Forum
May 06, 2024, 09:49:38 AM *
News: Latest Bitcoin Core release: 27.0 [Torrent]
 
   Home   Help Search Login Register More  
Pages: [1]
  Print  
Author Topic: Automate downloading a thread  (Read 481 times)
LoyceV (OP)
Legendary
*
Offline Offline

Activity: 3304
Merit: 16604


Thick-Skinned Gang Leader and Golden Feather 2021


View Profile WWW
September 02, 2016, 02:19:25 PM
 #1

I've been running a giveaway in Games and Rounds for a few months, and I would like to build a script to download all entries once a day. I know some threads are processed that way, so it must be possible.

I can download the source per page, but it has a lot of unnecessary data in there. Is there a way to view topics without all extra data?
What I need is the user's name, level ("Member") and the post.

The source looks like this:
Code:
							<b><a href="https://bitcointalk.org/index.php?action=profile;u=849998" title="View the profile of salsa321">salsa321</a></b>
<div class="smalltext">
Member<br />
<img src="https://bitcointalk.org/Themes/custom1/images/star.gif" alt="*" border="0" /><img src="https://bitcointalk.org/Themes/custom1/images/star.gif" alt="*" border="0" /><br />
<a href="https://bitcointalk.org/index.php?action=pm;sa=send;u=849998" title="Personal Message (Online)"><img src="https://bitcointalk.org/Themes/custom1/images/useron.gif" alt="Online" border="0" style="margin-top: 2px;" /></a><span class="smalltext"> Online</span><br /><br />
Activity: 98<br /><br />
Design and dice<br />
<br />



<br />
<a href="https://bitcointalk.org/index.php?action=profile;u=849998"><img src="https://bitcointalk.org/Themes/custom1/images/icons/profile_sm.gif" alt="View Profile" title="View Profile" border="0" /></a>
<a href="https://bitcointalk.org/index.php?action=pm;sa=send;u=849998" title="Personal Message (Online)"><img src="https://bitcointalk.org/Themes/custom1/images/im_on.gif" alt="Personal Message (Online)" border="0" /></a><br /><a href="https://bitcointalk.org/index.php?action=trust;u=849998">Trust:</a> <span class="trustscore" style="color:black"><b>0</b>: -0 / +0</span>
  <br /><a href=& [b]DO NOT POST SESC LINKS[/b] ">Ignore</a>
</div>
</td>
<td valign="top" width="85%" height="100%" style="padding: 2px;" class="td_headerandpost">
<table width="100%" border="0"><tr>
<td valign="middle"><a href="https://bitcointalk.org/index.php?topic=1497343.msg16122945#msg16122945"><img src="https://bitcointalk.org/Themes/custom1/images/post/xx.gif" alt="" border="0" /></a></td>
<td valign="middle">
<div style="font-weight: bold;" class="subject" id="subject_16122945">
<a href="https://bitcointalk.org/index.php?topic=1497343.msg16122945#msg16122945">Re: ¦¯¦¦¯¦ Rollin.io Free Lottery Tickets ¦¯¦¦¯¦ Daily Giveaway ¦¯¦¦¯¦ 526 mBTC won</a>
</div>
<div class="smalltext"><b>Today</b> at 03:08:12 PM</div></td>
  <td align="right" valign="middle" height="20" style="font-size: smaller; padding-top: 4px;" class="td_buttons" ><div id="ignmsgbttns3145" style="visibility: visible;">
<a href=& [b]DO NOT POST SESC LINKS[/b] "><img src="https://bitcointalk.org/Themes/custom1/images/frostee/frostee_quote.png" alt="Reply with quote" class="reply_button" style="vertical-align: middle;" /></a>  <a class="message_number" style="vertical-align: middle;" href="https://bitcointalk.org/index.php?topic=1497343.msg16122945#msg16122945">#3145</a>
</div>
</td>
</tr></table>
<hr width="100%" size="1" class="hrcolor"  style="margin-top: 4px;" />
<div class="post">username : maleficent</div>
And this was just one post! I can probably dissect this, but it would be a lot easier if it's not necessary.
Any suggestions where to start searching would be much appreciated.

1714988978
Hero Member
*
Offline Offline

Posts: 1714988978

View Profile Personal Message (Offline)

Ignore
1714988978
Reply with quote  #2

1714988978
Report to moderator
1714988978
Hero Member
*
Offline Offline

Posts: 1714988978

View Profile Personal Message (Offline)

Ignore
1714988978
Reply with quote  #2

1714988978
Report to moderator
1714988978
Hero Member
*
Offline Offline

Posts: 1714988978

View Profile Personal Message (Offline)

Ignore
1714988978
Reply with quote  #2

1714988978
Report to moderator
"There should not be any signed int. If you've found a signed int somewhere, please tell me (within the next 25 years please) and I'll change it to unsigned int." -- Satoshi
Advertised sites are not endorsed by the Bitcoin Forum. They may be unsafe, untrustworthy, or illegal in your jurisdiction.
1714988978
Hero Member
*
Offline Offline

Posts: 1714988978

View Profile Personal Message (Offline)

Ignore
1714988978
Reply with quote  #2

1714988978
Report to moderator
achow101
Staff
Legendary
*
Offline Offline

Activity: 3388
Merit: 6581


Just writing some code


View Profile WWW
September 02, 2016, 03:26:28 PM
 #2

For some help processing threads, take a look at https://github.com/achow101/BitcointalkForum/blob/master/app/src/main/java/com/achow101/bitcointalkforum/fragments/TopicFragment.java#L368 and https://github.com/achow101/BitcointalkAccountPricer/blob/master/src/com/achow101/bctalkaccountpricer/server/AccountPricer.java#L195 where I have written code (and comments) on parsing posts.

I can also write a program for you, for a fee, that can get all of the data and parse it for whatever you need.

LoyceV (OP)
Legendary
*
Offline Offline

Activity: 3304
Merit: 16604


Thick-Skinned Gang Leader and Golden Feather 2021


View Profile WWW
September 04, 2016, 06:40:25 PM
 #3

Thanks for the links knightdk, but it's above my understanding of languages. So I've just started from scratch with what I know.
I was hoping there may be a more machine-readable version of the forum, but it's not that hard to grep the right parts out of the HTML after all. So I'll manage. I start by getting the userID, and from there each user's post.
When I'm done, I might post the script here if there is any interest.

achow101
Staff
Legendary
*
Offline Offline

Activity: 3388
Merit: 6581


Just writing some code


View Profile WWW
September 04, 2016, 07:07:03 PM
 #4

Thanks for the links knightdk, but it's above my understanding of languages. So I've just started from scratch with what I know.
I was hoping there may be a more machine-readable version of the forum, but it's not that hard to grep the right parts out of the HTML after all. So I'll manage. I start by getting the userID, and from there each user's post.
When I'm done, I might post the script here if there is any interest.
Both of those use css search paths, which will make this a lot easier to do since the forum uses the same templates for each post. One thing you do have to keep in mind though is that each time you load a thread, the class name for the posts is random and changes every time.

Pages: [1]
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!