Bitcoin Forum
July 05, 2024, 08:10:21 AM *
News: Latest Bitcoin Core release: 27.0 [Torrent]
 
   Home   Help Search Login Register More  
Pages: [1]
  Print  
Author Topic: Python wait before returning contents of web-page  (Read 61 times)
elenakretova (OP)
Newbie
*
Offline Offline

Activity: 4
Merit: 0


View Profile
March 30, 2018, 11:56:04 AM
 #1

I'm trying to scrape this website: http://www.fivb.org/EN/BeachVolleyball/PlayersRanking_W.asp , but this page loads the contents of the tabel (probably through ajax), after the page has been loaded.

My attempt:

import requests
from bs4 import BeautifulSoup, Comment
uri = 'http://www.fivb.org/EN/BeachVolleyball/PlayersRanking_W.asp'

r = requests.get(uri)
soup = BeautifulSoup(r.content)
print(soup)
But the div with the id='BTechPlayM' remains empty, regardless of what I do. I've tried:

Setting a timeout on the request: Python requests.get(uri, timeout=10)
Passing headers
Using eventlet, to set a delay
And the latest thing was to try and use the selenium-library, to use PhantomJS (installed from NPM), but this rabbit-whole just kept going deeper and deeper.
Are there a way to send a request to a URI, wait X seconds, and return the contents then?

... Or to send a request to a URI, keep checking if a div contains an element; and only return the contents, whenever it does?
shaw1
Sr. Member
****
Offline Offline

Activity: 504
Merit: 297

CryptoTalk.Org - Get Paid for every Post!


View Profile
March 30, 2018, 02:23:43 PM
 #2

Hey bud.

This is in the wrong section. But I'll answer anyway.

You are making this waaaaaayy too hard.

Call this URL: http://www.fivb.org/Vis/Public/JS/Beach/TechPlayRank.aspx?Gender=1&id=BTechPlayW&Date=20180326
The "Date" field here, is 4 digits year + 2 digits month + 2 digits day.
Valid Gender is 0, or 1. The page you linked to was using 1, as in the link.
The "id" field changes the source code of the JS file returned, to specify which element the data should be loaded into. (I believe).

That is the data that the page is loading (as you correctly guessed) via AJAX.
Now, the data that you are getting pulled in is actually a JS file. So, have fun parsing that.

But all of the data is there, and you have a more efficient way of scraping the data.
(I'm assuming that you DO have permission to be scraping this data, and if not, you should probably make sure you do.)

99.9% of the time, when you face a difficult task, there's an easier way to be doing it. Wink
Let me know what you think of the solution.

 
                                . ██████████.
                              .████████████████.
                           .██████████████████████.
                        -█████████████████████████████
                     .██████████████████████████████████.
                  -█████████████████████████████████████████
               -███████████████████████████████████████████████
           .-█████████████████████████████████████████████████████.
        .████████████████████████████████████████████████████████████
       .██████████████████████████████████████████████████████████████.
       .██████████████████████████████████████████████████████████████.
       ..████████████████████████████████████████████████████████████..
       .   .██████████████████████████████████████████████████████.
       .      .████████████████████████████████████████████████.

       .       .██████████████████████████████████████████████
       .    ██████████████████████████████████████████████████████
       .█████████████████████████████████████████████████████████████.
        .███████████████████████████████████████████████████████████
           .█████████████████████████████████████████████████████
              .████████████████████████████████████████████████
                   ████████████████████████████████████████
                      ██████████████████████████████████
                          ██████████████████████████
                             ████████████████████
                               ████████████████
                                   █████████
CryptoTalk.org| 
MAKE POSTS AND EARN BTC!
🏆
Pages: [1]
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!