Bitcoin Forum
November 15, 2024, 11:59:03 AM *
News: Latest Bitcoin Core release: 28.0 [Torrent]
 
   Home   Help Search Login Register More  
Pages: [1]
  Print  
Author Topic: [CHART] Press Section Statistics  (Read 4187 times)
kiko (OP)
Sr. Member
****
Offline Offline

Activity: 453
Merit: 250


View Profile
April 28, 2013, 09:28:12 AM
 #1

Getting to grips with gnuplot has been on my ttd list for a while. So I pulled down some data from this right here section.

Here are the results.


You can see 4 clear media cycles in the last year. My only question is how big is the next one going to be?  Shocked

for any linux geeks who want to play:

press_scaper.sh
Code:
#!/bin/bash

#  press_scraper.sh - scrape and collate bitcoin press articles, output csv.
#  usage            - ./press_scraper.sh

# This program is free software. It comes without any warranty, to
# the extent permitted by applicable law. You can redistribute it
# and/or modify it under the terms of the Do What The Fuck You Want
# To Public License, Version 2, as published by Sam Hocevar. See
# http://sam.zoy.org/wtfpl/COPYING for more details.

total_articles=1760
decrement=40
tempfile=$(mktemp)
outfile=press_articles.csv

[ -f $tempfile ] || { echo "Error: Could not make temporary file. Exiting..."; \
  exit 1 ; }

function scrape {
  curl "$1" | sed -rn 's#.*<span id="msg_[0-9]+"><a href="https://bitcointalk\.org/index\.php\?topic=[0-9]+(\.0)?">([0-9]{4}-[0-1][0-9]-[0-3][0-9]).*</a></span>#\2#p' ;
}

for ((x=total_articles; x>40; x-=decrement))
do
  scrape "https://bitcointalk.org/index.php?board=77.$x" >> $tempfile
  sleep 5 # This is here just to be kind to the server, remove for speedup.
done

scrape "https://bitcointalk.org/index.php?board=77" >> $tempfile

sort $tempfile | uniq -c | sed -r 's/^ *([0-9]+) (.*)/\1,\2/' >$outfile

gnuplot_commands
Code:
reset
clear
set xdata time
set format x "%Y-%m-%d"
set timefmt "%Y-%m-%d"
set datafile separator ","
set style fill solid noborder
set xtics rotate by -90 out nomirror 604800
set ytics out nomirror
set grid ytics
set ylabel "Press hits/day"
set xrange ["2012-04-07":"2013-04-26"]
set yrange [0:*]
set boxwidth 43200 absolute
set datafile separator ","
set term pngcairo truecolor font "Arial,11" size 1200,1200
set output "press_hits.png"
plot "press_articles.csv" using 2:1 with boxes ti "Press Article Frequency" lt 1 linecolor rgb "#FF0000"
grondilu
Legendary
*
Offline Offline

Activity: 1288
Merit: 1080


View Profile
April 28, 2013, 11:09:03 AM
 #2

Nice script.

My first advice: don't use tempfiles.  They always mess up your directory as you always forget to remove them.

Just make proper unix pipes:  reading stdin, output to stdout.

Code:
#!/bin/bash

total_articles=1760
decrement=40

function scrape {
  curl "$1" | sed -rn 's#.*<span id="msg_[0-9]+"><a href="https://bitcointalk\.org/index\.php\?topic=[0-9]+(\.0)?">([0-9]{4}-[0-1][0-9]-[0-3][0-9]).*</a></span>#\2#p' ;
}

{
    for ((x=total_articles; x>40; x-=decrement))
    do
        scrape "https://bitcointalk.org/index.php?board=77.$x"
        sleep 5 # This is here just to be kind to the server, remove for speedup.
    done

    scrape "https://bitcointalk.org/index.php?board=77"
} |
sort |
uniq -c |
sed -r 's/^ *([0-9]+) (.*)/\1,\2/'

Not tested yet but this should work as well as your initial code.

Update:  Second advice:  provide your parameters as arguments to your script, with default values

Code:
total_articles="${1:-1760}"
decrement="${2:-40}"

kiko (OP)
Sr. Member
****
Offline Offline

Activity: 453
Merit: 250


View Profile
April 28, 2013, 12:00:02 PM
 #3

Cool, thanks for the tips.
My /tmp is just a ramdisk so I tend to abuse it.
Grinder
Legendary
*
Offline Offline

Activity: 1284
Merit: 1001


View Profile
April 28, 2013, 03:20:20 PM
 #4

It's hard to get a good impression with so many long thin lines. I suggest plotting weekly sums or weekly moving average instead.
kiko (OP)
Sr. Member
****
Offline Offline

Activity: 453
Merit: 250


View Profile
April 28, 2013, 04:42:31 PM
 #5

It's hard to get a good impression with so many long thin lines. I suggest plotting weekly sums or weekly moving average instead.
Version with curve-fitted bezier trendline:
Pages: [1]
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!