Bitcoin Forum
December 14, 2024, 01:44:37 AM *
News: Latest Bitcoin Core release: 28.0 [Torrent]
 
   Home   Help Search Login Register More  
Poll
Question: is there a need to pull historical MtGox trade data without bigquery (see: https://bitcointalk.org/index.php?topic=282154.0 )
Yes, need it desperately - 16 (88.9%)
Might be interested - 2 (11.1%)
NO - 0 (0%)
Total Voters: 18

Pages: [1]
  Print  
Author Topic: TradePickler - pull MtGox trade data without bigquery - please vote  (Read 3996 times)
bitranox (OP)
Newbie
*
Offline Offline

Activity: 13
Merit: 0



View Profile
August 27, 2013, 09:40:09 AM
Last edit: August 30, 2013, 01:36:50 PM by bitranox
 #1

Inspired by the amazing work of nitrous I will provide a tool (and the data) to download and update
the historical trade data of mtgox and other exchanges for a number of currencies like BTC/LTC and others in a standardized way.

The Toolbox Name is TradePickler

the toolbox will be composed of three parts :

- the data collector, what collects the data from variouse sources and uploads that data onto a number of hosts (google drive, skydrive, dropbox, cloudme,  others ...)
  You will not be in touch with that part, since that will be done on our servers.
- tradepickler-gobble : the downloader, it downloads junks of data to Your harddisk
  It will be also perfectly possible to write Your own downloadtool since the dataformat will be fully documented,
  but we dont recommend it - since the tool is written to perform the downloads keeping in mind very low API usage of the filehosters (not to get kicked out by them)
- tradepickler-transcribe : the export tool, that will export the downloaded data in a most flexible way to the users target dataformat or database.
  You are welcome to contribute modules to the transcriber or write Your own implementation.

here the reasons why :
- nitrous service require bigquery, what is slow and needs registration (a full download needs more then 24 hours on my state of the art machine on a 10Mbit connection)
- since the data is only growing there is no need to use bigdata
- I want to have a more robust, more distributed approach (i will upload the data to several filehosters)
- the bigquery database is not updated by mtgox until now, and no one will know if and when this is going to happen
- the bigquery database will just be updated my mtgox - but what about other exchanges like bitstamp ?
- providing a bullet proved and full automatic way to fetch and export all the tradedata You want, to any format or database You want
- TradePickler will not be limited only to BTC and MtGox.

what I love about nitrous downloadtool :
- good interface to adress non techies
- good packing for windows and mac users

what I dont love about nitrous downloadtool :
- no automatization for auto-update and so on ...
- no automatization for exports / updates to other dataformats. You can not save Your settings.
- rely on big query, rely on the updates from mtgox

here the concept :
- trade data will be provided on filehosters (google drive, skydrive, dropbox, cloudme, others ...)
- no registration, free service
- updates every day (cold data) / and every 10 minutes (hot data)
- small footprint, redundant data is stripped (the full tradedata for all currencies from mtgox consumes about only 150MB on the harddisk)
- dont rely on uploads from MtGox - we collect the data ourself
- expandable to other currencies (litecoin) in the future
- expandable to other exchanges (like bitstamp and others) in the future

objectives :
- robust and failsafe
- very easy setup and very easy to use on any platform (Linux/Windows/Mac)
- commandline interface, so can be easily used in batchjobs, shellscripts, macros, chronjobs to be able to automate the updates to your target database or dataformat
- settings stored into an easy, full documented ini file with lots of examples how to convert the data to the target You want.
- probably integration into nitrous downloadtool, be able to update his existing sqlite database
- written in python 2.7.x
- open source (GPL 3.0)

shortcomings :
- I can not provide the App ID of the bidding/selling party (the person buying/selling BTC), since this is not covered in MtGox API
however, since only 4% of the trades until now have some App ID this should not a big deal.

what I have got now :
- all the mtgox data is collected until now, as of 2013-08-27
- I am able to update the data in a 10 minutes timeframe without any problems
- the documentation of data structure for data- and metafiles

what I am working on now:
- reorganize the data according to the dataformat designed
- automatic upload of the files to google drive for the first version.

todolist:
- creating the download client (thats easy)
- creating the export tool (converting the data from my own proprietery format to any format You want), starting with csv export
- cross checking my data with nitrous database

timeframe:
- since I can donate only a few hours a week to that project, things will wind up slowly
- I will set up a project on bitbucket when the time comes, any help is appreciated


what I need from You is some input:

- please vote - are You interested in such a tool ? Is there any need for this tool ? vote here please : https://bitcointalk.org/index.php?topic=282154.0
- please vote - how often should the data be updated ? vote here please : https://bitcointalk.org/index.php?topic=282246.0

your comments are welcome
nitrous
Sr. Member
****
Offline Offline

Activity: 246
Merit: 250


View Profile
August 27, 2013, 10:59:14 AM
 #2

That's great! Admittedly, my tool was meant to be a very quick project relying on the availability of bigquery and exposing the data to non-techies, but it was a bit buggy and I spent more time than I really had as it is.

First and foremost, look out for anyone else who is thinking of doing the same. I know whydifficult has a project called wizbit, so you may want to make sure you're not reinventing the wheel.

Are you going to host this on a server?

With regards to how often it should update -- one thing some people mentioned is that they would like to feed data in (close to) realtime to their trade apps. Of course, if you're uploading to Google Drive that's not really feasible, but if you're hosting it on servers you could effectively create a realtime trade data mirror. The only alternative I can think of is, at some point in the future, writing in API support to the user app which could download the most recent data (but still downloading the bulk of the data from you).

One thing to watch out for is that sometimes MtGox adds new currencies without warning or updates. One method I'm considering in my own projects is to download the stream channel list from here, strip the prefixes, ignore the three letter ones (eg BTC, lag), and then strip the duplicates. I believe there are 39 currency pairs, but only 17 are active. Anyway, it's a start. Perhaps you could request MagicalTux writes an API method that returns the list of active currency pairs with the date of their first trade (unlikely due to concerns for api load, but if you explain your project he may be more agreeable).

Also, for another of my projects (though not very easy to integrate into my data tool) I am keeping up-to-date MtGox data (using the socket to hopefully get realtime), so if you want I can cross-check statistics with your database. Btw, does 150mb include indexes on your database? You may find queries and exports are quite slow without them. I would recommend one including at least the tid, or a unique index on tid,primary. Another one on date might be useful as well.

Anyway, I look forward to what you come up with! Feel free to reuse any of the code from my tool (if so, please include a thanks and a link to my original project).

Donations: 1Q2EN7TzJ6z82xvmQrRoQoMf3Tf4rMCyvL
MtGox API v2 Unofficial Documentation: https://bitbucket.org/nitrous/mtgox-api/overview
MtGox API v2 Unofficial Documentation Forum Thread: https://bitcointalk.org/index.php?topic=164404.0
bitranox (OP)
Newbie
*
Offline Offline

Activity: 13
Merit: 0



View Profile
August 27, 2013, 12:18:53 PM
 #3

Dear nitrous,
thank You so much for Your friendly comment. Here my answer on that :

...  and I spent more time than I really had as it is.

that is also my concern, so I will KISS (keep it simple and stupid)

Quote
First and foremost, look out for anyone else who is thinking of doing the same. I know whydifficult has a project called wizbit, so you may want to make sure you're not reinventing the wheel.

thanks for the link - I checked it and it is something different. My Tool will not do not any data representatation but only acts as an interface.
It will download the data and feed it to another custumizable format or database for futher investigation or use by third party tools like trade robots, data analysis, whatever ...

Quote
Are you going to host this on a server?

I dont want to expose our servers to the community because we dont want to be victim of DDOS Attacks etc.
Preventing those kind of attacks, renting servers and so on, will cost more time and money I am willing to spend on it - also I want to keep the service free of charge.
So I will just upload junks of data onto a number of filehosters.
The tool just downloads and reassemble those junks, as well as provide some mechanisms to translate that data into customizeable formats and feed it into another database or csv or whatever.
Since there is no database operation involved on the Webservers, it is very very fast, and failsafe - I can spread those downloads over a multiple number of hosters, for instance google drive, dropbox and others.
Also it is proxy-friendly and will be totally trasnsparent for firewalls etc.
All You need is access to port 80 and the ability to download *.gz files from the internet (some very restrictive firewalls {snort} may prevent that)

Quote
With regards to how often it should update -- one thing some people mentioned is that they would like to feed data in (close to) realtime to their trade apps. Of course, if you're uploading to Google Drive that's not really feasible, but if you're hosting it on servers you could effectively create a realtime trade data mirror. The only alternative I can think of is, at some point in the future, writing in API support to the user app which could download the most recent data (but still downloading the bulk of the data from you).

that is exactly what I am thinking of. for statistic analyses and stuff it would be enough to be able to update every 10 minutes.
If someone wants a realtime feed, he might download bulkdata with my tool and then take care of the most recent data by himself, or another tool probably provided later.

Quote
One thing to watch out for is that sometimes MtGox adds new currencies without warning or updates. One method I'm considering in my own projects is to download the stream channel list from here, strip the prefixes, ignore the three letter ones (eg BTC, lag), and then strip the duplicates. I believe there are 39 currency pairs, but only 17 are active. Anyway, it's a start. Perhaps you could request MagicalTux writes an API method that returns the list of active currency pairs with the date of their first trade (unlikely due to concerns for api load, but if you explain your project he may be more agreeable).

KISS. If any of the exchanges adds a certain currenypair in the future, users can simply inform me that they want that data. And if possible I will integrate that into the tool.
I just take what I can get and provide it, so no need to bother MagicalTux or others, also not depending more then neccessary on the good-will or recources of different exchanges, organisations or persons.
The tool will auto-update itself (if the user allows that) in order to be able to process such future dataformats, currencypairs, exchanges, whatever ...

Quote
Also, for another of my projects (though not very easy to integrate into my data tool) I am keeping up-to-date MtGox data (using the socket to hopefully get realtime), so if you want I can cross-check statistics with your database.

yeah, cool, I will come back for that later.

Quote
Btw, does 150mb include indexes on your database?

no - no indeces at all, and as low as possible redundancy to keep bandwith and footprint as small as possible.
by the way, I am planning that the user optionally can delete the datajunks after exporting / update to his desired target

Quote
You may find queries and exports are quite slow without them. I would recommend one including at least the tid, or a unique index on tid,primary. Another one on date might be useful as well.

That will not be the scope of the tool. It should just provide a method to feed the data into any format or database You want - there You can do all the indices, sortings  and queries You might need.
The focus is to make a robust script to be able to use it in macros, batchjobs, cronjobs and so on, to keep Your existing database up to date, or just to download the bulk-data until now().

Quote
Anyway, I look forward to what you come up with! Feel free to reuse any of the code from my tool (if so, please include a thanks and a link to my original project).

Of course it is better to work together, so my code will be GPL anyway - so of course You or any other user can also reuse my code or the data provided in any way.
In order to be completely legal and save to reuse the code, I suggest You also include some GPL Licence with Your stuff on bitbucket and other stuff you provide.

Yours sincerely

bitranox
MusX
Full Member
***
Offline Offline

Activity: 175
Merit: 100


View Profile
August 27, 2013, 12:45:01 PM
 #4

I like to initiative but I'm not sure if it is the right direction...
nitrous has made a workaround, you are going to create another workaround, probably a bit better.
Other way to handle the case would be host all the trades data to public via easy interface and ask users to donate 0.005 every time they are fetching whole history. I'm not sure how many people would donate, but the micro payments are encouraging the paying process.

bitranox (OP)
Newbie
*
Offline Offline

Activity: 13
Merit: 0



View Profile
August 27, 2013, 01:09:25 PM
 #5

Dear MusX,

there is no problem to provide the full data for free.

there is only a problem to provide realtimedata. In that case I need to put the stuff on one of my own servers, what will make the service vulnerable to DDOS attacks.
So You probably dont want to use such a source as a feed for Your tradebot, wouldnt You ?

My opinion is, for the historical data my method will work fine (update every hour or every 10 minutes probably). For realtimedata I suggest to use the channels from mtgox.

yours sincerely

bitranox
nitrous
Sr. Member
****
Offline Offline

Activity: 246
Merit: 250


View Profile
August 27, 2013, 01:24:25 PM
 #6

Dear bitranox,

Ok, I've GPL'd my public repos, so that should fix the legal issues Smiley

Yours sincerely,
nitrous

Donations: 1Q2EN7TzJ6z82xvmQrRoQoMf3Tf4rMCyvL
MtGox API v2 Unofficial Documentation: https://bitbucket.org/nitrous/mtgox-api/overview
MtGox API v2 Unofficial Documentation Forum Thread: https://bitcointalk.org/index.php?topic=164404.0
bitranox (OP)
Newbie
*
Offline Offline

Activity: 13
Merit: 0



View Profile
August 27, 2013, 04:55:48 PM
 #7

Dear nitrous,

Ok, I've GPL'd my public repos, so that should fix the legal issues Smiley

yeah, it might look ridiculous, but I dont want to get sued by a guy like that: https://bitcointalk.org/index.php?topic=278973.0
just because a tradingbot went wild on occasion or something like that ...

yours sincerely

bitranox
nitrous
Sr. Member
****
Offline Offline

Activity: 246
Merit: 250


View Profile
August 27, 2013, 05:04:36 PM
 #8

Dear nitrous,

Ok, I've GPL'd my public repos, so that should fix the legal issues Smiley

yeah, it might look ridiculous, but I dont want to get sued by a guy like that: https://bitcointalk.org/index.php?topic=278973.0
just because a tradingbot went wild on occasion or something like that ...

yours sincerely

bitranox


Dear bitranox,

Don't worry about it, it's fine Smiley I probably should have licensed them a long time ago anyway. I certainly don't want any legal problems with my open source projects!

Yours sincerely,
nitrous

Donations: 1Q2EN7TzJ6z82xvmQrRoQoMf3Tf4rMCyvL
MtGox API v2 Unofficial Documentation: https://bitbucket.org/nitrous/mtgox-api/overview
MtGox API v2 Unofficial Documentation Forum Thread: https://bitcointalk.org/index.php?topic=164404.0
arvin8
Newbie
*
Offline Offline

Activity: 9
Merit: 0


View Profile
August 30, 2013, 11:45:04 AM
 #9

It's great to see that you are working on such a tool.

There are many discussions about bots, strategies, etc... but they are all useless if we don't have proper data in our hand to backtest with.
I believe your tool is going to help a lot of people.

I would like to help since you say that you can't commit too many hours but unfortunately I'm not a developer but I can help with the GUI if you want to make usable and it look nice.

Arvin



arvin8
Newbie
*
Offline Offline

Activity: 9
Merit: 0


View Profile
August 30, 2013, 12:15:10 PM
 #10

While waiting, It would be great if you can please share the updated data that you got so we can make use of it.

Thanks!
Arvin
bitranox (OP)
Newbie
*
Offline Offline

Activity: 13
Merit: 0



View Profile
September 01, 2013, 09:56:21 PM
 #11

Dear Arvin,
thanks for Your offer - but there will be no GUI since that will be completely useless for just a datapump.
I think it will take until beginning of October to have the first version online.

Yours sincerely

bitranox
b!z
Legendary
*
Offline Offline

Activity: 1582
Merit: 1010



View Profile
September 02, 2013, 06:37:44 AM
 #12

With all of the Mt Gox problems, I don't think anybody 'desperately' needs info from them anymore :-)
bitranox (OP)
Newbie
*
Offline Offline

Activity: 13
Merit: 0



View Profile
September 02, 2013, 10:00:48 PM
 #13

Dear b!z,
I will also try to cover bitstamp tradedata if possible.
yours sincerely
bitranox
nitrous
Sr. Member
****
Offline Offline

Activity: 246
Merit: 250


View Profile
September 02, 2013, 10:07:03 PM
 #14

Dear b!z,
I will also try to cover bitstamp tradedata if possible.
yours sincerely
bitranox


Dear bitranox,

Bitstamp would be great. I contacted them a while ago asking them if they would consider implementing a method to download bulk data similar to MtGox's bigquery solution in order to reduce their potential server load. They replied that they would implement this, but it seems that they too have forgotten.

If your tool develops into a bulk trade data source for the most popular exchanges, then I think this will be a very useful tool indeed! If you do cover bitstamp in the future, please also consider BTCe and Kraken as well.

Yours sincerely,
nitrous

Donations: 1Q2EN7TzJ6z82xvmQrRoQoMf3Tf4rMCyvL
MtGox API v2 Unofficial Documentation: https://bitbucket.org/nitrous/mtgox-api/overview
MtGox API v2 Unofficial Documentation Forum Thread: https://bitcointalk.org/index.php?topic=164404.0
MusX
Full Member
***
Offline Offline

Activity: 175
Merit: 100


View Profile
September 03, 2013, 12:46:23 AM
 #15

my proposal:
MtGox trades history download R script - https://bitcointalk.org/index.php?topic=286755
no bigquery, no middle hosts, simple logic, opensource, output to csv/sqlite

bitranox (OP)
Newbie
*
Offline Offline

Activity: 13
Merit: 0



View Profile
September 04, 2013, 11:11:24 PM
 #16

Dear MusX,

not bad - but for my taste too many dependencies and a single point of failure.
However, if there are many sources its a good thing - also to crosscheck the data.

Yours sincerely

bitranox
Praeconium
Member
**
Offline Offline

Activity: 102
Merit: 10


View Profile
January 02, 2014, 06:17:13 PM
 #17

Hi, has there been any development? This tool is greatly needed.

All the best
MusX
Full Member
***
Offline Offline

Activity: 175
Merit: 100


View Profile
January 02, 2014, 07:51:14 PM
 #18

the R script provided above, finished for all currencies on mtgox, can be adjusted for bitstamp too. In fact it can integrate all of the markets available on bitcoincharts.com csv api.
In terms of it's speed the bottleneck is transfer rate from bitcoincharts.com, and nothing more, I believe it will be hard to beat it.

Pages: [1]
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!