Bitcoin Forum
June 18, 2024, 10:58:07 AM *
News: Voting for pizza day contest
 
  Home Help Search Login Register More  
  Show Posts
Pages: [1]
1  Economy / Trading Discussion / Re: How can I reduce time wasted on data cleaning? on: July 23, 2020, 11:30:21 AM
To clarify you're trying to build models from the data but want stuff to be synchronised between different datasets.

I may not have gone further than just a fairly Timeframe but is it not possible to just aggregate the data to a point where both sets are in sync rather than trying to fill gaps with models etc...

Could you give an example of the stuff you're trying to do or a simple application others would use that links to yours - without giving away what you're trading off..

I've encountered 2 types of problems so far. First is that the timestamp syncronizing is really difficult. Not all events have time stamps based on when they happened in the market, so I'm not sure how to sync them. As an example, imagine that I receive the events from two markets (like order book updates) and want to see which market moves first. The problem is that one of the markets does not give a timestamp on when the event was registered in the trading engine, but only timestamp I have is the one I record in my server. I don't know how long the data is in transit from the exchanges server, so it becomes really difficult for me to estimate which markets orders were recorded first. Any suggestions what can I do?

Second problem is with the gaps. Some data sets (for example's sake, imagine candles) can have a couple of days long gap in them. Most sources are telling me to just average it out but I don't like that as I'm worried it influences the models. Also filling the gaps from another data source works rarely because usually the timestamps are not in sync so it becomes almost impossible to fit them retrospectively.

I'm now using only a couple of exchanges and a handful of pairs. I'd like to increase the amount of markets but I'm worried about what kind of issues there will be. I'm already spending so much time in fixing these things that I don't know if I can manage more markets. Any ideas/help will be really appreciated!
2  Economy / Trading Discussion / How can I reduce time wasted on data cleaning? on: July 22, 2020, 03:15:33 PM
I'm fairly new to the trading scene, started in 2019 with TA and this year got more into more quant based strategies. So my problem is that my models are constantly getting issues because of timestamps not being synchronized (can't interlink data sets) or because the data set has gaps. I've noticed that every 10 hours I work in modelling I spend 5 just cleaning the data.

This topic is a bit general, but as I'm quite new to this I'd like to know how to make this faster, and is it normal to spend this amount of time in preparing the data? Any suggestions are welcome. I use python/jupyter and data usually I find from free online sources and/or fetch from exchanges.
Pages: [1]
Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!