Bitcoin Forum
May 15, 2024, 06:21:57 PM *
News: Latest Bitcoin Core release: 27.0 [Torrent]
 
   Home   Help Search Login Register More  
Poll
Question: As a quant developer/analyst/trader, how many hours do you spend weekly on fixing issues* with crypto market data?  *(e.g. timestamping inconsistencies, filling gaps, normalizing data structures, etc.)
0 hours - 0 (0%)
1-2 hours - 1 (100%)
3-5 hours - 0 (0%)
5-8 hours - 0 (0%)
8+ hours - 0 (0%)
Total Voters: 1

Pages: [1]
  Print  
Author Topic: How can I reduce time wasted on data cleaning?  (Read 138 times)
yellowgreenredwhiteblue (OP)
Newbie
*
Offline Offline

Activity: 2
Merit: 0


View Profile
July 22, 2020, 03:15:33 PM
 #1

I'm fairly new to the trading scene, started in 2019 with TA and this year got more into more quant based strategies. So my problem is that my models are constantly getting issues because of timestamps not being synchronized (can't interlink data sets) or because the data set has gaps. I've noticed that every 10 hours I work in modelling I spend 5 just cleaning the data.

This topic is a bit general, but as I'm quite new to this I'd like to know how to make this faster, and is it normal to spend this amount of time in preparing the data? Any suggestions are welcome. I use python/jupyter and data usually I find from free online sources and/or fetch from exchanges.
jackg
Copper Member
Legendary
*
Offline Offline

Activity: 2856
Merit: 3071


https://bit.ly/387FXHi lightning theory


View Profile
July 22, 2020, 09:28:20 PM
 #2

To clarify you're trying to build models from the data but want stuff to be synchronised between different datasets.

I may not have gone further than just a fairly Timeframe but is it not possible to just aggregate the data to a point where both sets are in sync rather than trying to fill gaps with models etc...

Could you give an example of the stuff you're trying to do or a simple application others would use that links to yours - without giving away what you're trading off..
yellowgreenredwhiteblue (OP)
Newbie
*
Offline Offline

Activity: 2
Merit: 0


View Profile
July 23, 2020, 11:30:21 AM
 #3

To clarify you're trying to build models from the data but want stuff to be synchronised between different datasets.

I may not have gone further than just a fairly Timeframe but is it not possible to just aggregate the data to a point where both sets are in sync rather than trying to fill gaps with models etc...

Could you give an example of the stuff you're trying to do or a simple application others would use that links to yours - without giving away what you're trading off..

I've encountered 2 types of problems so far. First is that the timestamp syncronizing is really difficult. Not all events have time stamps based on when they happened in the market, so I'm not sure how to sync them. As an example, imagine that I receive the events from two markets (like order book updates) and want to see which market moves first. The problem is that one of the markets does not give a timestamp on when the event was registered in the trading engine, but only timestamp I have is the one I record in my server. I don't know how long the data is in transit from the exchanges server, so it becomes really difficult for me to estimate which markets orders were recorded first. Any suggestions what can I do?

Second problem is with the gaps. Some data sets (for example's sake, imagine candles) can have a couple of days long gap in them. Most sources are telling me to just average it out but I don't like that as I'm worried it influences the models. Also filling the gaps from another data source works rarely because usually the timestamps are not in sync so it becomes almost impossible to fit them retrospectively.

I'm now using only a couple of exchanges and a handful of pairs. I'd like to increase the amount of markets but I'm worried about what kind of issues there will be. I'm already spending so much time in fixing these things that I don't know if I can manage more markets. Any ideas/help will be really appreciated!
jackg
Copper Member
Legendary
*
Offline Offline

Activity: 2856
Merit: 3071


https://bit.ly/387FXHi lightning theory


View Profile
July 23, 2020, 03:00:55 PM
 #4

For the first problem, you're trying to measure lag from the server to you. The only feasible way I can think of is if you place an order and see how long it takes to have the order added to the book (but it probably only works if you've got that level of aggregation either pick a quiet time to do it or have an order ID you can search for once collecting and timestamping the data).

If I were you I'd just ignore the gaps, it's a but stupid to extend the candle by 3 days and that's the only thing I'd do, because you know for the 2 days for something like the cme futures that the open was Fridays close and the close of the candle was Mondays open... But it doesn't make sense to me to have any possibility to normalise the data when markets close.

I think every exchange has closed trading for a certain amount of time. Too so everywhere has gpas that just stay unfilled.
BitcoinTurk
Hero Member
*****
Offline Offline

Activity: 1624
Merit: 624


View Profile
July 27, 2020, 05:11:22 PM
 #5

I think that this preferred period should definitely be variable depending on your daily life. For example, if your profession is mainly related to this job or if your main source of income is this method, I think it would be very normal to spend 8 hours a week, but if these situations are not valid for you, I think it will be enough to spend 2-4 hours a week. For this reason, I think that this process will vary from person to person and should be arranged according to your daily life routines.
rexxarofmoknathal
Sr. Member
****
Offline Offline

Activity: 966
Merit: 260



View Profile
July 27, 2020, 10:20:40 PM
 #6

I'm fairly new to the trading scene, started in 2019 with TA and this year got more into more quant based strategies. So my problem is that my models are constantly getting issues because of timestamps not being synchronized (can't interlink data sets) or because the data set has gaps. I've noticed that every 10 hours I work in modelling I spend 5 just cleaning the data.

This topic is a bit general, but as I'm quite new to this I'd like to know how to make this faster, and is it normal to spend this amount of time in preparing the data? Any suggestions are welcome. I use python/jupyter and data usually I find from free online sources and/or fetch from exchanges.

Dude, that sounds excessive. By the time you're done with the data modelling and cleaning the market shifts and most of the effort is wasted. It is best to develop something more time efficient, something automated or find a way to look at data that isn't so difficult to unravel.





BUY & SELL
▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀
BITCOIN ETHEREUM RIPPLE
FAQ
▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀
AFFILIATE PROGRAM




░██████████████████░
████████████████████
█████████▀░░░███████
█████████░░▄████████
███████▀▀░░▀▀███████
███████▄▄░░▄▄███████
█████████░░█████████

█████████░░█████████

█████████▄▄█████████

████████████████████

░██████████████████░
░██████████████████░
████████████████████
████████████▀▀▀█▀███
███░▀█████▀░░░░░▀███
███▌░░░▀▀▀░░░░░░████
████▄░░░░░░░░░░░████
█████▀░░░░░░░░░█████

██████▄░░░░░▄▄██████

█████▄▄▄▄███████████

████████████████████

░██████████████████░
░██████████████████░
████████████████████
████████████████████
███████████▀▀░░▐████
███████▀▀░░░░░█████
████▀░░░▄█▀░░░▐█████
█████▄▄█▀░░░░░██████

███████▌▄▄▄▐██████

████████████████████

████████████████████

░██████████████████░
Kelvinid
Sr. Member
****
Offline Offline

Activity: 2800
Merit: 344


when lambo...


View Profile
July 27, 2020, 11:21:34 PM
 #7


Second problem is with the gaps. Some data sets (for example's sake, imagine candles) can have a couple of days long gap in them. Most sources are telling me to just average it out but I don't like that as I'm worried it influences the models. Also filling the gaps from another data source works rarely because usually the timestamps are not in sync so it becomes almost impossible to fit them retrospectively.

I'm now using only a couple of exchanges and a handful of pairs. I'd like to increase the amount of markets but I'm worried about what kind of issues there will be. I'm already spending so much time in fixing these things that I don't know if I can manage more markets. Any ideas/help will be really appreciated!
Try to use other exchanges. If the gaps pertain to happen again, there might have a problem with your browser or your internet connection, you might try to use another browser instead. I believe there is no problem with the site because they will probably know it already as hearing complaints and for sure they are fixing it already but in your case, there is nothing it happens.

The last option is that you, you need to check your computer or try to use a new one, then figured it out what is the difference because it possible that there might some problem with the programs installed or the capability of it.

freebitcoin       ▄▄▄█▀▀██▄▄▄
   ▄▄██████▄▄█  █▀▀█▄▄
  ███  █▀▀███████▄▄██▀
   ▀▀▀██▄▄█  ████▀▀  ▄██
▄███▄▄  ▀▀▀▀▀▀▀  ▄▄██████
██▀▀█████▄     ▄██▀█ ▀▀██
██▄▄███▀▀██   ███▀ ▄▄  ▀█
███████▄▄███ ███▄▄ ▀▀▄  █
██▀▀████████ █████  █▀▄██
 █▄▄████████ █████   ███
  ▀████  ███ ████▄▄███▀
     ▀▀████   ████▀▀
BITCOIN
DICE
EVENT
BETTING
WIN A LAMBO !

.
            ▄▄▄▄▄▄▄▄▄▄███████████▄▄▄▄▄
▄▄▄▄▄██████████████████████████████████▄▄▄▄
▀██████████████████████████████████████████████▄▄▄
▄▄████▄█████▄████████████████████████████▄█████▄████▄▄
▀████████▀▀▀████████████████████████████████▀▀▀██████████▄
  ▀▀▀████▄▄▄███████████████████████████████▄▄▄██████████
       ▀█████▀  ▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀  ▀█████▀▀▀▀▀▀▀▀▀▀
▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀
▄█████
██
██
██
██
██
██
██
██
██
██
██
▀█████
.
PLAY NOW
█████▄
██
██
██
██
██
██
██
██
██
██
██
█████▀
Gozie51
Hero Member
*****
Online Online

Activity: 2492
Merit: 624


Leading Crypto Sports Betting & Casino Platform


View Profile
July 28, 2020, 10:04:44 PM
 #8

I think that this preferred period should definitely be variable depending on your daily life. For example, if your profession is mainly related to this job or if your main source of income is this method, I think it would be very normal to spend 8 hours a week, but if these situations are not valid for you, I think it will be enough to spend 2-4 hours a week. For this reason, I think that this process will vary from person to person and should be arranged according to your daily life routines.

I think also to buy minimize visiting sites that is not useful for the person or to reduce the rate of unnecessary data wastage. Some people use data more than they are required. Also, taking time out to adjust the amount of daily usage on the phone.

..Stake.com..   ▄████████████████████████████████████▄
   ██ ▄▄▄▄▄▄▄▄▄▄            ▄▄▄▄▄▄▄▄▄▄ ██  ▄████▄
   ██ ▀▀▀▀▀▀▀▀▀▀ ██████████ ▀▀▀▀▀▀▀▀▀▀ ██  ██████
   ██ ██████████ ██      ██ ██████████ ██   ▀██▀
   ██ ██      ██ ██████  ██ ██      ██ ██    ██
   ██ ██████  ██ █████  ███ ██████  ██ ████▄ ██
   ██ █████  ███ ████  ████ █████  ███ ████████
   ██ ████  ████ ██████████ ████  ████ ████▀
   ██ ██████████ ▄▄▄▄▄▄▄▄▄▄ ██████████ ██
   ██            ▀▀▀▀▀▀▀▀▀▀            ██ 
   ▀█████████▀ ▄████████████▄ ▀█████████▀
  ▄▄▄▄▄▄▄▄▄▄▄▄███  ██  ██  ███▄▄▄▄▄▄▄▄▄▄▄▄
 ██████████████████████████████████████████
▄▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▄
█  ▄▀▄             █▀▀█▀▄▄
█  █▀█             █  ▐  ▐▌
█       ▄██▄       █  ▌  █
█     ▄██████▄     █  ▌ ▐▌
█    ██████████    █ ▐  █
█   ▐██████████▌   █ ▐ ▐▌
█    ▀▀██████▀▀    █ ▌ █
█     ▄▄▄██▄▄▄     █ ▌▐▌
█                  █▐ █
█                  █▐▐▌
█                  █▐█
▀▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▀█
▄▄█████████▄▄
▄██▀▀▀▀█████▀▀▀▀██▄
▄█▀       ▐█▌       ▀█▄
██         ▐█▌         ██
████▄     ▄█████▄     ▄████
████████▄███████████▄████████
███▀    █████████████    ▀███
██       ███████████       ██
▀█▄       █████████       ▄█▀
▀█▄    ▄██▀▀▀▀▀▀▀██▄  ▄▄▄█▀
▀███████         ███████▀
▀█████▄       ▄█████▀
▀▀▀███▄▄▄███▀▀▀
..PLAY NOW..
Pages: [1]
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!