Bitcoin Forum
December 13, 2024, 11:45:32 PM *
News: Latest Bitcoin Core release: 28.0 [Torrent]
 
   Home   Help Search Login Register More  
Pages: [1] 2 »  All
  Print  
Author Topic: Data Analysis on Altcoins  (Read 489 times)
wesR (OP)
Newbie
*
Offline Offline

Activity: 13
Merit: 0


View Profile
December 13, 2017, 01:08:59 AM
 #1

Dear all,


I am new to this community  Cheesy.

The reason why I joined is because I enjoy data analysis on cryptocurrency data sets and to talk about it with people who are interested to discuss it with me.

The purpose of this particular post is to interact with a wider community regarding some preliminary results I have.

For starters, please consider the following chart of traded volumes, to find out when people like to go online and trade BTC in contrast to USD observed over several years. 1 is Sunday, 2 Monday etc. The absolute numbers are somewhat irrelevant for the purpose of the charts as the most extreme outliers were removed.

USD:
https://imgur.com/wNzEEMf

BTC:
https://imgur.com/FvE71of


Looking forward to hearing your thoughts. When do you go online and trade (Central US time please)?


illiki23
Sr. Member
****
Offline Offline

Activity: 602
Merit: 295


Hail Eris!


View Profile
December 13, 2017, 03:46:43 AM
 #2

So why don't you just use something like box plots?

It would show the side by side differences in daily distribution well.  It is what they are for.

It is hard to tell what a lot of this graphic is.

             ▄▄██████▄
         ▄▄████████████
   ▄▄█████████▀▀   ▀████
 ▄███████████▄      ████
████▀   ▀▀██████▄▄▄████
████      ▄███████████▄
▀████▄▄▄████████▀▀▀████▄
 ▀███████████▀      ████
 ████▀▀▀██████▄▄   ▄███▀
████      ▀███████████▀
████▄   ▄▄█████████▀▀
 ████████████▀▀
  ▀██████▀▀
█████████████████

     ███

██████████

     ██████

███████████

     ███████████████

███████████████████
█████████████████

███    

██████████

██████    

███████████

███████████████    

███████████████████
▄█████████████████████████▄
███████████████████████████
███████████████████████████
██████▀███████▀   ▀▀▀▄█████
█████▌  ▀▀███▌       ▄█████

████▀               █████
█████▄              ███████
██████▄            ████████
███████▄▄        ▄█████████
█████▄▄       ▄████████████
███████████████████████████
███████████████████████████
▀█████████████████████████▀
▄█████████████████████████▄
███████████████████████████
███████████████████████████
██████████████████▀▀███████
█████████████▀▀▀    ███████

███████▀▀▀   ▄▀   ███████
█████▄     ▄█▀     ████████
████████▄ █▀      █████████
█████████▌▐       █████████
██████████ ▄██▄  ██████████
████████████████▄██████████
███████████████████████████
▀█████████████████████████▀
▄█████████████████████████▄
███████████████████████████
███████████████████████████
███████▀           ▀███████
██████  ▄██▀▀▀▀▀█▀▄  ██████

█████  █▀  ▄▄▄  ▀█  █████
██████  █  █████  █  ██████
██████  █▄  ▀▀▀  ▄█  ██████
██████  ▀██▄▄▄▄▄██▀  ██████
███████▄           ▄███████
███████████████████████████
███████████████████████████
▀█████████████████████████▀
▄█████████████████████████▄
███████████████████████████
███████████████████████████
██████████▀█████▀██████████
███████▀  ▀     ▀  ▀███████

█████▌             ▐█████
██████    ██   ██    ██████
█████▌    ▀▀   ▀▀    ▐█████
██████▄  ▄▄▄   ▄▄▄  ▄██████
████████▄▄███████▄▄████████
███████████████████████████
███████████████████████████
▀█████████████████████████▀


na]][/font][/font][/size][/font][/td][td][/td][/tr][/table][/tr
wesR (OP)
Newbie
*
Offline Offline

Activity: 13
Merit: 0


View Profile
December 14, 2017, 02:00:53 AM
 #3

OK, I see your point. Density plots have the advantage to reveal some distribution of the data though, although I have to admit they are less suited in this particular case.

Have a look at version v2:

All volumetric trade data, the time seems to be GMT+0 time:

USD: https://imgur.com/sK3KrUr
BTC: https://imgur.com/Cs2MsGf
ETH: https://imgur.com/Vm0hPtD

Volume started increasing around 15h with a peak around 16-19h.
 
The strongest trade day was Friday, but not for Ethereum where it was Tuesday, interesting to know.

USD: https://imgur.com/zwr67Vv
BTC: https://imgur.com/JZRasz2
ETH: https://imgur.com/14sG3sa


All data is normalized within a week's time to mark the differences in trading pattern. I included all data, and it reveals an underlying trend. Let me know what you think.
illiki23
Sr. Member
****
Offline Offline

Activity: 602
Merit: 295


Hail Eris!


View Profile
December 14, 2017, 06:38:53 AM
 #4

Ok fine, and yes density plots are definitely advantageous.  I didn't recognize they were density plots at first.

How are the shapes created?  Those don't look like standard density plots, the shapes not the points.


OK, I see your point. Density plots have the advantage to reveal some distribution of the data though, although I have to admit they are less suited in this particular case.

Have a look at version v2:

All volumetric trade data, the time seems to be GMT+0 time:

USD: https://imgur.com/sK3KrUr
BTC: https://imgur.com/Cs2MsGf
ETH: https://imgur.com/Vm0hPtD

Volume started increasing around 15h with a peak around 16-19h.
 
The strongest trade day was Friday, but not for Ethereum where it was Tuesday, interesting to know.

USD: https://imgur.com/zwr67Vv
BTC: https://imgur.com/JZRasz2
ETH: https://imgur.com/14sG3sa


All data is normalized within a week's time to mark the differences in trading pattern. I included all data, and it reveals an underlying trend. Let me know what you think.


             ▄▄██████▄
         ▄▄████████████
   ▄▄█████████▀▀   ▀████
 ▄███████████▄      ████
████▀   ▀▀██████▄▄▄████
████      ▄███████████▄
▀████▄▄▄████████▀▀▀████▄
 ▀███████████▀      ████
 ████▀▀▀██████▄▄   ▄███▀
████      ▀███████████▀
████▄   ▄▄█████████▀▀
 ████████████▀▀
  ▀██████▀▀
█████████████████

     ███

██████████

     ██████

███████████

     ███████████████

███████████████████
█████████████████

███    

██████████

██████    

███████████

███████████████    

███████████████████
▄█████████████████████████▄
███████████████████████████
███████████████████████████
██████▀███████▀   ▀▀▀▄█████
█████▌  ▀▀███▌       ▄█████

████▀               █████
█████▄              ███████
██████▄            ████████
███████▄▄        ▄█████████
█████▄▄       ▄████████████
███████████████████████████
███████████████████████████
▀█████████████████████████▀
▄█████████████████████████▄
███████████████████████████
███████████████████████████
██████████████████▀▀███████
█████████████▀▀▀    ███████

███████▀▀▀   ▄▀   ███████
█████▄     ▄█▀     ████████
████████▄ █▀      █████████
█████████▌▐       █████████
██████████ ▄██▄  ██████████
████████████████▄██████████
███████████████████████████
▀█████████████████████████▀
▄█████████████████████████▄
███████████████████████████
███████████████████████████
███████▀           ▀███████
██████  ▄██▀▀▀▀▀█▀▄  ██████

█████  █▀  ▄▄▄  ▀█  █████
██████  █  █████  █  ██████
██████  █▄  ▀▀▀  ▄█  ██████
██████  ▀██▄▄▄▄▄██▀  ██████
███████▄           ▄███████
███████████████████████████
███████████████████████████
▀█████████████████████████▀
▄█████████████████████████▄
███████████████████████████
███████████████████████████
██████████▀█████▀██████████
███████▀  ▀     ▀  ▀███████

█████▌             ▐█████
██████    ██   ██    ██████
█████▌    ▀▀   ▀▀    ▐█████
██████▄  ▄▄▄   ▄▄▄  ▄██████
████████▄▄███████▄▄████████
███████████████████████████
███████████████████████████
▀█████████████████████████▀


na]][/font][/font][/size][/font][/td][td][/td][/tr][/table][/tr
wesR (OP)
Newbie
*
Offline Offline

Activity: 13
Merit: 0


View Profile
December 14, 2017, 07:29:21 PM
 #5

The shapes are so-called violin plots. They have a density representation of the data distribution, apart from the box that you can also draw into them.

This dataset may be already exhausted, what else would you like to see, anything particular I could draw up for you? I consider this a forum and the right place to ask such questions.

When do you preferably trade, do you have favorite buy or sell days or strong/weak shopping hours?


illiki23
Sr. Member
****
Offline Offline

Activity: 602
Merit: 295


Hail Eris!


View Profile
December 14, 2017, 07:38:30 PM
Last edit: December 14, 2017, 07:51:16 PM by illiki23
 #6

The shapes are so-called violin plots. They have a density representation of the data distribution, apart from the box that you can also draw into them.

This dataset may be already exhausted, what else would you like to see, anything particular I could draw up for you? I consider this a forum and the right place to ask such questions.

When do you preferably trade, do you have favorite buy or sell days or strong/weak shopping hours?


 

Oh ok, violin plots, haven't seen those for awhile and didn't recognize them.

I don't really have a preference..  but I do have a strong interest in visual data mining.

We can perhaps make use of some other features to show difference in day of week and month of year.

And one thing you might want to do is run some statistical tests on those metrics to see if there is significant difference in means because they do look pretty similar.

What is your number of samples?

             ▄▄██████▄
         ▄▄████████████
   ▄▄█████████▀▀   ▀████
 ▄███████████▄      ████
████▀   ▀▀██████▄▄▄████
████      ▄███████████▄
▀████▄▄▄████████▀▀▀████▄
 ▀███████████▀      ████
 ████▀▀▀██████▄▄   ▄███▀
████      ▀███████████▀
████▄   ▄▄█████████▀▀
 ████████████▀▀
  ▀██████▀▀
█████████████████

     ███

██████████

     ██████

███████████

     ███████████████

███████████████████
█████████████████

███    

██████████

██████    

███████████

███████████████    

███████████████████
▄█████████████████████████▄
███████████████████████████
███████████████████████████
██████▀███████▀   ▀▀▀▄█████
█████▌  ▀▀███▌       ▄█████

████▀               █████
█████▄              ███████
██████▄            ████████
███████▄▄        ▄█████████
█████▄▄       ▄████████████
███████████████████████████
███████████████████████████
▀█████████████████████████▀
▄█████████████████████████▄
███████████████████████████
███████████████████████████
██████████████████▀▀███████
█████████████▀▀▀    ███████

███████▀▀▀   ▄▀   ███████
█████▄     ▄█▀     ████████
████████▄ █▀      █████████
█████████▌▐       █████████
██████████ ▄██▄  ██████████
████████████████▄██████████
███████████████████████████
▀█████████████████████████▀
▄█████████████████████████▄
███████████████████████████
███████████████████████████
███████▀           ▀███████
██████  ▄██▀▀▀▀▀█▀▄  ██████

█████  █▀  ▄▄▄  ▀█  █████
██████  █  █████  █  ██████
██████  █▄  ▀▀▀  ▄█  ██████
██████  ▀██▄▄▄▄▄██▀  ██████
███████▄           ▄███████
███████████████████████████
███████████████████████████
▀█████████████████████████▀
▄█████████████████████████▄
███████████████████████████
███████████████████████████
██████████▀█████▀██████████
███████▀  ▀     ▀  ▀███████

█████▌             ▐█████
██████    ██   ██    ██████
█████▌    ▀▀   ▀▀    ▐█████
██████▄  ▄▄▄   ▄▄▄  ▄██████
████████▄▄███████▄▄████████
███████████████████████████
███████████████████████████
▀█████████████████████████▀


na]][/font][/font][/size][/font][/td][td][/td][/tr][/table][/tr
Protazio
Newbie
*
Offline Offline

Activity: 10
Merit: 0


View Profile
December 14, 2017, 08:06:57 PM
 #7

I hate stats but when I learned TA, I dont have a choice but to like it lol.. Too many data I guess not. Love the lines and candles
wesR (OP)
Newbie
*
Offline Offline

Activity: 13
Merit: 0


View Profile
December 14, 2017, 08:34:22 PM
 #8

Sounds good to me, I also have a strong interest in visualizations that actually help  Grin

See the first - last measurement date and number of observations:

USD:

01/04/2013 13:00 - 08/12/2017 13:00
n = 41090

BTC:
01/04/2013  13:00 - 06/12/2017  18:00
n = 41061

ETH:
23/04/2016  17:00 - 08/12/2017  13:00
n = 14254

I can do a t-test, but then you would tell me if the answer will tell you anything you didn't know before. If the mean is pretty similar, the p value is high, if I compare two groups that look really different, the p value is small. So the groups which look different also are statistically different. I don't think you can take much from there and I wouldn't like to draw t-test for all combinations Wink.  

I followed up on your idea of taking months, and I added the following visuals: trading volume per month.

USD: https://imgur.com/LtyDGKr
BTC: https://imgur.com/WxSvjDw
ETH: https://imgur.com/BGptbCV

I see some stronger trading months for ETH, which may be fueled by news rather than standard trading behavior, and there seems to be a tendency for more trade in the second half of the year, starting with August when people might be coming home from their holidays.



noloco
Full Member
***
Offline Offline

Activity: 210
Merit: 100


BLOCKCHAIN VERIFIED PRODUCT REVIEWS PLATFORM


View Profile
December 14, 2017, 08:47:30 PM
 #9

Where do you find data sources easy to manage to analyze? Are you compiling your how data trough API?

wesR (OP)
Newbie
*
Offline Offline

Activity: 13
Merit: 0


View Profile
December 14, 2017, 09:08:10 PM
 #10

Thats a great question. And thanks for liking my plots  Cheesy!

I am constantly on a never-ending hunt of the perfect cryptocurrency dataset.

This first one is from https://www.bfxdata.com/datadownload/, and contains 5 currencies to compare on a per hour basis (and always the same hour as well, very nicely structured). But it lacks relevant variables and is now more or less exhausted for my purposes. I see there are a few others on that page that may allow you to draw other conclusions, but for the purpose of finding out when people trade, this is as good as it gets.

I have analysed some data I pulled myself from http://coincap.io/, but it does not have extremely nicely stored data, sometimes there are multiple observations on one day, sometimes not. On the other hand, you compare more than 1000 altcoins. I have found another dataset at https://files.coinmarketcap.com/ which is also nice to work with, but both of those dont have per hour data and the variables are somewhat limited. Both of the last examples are pulled via API, yes.

Ideally, I would like to have more currencies and more variables to play with, which first allow me to compute meaningful features, or, if I have them already, go from exploratory to predictive data analysis. Going forward from the question of 'when' people trade, I think it would be more interesting to find out 'what' they trade and what the exchange price is most sensitive to. It requires a certain amount of creativity to find out meaningful features in a dataset. I do not claim to have a lot of experience with fintech, but I can give it a go with your help  Grin.

Let us now talk about interesting features, datasets and analyses you would like me to focus on.
illiki23
Sr. Member
****
Offline Offline

Activity: 602
Merit: 295


Hail Eris!


View Profile
December 14, 2017, 09:42:52 PM
 #11

Sounds good to me, I also have a strong interest in visualizations that actually help  Grin

See the first - last measurement date and number of observations:

USD:

01/04/2013 13:00 - 08/12/2017 13:00
n = 41090

BTC:
01/04/2013  13:00 - 06/12/2017  18:00
n = 41061

ETH:
23/04/2016  17:00 - 08/12/2017  13:00
n = 14254

I can do a t-test, but then you would tell me if the answer will tell you anything you didn't know before. If the mean is pretty similar, the p value is high, if I compare two groups that look really different, the p value is small. So the groups which look different also are statistically different. I don't think you can take much from there and I wouldn't like to draw t-test for all combinations Wink.  

I followed up on your idea of taking months, and I added the following visuals: trading volume per month.

USD: https://imgur.com/LtyDGKr
BTC: https://imgur.com/WxSvjDw
ETH: https://imgur.com/BGptbCV

I see some stronger trading months for ETH, which may be fueled by news rather than standard trading behavior, and there seems to be a tendency for more trade in the second half of the year, starting with August when people might be coming home from their holidays.





Glad you put the violin plots in from of the box plots (are those box plots?).
It is definitely cleaner.  With the months the difference is definitely more pronounced.   With regards to T tests (and I agree with a lot of complaints about them) I can't tell how big your sample size so I can't tell how much of the difference in distributions are due to variance and which are due to actual significant differences with the first round which were hard to see.

Anyways,  my focus is on text visualization.  I just started a project (still in early development) which involves visualizing things such as sentiment and semantics with respect to a coin or ICO.  We could look at different keyword frequencies on different days of the week or month for example, using your approach, and compare them with volume or other things to try and weed out why it is that people trade more or less for different months.  

Recommend a good historical price source?  Apis are fine.



             ▄▄██████▄
         ▄▄████████████
   ▄▄█████████▀▀   ▀████
 ▄███████████▄      ████
████▀   ▀▀██████▄▄▄████
████      ▄███████████▄
▀████▄▄▄████████▀▀▀████▄
 ▀███████████▀      ████
 ████▀▀▀██████▄▄   ▄███▀
████      ▀███████████▀
████▄   ▄▄█████████▀▀
 ████████████▀▀
  ▀██████▀▀
█████████████████

     ███

██████████

     ██████

███████████

     ███████████████

███████████████████
█████████████████

███    

██████████

██████    

███████████

███████████████    

███████████████████
▄█████████████████████████▄
███████████████████████████
███████████████████████████
██████▀███████▀   ▀▀▀▄█████
█████▌  ▀▀███▌       ▄█████

████▀               █████
█████▄              ███████
██████▄            ████████
███████▄▄        ▄█████████
█████▄▄       ▄████████████
███████████████████████████
███████████████████████████
▀█████████████████████████▀
▄█████████████████████████▄
███████████████████████████
███████████████████████████
██████████████████▀▀███████
█████████████▀▀▀    ███████

███████▀▀▀   ▄▀   ███████
█████▄     ▄█▀     ████████
████████▄ █▀      █████████
█████████▌▐       █████████
██████████ ▄██▄  ██████████
████████████████▄██████████
███████████████████████████
▀█████████████████████████▀
▄█████████████████████████▄
███████████████████████████
███████████████████████████
███████▀           ▀███████
██████  ▄██▀▀▀▀▀█▀▄  ██████

█████  █▀  ▄▄▄  ▀█  █████
██████  █  █████  █  ██████
██████  █▄  ▀▀▀  ▄█  ██████
██████  ▀██▄▄▄▄▄██▀  ██████
███████▄           ▄███████
███████████████████████████
███████████████████████████
▀█████████████████████████▀
▄█████████████████████████▄
███████████████████████████
███████████████████████████
██████████▀█████▀██████████
███████▀  ▀     ▀  ▀███████

█████▌             ▐█████
██████    ██   ██    ██████
█████▌    ▀▀   ▀▀    ▐█████
██████▄  ▄▄▄   ▄▄▄  ▄██████
████████▄▄███████▄▄████████
███████████████████████████
███████████████████████████
▀█████████████████████████▀


na]][/font][/font][/size][/font][/td][td][/td][/tr][/table][/tr
wesR (OP)
Newbie
*
Offline Offline

Activity: 13
Merit: 0


View Profile
December 14, 2017, 10:28:52 PM
 #12

For historical comparison, I would currently like to continue using https://files.coinmarketcap.com/, even though it has some limitations.

In near future more functionality and development was promised on http://coincap.io/ so might switch gears again, and I suppose I am fine with either.

If you give me data that has a structure similar to this one, which allows linking IDs in a relational database, lets talk again soon Cheesy

symbol   date   open   high   low   close   volume   market   name   ranknow   variance   volatility   mday   mon   year   wday   yday   delPerc
BTC   02/12/2017   10978.3   11320.2   10905.1   11074.6   5138500000   1.83E+11   Bitcoin   1   0.008695574   0.037482166   2   11   2017   6   335   POSITIVE
BTC   01/12/2017   10198.6   11046.7   9694.65   10975.6   6783120000   1.70E+11   Bitcoin   1   0.070793396   0.123186887   1   11   2017   5   334   POSITIVE
BTC   30/11/2017   9906.79   10801   9202.05   10233.6   8310690000   1.66E+11   Bitcoin   1   0.031934998   0.156245114   30   10   2017   4   333   POSITIVE
BTC   29/11/2017   10077.4   11517.4   9601.03   9888.61   11568800000   1.68E+11   Bitcoin   1   -0.019091662   0.19379569   29   10   2017   3   332   NEG
illiki23
Sr. Member
****
Offline Offline

Activity: 602
Merit: 295


Hail Eris!


View Profile
December 14, 2017, 10:35:03 PM
Last edit: December 14, 2017, 10:52:25 PM by illiki23
 #13

For historical comparison, I would currently like to continue using https://files.coinmarketcap.com/, even though it has some limitations.

In near future more functionality and development was promised on http://coincap.io/ so might switch gears again, and I suppose I am fine with either.

If you give me data that has a structure similar to this one, which allows linking IDs in a relational database, lets talk again soon Cheesy

symbol   date   open   high   low   close   volume   market   name   ranknow   variance   volatility   mday   mon   year   wday   yday   delPerc
BTC   02/12/2017   10978.3   11320.2   10905.1   11074.6   5138500000   1.83E+11   Bitcoin   1   0.008695574   0.037482166   2   11   2017   6   335   POSITIVE
BTC   01/12/2017   10198.6   11046.7   9694.65   10975.6   6783120000   1.70E+11   Bitcoin   1   0.070793396   0.123186887   1   11   2017   5   334   POSITIVE
BTC   30/11/2017   9906.79   10801   9202.05   10233.6   8310690000   1.66E+11   Bitcoin   1   0.031934998   0.156245114   30   10   2017   4   333   POSITIVE
BTC   29/11/2017   10077.4   11517.4   9601.03   9888.61   11568800000   1.68E+11   Bitcoin   1   -0.019091662   0.19379569   29   10   2017   3   332   NEG

But I see no ID!  

 Of course though.  We can put together what type of table you want.

I also have a keen interest in 'feature engineering'.  Oftentimes the success of a data mining operation comes down to the features they use.

And we don't need to put everything in one table!  If we have a key or ID associated with a coin we can have a table of time-stamped sentiment features such as positive or negative sentiment, a table wih certain keyword frequencies ('buy' vs 'sell') for each time step and such.  Then when you want to grab subsets or all of the features for an ICO you can simply join them.  Not worried about the costs of joins.

But the more features the better!  Many data mining algorithms perform their own type of features selection.  
.

             ▄▄██████▄
         ▄▄████████████
   ▄▄█████████▀▀   ▀████
 ▄███████████▄      ████
████▀   ▀▀██████▄▄▄████
████      ▄███████████▄
▀████▄▄▄████████▀▀▀████▄
 ▀███████████▀      ████
 ████▀▀▀██████▄▄   ▄███▀
████      ▀███████████▀
████▄   ▄▄█████████▀▀
 ████████████▀▀
  ▀██████▀▀
█████████████████

     ███

██████████

     ██████

███████████

     ███████████████

███████████████████
█████████████████

███    

██████████

██████    

███████████

███████████████    

███████████████████
▄█████████████████████████▄
███████████████████████████
███████████████████████████
██████▀███████▀   ▀▀▀▄█████
█████▌  ▀▀███▌       ▄█████

████▀               █████
█████▄              ███████
██████▄            ████████
███████▄▄        ▄█████████
█████▄▄       ▄████████████
███████████████████████████
███████████████████████████
▀█████████████████████████▀
▄█████████████████████████▄
███████████████████████████
███████████████████████████
██████████████████▀▀███████
█████████████▀▀▀    ███████

███████▀▀▀   ▄▀   ███████
█████▄     ▄█▀     ████████
████████▄ █▀      █████████
█████████▌▐       █████████
██████████ ▄██▄  ██████████
████████████████▄██████████
███████████████████████████
▀█████████████████████████▀
▄█████████████████████████▄
███████████████████████████
███████████████████████████
███████▀           ▀███████
██████  ▄██▀▀▀▀▀█▀▄  ██████

█████  █▀  ▄▄▄  ▀█  █████
██████  █  █████  █  ██████
██████  █▄  ▀▀▀  ▄█  ██████
██████  ▀██▄▄▄▄▄██▀  ██████
███████▄           ▄███████
███████████████████████████
███████████████████████████
▀█████████████████████████▀
▄█████████████████████████▄
███████████████████████████
███████████████████████████
██████████▀█████▀██████████
███████▀  ▀     ▀  ▀███████

█████▌             ▐█████
██████    ██   ██    ██████
█████▌    ▀▀   ▀▀    ▐█████
██████▄  ▄▄▄   ▄▄▄  ▄██████
████████▄▄███████▄▄████████
███████████████████████████
███████████████████████████
▀█████████████████████████▀


na]][/font][/font][/size][/font][/td][td][/td][/tr][/table][/tr
wesR (OP)
Newbie
*
Offline Offline

Activity: 13
Merit: 0


View Profile
December 14, 2017, 11:03:01 PM
 #14

The primary ID could be an agglomerated mixture of coins symbol and date, for instance: BTC02/12/2017, which is quite unique. If a related table also has such a coin and date, it is only about disentangling the exact structure and make it consistent (for instance combining DD/MM/YYYY with DD-MM-YYYY style or the other way around).

I naturally like the idea of having more features, and those could go together with help of the unique ID that combines them all.

A first is a good data quality. I dont know why these days with blockchain and everything, it seems to be the hardest thing to just pull the data from somewhere. Wasnt that the whole premise, that everything is publicly accessible?  Cheesy Grin

Happy to hear your ideas on feature engineering. For my part I think good features come naturally by agglomerating data from different places and putting them all together, running the algorithms and extracting the most relevant variables for buying or selling.
wesR (OP)
Newbie
*
Offline Offline

Activity: 13
Merit: 0


View Profile
December 14, 2017, 11:48:40 PM
 #15

just for your information, I would love to reply to your PMs, but I just got this ridiculously funny notification when trying to do so:


You have exceeded the limit of 2 personal messages per day. Buying a Copper membership may increase your limit.


haha  Grin Grin Grin seriously?!
illiki23
Sr. Member
****
Offline Offline

Activity: 602
Merit: 295


Hail Eris!


View Profile
December 15, 2017, 12:27:31 AM
 #16

just for your information, I would love to reply to your PMs, but I just got this ridiculously funny notification when trying to do so:


You have exceeded the limit of 2 personal messages per day. Buying a Copper membership may increase your limit.


haha  Grin Grin Grin seriously?!

Mull everything over, sleep on it, then message in the morning.  Formally started this project two days ago so a lot is still in the planning phase though I am working on prototypes.


             ▄▄██████▄
         ▄▄████████████
   ▄▄█████████▀▀   ▀████
 ▄███████████▄      ████
████▀   ▀▀██████▄▄▄████
████      ▄███████████▄
▀████▄▄▄████████▀▀▀████▄
 ▀███████████▀      ████
 ████▀▀▀██████▄▄   ▄███▀
████      ▀███████████▀
████▄   ▄▄█████████▀▀
 ████████████▀▀
  ▀██████▀▀
█████████████████

     ███

██████████

     ██████

███████████

     ███████████████

███████████████████
█████████████████

███    

██████████

██████    

███████████

███████████████    

███████████████████
▄█████████████████████████▄
███████████████████████████
███████████████████████████
██████▀███████▀   ▀▀▀▄█████
█████▌  ▀▀███▌       ▄█████

████▀               █████
█████▄              ███████
██████▄            ████████
███████▄▄        ▄█████████
█████▄▄       ▄████████████
███████████████████████████
███████████████████████████
▀█████████████████████████▀
▄█████████████████████████▄
███████████████████████████
███████████████████████████
██████████████████▀▀███████
█████████████▀▀▀    ███████

███████▀▀▀   ▄▀   ███████
█████▄     ▄█▀     ████████
████████▄ █▀      █████████
█████████▌▐       █████████
██████████ ▄██▄  ██████████
████████████████▄██████████
███████████████████████████
▀█████████████████████████▀
▄█████████████████████████▄
███████████████████████████
███████████████████████████
███████▀           ▀███████
██████  ▄██▀▀▀▀▀█▀▄  ██████

█████  █▀  ▄▄▄  ▀█  █████
██████  █  █████  █  ██████
██████  █▄  ▀▀▀  ▄█  ██████
██████  ▀██▄▄▄▄▄██▀  ██████
███████▄           ▄███████
███████████████████████████
███████████████████████████
▀█████████████████████████▀
▄█████████████████████████▄
███████████████████████████
███████████████████████████
██████████▀█████▀██████████
███████▀  ▀     ▀  ▀███████

█████▌             ▐█████
██████    ██   ██    ██████
█████▌    ▀▀   ▀▀    ▐█████
██████▄  ▄▄▄   ▄▄▄  ▄██████
████████▄▄███████▄▄████████
███████████████████████████
███████████████████████████
▀█████████████████████████▀


na]][/font][/font][/size][/font][/td][td][/td][/tr][/table][/tr
illiki23
Sr. Member
****
Offline Offline

Activity: 602
Merit: 295


Hail Eris!


View Profile
December 15, 2017, 01:33:54 AM
Last edit: December 15, 2017, 03:23:20 AM by illiki23
 #17

The shapes are so-called violin plots. They have a density representation of the data distribution, apart from the box that you can also draw into them.

This dataset may be already exhausted, what else would you like to see, anything particular I could draw up for you? I consider this a forum and the right place to ask such questions.

When do you preferably trade, do you have favorite buy or sell days or strong/weak shopping hours?




Can you do 'average price movement' in terms of increase or decrease on the days of week and months of year?

One thing that will throw things off is that we don't have many years of data so the recent surge is going to impact the visualization and drown out yearly patterns.  I mean with only a few years how do you deal with the black swan events that happen in some year but not the others but caused significant change? Throw the year out like it is an outlier?

Seasonal pattern mining is pretty neat though.

             ▄▄██████▄
         ▄▄████████████
   ▄▄█████████▀▀   ▀████
 ▄███████████▄      ████
████▀   ▀▀██████▄▄▄████
████      ▄███████████▄
▀████▄▄▄████████▀▀▀████▄
 ▀███████████▀      ████
 ████▀▀▀██████▄▄   ▄███▀
████      ▀███████████▀
████▄   ▄▄█████████▀▀
 ████████████▀▀
  ▀██████▀▀
█████████████████

     ███

██████████

     ██████

███████████

     ███████████████

███████████████████
█████████████████

███    

██████████

██████    

███████████

███████████████    

███████████████████
▄█████████████████████████▄
███████████████████████████
███████████████████████████
██████▀███████▀   ▀▀▀▄█████
█████▌  ▀▀███▌       ▄█████

████▀               █████
█████▄              ███████
██████▄            ████████
███████▄▄        ▄█████████
█████▄▄       ▄████████████
███████████████████████████
███████████████████████████
▀█████████████████████████▀
▄█████████████████████████▄
███████████████████████████
███████████████████████████
██████████████████▀▀███████
█████████████▀▀▀    ███████

███████▀▀▀   ▄▀   ███████
█████▄     ▄█▀     ████████
████████▄ █▀      █████████
█████████▌▐       █████████
██████████ ▄██▄  ██████████
████████████████▄██████████
███████████████████████████
▀█████████████████████████▀
▄█████████████████████████▄
███████████████████████████
███████████████████████████
███████▀           ▀███████
██████  ▄██▀▀▀▀▀█▀▄  ██████

█████  █▀  ▄▄▄  ▀█  █████
██████  █  █████  █  ██████
██████  █▄  ▀▀▀  ▄█  ██████
██████  ▀██▄▄▄▄▄██▀  ██████
███████▄           ▄███████
███████████████████████████
███████████████████████████
▀█████████████████████████▀
▄█████████████████████████▄
███████████████████████████
███████████████████████████
██████████▀█████▀██████████
███████▀  ▀     ▀  ▀███████

█████▌             ▐█████
██████    ██   ██    ██████
█████▌    ▀▀   ▀▀    ▐█████
██████▄  ▄▄▄   ▄▄▄  ▄██████
████████▄▄███████▄▄████████
███████████████████████████
███████████████████████████
▀█████████████████████████▀


na]][/font][/font][/size][/font][/td][td][/td][/tr][/table][/tr
wesR (OP)
Newbie
*
Offline Offline

Activity: 13
Merit: 0


View Profile
December 16, 2017, 09:50:14 AM
 #18

Yes linking by unique ID is enough and we dont need everything stored in one huge table. We just need to join the information those multiple tables have in common.

small consideration on my side: sentiment analysis might not be the easiest way to start with this. Look at those false positives that sentiment analysis software picks up. I would also imagine this is the place where most people have a very strong opinion, like on all social channels. Perhaps the first victories should be won through numbers alone.

As for the features to use, well there could be some obvious ones for us humans, which we then need to translate into a formula, so that a simple script can pick it up.

Let us talk about how we could approach this (and please post your thoughts below):

I think one of the easiest one could be pump and dump schemes. Why do I think so? Is this not defined as a sharp drop after a time of growth?
So, lets say we have open close prices, we calculate the percentage of increase. OK and now we generate our pump and dump feature: if the coin lost lets say 80% of its value over the time span of one day, I consider that an orchestrated dump. I would create a new variable, that records such an event with a simple 1. Then, I would count the number of times this has happened over the whole lifetime of this coin. And because I dont know whether this is normal or not, I would then compare this result with the other coins I have in my database. Finally, I would make a barchart and sort them by number of occurences to find out which coins suffered from being dumped and whether there are serial pump and dump schemes on particular coins. Perhaps you will think 80% is a bit too much, and you are right. Another variable could be dump 70%, 60%, 50%... and voila you have a lot of new features.

Please keep this discussion as interactive as possible and I will try to execute your ideas.
illiki23
Sr. Member
****
Offline Offline

Activity: 602
Merit: 295


Hail Eris!


View Profile
December 16, 2017, 11:12:21 PM
Last edit: December 17, 2017, 01:13:28 AM by illiki23
 #19

Yes linking by unique ID is enough and we dont need everything stored in one huge table. We just need to join the information those multiple tables have in common.

small consideration on my side: sentiment analysis might not be the easiest way to start with this. Look at those false positives that sentiment analysis software picks up. I would also imagine this is the place where most people have a very strong opinion, like on all social channels. Perhaps the first victories should be won through numbers alone.

As for the features to use, well there could be some obvious ones for us humans, which we then need to translate into a formula, so that a simple script can pick it up.

Let us talk about how we could approach this (and please post your thoughts below):

I think one of the easiest one could be pump and dump schemes. Why do I think so? Is this not defined as a sharp drop after a time of growth?
So, lets say we have open close prices, we calculate the percentage of increase. OK and now we generate our pump and dump feature: if the coin lost lets say 80% of its value over the time span of one day, I consider that an orchestrated dump. I would create a new variable, that records such an event with a simple 1. Then, I would count the number of times this has happened over the whole lifetime of this coin. And because I dont know whether this is normal or not, I would then compare this result with the other coins I have in my database. Finally, I would make a barchart and sort them by number of occurences to find out which coins suffered from being dumped and whether there are serial pump and dump schemes on particular coins. Perhaps you will think 80% is a bit too much, and you are right. Another variable could be dump 70%, 60%, 50%... and voila you have a lot of new features.

Please keep this discussion as interactive as possible and I will try to execute your ideas.


1.  How do you distinguish pumps/dumps with actual surges and delines due to interest?  The hand coded rules you suggest are a little arbitrarily chosen.
    There are places where both supervised and unsupervised learning could help.  Too bad we don't have labeled data for training.  Clustering might be fun.
    When we did GPS stream analysis for seismic event detection the movement time series were clustered into groups which defined based patterns of movement, that was a fun one.
    Many methods for working with and comparing time series, know any good distance measures?  Time warping helps but might not be best because we don't want to match a slow growth/decline pattern with a pump/dump pattern.

2.  Identifying 'pump coins' using these features and additional machine learning would be awesome.  If you have hand coded rules or a machine learning generated model for pump/dump detection then we can use the resulting time series of 'pump/dump', 'natural growth', etc as features.  Again having coins labeled as 'pump' coins would be very very useful.   Supervised and unsupervised approaches could be fruitful.

3.  We have another strategy for identifying shill/scam/pumpdump coins!  I will email you about this as we are still working it out.  It might be useful for identifying shill coins as well as shillers.

4. I have experience doing sentiment analysis.  We built our own system for detecting the emotional polarity of texts which could be useful.  It is not planned for iteration 1 though so don't expect it.

5.  One problem with sentiment analysis is detecting a time window where there are both positive and negative sentiment expresisons.  Don't just want to say that section is neutral, as it happens all the time when shilling occurs as well as naturally.

6.  Some legit coins are pumped and dumped.  Doesn't make them a scam and some eventually do well if you are along term investor.

7.  Volume along with price is important.  One thing that happens pre-pump is an number of small purchases by the pumpers which may or may not become a useful features.

8.  A hype score would be a useful feature.  Most pumpers shill along with the pump.  This is my specialty right now, shilling detection.

9.  I am far far far from an expert.  I got a graduate degree but have many gaps.  I welcome any knowledge and advice.

10.  we will work on different approaches. We will set you up to do with the data as you please.  

             ▄▄██████▄
         ▄▄████████████
   ▄▄█████████▀▀   ▀████
 ▄███████████▄      ████
████▀   ▀▀██████▄▄▄████
████      ▄███████████▄
▀████▄▄▄████████▀▀▀████▄
 ▀███████████▀      ████
 ████▀▀▀██████▄▄   ▄███▀
████      ▀███████████▀
████▄   ▄▄█████████▀▀
 ████████████▀▀
  ▀██████▀▀
█████████████████

     ███

██████████

     ██████

███████████

     ███████████████

███████████████████
█████████████████

███    

██████████

██████    

███████████

███████████████    

███████████████████
▄█████████████████████████▄
███████████████████████████
███████████████████████████
██████▀███████▀   ▀▀▀▄█████
█████▌  ▀▀███▌       ▄█████

████▀               █████
█████▄              ███████
██████▄            ████████
███████▄▄        ▄█████████
█████▄▄       ▄████████████
███████████████████████████
███████████████████████████
▀█████████████████████████▀
▄█████████████████████████▄
███████████████████████████
███████████████████████████
██████████████████▀▀███████
█████████████▀▀▀    ███████

███████▀▀▀   ▄▀   ███████
█████▄     ▄█▀     ████████
████████▄ █▀      █████████
█████████▌▐       █████████
██████████ ▄██▄  ██████████
████████████████▄██████████
███████████████████████████
▀█████████████████████████▀
▄█████████████████████████▄
███████████████████████████
███████████████████████████
███████▀           ▀███████
██████  ▄██▀▀▀▀▀█▀▄  ██████

█████  █▀  ▄▄▄  ▀█  █████
██████  █  █████  █  ██████
██████  █▄  ▀▀▀  ▄█  ██████
██████  ▀██▄▄▄▄▄██▀  ██████
███████▄           ▄███████
███████████████████████████
███████████████████████████
▀█████████████████████████▀
▄█████████████████████████▄
███████████████████████████
███████████████████████████
██████████▀█████▀██████████
███████▀  ▀     ▀  ▀███████

█████▌             ▐█████
██████    ██   ██    ██████
█████▌    ▀▀   ▀▀    ▐█████
██████▄  ▄▄▄   ▄▄▄  ▄██████
████████▄▄███████▄▄████████
███████████████████████████
███████████████████████████
▀█████████████████████████▀


na]][/font][/font][/size][/font][/td][td][/td][/tr][/table][/tr
wesR (OP)
Newbie
*
Offline Offline

Activity: 13
Merit: 0


View Profile
December 17, 2017, 01:34:02 AM
Last edit: December 17, 2017, 01:44:48 AM by wesR
 #20

These are all very valid questions, and I will attempt to answer some of them in my next post.


Tonight I have a new analysis for you: Bitcoin data between 2013-today (end 2017).

Since you talked about seasonality, there were some questions I wanted to have answered, it is a different data set I use now (daily data).

If we look at open vs close positions on a day in percent, can we find out if there is an underlying pattern driving bitcoin buy and sale?

We require some data trimming,  so we wont scale the analysis on extreme events. I found the scale between -10 and +10% works quite well and shows what is interesting without excluding too much relevant information (although it does effectively also exclude 'minor' swings like +-11% per day). Also I dont take all time high or low per day into account, just start and end price of a full day, so dont be surprised that the numbers are relatively low compared to what you hear about cryptocurrencies.


BTC: mday-year_mon: https://imgur.com/lyaRMV3
all data plotted with a color code for day of month segments. Cant see much.

BTC: mday-mon: https://imgur.com/4VYpJsc
The first days of a month are somewhat positive percentage increase times.

BTC: mday-year: https://imgur.com/QGBI4SM
The middle of a month is not the best time to trade although there are up and down swings of course. Data aggregated over years.

BTC: mday-wday: https://imgur.com/q1roIHy
Thursday and Friday towards the end of the month are stronger and weekends are weaker. Mondays and Tuesdays are overall higher.

BTC: mday-overall: https://imgur.com/CUWC8iL
Overall observation regarding time of month for trades. Beginning and end are stronger than the middle.


Digging deper into the weekdays, data was agglomerated with boxplots and violin plots. Results (only between -10 and +10%percent change on opening vs closing price) are displayed below.


BTC: wday-overall: https://imgur.com/Tft36JJ
Are certain weekdays better than others, and does the observation hold for several years or not? Looks like Monday, Tuesday, Friday and Saturday are stronger. Wednesday, Thursday and Sunday trades are average. The violin shape informs us with its distribution that higher densities exist in more positive occurrences in Tuesdays and Saturdays maxima, while Sunday has slightly more negative occurrences than other days.

BTC: wday-year: https://imgur.com/4c6kuCI
Going away from agglomerated data once more, the trend over the years reveals that certain years skewed the agglomeration, however Monday and Tuesday still hold their ground.


Hope you enjoy it as much as I do, next time some analysis on pump and dump.

Pages: [1] 2 »  All
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!