ltcltcltc (OP)
Newbie
Offline
Activity: 26
Merit: 64
|
|
December 23, 2023, 04:28:02 PM |
|
I know LoyceV has put together a nice scrapable archive of the topics of this forum. I want to do a analysis of the BTT forum. I could write a script to scrape the data from LoyceV's archive but I was really wishing someone could point me towards a fully downloadable database to speed things up. Does anyone have a reference? Cheers!
|
|
|
|
|
|
|
|
"This isn't the kind of software where we can leave so many unresolved bugs that we need a tracker for them." -- Satoshi
|
|
|
Advertised sites are not endorsed by the Bitcoin Forum. They may be unsafe, untrustworthy, or illegal in your jurisdiction.
|
|
|
LoyceV
Legendary
Offline
Activity: 3304
Merit: 16654
Thick-Skinned Gang Leader and Golden Feather 2021
|
|
December 23, 2023, 05:23:27 PM |
|
How about you just ask nicely? What do you need, and what's the goal? Or better: will you publish the results on Bitcointalk?
|
|
|
|
ltcltcltc (OP)
Newbie
Offline
Activity: 26
Merit: 64
|
|
December 23, 2023, 07:44:27 PM |
|
Haha I didn't think about that indeed. I came across this sentimental analysis of BTT. It aims to infer a correlation between the temperature/feeling of this forum and the tendency of cryptos like Bitcoin. I found it interesting so I thought I'd try it myself, play around with the data, see what comes out. Might even be an intro to ML. My goal: learning. Oftentimes that leads to interesting results but one can never be certain. Still, if the least comes out of this you'll be the first to read about it.
|
|
|
|
LoyceV
Legendary
Offline
Activity: 3304
Merit: 16654
Thick-Skinned Gang Leader and Golden Feather 2021
|
|
December 23, 2023, 08:11:53 PM |
|
I've seen another sentiment analysis, which analysed posts concerning the block size discussion at Fork time. But it's only in my email, I can't find it online.
Have you thought about how you'd handle my data? It's a lot: millions of files, about 100 GB, and most file systems can't handle that many files without many subdirectories.
|
|
|
|
ltcltcltc (OP)
Newbie
Offline
Activity: 26
Merit: 64
|
|
December 23, 2023, 09:30:35 PM |
|
How is your data classified? Tree-structure or raw recent-first stack? In the second case, I'd probably reorganize it myself into a tree-like structure. This way should be quicker to filter out some data. Maybe start with the Bitcoin discussion board, then scale up.
Also I can always chop those 100GB into various time series. Perhaps the 2020-2022 time period contains jucier data than the rest (due to the rise and drop of BTC). Everything can be explored.
|
|
|
|
LoyceV
Legendary
Offline
Activity: 3304
Merit: 16654
Thick-Skinned Gang Leader and Golden Feather 2021
|
|
December 23, 2023, 10:10:24 PM |
|
How is your data classified? See the link you started this topic with. WYSIWYG. Best I can do is a post number.
|
|
|
|
ltcltcltc (OP)
Newbie
Offline
Activity: 26
Merit: 64
|
|
December 23, 2023, 11:26:11 PM Last edit: December 24, 2023, 07:51:21 AM by ltcltcltc |
|
Thanks, btw what do you mean by WYSIWYG? I get what it stands for, and that it's CS slang, but how does it apply here? Also, I've seen that your website offers the functionality of showing any given user's messages. Is there an analogous way of filtering messages by board? Like: "showing Economy messages".
PD. I've seen your other work, quite impressive!
|
|
|
|
Vod
Legendary
Offline
Activity: 3696
Merit: 3074
Licking my boob since 1970
|
|
December 23, 2023, 11:41:09 PM |
|
Thanks, btw what do you mean by WYSIWYG?
Odd that you have never googled that phrase, but you've found an obscure website. Could it have something to do with my recent suggestion?
|
https://nastyscam.com - landing page up https://vod.fan - advanced image hosting - coming soon! OGNasty has early onset dementia; keep this in mind when discussing his past actions.
|
|
|
LoyceV
Legendary
Offline
Activity: 3304
Merit: 16654
Thick-Skinned Gang Leader and Golden Feather 2021
|
|
December 24, 2023, 07:55:32 AM |
|
Thanks, btw what do you mean by WYSIWYG? I get what it stands for, and that it's CS slang, but how does it apply here? I don't know what "CS slang" is, but WYSIWYG stands for What You See Is What You Get. That's literally how my data files are. Also, I've seen that your website offers the functionality of showing any given user's messages. Is there an analogous way of filtering messages by board? Like: "showing Economy messages". Nope. That's TryNinja's specialty. Again: just ask nicely
|
|
|
|
ltcltcltc (OP)
Newbie
Offline
Activity: 26
Merit: 64
|
|
December 24, 2023, 08:37:08 AM |
|
CS means computer science. That's literally how my data files are. So, please tell me if I'm wrong: you just scrape content with limited data treatment, so the filtering functionalities you offer are the same that the forum offers, i.e. sorting by chronological order, viewing the posts inside a topic and filtering by user. Again: just ask nicely Ok! I thought by the previous message that you were declining. But if that's not the case then I'd be super grateful if you shared your database with me to avoid the undesirable task of rescraping the scraped! Ninja's website looks handy too. Harder to scrape though.
|
|
|
|
ABCbits
Legendary
Offline
Activity: 2870
Merit: 7496
Crypto Swap Exchange
|
Also, I've seen that your website offers the functionality of showing any given user's messages. Is there an analogous way of filtering messages by board? Like: "showing Economy messages". Nope. That's TryNinja's specialty. Again: just ask nicely Link you mentioned TryNinja already offer API where it's documentation can be seen on https://docs.ninjastic.space/. If OP willing to write script which download topic/reply from the API and wait for several days, it should be viable option.
|
|
|
|
LoyceV
Legendary
Offline
Activity: 3304
Merit: 16654
Thick-Skinned Gang Leader and Golden Feather 2021
|
|
December 24, 2023, 08:59:02 AM |
|
So, please tell me if I'm wrong: you just scrape content with limited data treatment, so the filtering functionalities you offer are the same that the forum offers, i.e. sorting by chronological order, viewing the posts inside a topic and filtering by user. I don't process anything, I just keep the raw HTML for archiving purposes. Although I also keep a list per user and per topic. Ok! I thought by the previous message that you were declining. I meant ask TryNinja nicely if you want for instance only data from the Economics board. But if that's not the case then I'd be super grateful if you shared your database with me to avoid the undesirable task of rescraping the scraped! I don't have a database. I just have "data". And it's a lot. Hence my question if you know how you're going to handle it. Old posts for instance are stored in a different format, although I may still have a backup of individual files for each post. Update: found it. That's the part where you'll get millions of files in one directory. So you'll have to be a bit more specific before I just dump a shitload of files on you Ninja's website looks handy too. Harder to scrape though. Don't scrape, ask
|
|
|
|
TryNinja
Legendary
Offline
Activity: 2828
Merit: 6989
|
|
December 24, 2023, 09:23:11 AM |
|
I’m willing to give anyone a .csv or similar with any data that I have. Like Loyce said, all you gotta do is ask.
|
. .HUGE. | | | | | | █▀▀▀▀ █ █ █ █ █ █ █ █ █ █ █ █▄▄▄▄ | ▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀ . CASINO & SPORTSBOOK ▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄ | ▀▀▀▀█ █ █ █ █ █ █ █ █ █ █ █ ▄▄▄▄█ | | |
|
|
|
digaran
Copper Member
Hero Member
Offline
Activity: 1330
Merit: 899
🖤😏
|
|
December 24, 2023, 10:34:34 AM |
|
My man triple ltc, can we start over? The analysis you linked above seems to be interesting, can you do a special analysis on price changes and my appearance on reputation and meta boards in the past 6 month? Lol, I mean is there a way to do that? Earlier I thought you are one of the trolls harassing me, so apology for snapping at you. I appreciate the effort. 😉
|
🖤😏
|
|
|
ltcltcltc (OP)
Newbie
Offline
Activity: 26
Merit: 64
|
|
December 24, 2023, 01:47:00 PM |
|
I just keep the raw HTML for archiving purposes.
Ok. Then perhaps TryNinja's database fits better my purposes. I’m willing to give anyone a .csv or similar with any data that I have.
I think a tree-like structure (board/subboard/topic/message) would work best so as to study conversations as a whole more than individual messages, since I don't care about individual opinions as much as I do about global sentiments. So maybe JSON? Does this work for you? I mentioned the Economy board as an example; ideally I'd want the whole data. It would be a super favour you'd be doing me.
|
|
|
|
ltcltcltc (OP)
Newbie
Offline
Activity: 26
Merit: 64
|
|
December 24, 2023, 03:54:27 PM |
|
It's a good idea too. I'll .append() it to the list. Ltc stands for other than litecoin.
|
|
|
|
TryNinja
Legendary
Offline
Activity: 2828
Merit: 6989
|
|
December 24, 2023, 05:14:01 PM |
|
I think a tree-like structure (board/subboard/topic/message) would work best so as to study conversations as a whole more than individual messages, since I don't care about individual opinions as much as I do about global sentiments. So maybe JSON? Does this work for you? I mentioned the Economy board as an example; ideally I'd want the whole data.
JSON is probably fine. Could you provide an example of the format you want with a dummy post?
|
. .HUGE. | | | | | | █▀▀▀▀ █ █ █ █ █ █ █ █ █ █ █ █▄▄▄▄ | ▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀ . CASINO & SPORTSBOOK ▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄ | ▀▀▀▀█ █ █ █ █ █ █ █ █ █ █ █ ▄▄▄▄█ | | |
|
|
|
ltcltcltc (OP)
Newbie
Offline
Activity: 26
Merit: 64
|
|
December 25, 2023, 01:21:33 AM Last edit: January 03, 2024, 04:32:31 PM by ltcltcltc |
|
JSON is probably fine. Could you provide an example of the format you want with a dummy post?
Suppose there are boards B1 and B2. B1 has child boards B11 and B12. Each board is represented as a folder with the same name. The main foder, F, could be structured as follows (every instance of content.txt represents a file; the rest are folders). F ├───B1 │ ├───content.txt │ ├───B11 │ │ └───content.txt (*) │ └───B12 │ └───content.txt └───B2 └───content.txt
Now here's what a content.txt file could look like. Suppose we're looking at (*). { "name": "B11", "topics": [ { "topicId": "1111", "subject": "Help me out plz", "op": {"userId": "3596085", "username": "ltcltcltc", "activity": 26, "merit": 60}, "time": <timestamp of the original post>, "messages": [ { "msgId": "6666", "author": {"userId": "3596085", "username": "ltcltcltc", "activity": 26, "merit": 60}, "time": <timestamp of this message (in this case the original post)>, "merited": "2", "message": "Hey does anyone know how to speed up ecdsa signature bruteforcing?" }, { "msgId": "6699", "author": {"userId": "3597570", "username": "aleph1", "activity": 1, "merit": 23}, "time": <timestamp of this message>, "merited": 0, "message": "Stop wasting your time." } ] }, { "topicId": "2222", "subject": "Test. Do not answer.", "op": {"userId": "3597570", "username": "aleph1", "activity": 1, "merit": 23}, "time": <timestamp of the original post>, "messages": [ { "msgId": "8008", "author": {"userId": "3597570", "username": "aleph1", "activity": 1, "merit": 23}, "time": <timestamp of this message (in this case the original post)>, "merited": "3", "message": "Testy test." } ] } ] }
I didn't give any example of timestamp because I don't know what your time format is, but I think I'd prefer Unix time. Also note the redundancy: the topic's timestamp is the same as the timestamp on the first message of said topic. The topics inside each board are ordered chronologically (older first) and the messages inside each topic too. What do you think about this format?
|
|
|
|
|