Bitcoin Forum
May 13, 2024, 12:55:58 AM *
News: Latest Bitcoin Core release: 27.0 [Torrent]
 
   Home   Help Search Login Register More  
Pages: [1]
  Print  
Author Topic: Downloadable topic-database?  (Read 230 times)
ltcltcltc (OP)
Newbie
*
Offline Offline

Activity: 26
Merit: 64


View Profile
December 23, 2023, 04:28:02 PM
 #1

I know LoyceV has put together a nice scrapable archive of the topics of this forum.

I want to do a analysis of the BTT forum. I could write a script to scrape the data from LoyceV's archive but I was really wishing someone could point me towards a fully downloadable database to speed things up. Does anyone have a reference?

Cheers!
1715561758
Hero Member
*
Offline Offline

Posts: 1715561758

View Profile Personal Message (Offline)

Ignore
1715561758
Reply with quote  #2

1715561758
Report to moderator
1715561758
Hero Member
*
Offline Offline

Posts: 1715561758

View Profile Personal Message (Offline)

Ignore
1715561758
Reply with quote  #2

1715561758
Report to moderator
"This isn't the kind of software where we can leave so many unresolved bugs that we need a tracker for them." -- Satoshi
Advertised sites are not endorsed by the Bitcoin Forum. They may be unsafe, untrustworthy, or illegal in your jurisdiction.
1715561758
Hero Member
*
Offline Offline

Posts: 1715561758

View Profile Personal Message (Offline)

Ignore
1715561758
Reply with quote  #2

1715561758
Report to moderator
1715561758
Hero Member
*
Offline Offline

Posts: 1715561758

View Profile Personal Message (Offline)

Ignore
1715561758
Reply with quote  #2

1715561758
Report to moderator
LoyceV
Legendary
*
Offline Offline

Activity: 3304
Merit: 16654


Thick-Skinned Gang Leader and Golden Feather 2021


View Profile WWW
December 23, 2023, 05:23:27 PM
 #2

How about you just ask nicely? Wink What do you need, and what's the goal? Or better: will you publish the results on Bitcointalk?

ltcltcltc (OP)
Newbie
*
Offline Offline

Activity: 26
Merit: 64


View Profile
December 23, 2023, 07:44:27 PM
 #3

Haha I didn't think about that indeed.
I came across this sentimental analysis of BTT. It aims to infer a correlation between the temperature/feeling of this forum and the tendency of cryptos like Bitcoin. I found it interesting so I thought I'd try it myself, play around with the data, see what comes out. Might even be an intro to ML. My goal: learning. Oftentimes that leads to interesting results but one can never be certain. Still, if the least comes out of this you'll be the first to read about it.
LoyceV
Legendary
*
Offline Offline

Activity: 3304
Merit: 16654


Thick-Skinned Gang Leader and Golden Feather 2021


View Profile WWW
December 23, 2023, 08:11:53 PM
 #4

I've seen another sentiment analysis, which analysed posts concerning the block size discussion at Fork time. But it's only in my email, I can't find it online.

Have you thought about how you'd handle my data? It's a lot: millions of files, about 100 GB, and most file systems can't handle that many files without many subdirectories.

ltcltcltc (OP)
Newbie
*
Offline Offline

Activity: 26
Merit: 64


View Profile
December 23, 2023, 09:30:35 PM
 #5

How is your data classified? Tree-structure or raw recent-first stack? In the second case, I'd probably reorganize it myself into a tree-like structure. This way should be quicker to filter out some data. Maybe start with the Bitcoin discussion board, then scale up.

Also I can always chop those 100GB into various time series. Perhaps the 2020-2022 time period contains jucier data than the rest (due to the rise and drop of BTC). Everything can be explored.
LoyceV
Legendary
*
Offline Offline

Activity: 3304
Merit: 16654


Thick-Skinned Gang Leader and Golden Feather 2021


View Profile WWW
December 23, 2023, 10:10:24 PM
 #6

How is your data classified?
See the link you started this topic with. WYSIWYG. Best I can do is a post number.

ltcltcltc (OP)
Newbie
*
Offline Offline

Activity: 26
Merit: 64


View Profile
December 23, 2023, 11:26:11 PM
Last edit: December 24, 2023, 07:51:21 AM by ltcltcltc
 #7

Thanks, btw what do you mean by WYSIWYG? I get what it stands for, and that it's CS slang, but how does it apply here? Also, I've seen that your website offers the functionality of showing any given user's messages. Is there an analogous way of filtering messages by board? Like: "showing Economy messages".

PD. I've seen your other work, quite impressive!
Vod
Legendary
*
Offline Offline

Activity: 3696
Merit: 3074


Licking my boob since 1970


View Profile WWW
December 23, 2023, 11:41:09 PM
 #8

Thanks, btw what do you mean by WYSIWYG?

Odd that you have never googled that phrase, but you've found an obscure website.   Could it have something to do with my recent suggestion?   Wink

https://nastyscam.com - landing page up     https://vod.fan - advanced image hosting - coming soon!
OGNasty has early onset dementia; keep this in mind when discussing his past actions.
LoyceV
Legendary
*
Offline Offline

Activity: 3304
Merit: 16654


Thick-Skinned Gang Leader and Golden Feather 2021


View Profile WWW
December 24, 2023, 07:55:32 AM
 #9

Thanks, btw what do you mean by WYSIWYG? I get what it stands for, and that it's CS slang, but how does it apply here?
I don't know what "CS slang" is, but WYSIWYG stands for What You See Is What You Get. That's literally how my data files are.

Quote
Also, I've seen that your website offers the functionality of showing any given user's messages. Is there an analogous way of filtering messages by board? Like: "showing Economy messages".
Nope. That's TryNinja's specialty. Again: just ask nicely Smiley

ltcltcltc (OP)
Newbie
*
Offline Offline

Activity: 26
Merit: 64


View Profile
December 24, 2023, 08:37:08 AM
 #10

CS means computer science.

Quote
That's literally how my data files are.

So, please tell me if I'm wrong: you just scrape content with limited data treatment, so the filtering functionalities you offer are the same that the forum offers, i.e. sorting by chronological order, viewing the posts inside a topic and filtering by user.

Quote
Again: just ask nicely

Ok! I thought by the previous message that you were declining. But if that's not the case then I'd be super grateful if you shared your database with me to avoid the undesirable task of rescraping the scraped!

Ninja's website looks handy too. Harder to scrape though.
ABCbits
Legendary
*
Offline Offline

Activity: 2870
Merit: 7496


Crypto Swap Exchange


View Profile
December 24, 2023, 08:55:41 AM
Merited by LoyceV (4), ltcltcltc (1)
 #11

Quote
Also, I've seen that your website offers the functionality of showing any given user's messages. Is there an analogous way of filtering messages by board? Like: "showing Economy messages".
Nope. That's TryNinja's specialty. Again: just ask nicely Smiley

Link you mentioned TryNinja already offer API where it's documentation can be seen on https://docs.ninjastic.space/. If OP willing to write script which download topic/reply from the API and wait for several days, it should be viable option.

█▀▀▀











█▄▄▄
▀▀▀▀▀▀▀▀▀▀▀
e
▄▄▄▄▄▄▄▄▄▄▄
█████████████
████████████▄███
██▐███████▄█████▀
█████████▄████▀
███▐████▄███▀
████▐██████▀
█████▀█████
███████████▄
████████████▄
██▄█████▀█████▄
▄█████████▀█████▀
███████████▀██▀
████▀█████████
▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀
c.h.
▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄
▀▀▀█











▄▄▄█
▄██████▄▄▄
█████████████▄▄
███████████████
███████████████
███████████████
███████████████
███░░█████████
███▌▐█████████
█████████████
███████████▀
██████████▀
████████▀
▀██▀▀
LoyceV
Legendary
*
Offline Offline

Activity: 3304
Merit: 16654


Thick-Skinned Gang Leader and Golden Feather 2021


View Profile WWW
December 24, 2023, 08:59:02 AM
 #12

So, please tell me if I'm wrong: you just scrape content with limited data treatment, so the filtering functionalities you offer are the same that the forum offers, i.e. sorting by chronological order, viewing the posts inside a topic and filtering by user.
I don't process anything, I just keep the raw HTML for archiving purposes. Although I also keep a list per user and per topic.

Quote
Ok! I thought by the previous message that you were declining.
I meant ask TryNinja nicely if you want for instance only data from the Economics board.

Quote
But if that's not the case then I'd be super grateful if you shared your database with me to avoid the undesirable task of rescraping the scraped!
I don't have a database. I just have "data". And it's a lot. Hence my question if you know how you're going to handle it. Old posts for instance are stored in a different format, although I may still have a backup of individual files for each post. Update: found it. That's the part where you'll get millions of files in one directory. So you'll have to be a bit more specific before I just dump a shitload of files on you Tongue

Quote
Ninja's website looks handy too. Harder to scrape though.
Don't scrape, ask Tongue

TryNinja
Legendary
*
Offline Offline

Activity: 2828
Merit: 6989



View Profile WWW
December 24, 2023, 09:23:11 AM
Merited by LoyceV (4)
 #13

I’m willing to give anyone a .csv or similar with any data that I have. Like Loyce said, all you gotta do is ask. Smiley

.
.HUGE.
▄██████████▄▄
▄█████████████████▄
▄█████████████████████▄
▄███████████████████████▄
▄█████████████████████████▄
███████▌██▌▐██▐██▐████▄███
████▐██▐████▌██▌██▌██▌██
█████▀███▀███▀▐██▐██▐█████

▀█████████████████████████▀

▀███████████████████████▀

▀█████████████████████▀

▀█████████████████▀

▀██████████▀▀
█▀▀▀▀











█▄▄▄▄
▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀
.
CASINSPORTSBOOK
▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄
▀▀▀▀█











▄▄▄▄█
digaran
Copper Member
Hero Member
*****
Offline Offline

Activity: 1330
Merit: 899

🖤😏


View Profile
December 24, 2023, 10:34:34 AM
 #14

My man triple ltc, can we start over? The analysis you linked above seems to be interesting, can you do a special analysis on price changes and my appearance on reputation and meta boards in the past 6 month? Lol, I mean is there a way to do that?
Earlier I thought you are one of the trolls harassing me, so apology for snapping at you. I appreciate the effort. 😉

🖤😏
ltcltcltc (OP)
Newbie
*
Offline Offline

Activity: 26
Merit: 64


View Profile
December 24, 2023, 01:47:00 PM
 #15

I just keep the raw HTML for archiving purposes.
Ok. Then perhaps TryNinja's database fits better my purposes.

I’m willing to give anyone a .csv or similar with any data that I have.
I think a tree-like structure (board/subboard/topic/message) would work best so as to study conversations as a whole more than individual messages, since I don't care about individual opinions as much as I do about global sentiments.
So maybe JSON? Does this work for you? I mentioned the Economy board as an example; ideally I'd want the whole data.

It would be a super favour you'd be doing me.
ltcltcltc (OP)
Newbie
*
Offline Offline

Activity: 26
Merit: 64


View Profile
December 24, 2023, 03:54:27 PM
 #16

It's a good idea too. I'll .append() it to the list. Ltc stands for other than litecoin.
TryNinja
Legendary
*
Offline Offline

Activity: 2828
Merit: 6989



View Profile WWW
December 24, 2023, 05:14:01 PM
 #17

I think a tree-like structure (board/subboard/topic/message) would work best so as to study conversations as a whole more than individual messages, since I don't care about individual opinions as much as I do about global sentiments.
So maybe JSON? Does this work for you? I mentioned the Economy board as an example; ideally I'd want the whole data.
JSON is probably fine. Could you provide an example of the format you want with a dummy post?

.
.HUGE.
▄██████████▄▄
▄█████████████████▄
▄█████████████████████▄
▄███████████████████████▄
▄█████████████████████████▄
███████▌██▌▐██▐██▐████▄███
████▐██▐████▌██▌██▌██▌██
█████▀███▀███▀▐██▐██▐█████

▀█████████████████████████▀

▀███████████████████████▀

▀█████████████████████▀

▀█████████████████▀

▀██████████▀▀
█▀▀▀▀











█▄▄▄▄
▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀
.
CASINSPORTSBOOK
▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄
▀▀▀▀█











▄▄▄▄█
ltcltcltc (OP)
Newbie
*
Offline Offline

Activity: 26
Merit: 64


View Profile
December 25, 2023, 01:21:33 AM
Last edit: January 03, 2024, 04:32:31 PM by ltcltcltc
 #18

JSON is probably fine. Could you provide an example of the format you want with a dummy post?

Suppose there are boards B1 and B2. B1 has child boards B11 and B12. Each board is represented as a folder with the same name. The main foder, F, could be structured as follows (every instance of content.txt represents a file; the rest are folders).

F
├───B1
│   ├───content.txt
│   ├───B11
│   │   └───content.txt (*)
│   └───B12
│       └───content.txt
└───B2
    └───content.txt

Now here's what a content.txt file could look like. Suppose we're looking at (*).

{
    "name": "B11",
    "topics": [
        {
            "topicId": "1111",
            "subject": "Help me out plz",
            "op": {"userId": "3596085", "username": "ltcltcltc", "activity": 26, "merit": 60},
            "time": <timestamp of the original post>,
            "messages": [
                {
                    "msgId": "6666",
                    "author": {"userId": "3596085", "username": "ltcltcltc", "activity": 26, "merit": 60},
                    "time": <timestamp of this message (in this case the original post)>,
                    "merited": "2",
                    "message": "Hey does anyone know how to speed up ecdsa signature bruteforcing?"
                },
                {
                    "msgId": "6699",
                    "author": {"userId": "3597570", "username": "aleph1", "activity": 1, "merit": 23},
                    "time": <timestamp of this message>,
                    "merited": 0,
                    "message": "Stop wasting your time."
                }
            ]
        },
        {
            "topicId": "2222",
            "subject": "Test. Do not answer.",
            "op": {"userId": "3597570", "username": "aleph1", "activity": 1, "merit": 23},
            "time": <timestamp of the original post>,
            "messages": [
                {
                    "msgId": "8008",
                    "author": {"userId": "3597570", "username": "aleph1", "activity": 1, "merit": 23},
                    "time": <timestamp of this message (in this case the original post)>,
                    "merited": "3",
                    "message": "Testy test."
                }
            ]
        }
    ]
}


I didn't give any example of timestamp because I don't know what your time format is, but I think I'd prefer Unix time. Also note the redundancy: the topic's timestamp is the same as the timestamp on the first message of said topic. The topics inside each board are ordered chronologically (older first) and the messages inside each topic too.

What do you think about this format?
Pages: [1]
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!