Bitcoin Forum
July 14, 2024, 09:22:09 PM *
News: Latest Bitcoin Core release: 27.0 [Torrent]
 
   Home   Help Search Login Register More  
Pages: [1] 2 »  All
  Print  
Author Topic: Please help me to get details on boards, sub-boards of posts/topics  (Read 382 times)
tranthidung (OP)
Legendary
*
Offline Offline

Activity: 2338
Merit: 4141


Farewell o_e_l_e_o


View Profile WWW
December 17, 2019, 01:44:08 AM
 #1

Hi,

I don't know how to get the details on boards or sub-boards of each posts or topics. I know from the numbers belong to each boards/ sub-boards, I can get their boards'/ sub-boards' names, but the problems for me is how to get those numbers.

For example, in Meta, we have the board number at 24: https://bitcointalk.org/index.php?action=post;board=24.0

I made that post in Meta board, in the topic "Merit & new rank requirements"
https://bitcointalk.org/index.php?topic=2818350.msg53358425#msg53358425
There are numbers for topic (2818350), and for that post (53358425), but can you guide me how to get the number of Meta board (24) if I only have that one: https://bitcointalk.org/index.php?topic=2818350.msg53358425#msg53358425

I asked this because if it is easy to do, I want to get statistics on merit distributions over boards, sub-boards from raw merit data, dumped by theymos.

If I can do this, I will make a thread like that one, but for the whole merit history.
Daily merits over local boards

Thank you.

▄▄███████▄▄
▄██████████████▄
▄██████████████████▄
▄████▀▀▀▀███▀▀▀▀█████▄
▄█████████████▄█▀████▄
███████████▄███████████
██████████▄█▀███████████
██████████▀████████████
▀█████▄█▀█████████████▀
▀████▄▄▄▄███▄▄▄▄████▀
▀██████████████████▀
▀███████████████▀
▀▀███████▀▀
.
 MΞTAWIN  THE FIRST WEB3 CASINO   
.
.. PLAY NOW ..
PrimeNumber7
Copper Member
Legendary
*
Offline Offline

Activity: 1638
Merit: 1899

Amazon Prime Member #7


View Profile
December 17, 2019, 02:57:10 AM
 #2

Are you trying to figure out what board a particular thread is in from visiting the thread? Or do you want to know based on other information?
tranthidung (OP)
Legendary
*
Offline Offline

Activity: 2338
Merit: 4141


Farewell o_e_l_e_o


View Profile WWW
December 17, 2019, 03:02:19 AM
 #3

Are you trying to figure out what board a particular thread is in from visiting the thread? Or do you want to know based on other information?
It is easy to visit one post to see which board / sub-board it was posted in, but it is a serious issue if you have hundreds or thousands of posts to check. I have to check it automatically with machine, not handy.

I would like to check it with the available figures of topic or post numbers (as I described in OP).

▄▄███████▄▄
▄██████████████▄
▄██████████████████▄
▄████▀▀▀▀███▀▀▀▀█████▄
▄█████████████▄█▀████▄
███████████▄███████████
██████████▄█▀███████████
██████████▀████████████
▀█████▄█▀█████████████▀
▀████▄▄▄▄███▄▄▄▄████▀
▀██████████████████▀
▀███████████████▀
▀▀███████▀▀
.
 MΞTAWIN  THE FIRST WEB3 CASINO   
.
.. PLAY NOW ..
PrimeNumber7
Copper Member
Legendary
*
Offline Offline

Activity: 1638
Merit: 1899

Amazon Prime Member #7


View Profile
December 17, 2019, 03:23:24 AM
Merited by LoyceV (2), nc50lc (2), tranthidung (1), TheBeardedBaby (1)
 #4

The topic number of this thread, 5210219 I believe was assigned because the previous thread that was created was topic number 5210218. I don't believe it has anything to do with the fact it was created in the meta sub-board, and if it were to be moved to another sub, I don't believe the topic number would change.

If you visit a single post in a thread, you know which board every post in that thread is located in. So if you need to check for 100 posts in the same thread, you only need to visit one page in that thread. If you have a relational database, you can create a new table that has a list of each thread you are looking at, and record the board ID, and the topic number (along with any other information you need about the thread).

If you are visiting a thread, you can tell your program to look at the following to get the board number the thread is on:
find all the "div" tags with the class "nav" ("1st Query")
From the 1st Query, search for all the links ("2nd Query")
From the 2nd Query, you can search for all the URLs that contain "php?board=" ("3rd Query").
From the 3rd Query, you can isolate the board number for each link ("4th Query")
From the 4th Query, convert the result to an integer (if you haven't already), and find the highest result. This is the board number the thread is located in.
tranthidung (OP)
Legendary
*
Offline Offline

Activity: 2338
Merit: 4141


Farewell o_e_l_e_o


View Profile WWW
December 17, 2019, 03:42:57 AM
 #5

Thanks, looks like a good guide but I have to learn about programming to do this. It surely takes time.  Smiley

▄▄███████▄▄
▄██████████████▄
▄██████████████████▄
▄████▀▀▀▀███▀▀▀▀█████▄
▄█████████████▄█▀████▄
███████████▄███████████
██████████▄█▀███████████
██████████▀████████████
▀█████▄█▀█████████████▀
▀████▄▄▄▄███▄▄▄▄████▀
▀██████████████████▀
▀███████████████▀
▀▀███████▀▀
.
 MΞTAWIN  THE FIRST WEB3 CASINO   
.
.. PLAY NOW ..
LoyceV
Legendary
*
Offline Offline

Activity: 3374
Merit: 17018


Thick-Skinned Gang Leader and Golden Feather 2021


View Profile WWW
December 17, 2019, 07:36:25 AM
 #6

I asked this because if it is easy to do, I want to get statistics on merit distributions over boards, sub-boards from raw merit data, dumped by theymos.
Doesn't DdmrDdmr have this data available?
I never kept track of all topic locations.

Last Friday, there were 60612 different topics that received Merit. That means (with 1 second delay) you can scrape the data you need within 24 hours.
I do have all titles for 177716 merited posts, but that's not going to help you here.

DdmrDdmr
Legendary
*
Offline Offline

Activity: 2380
Merit: 10876


There are lies, damned lies and statistics. MTwain


View Profile WWW
December 17, 2019, 07:43:29 AM
Last edit: December 17, 2019, 07:58:40 AM by DdmrDdmr
 #7

<...>
Basically you need to derive that information not from the message Id information (https://bitcointalk.org/index.php?topic=2818350.msg53358425#msg53358425), but from the page itself where the message is displayed (Bitcoin Forum > Other > Meta > Merit & new rank requirements).

The path needs to be parsed, and constitutes a text based solution (not an Id based one). On top of that, you need to do some cleansing when the moderators are included in the path (i.e. https://bitcointalk.org/index.php?topic=5103501.msg49500922#msg49500922 has as a path Bitcoin Forum > Economy > Marketplace > Goods > Collectibles (Moderators: malevolent, Cyrus, hilariousandco) > [WTS] Old peseta coins and few 1800 coins. Whole lot 500 eur now.), or when a title includes a “>” character (which is not a subselvel).

It's what I do for the Dashboard, but there are some issues and I do not work with all path levels.

As I said, the output I create it's text base, not Id based.

Something like this :
https://docs.google.com/spreadsheets/d/1hnuC0EadNbxm4gcK7GOTobCg-IWsOCAw1UCjunuIQlY/edit?usp=sharing

I only cleanse three levels in the path. The fourth is interesting to enter childboards, but I only cleans it for the Spanish local board (í've ommited levels 4..10 since they do not fit, and are not cleansed homogeneusly).

Every now and then, the data should be regenerated retrospectively to cover posts being deleted and moved. It's a bit of a p.i.t.a. so I only do it every few months or so.

LoyceV
Legendary
*
Offline Offline

Activity: 3374
Merit: 17018


Thick-Skinned Gang Leader and Golden Feather 2021


View Profile WWW
December 17, 2019, 07:49:22 AM
 #8

The path needs to be parsed, and constitutes a text based solution (not an Id based one). On top of that, you need to do some cleansing when the moderators are included in the path
Without much new work (only a lot of scraping), I can get OP a list like this one (but with the topic/msg-ID instead of a Merit count).
@tranthidung: can you work with that?

tranthidung (OP)
Legendary
*
Offline Offline

Activity: 2338
Merit: 4141


Farewell o_e_l_e_o


View Profile WWW
December 17, 2019, 09:45:32 AM
Last edit: December 17, 2019, 10:00:25 AM by tranthidung
 #9

Without much new work (only a lot of scraping), I can get OP a list like this one (but with the topic/msg-ID instead of a Merit count).
@tranthidung: can you work with that?
It is not perfect for me but looks cool and you have available data on it. That is a plus point, sure.
But it looks a little bit messy and I can not use it.

Example:
Quote
65 Merit earned in Bitcoin Forum > Other > Beginners & Help
1 Merit earned in Bitcoin Forum > Alternate cryptocurrencies > Marketplace (Altcoins) > Service Discussion (Altcoins)
Those two lines have very different formats (different hierarchical formats, I meant).

In addition, I doubt that which types of data you have:
  • You directly scraped data with boards/ sub-boards names (Beginners & Help; Meta, ie.).
  • You scraped data with boards' / sub-boards' id numbers, from which you define their labels.
If what you did is [1], you have to do more works to help me.
I need only details on subboards (if posts released in subboards), from which I will trace back the main boards of posts.
In this format:
Code:
amount board/subboard_id 
65 39
1 198
39 is for Beginners & help
198 is for Service Discussion (Altcoins)

If what you did is [2], it is perfect because you can easily help me with your available data.

I don't know which format you can dump if you are going to help me.
That one is theymos's merit data dump:
Quote
time    amount    msg    user_from    user_to
1576204400   2   5209104.msg53329921   18321   307884

This one is what I need.
Code:
time                amount    board/subboard_idnumber    user_from    user_to 
1576204400      2    24                                  18321   307884
Each variables separeted by : or , or tab or space. All are fine for me.

If you have id numbers of boards/ subboards, please give me a dump too. According to https://bitcointalk.org/index.php?action=stats, there are 252 boards at the moment.
If you don't have that list, I am going to get it myself.  Smiley

In short, it is better if you give me data dump with numbers, not texts because I can only directly import data from your club with numbers (takes me around 15 - 20 seconds only). With text, I have to copy and paste data and do some other steps, but I can not copy and paste it (not fully load on browser as I discussed with you previously.)


< ... >
Thanks for your informative explanations, but I have not yet had knowledge and skills to scrap data (in any methods). I will try to learn it.  Smiley

▄▄███████▄▄
▄██████████████▄
▄██████████████████▄
▄████▀▀▀▀███▀▀▀▀█████▄
▄█████████████▄█▀████▄
███████████▄███████████
██████████▄█▀███████████
██████████▀████████████
▀█████▄█▀█████████████▀
▀████▄▄▄▄███▄▄▄▄████▀
▀██████████████████▀
▀███████████████▀
▀▀███████▀▀
.
 MΞTAWIN  THE FIRST WEB3 CASINO   
.
.. PLAY NOW ..
LoyceV
Legendary
*
Offline Offline

Activity: 3374
Merit: 17018


Thick-Skinned Gang Leader and Golden Feather 2021


View Profile WWW
December 17, 2019, 07:02:25 PM
 #10

Those two lines have very different formats (different hierarchical formats, I meant).
It's just taken from the HTML links on top of each page. Different child boards have different "depths", that's what makes it "messy".

Quote
This one is what I need.
Code:
time                amount    board/subboard_idnumber    user_from    user_to 
1576204400      2    24                                  18321   307884
How about just the topicID (say: "5209104" taken from the msgID: "5209104.msg53329921") and the board? Is that enough to combine with my merit.all.txt ?
That shouldn't be too hard to scrape.

Quote
If you have id numbers of boards/ subboards, please give me a dump too. According to https://bitcointalk.org/index.php?action=stats, there are 252 boards at the moment.
If you don't have that list, I am going to get it myself.  Smiley
See Ignore Boards Preferences, then view the page source.

Quote
Thanks for your informative explanations, but I have not yet had knowledge and skills to scrap data (in any methods). I will try to learn it.  Smiley
This can literally be done in one (long) line of code Smiley

PrimeNumber7
Copper Member
Legendary
*
Offline Offline

Activity: 1638
Merit: 1899

Amazon Prime Member #7


View Profile
December 17, 2019, 07:16:55 PM
Merited by tranthidung (1)
 #11

You will need to run a for loop through the links where the boards are to get the board number.
tranthidung (OP)
Legendary
*
Offline Offline

Activity: 2338
Merit: 4141


Farewell o_e_l_e_o


View Profile WWW
December 18, 2019, 12:54:08 AM
 #12

Those two lines have very different formats (different hierarchical formats, I meant).
It's just taken from the HTML links on top of each page. Different child boards have different "depths", that's what makes it "messy".

Quote
This one is what I need.
Code:
time                amount    board/subboard_idnumber    user_from    user_to 
1576204400      2    24                                  18321   307884
How about just the topicID (say: "5209104" taken from the msgID: "5209104.msg53329921") and the board? Is that enough to combine with my merit.all.txt ?
Yes, I just need at least one variable in the two datasets to merge them together, but what did you mean by 'the board'?
Is it the board's name or the board's id number. I can use both of them but it is more convenient for me if you have board's id number.

If you don't have it available, I am going to do it, it's my turn. And if you only have board's name, please help me by moving it to the last column (last variable).
Quote
See Ignore Boards Preferences, then view the page source.
It is helpful.
Quote
This can literally be done in one (long) line of code Smiley
For things you manage well, it is easy, but for the others who don't know how to do, it is a challenge.  Cheesy


Another thing I don't know. I meant I use nearly same things for my statistical stuffs but with computer programming, I have to learn from scratch. Thanks.

▄▄███████▄▄
▄██████████████▄
▄██████████████████▄
▄████▀▀▀▀███▀▀▀▀█████▄
▄█████████████▄█▀████▄
███████████▄███████████
██████████▄█▀███████████
██████████▀████████████
▀█████▄█▀█████████████▀
▀████▄▄▄▄███▄▄▄▄████▀
▀██████████████████▀
▀███████████████▀
▀▀███████▀▀
.
 MΞTAWIN  THE FIRST WEB3 CASINO   
.
.. PLAY NOW ..
LoyceV
Legendary
*
Offline Offline

Activity: 3374
Merit: 17018


Thick-Skinned Gang Leader and Golden Feather 2021


View Profile WWW
December 18, 2019, 08:49:34 AM
 #13

Yes, I just need at least one variable in the two datasets to merge them together, but what did you mean by 'the board'?
Is it the board's name or the board's id number. I can use both of them but it is more convenient for me if you have board's id number.
How's this?
topicID:boardID
Code:
1:offlimits
5:1
6:1
7:1
8:1
9:1
12:6
13:1
15:223
16:1
20:5
22:6
30:5
34:1
41:1
I'm currently not scraping Investigations as it's a hidden board. If you really want data on that board too, I'll have to scrape with an account that logs in, but I prefer not to.
If the above list works for you, I'll run it. It'll take a day to complete.
I could also wait until next Friday's Merit data dump is included.

tranthidung (OP)
Legendary
*
Offline Offline

Activity: 2338
Merit: 4141


Farewell o_e_l_e_o


View Profile WWW
December 18, 2019, 09:02:15 AM
 #14

It works but honestly I don't think I should ask you to run your computer one day just to do this. It sounds crazy but now I understood why sometimes I asked your help and you rejected it.

For off-limited boards, I don't need them.

▄▄███████▄▄
▄██████████████▄
▄██████████████████▄
▄████▀▀▀▀███▀▀▀▀█████▄
▄█████████████▄█▀████▄
███████████▄███████████
██████████▄█▀███████████
██████████▀████████████
▀█████▄█▀█████████████▀
▀████▄▄▄▄███▄▄▄▄████▀
▀██████████████████▀
▀███████████████▀
▀▀███████▀▀
.
 MΞTAWIN  THE FIRST WEB3 CASINO   
.
.. PLAY NOW ..
LoyceV
Legendary
*
Offline Offline

Activity: 3374
Merit: 17018


Thick-Skinned Gang Leader and Golden Feather 2021


View Profile WWW
December 18, 2019, 09:03:19 AM
 #15

It works but honestly I don't think I should ask you to run your computer one day just to do this. It sounds crazy.
It's a $2/year VPS that I can't use for anything else because it has only 128 MB ram Smiley Up to you if you want the data Smiley

tranthidung (OP)
Legendary
*
Offline Offline

Activity: 2338
Merit: 4141


Farewell o_e_l_e_o


View Profile WWW
December 18, 2019, 09:05:55 AM
 #16

It's a $2/year VPS Smiley Up to you if you want the data Smiley
Cool.

If it only takes much time (one day) for the first scraping round (next Friday), but takes less time (that I guess) for the second round and later, please do it.  Cheesy

I already have the List of boards, subboards (some local subboards are not listed)

For this analysis, I think I am going to make updates monthly or quarterly.

▄▄███████▄▄
▄██████████████▄
▄██████████████████▄
▄████▀▀▀▀███▀▀▀▀█████▄
▄█████████████▄█▀████▄
███████████▄███████████
██████████▄█▀███████████
██████████▀████████████
▀█████▄█▀█████████████▀
▀████▄▄▄▄███▄▄▄▄████▀
▀██████████████████▀
▀███████████████▀
▀▀███████▀▀
.
 MΞTAWIN  THE FIRST WEB3 CASINO   
.
.. PLAY NOW ..
LoyceV
Legendary
*
Offline Offline

Activity: 3374
Merit: 17018


Thick-Skinned Gang Leader and Golden Feather 2021


View Profile WWW
December 18, 2019, 09:14:33 AM
 #17

It's running now. Results tomorrow Smiley

If it only takes much time (one day) for the first scraping round (next Friday), but takes less time (that I guess) for the second round and later, please do it.  Cheesy
Wait.. You want weekly updates on this too?

tranthidung (OP)
Legendary
*
Offline Offline

Activity: 2338
Merit: 4141


Farewell o_e_l_e_o


View Profile WWW
December 18, 2019, 09:18:36 AM
 #18

It's running now. Results tomorrow Smiley
Thanks.
Quote
Wait.. You want weekly updates on this too?
It depends on you. If you can dump data weekly, I will definitely make weekly updates too. It depends on you.  Grin

 I only think such big stats won't change too much weekly.

▄▄███████▄▄
▄██████████████▄
▄██████████████████▄
▄████▀▀▀▀███▀▀▀▀█████▄
▄█████████████▄█▀████▄
███████████▄███████████
██████████▄█▀███████████
██████████▀████████████
▀█████▄█▀█████████████▀
▀████▄▄▄▄███▄▄▄▄████▀
▀██████████████████▀
▀███████████████▀
▀▀███████▀▀
.
 MΞTAWIN  THE FIRST WEB3 CASINO   
.
.. PLAY NOW ..
LoyceV
Legendary
*
Offline Offline

Activity: 3374
Merit: 17018


Thick-Skinned Gang Leader and Golden Feather 2021


View Profile WWW
December 18, 2019, 09:23:25 AM
Merited by tranthidung (1)
 #19

It depends on you. If you can dump data weekly, I will definitely make weekly updates too. It depends on you.  Grin

 I only think such big stats won't change too much weekly.
How about you ask me for an update every six months or so?

tranthidung (OP)
Legendary
*
Offline Offline

Activity: 2338
Merit: 4141


Farewell o_e_l_e_o


View Profile WWW
December 18, 2019, 09:25:16 AM
 #20

How about you ask me for an update every six months or so?
I agree with the six-month updates.  Thank you. Smiley

▄▄███████▄▄
▄██████████████▄
▄██████████████████▄
▄████▀▀▀▀███▀▀▀▀█████▄
▄█████████████▄█▀████▄
███████████▄███████████
██████████▄█▀███████████
██████████▀████████████
▀█████▄█▀█████████████▀
▀████▄▄▄▄███▄▄▄▄████▀
▀██████████████████▀
▀███████████████▀
▀▀███████▀▀
.
 MΞTAWIN  THE FIRST WEB3 CASINO   
.
.. PLAY NOW ..
Pages: [1] 2 »  All
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!