Bitcoin Forum
May 22, 2024, 02:07:48 PM *
News: Latest Bitcoin Core release: 27.0 [Torrent]
 
   Home   Help Search Login Register More  
Pages: [1]
  Print  
Author Topic: Forum BBCode to HTML script development  (Read 196 times)
NotATether (OP)
Legendary
*
Offline Offline

Activity: 1610
Merit: 6753


bitcoincleanup.com / bitmixlist.org


View Profile WWW
December 07, 2020, 10:58:15 PM
 #1

I am trying to make a Python script that can convert any bitcointalk post into the equivalent in HTML. This will allow you to naturally embed them in web pages for example. To do this I have to write a BBCode parser that outputs an HTML tag for each BBCode tag, and also handle all tags specific to Bitcointalk. Since this is tantamount to writing a state machine for an entire language, a daunting task, I have decided to look for some existing program to base my work on rather than write it from scratch. A fully-working program suitable for BTT does not exist as far as I know.

https://github.com/chaomodus/ppcode This is someone's bare bones implementation of a bbcode state machine, it needs a lot of work like handling url= and img= tags and recognizing "quote", it needs to be made case insensitive and it needs to handle all the other smileys, not just the smile face. But I think it will be worth it in the long run if I manage to build this. I forked it at https://github.com/ZenulAbidin/ppcode if you want to track its progress.

.
.BLACKJACK ♠ FUN.
█████████
██████████████
████████████
█████████████████
████████████████▄▄
░█████████████▀░▀▀
██████████████████
░██████████████
████████████████
░██████████████
████████████
███████████████░██
██████████
CRYPTO CASINO &
SPORTS BETTING
▄▄███████▄▄
▄███████████████▄
███████████████████
█████████████████████
███████████████████████
█████████████████████████
█████████████████████████
█████████████████████████
███████████████████████
█████████████████████
███████████████████
▀███████████████▀
█████████
.
bitmover
Legendary
*
Offline Offline

Activity: 2310
Merit: 5957


bitcoindata.science


View Profile WWW
December 08, 2020, 02:01:14 AM
 #2

I am trying to make a Python script that can convert any bitcointalk post into the equivalent in HTML.

This looks simple to do, just using replace.

Code:
[b] I am bold[/b]

into this:

Code:
<strong> I am bold </strong>

You could also use <b>, it is depreciated but still works.
In most cases I think replacing [ for < will do the job.

.
.BLACKJACK ♠ FUN.
█████████
██████████████
████████████
█████████████████
████████████████▄▄
░█████████████▀░▀▀
██████████████████
░██████████████
████████████████
░██████████████
████████████
███████████████░██
██████████
CRYPTO CASINO &
SPORTS BETTING
▄▄███████▄▄
▄███████████████▄
███████████████████
█████████████████████
███████████████████████
█████████████████████████
█████████████████████████
█████████████████████████
███████████████████████
█████████████████████
███████████████████
▀███████████████▀
█████████
.
NotATether (OP)
Legendary
*
Offline Offline

Activity: 1610
Merit: 6753


bitcoincleanup.com / bitmixlist.org


View Profile WWW
December 08, 2020, 08:26:09 AM
Merited by bitmover (1)
 #3

I am trying to make a Python script that can convert any bitcointalk post into the equivalent in HTML.

This looks simple to do, just using replace.

Code:
[b] I am bold[/b]

into this:

Code:
<strong> I am bold </strong>

You could also use <b>, it is depreciated but still works.
In most cases I think replacing [ for < will do the job.

It's not as simple as you think. I can't just simply find and replace all instances of a bbcode tag like [ b] to <b>. First you have the issue that those tags are inside a code block which is not supposed to be altered. Second, this doesn't work for all tags, things like [ center] and [ color] have CSS in the opening tag which must not be in the closing tag, so a find and replace won't work here. Third, some bbcode tags do not come in pairs and only exist as a single tag like [ hr] and [ btc], the first case needs to be converted into </hr> and he other needs to be converted into the Unicode character for bitcoin, and that's why I need to use a state machine. Or more accurately, someone else's state machine.

.
.BLACKJACK ♠ FUN.
█████████
██████████████
████████████
█████████████████
████████████████▄▄
░█████████████▀░▀▀
██████████████████
░██████████████
████████████████
░██████████████
████████████
███████████████░██
██████████
CRYPTO CASINO &
SPORTS BETTING
▄▄███████▄▄
▄███████████████▄
███████████████████
█████████████████████
███████████████████████
█████████████████████████
█████████████████████████
█████████████████████████
███████████████████████
█████████████████████
███████████████████
▀███████████████▀
█████████
.
BlackHatCoiner
Legendary
*
Offline Offline

Activity: 1526
Merit: 7408


Farewell, Leo


View Profile
December 08, 2020, 10:38:25 AM
Merited by bitmover (2), NotATether (1)
 #4

Then why don't you look on the smf code? Forum software is open source and if I'm not mistaken, written in PHP. Also, I think that forumotion has done what you say, but not in python. Technically, it works. You write your BBCode, you copy it and then you paste it on an html form (admin panel). It pastes it, correctly. You can check their forum here: help.forumotion.com (or create your own forumotion forum, just to check the admin panel).

.
.BLACKJACK ♠ FUN.
█████████
██████████████
████████████
█████████████████
████████████████▄▄
░█████████████▀░▀▀
██████████████████
░██████████████
████████████████
░██████████████
████████████
███████████████░██
██████████
CRYPTO CASINO &
SPORTS BETTING
▄▄███████▄▄
▄███████████████▄
███████████████████
█████████████████████
███████████████████████
█████████████████████████
█████████████████████████
█████████████████████████
███████████████████████
█████████████████████
███████████████████
▀███████████████▀
█████████
.
bitmover
Legendary
*
Offline Offline

Activity: 2310
Merit: 5957


bitcoindata.science


View Profile WWW
December 08, 2020, 10:58:37 AM
 #5

i made a quick search and found that:

http://www.bbcode-to-html.com/
https://www.browserling.com/tools/bbcode-to-html

Those are javascript implementations I guess. I found some of the source code here:
https://www.browserling.com/js/tools/xbbcode.js

I think also that javascript would be easier to distribute than python as well, as you could make an HTML page that would make the conversion, just like those 2.

A brief look at the code, he made dictionaries (or a json structure talking in js terms) with all possible tag/brackets
like this:

Code:
    tags = {
        "b": {
            openTag: function(params,content) {
                return '<b>';
            },
            closeTag: function(params,content) {
                return '</b>';
            }
        },
        "center": {
            openTag: function(params,content) {
                return '<center>';
            },
            closeTag: function(params,content) {
                return '</center>';
            }
        },

Than he replaced them later on using that "tags" dictionary in some functions.

.
.BLACKJACK ♠ FUN.
█████████
██████████████
████████████
█████████████████
████████████████▄▄
░█████████████▀░▀▀
██████████████████
░██████████████
████████████████
░██████████████
████████████
███████████████░██
██████████
CRYPTO CASINO &
SPORTS BETTING
▄▄███████▄▄
▄███████████████▄
███████████████████
█████████████████████
███████████████████████
█████████████████████████
█████████████████████████
█████████████████████████
███████████████████████
█████████████████████
███████████████████
▀███████████████▀
█████████
.
NotATether (OP)
Legendary
*
Offline Offline

Activity: 1610
Merit: 6753


bitcoincleanup.com / bitmixlist.org


View Profile WWW
December 08, 2020, 11:36:37 AM
 #6

Then why don't you look on the smf code? Forum software is open source and if I'm not mistaken, written in PHP.

That is a good idea. I found the source code for the SMF version bitcointalk's on (1.1.19, says so at the bottom of this thread) at https://download.simplemachines.org/index.php?archive;b=3;v=85 , now I just need to track all the additions that theymos made to the bbcode.

Also, I think that forumotion has done what you say, but not in python. Technically, it works. You write your BBCode, you copy it and then you paste it on an html form (admin panel). It pastes it, correctly. You can check their forum here: help.forumotion.com (or create your own forumotion forum, just to check the admin panel).

i made a quick search and found that:

http://www.bbcode-to-html.com/
https://www.browserling.com/tools/bbcode-to-html

Those are javascript implementations I guess. I found some of the source code here:
https://www.browserling.com/js/tools/xbbcode.js

I think also that javascript would be easier to distribute than python as well, as you could make an HTML page that would make the conversion, just like those 2.

The problem with using an online tool is that it won't recognize any of theymos' added bbcode such as [ btc]. I'm not sure if that is the only one, but I would prefer the translation to be completely automated without having to manually replace tags.

.
.BLACKJACK ♠ FUN.
█████████
██████████████
████████████
█████████████████
████████████████▄▄
░█████████████▀░▀▀
██████████████████
░██████████████
████████████████
░██████████████
████████████
███████████████░██
██████████
CRYPTO CASINO &
SPORTS BETTING
▄▄███████▄▄
▄███████████████▄
███████████████████
█████████████████████
███████████████████████
█████████████████████████
█████████████████████████
█████████████████████████
███████████████████████
█████████████████████
███████████████████
▀███████████████▀
█████████
.
bitmover
Legendary
*
Offline Offline

Activity: 2310
Merit: 5957


bitcoindata.science


View Profile WWW
December 08, 2020, 11:44:47 AM
Merited by NotATether (1)
 #7

The problem with using an online tool is that it won't recognize any of theymos' added bbcode such as [ btc]. I'm not sure if that is the only one, but I would prefer the translation to be completely automated without having to manually replace tags.

I suggested that you use those online tools as a reference to your new program.
Then you could add the new specific rules (and remove some others which are not supported here)

For this specifc case of [ btc], it is simple to use replace to <span class="BTC">BTC</span>

I took a look at this class and it is the letter B of a custom font type
http://bitcointalk.org/Themes/BTC.ttf

.
.BLACKJACK ♠ FUN.
█████████
██████████████
████████████
█████████████████
████████████████▄▄
░█████████████▀░▀▀
██████████████████
░██████████████
████████████████
░██████████████
████████████
███████████████░██
██████████
CRYPTO CASINO &
SPORTS BETTING
▄▄███████▄▄
▄███████████████▄
███████████████████
█████████████████████
███████████████████████
█████████████████████████
█████████████████████████
█████████████████████████
███████████████████████
█████████████████████
███████████████████
▀███████████████▀
█████████
.
Maus0728
Legendary
*
Offline Offline

Activity: 1918
Merit: 1577


Bitcoin Casino Est. 2013


View Profile
December 08, 2020, 02:16:06 PM
Merited by bitmover (1), NotATether (1)
 #8

I am trying to make a Python script that can convert any bitcointalk post into the equivalent in HTML. This will allow you to naturally embed them in web pages for example. To do this I have to write a BBCode parser that outputs an HTML tag for each BBCode tag, and also handle all tags specific to Bitcointalk. Since this is tantamount to writing a state machine for an entire language, a daunting task, I have decided to look for some existing program to base my work on rather than write it from scratch. A fully-working program suitable for BTT does not exist as far as I know.

https://github.com/chaomodus/ppcode This is someone's bare bones implementation of a bbcode state machine, it needs a lot of work like handling url= and img= tags and recognizing "quote", it needs to be made case insensitive and it needs to handle all the other smileys, not just the smile face. But I think it will be worth it in the long run if I manage to build this. I forked it at https://github.com/ZenulAbidin/ppcode if you want to track its progress.

Have you tried using TryNinja's API? If you would read its documentation[1], you can see that it already parses/scrapes all of the Bitcointalk's posts and necessary data and turn contents into an HTML format. The documentation also shows how to use the API with python scripts. And if you would make a program or an application that shows the parsed content into an embeddable HTML, you could just use iteration to access on the Key named 'content' on the JSON format from the API's Response. Here's an example:


Code:
{
"result": "success",
"message": null,
"data": [
{
"post_id": 55763102,
"topic_id": 5295719,
"author": "Maus0728",
"author_uid": 1289002,
"title": "Re: 2 new Metamask phishing site thru Google Ads",
"content": "I don&apos;t know if everyone practices installing &quot;uBlock Origin&quot; as one of their browser add-ons.
Though I am fully aware that this is not an ad-blocker, however, based on my experience it can effectively
help solve these kinds of phishing attempts in a form of ads that is not carefully filtered by Google
<img src=\"https://bitcointalk.org/Smileys/default/rolleyes.gif\" alt=\"Roll Eyes\" border=\"0\">. Been using their services for quite some time and fortunately,
 I never encountered such scam attempts up to this date.<br><br>[1] <a class=\"ul\" href=\"https://ublockorigin.com/\">https://ublockorigin.com/</a><br><br>",
"date": "2020-12-06T05:14:44.000Z",
"board_id": 39,
"board_name": "Beginners & Help",
"archive": false,
"created_at": "2020-12-06T05:14:47.342Z",
"updated_at": "2020-12-06T05:14:50.066Z"
}]
}

from: https://api.ninjastic.space/posts/55763102

As you can see, the content was already in a snippet form, if you would make a webapp that produces embedding of posts, better check his API for an easier job.



[1] - https://docs.ninjastic.space/

███▄▀██▄▄
░░▄████▄▀████ ▄▄▄
░░████▄▄▄▄░░█▀▀
███ ██████▄▄▀█▌
░▄░░███▀████
░▐█░░███░██▄▄
░░▄▀░████▄▄▄▀█
░█░▄███▀████ ▐█
▀▄▄███▀▄██▄
░░▄██▌░░██▀
░▐█▀████ ▀██
░░█▌██████ ▀▀██▄
░░▀███
▄▄██▀▄███
▄▄▄████▀▄████▄░░
▀▀█░░▄▄▄▄████░░
▐█▀▄▄█████████
████▀███░░▄░
▄▄██░███░░█▌░
█▀▄▄▄████░▀▄░░
█▌████▀███▄░█░
▄██▄▀███▄▄▀
▀██░░▐██▄░░
██▀████▀█▌░
▄██▀▀██████▐█░░
███▀░░
NotATether (OP)
Legendary
*
Offline Offline

Activity: 1610
Merit: 6753


bitcoincleanup.com / bitmixlist.org


View Profile WWW
December 08, 2020, 10:40:14 PM
 #9

Have you tried using TryNinja's API? If you would read its documentation[1], you can see that it already parses/scrapes all of the Bitcointalk's posts and necessary data and turn contents into an HTML format. The documentation also shows how to use the API with python scripts. And if you would make a program or an application that shows the parsed content into an embeddable HTML, you could just use iteration to access on the Key named 'content' on the JSON format from the API's Response.

Good lord, TryNinja deserves money for this. it translated everything in my test post perfectly.

I'll see if I can get permission to reuse his source code that translates bbcode like that for my own (commercial) use as I will be putting some good posts on my new domain and running ads on the side. So it's not exactly for personal use.

.
.BLACKJACK ♠ FUN.
█████████
██████████████
████████████
█████████████████
████████████████▄▄
░█████████████▀░▀▀
██████████████████
░██████████████
████████████████
░██████████████
████████████
███████████████░██
██████████
CRYPTO CASINO &
SPORTS BETTING
▄▄███████▄▄
▄███████████████▄
███████████████████
█████████████████████
███████████████████████
█████████████████████████
█████████████████████████
█████████████████████████
███████████████████████
█████████████████████
███████████████████
▀███████████████▀
█████████
.
Maus0728
Legendary
*
Offline Offline

Activity: 1918
Merit: 1577


Bitcoin Casino Est. 2013


View Profile
December 09, 2020, 02:31:48 PM
 #10

Have you tried using TryNinja's API? If you would read its documentation[1], you can see that it already parses/scrapes all of the Bitcointalk's posts and necessary data and turn contents into an HTML format. The documentation also shows how to use the API with python scripts. And if you would make a program or an application that shows the parsed content into an embeddable HTML, you could just use iteration to access on the Key named 'content' on the JSON format from the API's Response.

Good lord, TryNinja deserves money for this. it translated everything in my test post perfectly.

I'll see if I can get permission to reuse his source code that translates bbcode like that for my own (commercial) use as I will be putting some good posts on my new domain and running ads on the side. So it's not exactly for personal use.

Well if you would make another Python or JS program that parses and scrapes a certain data from this forum, then you might want to try using cheerios (JS) or BeautifulSoup and make a certain get request that scrapes posts by id then turn it into HTML embeddable tag. But honestly, the API works better with less work to do, as well as it is free to use, both commercially or personal (API has no token nor Request Limiter)

Meanwhile, in python, there is a library called bbcode 1.1.0. I suggest you create a function that accepts an bbcode formatted string and use the library and its functions to process the accepted string and return it in a variable that you can later use to either show the embeddable HTML or just the html code. In javascript, I guess it would be harder as you would map all of the parsed text and use RegEx to replace certain tags.

I don't know if I'm fully right, but I hope my knowledge were helpful.

███▄▀██▄▄
░░▄████▄▀████ ▄▄▄
░░████▄▄▄▄░░█▀▀
███ ██████▄▄▀█▌
░▄░░███▀████
░▐█░░███░██▄▄
░░▄▀░████▄▄▄▀█
░█░▄███▀████ ▐█
▀▄▄███▀▄██▄
░░▄██▌░░██▀
░▐█▀████ ▀██
░░█▌██████ ▀▀██▄
░░▀███
▄▄██▀▄███
▄▄▄████▀▄████▄░░
▀▀█░░▄▄▄▄████░░
▐█▀▄▄█████████
████▀███░░▄░
▄▄██░███░░█▌░
█▀▄▄▄████░▀▄░░
█▌████▀███▄░█░
▄██▄▀███▄▄▀
▀██░░▐██▄░░
██▀████▀█▌░
▄██▀▀██████▐█░░
███▀░░
Pages: [1]
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!