Bitcoin Forum
July 17, 2024, 01:37:47 PM *
News: Latest Bitcoin Core release: 27.0 [Torrent]
 
   Home   Help Search Login Register More  
Pages: [1]
  Print  
Author Topic: Bitcointalk Search Project  (Read 118 times)
NotATether (OP)
Legendary
*
Offline Offline

Activity: 1666
Merit: 7039


In memory of o_e_l_e_o


View Profile WWW
July 16, 2024, 08:01:35 AM
Merited by LoyceV (6), Cyrus (1), mocacinno (1), seoincorporation (1), ABCbits (1)
 #1

I am trying to make a search engine for Bitcointalk posts, since Google and the built-in one are so bad.

List all the features you want in a search engine here.

For now, I am scraping topics from the forum using my bot. I made sure to identify the requests as coming from me in my program so that the admins know where this traffic is coming from.

It doesn't look like it's exceeding the threshold of one request per second so that's good.

Private boards are not being scraped. The scraping is being done as a guest.

ABCbits
Legendary
*
Offline Offline

Activity: 2940
Merit: 7666


Crypto Swap Exchange


View Profile
July 16, 2024, 08:46:29 AM
 #2

List all the features you want in a search engine here.

How about feature which already available on https://ninjastic.space/search? Aside from that, i would suggest these feature.
1. Sort by relevancy.
2. Showing message that the search keyword may contain typo (such as showing "bitcoin" when someone enter "bitcon").

█▀▀▀











█▄▄▄
▀▀▀▀▀▀▀▀▀▀▀
e
▄▄▄▄▄▄▄▄▄▄▄
█████████████
████████████▄███
██▐███████▄█████▀
█████████▄████▀
███▐████▄███▀
████▐██████▀
█████▀█████
███████████▄
████████████▄
██▄█████▀█████▄
▄█████████▀█████▀
███████████▀██▀
████▀█████████
▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀
c.h.
▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄
▀▀▀█











▄▄▄█
▄██████▄▄▄
█████████████▄▄
███████████████
███████████████
███████████████
███████████████
███░░█████████
███▌▐█████████
█████████████
███████████▀
██████████▀
████████▀
▀██▀▀
NotATether (OP)
Legendary
*
Offline Offline

Activity: 1666
Merit: 7039


In memory of o_e_l_e_o


View Profile WWW
July 16, 2024, 09:26:57 AM
Merited by Cyrus (1)
 #3

List all the features you want in a search engine here.

How about feature which already available on https://ninjastic.space/search? Aside from that, i would suggest these feature.
1. Sort by relevancy.
2. Showing message that the search keyword may contain typo (such as showing "bitcoin" when someone enter "bitcon").

Ninjastic is showing entire posts so it's impossible to find anything meaningful when you search for a keyword.

It needs to show only an excerpt with a link and title like forum search and Google do, it needs to have page numbers for browsing the results by page and most importantly it should not be looking inside quotes for keywords.

LoyceV
Legendary
*
Online Online

Activity: 3374
Merit: 17033


Thick-Skinned Gang Leader and Golden Feather 2021


View Profile WWW
July 16, 2024, 09:27:46 AM
Merited by Cyrus (1)
 #4

For now, I am scraping topics from the forum using my bot.
If it helps, I can give you a tar.gz copy of my data (note: some posts are missing). I shared it with Ninjastic years ago, and it saves you several months of scraping. Freshly scraping will get you a more recent edit though, and less deleted posts.

1. Sort by relevancy.
This would be the one thing I'd like to see, but also no doubt the most difficult one. Ninjastic often gives me a list of hundreds of posts. A good search engine (like Google 10 years ago) would show what I want to see first.

ABCbits
Legendary
*
Offline Offline

Activity: 2940
Merit: 7666


Crypto Swap Exchange


View Profile
July 16, 2024, 10:19:21 AM
 #5

Ninjastic is showing entire posts so it's impossible to find anything meaningful when you search for a keyword.

Sorry for not being specific. I mean feature such as "Date Range (UTC)" filter, choosing one or more boards (and optionally with the child board) and sign support (+, -, | and "").

█▀▀▀











█▄▄▄
▀▀▀▀▀▀▀▀▀▀▀
e
▄▄▄▄▄▄▄▄▄▄▄
█████████████
████████████▄███
██▐███████▄█████▀
█████████▄████▀
███▐████▄███▀
████▐██████▀
█████▀█████
███████████▄
████████████▄
██▄█████▀█████▄
▄█████████▀█████▀
███████████▀██▀
████▀█████████
▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀
c.h.
▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄
▀▀▀█











▄▄▄█
▄██████▄▄▄
█████████████▄▄
███████████████
███████████████
███████████████
███████████████
███░░█████████
███▌▐█████████
█████████████
███████████▀
██████████▀
████████▀
▀██▀▀
mocacinno
Legendary
*
Online Online

Activity: 3458
Merit: 5054


https://merel.mobi => buy facemasks with BTC/LTC


View Profile WWW
July 16, 2024, 10:49:26 AM
Merited by LoyceV (4)
 #6

if you are scraping and parsing anyway, it would be nice if your search engine was indexing the most common board objects... For example, the username, DT rank, feedback, boards,... That way, you could use keywords, like you can in google (filetype:, site:,...).

If would be nice if i could make a query like `user:Theymos board:Bitcoin\Project_Development +wallet -knots taproot` and i would only see posts made by Theymos in the project developent board that contained the word wallet, did not contain the word knots and hopefully contained the word taproot.

█▀▀▀











█▄▄▄
▀▀▀▀▀▀▀▀▀▀▀
e
▄▄▄▄▄▄▄▄▄▄▄
█████████████
████████████▄███
██▐███████▄█████▀
█████████▄████▀
███▐████▄███▀
████▐██████▀
█████▀█████
███████████▄
████████████▄
██▄█████▀█████▄
▄█████████▀█████▀
███████████▀██▀
████▀█████████
▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀
c.h.
▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄
▀▀▀█











▄▄▄█
▄██████▄▄▄
█████████████▄▄
███████████████
███████████████
███████████████
███████████████
███░░█████████
███▌▐█████████
█████████████
███████████▀
██████████▀
████████▀
▀██▀▀
NotATether (OP)
Legendary
*
Offline Offline

Activity: 1666
Merit: 7039


In memory of o_e_l_e_o


View Profile WWW
July 16, 2024, 12:03:39 PM
 #7

For now, I am scraping topics from the forum using my bot.
If it helps, I can give you a tar.gz copy of my data (note: some posts are missing). I shared it with Ninjastic years ago, and it saves you several months of scraping. Freshly scraping will get you a more recent edit though, and less deleted posts.

Sure, you can send me a copy by PM.

if you are scraping and parsing anyway, it would be nice if your search engine was indexing the most common board objects... For example, the username, DT rank, feedback, boards,... That way, you could use keywords, like you can in google (filetype:, site:,...).

If would be nice if i could make a query like `user:Theymos board:Bitcoin\Project_Development +wallet -knots taproot` and i would only see posts made by Theymos in the project developent board that contained the word wallet, did not contain the word knots and hopefully contained the word taproot.

I can't index DT information since it's invisible to guests and it changes too quickly anyway, but I'm already scraping the other stuff like boards, username (of course) etcetera. Even the user IDs are being scraped to help deal with name changes.

My bot can also handle anonymous users too.

seoincorporation
Legendary
*
Offline Offline

Activity: 3220
Merit: 3009



View Profile
July 16, 2024, 02:07:18 PM
 #8

if you are scraping and parsing anyway, it would be nice if your search engine was indexing the most common board objects... For example, the username, DT rank, feedback, boards,... That way, you could use keywords, like you can in google (filetype:, site:,...).

But this wouldn't be like rebuilding the full forum on a database?

I mean, there are 2 ways to do this:

1.- You take all the forum data, and put it together on a database and then your search engine makes calls to that database. But for this, you will have to live update that database or at least have a cron job to add the new data each x time.

2.- Search for the data directly on the site, but for that, you would have to do some kind of hack to the current search engine.

If you have other way in mind i would love to know how it work.

▄▄███████▄▄
▄██████████████▄
▄██████████████████▄
▄████▀▀▀▀███▀▀▀▀█████▄
▄█████████████▄█▀████▄
███████████▄███████████
██████████▄█▀███████████
██████████▀████████████
▀█████▄█▀█████████████▀
▀████▄▄▄▄███▄▄▄▄████▀
▀██████████████████▀
▀███████████████▀
▀▀███████▀▀
.
 MΞTAWIN  THE FIRST WEB3 CASINO   
.
.. PLAY NOW ..
Vod
Legendary
*
Offline Offline

Activity: 3766
Merit: 3102


Licking my boob since 1970


View Profile WWW
July 16, 2024, 06:16:47 PM
 #9

it should not be looking inside quotes for keywords.

This is what I'm currently having issues with.  People break the BB quote code all the time in their posts.

https://nastyscam.com - featuring 13 years of OGNasty bitcoin scams     https://vod.fan - advanced image hosting - coming sooner than you think!
mocacinno
Legendary
*
Online Online

Activity: 3458
Merit: 5054


https://merel.mobi => buy facemasks with BTC/LTC


View Profile WWW
Today at 05:34:14 AM
 #10

if you are scraping and parsing anyway, it would be nice if your search engine was indexing the most common board objects... For example, the username, DT rank, feedback, boards,... That way, you could use keywords, like you can in google (filetype:, site:,...).

But this wouldn't be like rebuilding the full forum on a database?

I mean, there are 2 ways to do this:

1.- You take all the forum data, and put it together on a database and then your search engine makes calls to that database. But for this, you will have to live update that database or at least have a cron job to add the new data each x time.

2.- Search for the data directly on the site, but for that, you would have to do some kind of hack to the current search engine.

If you have other way in mind i would love to know how it work.

It would be better if there was a way to improve SMF's search function or query the relational database directly, but i don't know if Theymos would give anybody direct access to the database or allow anybody to completely rework smf's soucecode .
I don't think he would, and for good reason offcourse... It would require absolute trust in the person building the search engine.

But you're right, it would be completely rebuilding bitcointalk's database, like several other members are doing aswell (more or less)..

Whenever i see somebody building extensions, offsite tools, proposing changes to SMF, i can't help but wonder how epochtalk is doing, and if epochtalk would solve the problem without requiring browser plugins, offsite tools, scraping,... Don't get me wrong: the current forum software lacks several features, and i'm happy if somebody builds them (even if it's on a different domain, or requires me to install a browser plugin), i just wonder wether we'll ever switch to the new forum software.

█▀▀▀











█▄▄▄
▀▀▀▀▀▀▀▀▀▀▀
e
▄▄▄▄▄▄▄▄▄▄▄
█████████████
████████████▄███
██▐███████▄█████▀
█████████▄████▀
███▐████▄███▀
████▐██████▀
█████▀█████
███████████▄
████████████▄
██▄█████▀█████▄
▄█████████▀█████▀
███████████▀██▀
████▀█████████
▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀
c.h.
▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄
▀▀▀█











▄▄▄█
▄██████▄▄▄
█████████████▄▄
███████████████
███████████████
███████████████
███████████████
███░░█████████
███▌▐█████████
█████████████
███████████▀
██████████▀
████████▀
▀██▀▀
Pages: [1]
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!