Bitcoin Forum
July 16, 2024, 05:44:51 PM *
News: Latest Bitcoin Core release: 27.0 [Torrent]
 
   Home   Help Search Login Register More  
Pages: [1]
  Print  
Author Topic: Bitcointalk Search Project  (Read 76 times)
NotATether (OP)
Legendary
*
Offline Offline

Activity: 1666
Merit: 7034


In memory of o_e_l_e_o


View Profile WWW
Today at 08:01:35 AM
Merited by LoyceV (6), mocacinno (1), ABCbits (1)
 #1

I am trying to make a search engine for Bitcointalk posts, since Google and the built-in one are so bad.

List all the features you want in a search engine here.

For now, I am scraping topics from the forum using my bot. I made sure to identify the requests as coming from me in my program so that the admins know where this traffic is coming from.

It doesn't look like it's exceeding the threshold of one request per second so that's good.

Private boards are not being scraped. The scraping is being done as a guest.

.
.BLACKJACK ♠ FUN.
█████████
██████████████
████████████
█████████████████
████████████████▄▄
░█████████████▀░▀▀
██████████████████
░██████████████
████████████████
░██████████████
████████████
███████████████░██
██████████
CRYPTO CASINO &
SPORTS BETTING
▄▄███████▄▄
▄███████████████▄
███████████████████
█████████████████████
███████████████████████
█████████████████████████
█████████████████████████
█████████████████████████
███████████████████████
█████████████████████
███████████████████
▀███████████████▀
█████████
.
ABCbits
Legendary
*
Offline Offline

Activity: 2940
Merit: 7665


Crypto Swap Exchange


View Profile
Today at 08:46:29 AM
 #2

List all the features you want in a search engine here.

How about feature which already available on https://ninjastic.space/search? Aside from that, i would suggest these feature.
1. Sort by relevancy.
2. Showing message that the search keyword may contain typo (such as showing "bitcoin" when someone enter "bitcon").

█▀▀▀











█▄▄▄
▀▀▀▀▀▀▀▀▀▀▀
e
▄▄▄▄▄▄▄▄▄▄▄
█████████████
████████████▄███
██▐███████▄█████▀
█████████▄████▀
███▐████▄███▀
████▐██████▀
█████▀█████
███████████▄
████████████▄
██▄█████▀█████▄
▄█████████▀█████▀
███████████▀██▀
████▀█████████
▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀
c.h.
▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄
▀▀▀█











▄▄▄█
▄██████▄▄▄
█████████████▄▄
███████████████
███████████████
███████████████
███████████████
███░░█████████
███▌▐█████████
█████████████
███████████▀
██████████▀
████████▀
▀██▀▀
NotATether (OP)
Legendary
*
Offline Offline

Activity: 1666
Merit: 7034


In memory of o_e_l_e_o


View Profile WWW
Today at 09:26:57 AM
 #3

List all the features you want in a search engine here.

How about feature which already available on https://ninjastic.space/search? Aside from that, i would suggest these feature.
1. Sort by relevancy.
2. Showing message that the search keyword may contain typo (such as showing "bitcoin" when someone enter "bitcon").

Ninjastic is showing entire posts so it's impossible to find anything meaningful when you search for a keyword.

It needs to show only an excerpt with a link and title like forum search and Google do, it needs to have page numbers for browsing the results by page and most importantly it should not be looking inside quotes for keywords.

.
.BLACKJACK ♠ FUN.
█████████
██████████████
████████████
█████████████████
████████████████▄▄
░█████████████▀░▀▀
██████████████████
░██████████████
████████████████
░██████████████
████████████
███████████████░██
██████████
CRYPTO CASINO &
SPORTS BETTING
▄▄███████▄▄
▄███████████████▄
███████████████████
█████████████████████
███████████████████████
█████████████████████████
█████████████████████████
█████████████████████████
███████████████████████
█████████████████████
███████████████████
▀███████████████▀
█████████
.
LoyceV
Legendary
*
Online Online

Activity: 3374
Merit: 17030


Thick-Skinned Gang Leader and Golden Feather 2021


View Profile WWW
Today at 09:27:46 AM
 #4

For now, I am scraping topics from the forum using my bot.
If it helps, I can give you a tar.gz copy of my data (note: some posts are missing). I shared it with Ninjastic years ago, and it saves you several months of scraping. Freshly scraping will get you a more recent edit though, and less deleted posts.

1. Sort by relevancy.
This would be the one thing I'd like to see, but also no doubt the most difficult one. Ninjastic often gives me a list of hundreds of posts. A good search engine (like Google 10 years ago) would show what I want to see first.

ABCbits
Legendary
*
Offline Offline

Activity: 2940
Merit: 7665


Crypto Swap Exchange


View Profile
Today at 10:19:21 AM
 #5

Ninjastic is showing entire posts so it's impossible to find anything meaningful when you search for a keyword.

Sorry for not being specific. I mean feature such as "Date Range (UTC)" filter, choosing one or more boards (and optionally with the child board) and sign support (+, -, | and "").

█▀▀▀











█▄▄▄
▀▀▀▀▀▀▀▀▀▀▀
e
▄▄▄▄▄▄▄▄▄▄▄
█████████████
████████████▄███
██▐███████▄█████▀
█████████▄████▀
███▐████▄███▀
████▐██████▀
█████▀█████
███████████▄
████████████▄
██▄█████▀█████▄
▄█████████▀█████▀
███████████▀██▀
████▀█████████
▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀
c.h.
▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄
▀▀▀█











▄▄▄█
▄██████▄▄▄
█████████████▄▄
███████████████
███████████████
███████████████
███████████████
███░░█████████
███▌▐█████████
█████████████
███████████▀
██████████▀
████████▀
▀██▀▀
mocacinno
Legendary
*
Offline Offline

Activity: 3458
Merit: 5052


https://merel.mobi => buy facemasks with BTC/LTC


View Profile WWW
Today at 10:49:26 AM
Merited by LoyceV (4)
 #6

if you are scraping and parsing anyway, it would be nice if your search engine was indexing the most common board objects... For example, the username, DT rank, feedback, boards,... That way, you could use keywords, like you can in google (filetype:, site:,...).

If would be nice if i could make a query like `user:Theymos board:Bitcoin\Project_Development +wallet -knots taproot` and i would only see posts made by Theymos in the project developent board that contained the word wallet, did not contain the word knots and hopefully contained the word taproot.

█▀▀▀











█▄▄▄
▀▀▀▀▀▀▀▀▀▀▀
e
▄▄▄▄▄▄▄▄▄▄▄
█████████████
████████████▄███
██▐███████▄█████▀
█████████▄████▀
███▐████▄███▀
████▐██████▀
█████▀█████
███████████▄
████████████▄
██▄█████▀█████▄
▄█████████▀█████▀
███████████▀██▀
████▀█████████
▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀
c.h.
▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄
▀▀▀█











▄▄▄█
▄██████▄▄▄
█████████████▄▄
███████████████
███████████████
███████████████
███████████████
███░░█████████
███▌▐█████████
█████████████
███████████▀
██████████▀
████████▀
▀██▀▀
NotATether (OP)
Legendary
*
Offline Offline

Activity: 1666
Merit: 7034


In memory of o_e_l_e_o


View Profile WWW
Today at 12:03:39 PM
 #7

For now, I am scraping topics from the forum using my bot.
If it helps, I can give you a tar.gz copy of my data (note: some posts are missing). I shared it with Ninjastic years ago, and it saves you several months of scraping. Freshly scraping will get you a more recent edit though, and less deleted posts.

Sure, you can send me a copy by PM.

if you are scraping and parsing anyway, it would be nice if your search engine was indexing the most common board objects... For example, the username, DT rank, feedback, boards,... That way, you could use keywords, like you can in google (filetype:, site:,...).

If would be nice if i could make a query like `user:Theymos board:Bitcoin\Project_Development +wallet -knots taproot` and i would only see posts made by Theymos in the project developent board that contained the word wallet, did not contain the word knots and hopefully contained the word taproot.

I can't index DT information since it's invisible to guests and it changes too quickly anyway, but I'm already scraping the other stuff like boards, username (of course) etcetera. Even the user IDs are being scraped to help deal with name changes.

My bot can also handle anonymous users too.

.
.BLACKJACK ♠ FUN.
█████████
██████████████
████████████
█████████████████
████████████████▄▄
░█████████████▀░▀▀
██████████████████
░██████████████
████████████████
░██████████████
████████████
███████████████░██
██████████
CRYPTO CASINO &
SPORTS BETTING
▄▄███████▄▄
▄███████████████▄
███████████████████
█████████████████████
███████████████████████
█████████████████████████
█████████████████████████
█████████████████████████
███████████████████████
█████████████████████
███████████████████
▀███████████████▀
█████████
.
seoincorporation
Legendary
*
Offline Offline

Activity: 3220
Merit: 3009



View Profile
Today at 02:07:18 PM
 #8

if you are scraping and parsing anyway, it would be nice if your search engine was indexing the most common board objects... For example, the username, DT rank, feedback, boards,... That way, you could use keywords, like you can in google (filetype:, site:,...).

But this wouldn't be like rebuilding the full forum on a database?

I mean, there are 2 ways to do this:

1.- You take all the forum data, and put it together on a database and then your search engine makes calls to that database. But for this, you will have to live update that database or at least have a cron job to add the new data each x time.

2.- Search for the data directly on the site, but for that, you would have to do some kind of hack to the current search engine.

If you have other way in mind i would love to know how it work.

▄▄███████▄▄
▄██████████████▄
▄██████████████████▄
▄████▀▀▀▀███▀▀▀▀█████▄
▄█████████████▄█▀████▄
███████████▄███████████
██████████▄█▀███████████
██████████▀████████████
▀█████▄█▀█████████████▀
▀████▄▄▄▄███▄▄▄▄████▀
▀██████████████████▀
▀███████████████▀
▀▀███████▀▀
.
 MΞTAWIN  THE FIRST WEB3 CASINO   
.
.. PLAY NOW ..
Pages: [1]
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!