Bitcoin Forum
June 18, 2025, 05:46:41 PM *
News: Pizza day contest voting
 
   Home   Help Search Login Register More  
Poll
Question: Should I create translations of the website?
Yes - 17 (68%)
No - 8 (32%)
Total Voters: 25

Pages: « 1 2 3 4 [5] 6 7 »  All
  Print  
Author Topic: Talksearch.io - Advanced Bitcointalk Search Engine  (Read 2736 times)
Hossain Risfa
Jr. Member
*
Offline Offline

Activity: 51
Merit: 22

WO Buddy!!!!!


View Profile
May 01, 2025, 05:38:54 PM
 #81

I've done my translation in our Bangla local board thank you very much for giving me permission to translate the post. I was quite nervous and I don't know that am I able to translate accurately. But after post my translation in our local board some senior brothers told me that I've done great translation and my translation skill is good and they also give me some advice. Thank you @NotaTether for give me permission and thankyou for giving me a chance to translate it give me experience and all over as e newbie you think thak I may be able to do and sir I try to do my best. My post translational link .

Talksearch.io - Advanced Bitcointalk Search Engine Translated in Bangla local board by Hossain Risfa
NotATether (OP)
Legendary
*
Offline Offline

Activity: 2002
Merit: 8611


Search? Try talksearch.io


View Profile WWW
May 02, 2025, 02:30:52 PM
 #82

Well, it looks like I've hit another snag during uploading. Thankfully, this has nothing to do with Elasticsearch, but with my scraping server.

As you might be aware, I scrape the posts on my server before processing them. The processing involves splitting up posts by quotes, which create a series of chunks for each posts. Usually 1-3. This is saved to the disk, and then another part of the program reads them into memory, and after that these are uploaded to Elasticsearch.

It seems that the splitting process has created so many chunks that I simply cannot create any more in that folder. Any attempts to do so lead to an error.

It might have something to do with the fact that there are tens of millions of these files (inodes) in the filesystem, but I don't know if ext4 has such a limitation. And I'm definitely not out of disk space (though the Elasticsearch server could be a different story when this is all uploaded), as not even 50% of the disk space is used so far. (Strangely, I'm not out of inodes either.)

One solution to this problem could be to avoid saving these chunks to the disk all together and run the processing and upload as one step. This is what I was doing for several days, but then I had to diagnose performance issues on the cluster so it got interrupted. Performance was bad after that though, because I was reading already-uploaded chunks form the disk.

Another solution would be to simply avoid processing low-quality posts, e.g. gambling discussion. This will make for a smaller set, but it will take vastly less space. I estimate that around 15% of all Bitcointalk posts are made on Gambling Discussion. This is mostly sig spam that nobody wants to read, so there's no use returning that in search results. As a side effect of this, it will bring features resembling Google de-indexing to Talksearch, but I will never knowingly de-index posts I don't agree with. There will still be an index containing all existing forum posts, but that will be reserved for detailed search and API only.

██
██
██
██
██
██
██
██
██
██
██
██
██
... LIVECASINO.io    Play Live Games with up to 20% cashback!...██
██
██
██
██
██
██
██
██
██
██
██
██
LoyceV
Legendary
*
Offline Offline

Activity: 3710
Merit: 19117


Thick-Skinned Gang Leader and Golden Feather 2021


View Profile WWW
May 02, 2025, 03:29:34 PM
Last edit: May 02, 2025, 05:42:50 PM by LoyceV
 #83

It might have something to do with the fact that there are tens of millions of these files (inodes) in the filesystem, but I don't know if ext4 has such a limitation. And I'm definitely not out of disk space (though the Elasticsearch server could be a different story when this is all uploaded), as not even 50% of the disk space is used so far. (Strangely, I'm not out of inodes either.)
I have some experience dealing with tens of millions of files, and apart from making a directory view terribly slow, it works fine as long as you have enough inodes.
On ext4, with default settings, it looks like a ten times larger disk does not get ten times more inodes. I checked a few disks, and typical limits are tens to hundreds of millions of inodes per disk.
Just enter df -hi and it tells you want you need to know.

¡uʍop ǝpᴉsdn pɐǝɥ ɹnoʎ ɥʇᴉʍ ʎuunɟ ʞool no⅄
Z_MBFM
Sr. Member
****
Online Online

Activity: 784
Merit: 375



View Profile WWW
May 02, 2025, 06:52:33 PM
 #84

Although I used to search Google to see if there was a related topic on this forum before I thought about something, I could have found information there too, but since Google is a search engine, there would have been many more search results besides the forum related.

However, I found using talksearch that it could make our forum related search much smoother. However, it is quite effective. nice job op











██
██
██████
R


▀▀██████▄▄
████████████████
▀█████▀▀▀█████
████████▌███▐████
▄█████▄▄▄█████
████████████████
▄▄██████▀▀
LLBIT
██████
██
██
██████
██
██
██
██
██
██
██
██
██
██
██
██████
██████████████
 
 TH#1 SOLANA CASINO 
██████████████
██████
██
██
██
██
██
██
██
██
██
██
██
██████
████████████▄
▀▀██████▀▀███
██▄▄▀▀▄▄████
████████████
██████████
███▀████████
▄▄█████████
████████████
████████████
████████████
████████████
█████████████
████████████▀
████████████▄
▀▀▀▀▀▀▀██████
████████████
███████████
██▄█████████
████▄███████
████████████
█░▀▀████████
▀▀██████████
█████▄█████
████▀▄▀████
▄▄▄▄▄▄▄██████
████████████▀
[
[
5,000+
GAMES
INSTANT
WITHDRAWALS
][
][
HUGE
   REWARDS   
VIP
PROGRAM
]
]
████
██
██
██
██
██
██
██
██
██
██
██
████
████████████████████████████████████████████████
 
PLAY NOW
 

████████████████████████████████████████████████
████
██
██
██
██
██
██
██
██
██
██
██
████
NotATether (OP)
Legendary
*
Offline Offline

Activity: 2002
Merit: 8611


Search? Try talksearch.io


View Profile WWW
May 03, 2025, 09:10:01 AM
 #85

I have some experience dealing with tens of millions of files, and apart from making a directory view terribly slow, it works fine as long as you have enough inodes.
On ext4, with default settings, it looks like a ten times larger disk does not get ten times more inodes. I checked a few disks, and typical limits are tens to hundreds of millions of inodes per disk.
Just enter df -hi and it tells you want you need to know.

About 18% of my inodes are used.

ls ran for a horribly long time but I finally got output:

Code:
zenulabidin@zerstrorer ~ % ls -l /opt/talksearch/processed_chunks | wc -l
30240178
command ls --color=auto -v -l /opt/talksearch/processed_chunks  435.80s user 1776.56s system 2% cpu 21:08:55.04 total
wc -l  1.46s user 2.14s system 0% cpu 21:08:54.10 total

So about 30 million files. Thank goodness for zsh, otherwise I wouldn't have known the run time of this. I'll see if this long directory listing time is the cause of "No space left on device" bailing-out in the filesystem code and/or the kernel.

██
██
██
██
██
██
██
██
██
██
██
██
██
... LIVECASINO.io    Play Live Games with up to 20% cashback!...██
██
██
██
██
██
██
██
██
██
██
██
██
LoyceV
Legendary
*
Offline Offline

Activity: 3710
Merit: 19117


Thick-Skinned Gang Leader and Golden Feather 2021


View Profile WWW
May 03, 2025, 09:24:43 AM
Last edit: May 03, 2025, 10:44:34 AM by LoyceV
Merited by NotATether (2)
 #86

About 18% of my inodes are used.
~
So about 30 million files. ~ I'll see if this long directory listing time is the cause of "No space left on device" bailing-out in the filesystem code and/or the kernel.
As far as I know, there are no limits to the number of files per directory on ext4, so this is weird. I'm pretty sure I've had more files in one directory before I added subdirectories for faster listings.

I'm going to test it Smiley
I don't want this many files on my own system, so I use a temporary server:
Code:
16GB PKVM
$100/mo ($0.15/hr)
4 CPU, 16GB RAM, 400GB NVMe
Running this as a user:
Code:
i=1; while test $i -le 40000000; do echo "Hello world!" > $i; i=$((i+1)); done
This takes a while Tongue

I'll be damned: No space left on device!
I got to 29,272,362 files with 22M inodes free.

Filesystem:
Code:
/dev/vda1 on / type ext4 (rw,relatime,discard,errors=remount-ro,commit=30



It gets weirder: I can still create new files, just not all of them:
Code:
i=100000000; time while test $i -le 110000000; do echo "Hello world!" > $i; i=$((i+1)); done
-bash: 100000040: No space left on device
-bash: 100000145: No space left on device
-bash: 100002253: No space left on device
-bash: 100002567: No space left on device
-bash: 100002715: No space left on device
-bash: 100002827: No space left on device
-bash: 100003033: No space left on device
-bash: 100003445: No space left on device
-bash: 100003749: No space left on device
-bash: 100003997: No space left on device
-bash: 100004406: No space left on device
-bash: 100007839: No space left on device
That's 12 out of 7840 files that couldn't be created, the rest is fine:
Code:
ls 10000282*
10000282   100002821  100002823  100002825  100002828
100002820  100002822  100002824  100002826  100002829

Root command dmesg shows this:
Code:
[ 2024.349441] EXT4-fs warning: 598 callbacks suppressed
[ 2024.349450] EXT4-fs warning (device vda1): ext4_dx_add_entry:2592: Directory (ino: 295416) index full, reach max htree level :2
[ 2024.349477] EXT4-fs warning (device vda1): ext4_dx_add_entry:2596: Large directory feature is not enabled on this filesystem
[ 2024.349503] EXT4-fs warning (device vda1): ext4_dx_add_entry:2592: Directory (ino: 295416) index full, reach max htree level :2
[ 2024.349505] EXT4-fs warning (device vda1): ext4_dx_add_entry:2596: Large directory feature is not enabled on this filesystem
[ 2024.349524] EXT4-fs warning (device vda1): ext4_dx_add_entry:2592: Directory (ino: 295416) index full, reach max htree level :2
[ 2024.349526] EXT4-fs warning (device vda1): ext4_dx_add_entry:2596: Large directory feature is not enabled on this filesystem
[ 2024.349545] EXT4-fs warning (device vda1): ext4_dx_add_entry:2592: Directory (ino: 295416) index full, reach max htree level :2
[ 2024.349547] EXT4-fs warning (device vda1): ext4_dx_add_entry:2596: Large directory feature is not enabled on this filesystem
[ 2024.363790] EXT4-fs warning (device vda1): ext4_dx_add_entry:2592: Directory (ino: 295416) index full, reach max htree level :2
[ 2024.363797] EXT4-fs warning (device vda1): ext4_dx_add_entry:2596: Large directory feature is not enabled on this filesystem
[ 2050.577162] EXT4-fs warning: 118 callbacks suppressed
[ 2050.577169] EXT4-fs warning (device vda1): ext4_dx_add_entry:2592: Directory (ino: 295416) index full, reach max htree level :2
[ 2050.577175] EXT4-fs warning (device vda1): ext4_dx_add_entry:2596: Large directory feature is not enabled on this filesystem
[ 2050.582961] EXT4-fs warning (device vda1): ext4_dx_add_entry:2592: Directory (ino: 295416) index full, reach max htree level :2
[ 2050.582965] EXT4-fs warning (device vda1): ext4_dx_add_entry:2596: Large directory feature is not enabled on this filesystem
[ 2050.582990] EXT4-fs warning (device vda1): ext4_dx_add_entry:2592: Directory (ino: 295416) index full, reach max htree level :2
[ 2050.582992] EXT4-fs warning (device vda1): ext4_dx_add_entry:2596: Large directory feature is not enabled on this filesystem
[ 2050.583012] EXT4-fs warning (device vda1): ext4_dx_add_entry:2592: Directory (ino: 295416) index full, reach max htree level :2
[ 2050.583014] EXT4-fs warning (device vda1): ext4_dx_add_entry:2596: Large directory feature is not enabled on this filesystem
[ 2050.598773] EXT4-fs warning (device vda1): ext4_dx_add_entry:2592: Directory (ino: 295416) index full, reach max htree level :2
[ 2050.598778] EXT4-fs warning (device vda1): ext4_dx_add_entry:2596: Large directory feature is not enabled on this filesystem
[ 2078.294090] EXT4-fs warning: 302 callbacks suppressed
[ 2078.294097] EXT4-fs warning (device vda1): ext4_dx_add_entry:2592: Directory (ino: 295416) index full, reach max htree level :2
[ 2078.294103] EXT4-fs warning (device vda1): ext4_dx_add_entry:2596: Large directory feature is not enabled on this filesystem
[ 2078.296589] EXT4-fs warning (device vda1): ext4_dx_add_entry:2592: Directory (ino: 295416) index full, reach max htree level :2
[ 2078.296594] EXT4-fs warning (device vda1): ext4_dx_add_entry:2596: Large directory feature is not enabled on this filesystem
[ 2078.296638] EXT4-fs warning (device vda1): ext4_dx_add_entry:2592: Directory (ino: 295416) index full, reach max htree level :2
[ 2078.296641] EXT4-fs warning (device vda1): ext4_dx_add_entry:2596: Large directory feature is not enabled on this filesystem
[ 2078.296659] EXT4-fs warning (device vda1): ext4_dx_add_entry:2592: Directory (ino: 295416) index full, reach max htree level :2
[ 2078.296661] EXT4-fs warning (device vda1): ext4_dx_add_entry:2596: Large directory feature is not enabled on this filesystem
[ 2078.302125] EXT4-fs warning (device vda1): ext4_dx_add_entry:2592: Directory (ino: 295416) index full, reach max htree level :2
[ 2078.302130] EXT4-fs warning (device vda1): ext4_dx_add_entry:2596: Large directory feature is not enabled on this filesystem

Solution
Enabling ext4 large_dir seems to fix it:
Code:
tune2fs -O large_dir /dev/nvme2n1

The EXT4 "largedir" feature overcomes the current limit of around ten million entires allowed within a directory on EXT4. Now, EXT4 directories can support around two billion directory entries. However, you are likely to hit performance bottlenecks before hitting this new EXT4 limitation.

It looks like the safe limit is about 10 million files per directory, although it may work up to around 30 million files, but you shouldn't get anywhere near that number without enabling large_dir because things start failing.

I completed my test at over 51 million files in a single directory. No more errors until I actually ran out of inodes.

¡uʍop ǝpᴉsdn pɐǝɥ ɹnoʎ ɥʇᴉʍ ʎuunɟ ʞool no⅄
NotATether (OP)
Legendary
*
Offline Offline

Activity: 2002
Merit: 8611


Search? Try talksearch.io


View Profile WWW
May 03, 2025, 12:35:16 PM
 #87

~
Solution
Enabling ext4 large_dir seems to fix it:
Code:
tune2fs -O large_dir /dev/nvme2n1

The EXT4 "largedir" feature overcomes the current limit of around ten million entires allowed within a directory on EXT4. Now, EXT4 directories can support around two billion directory entries. However, you are likely to hit performance bottlenecks before hitting this new EXT4 limitation.

It looks like the safe limit is about 10 million files per directory, although it may work up to around 30 million files, but you shouldn't get anywhere near that number without enabling large_dir because things start failing.

I completed my test at over 51 million files in a single directory. No more errors until I actually ran out of inodes.

Amazing work!

The forum should hire you as a consultant Smiley

I can restart the chunks processing now, but it's going to be starting from the first topic because I lost track of which topics failed to write. Fortunately it is much faster than upload at the moment - I was actually processing topics from 2023 when I noticed this issue.

██
██
██
██
██
██
██
██
██
██
██
██
██
... LIVECASINO.io    Play Live Games with up to 20% cashback!...██
██
██
██
██
██
██
██
██
██
██
██
██
joker_josue
Legendary
*
Online Online

Activity: 2058
Merit: 5834


**In BTC since 2013**


View Profile WWW
May 04, 2025, 06:34:05 AM
 #88

I can restart the chunks processing now, but it's going to be starting from the first topic because I lost track of which topics failed to write. Fortunately it is much faster than upload at the moment - I was actually processing topics from 2023 when I noticed this issue.

Don't run the system all at once!
Make it run in cycles, for example 1 year at a time. This way, if there is a failure in any cycle, you know to what extent everything is fine and you won't have to start from scratch.

You do this manually by running the script in each cycle. Or you can set up the script so that it runs in cycles and keeps a log of the events. Whenever a cycle ends, it informs you of the result, if everything is ok. This way you can follow the process.

▄███████████████████▄
████████████████████████

██████████▀▀▀▀██████████
███████████████▀▀███████
█████████▄▄███▄▄█████
████████▀▀████▀███████
█████████▄▄██▀██████████
████████████▄███████████
██████████████▄█████████
██████████▀▀███▀▀███████
███████████████████████
█████████▄▄████▄▄████████
▀███████████████████▀
.
 BC.GAME 
███████████████
███████████████
███████████████
███████████████
██████▀░▀██████
████▀░░░░░▀████
███░░░░░░░░░███
███▄░░▄░▄░░▄███
█████▀░░░▀█████

███████████████

███████████████

███████████████

███████████████
███████████████
███████████████
███████████████
███████████████
███░░▀░░░▀░░███
███░░▄▄▄░░▄████
███▄▄█▀░░▄█████
█████▀░░▐██████
█████░░░░██████

███████████████

███████████████

███████████████

███████████████
███████████████
███████████████
███████████████
███████████████
██████▀▀░▀▄░███
████▀░░▄░▄░▀███
███▀░░▀▄▀▄░▄███
███▄░░▀░▀░▄████
███░▀▄░▄▄██████

███████████████

███████████████

███████████████

███████████████

DEPOSIT BONUS
.1000%.
GET FREE
...5 BTC...

REFER & EARN
..$1000 + 15%..
COMMISSION


 Play Now 
GazetaBitcoin
Legendary
*
Offline Offline

Activity: 2100
Merit: 8316


Fully-fledged Merit Cycler|Spambuster'23|Pie Baker


View Profile
May 04, 2025, 12:18:06 PM
 #89

Hey NotATether, please be aware that 1 more translation was made for your topic by AOBT:

Ukrainian translation, made by DrBeer

Cheers!

░░░░▄▄████████████▄
▄████████████████▀
▄████████████████▀▄█▄
▄██████▀▀░░▄███▀▄████▄
▄██████▀░░░▄███▀▀██████▄
██████▀░░▄████▄░░░▀██████
██████░░▀▀▀▀▄▄▄▄░░██████
██████▄░░░▀████▀░░▄██████
▀██████▄▄███▀░░░▄██████▀
▀████▀▄████░░▄▄███████▀
▀█▀▄████████████████▀
▄████████████████▀
▀████████████▀▀░░░░
 
 CCECASH 
 
    ANN THREAD    
 
      TUTORIAL      
Mahiyammahi
Full Member
***
Offline Offline

Activity: 308
Merit: 165


The largest #BITCOINPOKER site to this day


View Profile
May 09, 2025, 10:12:09 AM
 #90

Hey NotATether, how about creating an AI Model only specific to the Bitcointalk forum? Since you have developed a search engine, it can scrape posts. Why not train an AI model using it? I don't know if it will be helpful for forum users. But if an AI model found that scraps all the answers from the Bitcointalk forum Topic, replies, I think it won't be bad. A user can get their answer within a few seconds rather than scraping all the data it had been on Bitcointalk. Other AI models like Chatgpt look everywhere for an answer. So, if an AI model specifically only looks at forum data, this would be great.

$crypto$
Legendary
*
Offline Offline

Activity: 2772
Merit: 1122


Smart is not enough, there must be skills


View Profile WWW
May 09, 2025, 12:00:18 PM
 #91

Hey NotATether, how about creating an AI Model only specific to the Bitcointalk forum? Since you have developed a search engine, it can scrape posts. Why not train an AI model using it? I don't know if it will be helpful for forum users. But if an AI model found that scraps all the answers from the Bitcointalk forum Topic, replies, I think it won't be bad. A user can get their answer within a few seconds rather than scraping all the data it had been on Bitcointalk. Other AI models like Chatgpt look everywhere for an answer. So, if an AI model specifically only looks at forum data, this would be great.
There is an AI search engine (Bitcointalk) you can do some browsing there.

[AI Search Engine] Bitcointalk

Have tried asking questions on this AI search engine --- there are some answers that the AI gives are not accurate, and it takes a few seconds to give an answer.

R


▀▀▀▀▀▀▀██████▄▄
████████████████
▀▀▀▀█████▀▀▀█████
████████▌███▐████
▄▄▄▄█████▄▄▄█████
████████████████
▄▄▄▄▄▄▄██████▀▀
LLBIT|
4,000+ GAMES
███████████████████
██████████▀▄▀▀▀████
████████▀▄▀██░░░███
██████▀▄███▄▀█▄▄▄██
███▀▀▀▀▀▀█▀▀▀▀▀▀███
██░░░░░░░░█░░░░░░██
██▄░░░░░░░█░░░░░▄██
███▄░░░░▄█▄▄▄▄▄████
▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀
█████████
▀████████
░░▀██████
░░░░▀████
░░░░░░███
▄░░░░░███
▀█▄▄▄████
░░▀▀█████
▀▀▀▀▀▀▀▀▀
█████████
░░░▀▀████
██▄▄▀░███
█░░█▄░░██
░████▀▀██
█░░█▀░░██
██▀▀▄░███
░░░▄▄████
▀▀▀▀▀▀▀▀▀
||.
|
▄▄████▄▄
▀█▀
▄▀▀▄▀█▀
▄░░▄█░██░█▄░░▄
█░▄█░▀█▄▄█▀░█▄░█
▀▄░███▄▄▄▄███░▄▀
▀▀█░░░▄▄▄▄░░░█▀▀
░░██████░░█
█░░░░▀▀░░░░█
▀▄▀▄▀▄▀▄▀▄
▄░█████▀▀█████░▄
▄███████░██░███████▄
▀▀██████▄▄██████▀▀
▀▀████████▀▀
.
▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄
░▀▄░▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄░▄▀
███▀▄▀█████████████████▀▄▀
█████▀▄░▄▄▄▄▄███░▄▄▄▄▄▄▀
███████▀▄▀██████░█▄▄▄▄▄▄▄▄
█████████▀▄▄░███▄▄▄▄▄▄░▄▀
███████████░███████▀▄▀
███████████░██▀▄▄▄▄▀
███████████░▀▄▀
████████████▄▀
███████████
▄▄███████▄▄
▄████▀▀▀▀▀▀▀████▄
▄███▀▄▄███████▄▄▀███▄
▄██▀▄█▀▀▀█████▀▀▀█▄▀██▄
▄██▀▄███░░░▀████░███▄▀██▄
███░████░░░░░▀██░████░███
███░████░█▄░░░░▀░████░███
███░████░███▄░░░░████░███
▀██▄▀███░█████▄░░███▀▄██▀
▀██▄▀█▄▄▄██████▄██▀▄██▀
▀███▄▀▀███████▀▀▄███▀
▀████▄▄▄▄▄▄▄████▀
▀▀███████▀▀
OFFICIAL PARTNERSHIP
SOUTHAMPTON FC
FAZE CLAN
SSC NAPOLI
NotATether (OP)
Legendary
*
Offline Offline

Activity: 2002
Merit: 8611


Search? Try talksearch.io


View Profile WWW
May 09, 2025, 01:32:16 PM
 #92

Hey NotATether, how about creating an AI Model only specific to the Bitcointalk forum? Since you have developed a search engine, it can scrape posts. Why not train an AI model using it? I don't know if it will be helpful for forum users. But if an AI model found that scraps all the answers from the Bitcointalk forum Topic, replies, I think it won't be bad. A user can get their answer within a few seconds rather than scraping all the data it had been on Bitcointalk. Other AI models like Chatgpt look everywhere for an answer. So, if an AI model specifically only looks at forum data, this would be great.

I don't have a dev team, so this will take a very long time to implement. It is not a priority at the moment.

In fact, only about 5 million chunks out of almost a hundred million have been uploaded so far.

██
██
██
██
██
██
██
██
██
██
██
██
██
... LIVECASINO.io    Play Live Games with up to 20% cashback!...██
██
██
██
██
██
██
██
██
██
██
██
██
hopenotlate
Legendary
*
Offline Offline

Activity: 3710
Merit: 1256



View Profile WWW
May 09, 2025, 03:10:44 PM
 #93

I had some free time, and as a sign of gratitude for the efforts you make to improve users experience of this forum I took the liberty of translating opening post into Italian as I noticed it hadn't been done yet, without even asking your permission to do it.
I hope you don't mind and please let me know if it's okay or if I should remove it.

Translation link : Talksearch.io - Motore di ricerca avanzato per Bitcointalk


▄███████████████████▄
████████████████████████

██████████▀▀▀▀██████████
███████████████▀▀███████
█████████▄▄███▄▄█████
████████▀▀████▀███████
█████████▄▄██▀██████████
████████████▄███████████
██████████████▄█████████
██████████▀▀███▀▀███████
███████████████████████
█████████▄▄████▄▄████████
▀███████████████████▀
.
 BC.GAME 
███████████████
███████████████
███████████████
███████████████
██████▀░▀██████
████▀░░░░░▀████
███░░░░░░░░░███
███▄░░▄░▄░░▄███
█████▀░░░▀█████

███████████████

███████████████

███████████████

███████████████
███████████████
███████████████
███████████████
███████████████
███░░▀░░░▀░░███
███░░▄▄▄░░▄████
███▄▄█▀░░▄█████
█████▀░░▐██████
█████░░░░██████

███████████████

███████████████

███████████████

███████████████
███████████████
███████████████
███████████████
███████████████
██████▀▀░▀▄░███
████▀░░▄░▄░▀███
███▀░░▀▄▀▄░▄███
███▄░░▀░▀░▄████
███░▀▄░▄▄██████

███████████████

███████████████

███████████████

███████████████

DEPOSIT BONUS
.1000%.
GET FREE
...5 BTC...

REFER & EARN
..$1000 + 15%..
COMMISSION


 Play Now 
NotATether (OP)
Legendary
*
Offline Offline

Activity: 2002
Merit: 8611


Search? Try talksearch.io


View Profile WWW
May 12, 2025, 06:01:34 AM
 #94

I had some free time, and as a sign of gratitude for the efforts you make to improve users experience of this forum I took the liberty of translating opening post into Italian as I noticed it hadn't been done yet, without even asking your permission to do it.
I hope you don't mind and please let me know if it's okay or if I should remove it.

Translation link : Talksearch.io - Motore di ricerca avanzato per Bitcointalk

Anybody can make a translation of this topic without asking me. But to avoid duplicate efforts, people should make sure that a local translation doesn't already exist.

On an unrelated note - Google Cloud is so useful! It's like having a free VScode in the cloud that doesn't cost anything extra, along with a database and git integration. HTTP server URLs are practically free as well. I am even using it for other projects too.

It's too bad that Elasticsearch is not keeping up with the load Tongue, I guess I will have to wait a while for the upload to complete.

██
██
██
██
██
██
██
██
██
██
██
██
██
... LIVECASINO.io    Play Live Games with up to 20% cashback!...██
██
██
██
██
██
██
██
██
██
██
██
██
hopenotlate
Legendary
*
Offline Offline

Activity: 3710
Merit: 1256



View Profile WWW
May 12, 2025, 09:26:18 AM
 #95

I had some free time, and as a sign of gratitude for the efforts you make to improve users experience of this forum I took the liberty of translating opening post into Italian as I noticed it hadn't been done yet, without even asking your permission to do it.
I hope you don't mind and please let me know if it's okay or if I should remove it.

Translation link : Talksearch.io - Motore di ricerca avanzato per Bitcointalk

Anybody can make a translation of this topic without asking me. But to avoid duplicate efforts, people should make sure that a local translation doesn't already exist.

-snip-

Glad to hear everything it's ok with it; maybe to avoid a duplicate you might want to add my translation link in opening post just for everyone to make sure at a first look it has already been done.

▄███████████████████▄
████████████████████████

██████████▀▀▀▀██████████
███████████████▀▀███████
█████████▄▄███▄▄█████
████████▀▀████▀███████
█████████▄▄██▀██████████
████████████▄███████████
██████████████▄█████████
██████████▀▀███▀▀███████
███████████████████████
█████████▄▄████▄▄████████
▀███████████████████▀
.
 BC.GAME 
███████████████
███████████████
███████████████
███████████████
██████▀░▀██████
████▀░░░░░▀████
███░░░░░░░░░███
███▄░░▄░▄░░▄███
█████▀░░░▀█████

███████████████

███████████████

███████████████

███████████████
███████████████
███████████████
███████████████
███████████████
███░░▀░░░▀░░███
███░░▄▄▄░░▄████
███▄▄█▀░░▄█████
█████▀░░▐██████
█████░░░░██████

███████████████

███████████████

███████████████

███████████████
███████████████
███████████████
███████████████
███████████████
██████▀▀░▀▄░███
████▀░░▄░▄░▀███
███▀░░▀▄▀▄░▄███
███▄░░▀░▀░▄████
███░▀▄░▄▄██████

███████████████

███████████████

███████████████

███████████████

DEPOSIT BONUS
.1000%.
GET FREE
...5 BTC...

REFER & EARN
..$1000 + 15%..
COMMISSION


 Play Now 
NotATether (OP)
Legendary
*
Offline Offline

Activity: 2002
Merit: 8611


Search? Try talksearch.io


View Profile WWW
May 12, 2025, 09:29:43 AM
 #96

It’s missing a few key features though. Not being able to tweak the search text is a bit of a letdown since that’s pretty important for narrowing things down.

Can you elaborate on this? I don't really understand what you mean by tweaking.

Would you like variations that are more professional, casual, or critical?

As in what? Sorry but just like the other part, I'm not very sure what you're asking for here.

I am working on automatically including synonyms and verb conjugations of search terms in order to capture additional relevant topics though. This is something I can do independently of the document upload.

██
██
██
██
██
██
██
██
██
██
██
██
██
... LIVECASINO.io    Play Live Games with up to 20% cashback!...██
██
██
██
██
██
██
██
██
██
██
██
██
nutildah
Legendary
*
Offline Offline

Activity: 3388
Merit: 9542



View Profile WWW
May 18, 2025, 10:20:08 AM
Merited by NotATether (2)
 #97

It’s missing a few key features though. Not being able to tweak the search text is a bit of a letdown since that’s pretty important for narrowing things down.

Can you elaborate on this? I don't really understand what you mean by tweaking.

Would you like variations that are more professional, casual, or critical?

As in what? Sorry but just like the other part, I'm not very sure what you're asking for here.
...

The problem is you're talking with a bot, or a human emulating a bot, rather. This last part is the AI asking him if he want the output rephrased but he just copy/pasted it because, naturally, he's a maroon:

Would you like variations that are more professional, casual, or critical?

Don't let the bots bring you down, NotATether!  Cheesy  As a human, I for one applaud your efforts and think its great to see alternative resources being built around forum data. Will remember to add it to my arsenal the next time I am researching something.

██████▄██▄███████████▄█▄
█████▄█████▄████▄▄▄█
███████████████████
████▐███████████████████
███████████▀▀▄▄▄▄███████
██▄███████▄▀███▀█▀▀█▄▄▄█
▀██████████▄█████▄▄█████▀██
██████████▄████▀██▄▀▀▀█████▄
█████████████▐█▄▀▄███▀██▄
███████▄▄▄███▌▌█▄▀▀███████▄
▀▀▀███████████▌██▀▀▀▀▀█▄▄▄████▀
███████▀▀██████▄▄██▄▄▄▄███▀▀
████████████▀▀▀██████████
 BETFURY ....█████████████
███████████████
███████████████
██▀▀▀▀█▀▀▄░▄███
█▄░░░░░██▌▐████
█████▌▐██▌▐████
███▀▀░▀█▀░░▀███
██░▄▀░█░▄▀░░░██
██░░░░█░░░░░░██
███▄░░▄█▄░░▄███
███████████████
███████████████
░░█████████████
█████████████
███████████████
███████████████
██▀▄▄▄▄▄▄▄▄████
██░█▀░░░░░░░▀██
██░█░▀░▄░▄░░░██
██░█░░█████░░██
██░█░░▀███▀░░██
██░█░░░░▀░░▄░██
████▄░░░░░░░▄██
███████████████
███████████████
░░█████████████
Wouter Mense
Newbie
*
Offline Offline

Activity: 16
Merit: 8


View Profile
May 19, 2025, 09:50:57 AM
 #98

The issue is, I currently don't have a reliable way to measure post quality.

Suggest to look at "user quality". Example post history.

A lot of this kind of user exists. Looked at recent unread topics and this one I found at my third try.

The patterns to look for in this case there are about 1200 posts that all "look" the same:
- Each post begins with a quote.
- Followed by one or two lines of text.

Other things to look for:
- All roughly the same total length.
- All roughly the same number of paragraphs, of the same length.
- Same number of sentences, of the same length.
- Each with for example one image.

All these are in my opinion the result of "forced" content generation. Usually with financial incentive I would assume.

Of course above metric can be gamed. The thing here is that this pattern is predictable. The next posts of above user will also look the same. Introucing more variety in post style will take more effort, and would possibly also be indicative of improved quality.
NotATether (OP)
Legendary
*
Offline Offline

Activity: 2002
Merit: 8611


Search? Try talksearch.io


View Profile WWW
May 19, 2025, 10:53:33 AM
 #99

Don't let the bots bring you down, NotATether!  Cheesy  As a human, I for one applaud your efforts and think its great to see alternative resources being built around forum data. Will remember to add it to my arsenal the next time I am researching something.

Thanks, I appreciate it.

The issue is, I currently don't have a reliable way to measure post quality.

Suggest to look at "user quality". Example post history.

A lot of this kind of user exists. Looked at recent unread topics and this one I found at my third try.

The patterns to look for in this case there are about 1200 posts that all "look" the same:
- Each post begins with a quote.
- Followed by one or two lines of text.

Other things to look for:
- All roughly the same total length.
- All roughly the same number of paragraphs, of the same length.
- Same number of sentences, of the same length.
- Each with for example one image.

All these are in my opinion the result of "forced" content generation. Usually with financial incentive I would assume.

Of course above metric can be gamed. The thing here is that this pattern is predictable. The next posts of above user will also look the same. Introucing more variety in post style will take more effort, and would possibly also be indicative of improved quality.

Noted. I do think, however, that post quality can be quantified somehow, so I'm going to look for some research on how that would be calculated. Probably it should be between 0 and 1.

Then the user quality can be set to the mean of all post qualities from that user, which is then used as a weight for search results, but will not dampen results too much compared to post quality.

██
██
██
██
██
██
██
██
██
██
██
██
██
... LIVECASINO.io    Play Live Games with up to 20% cashback!...██
██
██
██
██
██
██
██
██
██
██
██
██
Wouter Mense
Newbie
*
Offline Offline

Activity: 16
Merit: 8


View Profile
May 19, 2025, 12:10:51 PM
Last edit: May 19, 2025, 12:43:18 PM by Wouter Mense
 #100

I do assume a strong correlation between post and user quality but I don't have proof.

Also I totally ignored topic context.

post quality can be quantified somehow
Looking at just one post without context? I guess it would be less cpu time?

Quote
look for some research
After reading your post I did pose a few questions to ai chat with possibly interesting results. Queries (in order, with typos, and ai chat answers between each query):
  • quantify post quality of a forum post
  • specifically site is bitcointalk.org
  • indicate which of these an be measured with low computational cost
  • rearrange the low cost metrics from best to worst
  • adjust for the fact that accounts can be bought and sold
  • adjust to the fact that users may get paid for posting
  • same analisys for comments vs opening posts
  • which are most usefule without taking context from other posts

Offtopic, I hope you appreciate getting more questions instead of more answers. I do believe asking the right questions is more helpful to start your research. I can't vouch for the quality of ai answers, just that it looked interesting. I'm not a programmer, but it does offer to write your code as well.
Pages: « 1 2 3 4 [5] 6 7 »  All
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!