Hossain Risfa
Jr. Member
Offline
Activity: 51
Merit: 22
WO Buddy!!!!!
|
 |
May 01, 2025, 05:38:54 PM |
|
I've done my translation in our Bangla local board thank you very much for giving me permission to translate the post. I was quite nervous and I don't know that am I able to translate accurately. But after post my translation in our local board some senior brothers told me that I've done great translation and my translation skill is good and they also give me some advice. Thank you @NotaTether for give me permission and thankyou for giving me a chance to translate it give me experience and all over as e newbie you think thak I may be able to do and sir I try to do my best. My post translational link . Talksearch.io - Advanced Bitcointalk Search Engine Translated in Bangla local board by Hossain Risfa
|
|
|
|
NotATether (OP)
Legendary
Offline
Activity: 2002
Merit: 8611
Search? Try talksearch.io
|
 |
May 02, 2025, 02:30:52 PM |
|
Well, it looks like I've hit another snag during uploading. Thankfully, this has nothing to do with Elasticsearch, but with my scraping server.
As you might be aware, I scrape the posts on my server before processing them. The processing involves splitting up posts by quotes, which create a series of chunks for each posts. Usually 1-3. This is saved to the disk, and then another part of the program reads them into memory, and after that these are uploaded to Elasticsearch.
It seems that the splitting process has created so many chunks that I simply cannot create any more in that folder. Any attempts to do so lead to an error.
It might have something to do with the fact that there are tens of millions of these files (inodes) in the filesystem, but I don't know if ext4 has such a limitation. And I'm definitely not out of disk space (though the Elasticsearch server could be a different story when this is all uploaded), as not even 50% of the disk space is used so far. (Strangely, I'm not out of inodes either.)
One solution to this problem could be to avoid saving these chunks to the disk all together and run the processing and upload as one step. This is what I was doing for several days, but then I had to diagnose performance issues on the cluster so it got interrupted. Performance was bad after that though, because I was reading already-uploaded chunks form the disk.
Another solution would be to simply avoid processing low-quality posts, e.g. gambling discussion. This will make for a smaller set, but it will take vastly less space. I estimate that around 15% of all Bitcointalk posts are made on Gambling Discussion. This is mostly sig spam that nobody wants to read, so there's no use returning that in search results. As a side effect of this, it will bring features resembling Google de-indexing to Talksearch, but I will never knowingly de-index posts I don't agree with. There will still be an index containing all existing forum posts, but that will be reserved for detailed search and API only.
|
|
|
|
LoyceV
Legendary
Offline
Activity: 3710
Merit: 19117
Thick-Skinned Gang Leader and Golden Feather 2021
|
 |
May 02, 2025, 03:29:34 PM Last edit: May 02, 2025, 05:42:50 PM by LoyceV |
|
It might have something to do with the fact that there are tens of millions of these files (inodes) in the filesystem, but I don't know if ext4 has such a limitation. And I'm definitely not out of disk space (though the Elasticsearch server could be a different story when this is all uploaded), as not even 50% of the disk space is used so far. (Strangely, I'm not out of inodes either.) I have some experience dealing with tens of millions of files, and apart from making a directory view terribly slow, it works fine as long as you have enough inodes. On ext4, with default settings, it looks like a ten times larger disk does not get ten times more inodes. I checked a few disks, and typical limits are tens to hundreds of millions of inodes per disk. Just enter df -hi and it tells you want you need to know.
|
¡uʍop ǝpᴉsdn pɐǝɥ ɹnoʎ ɥʇᴉʍ ʎuunɟ ʞool no⅄
|
|
|
Z_MBFM
|
 |
May 02, 2025, 06:52:33 PM |
|
Although I used to search Google to see if there was a related topic on this forum before I thought about something, I could have found information there too, but since Google is a search engine, there would have been many more search results besides the forum related.
However, I found using talksearch that it could make our forum related search much smoother. However, it is quite effective. nice job op
|
|
██ ██ ██████ | R |
▀▀▀▀▀▀▀██████▄▄ ████████████████ ▀▀▀▀█████▀▀▀█████ ████████▌███▐████ ▄▄▄▄█████▄▄▄█████ ████████████████ ▄▄▄▄▄▄▄██████▀▀ | LLBIT | ██████ ██ ██ | ██████ ██ ██ ██ ██ ██ ██ ██ ██ ██ ██ ██ ██████ | ██████████████ THE #1 SOLANA CASINO
██████████████ | ██████ ██ ██ ██ ██ ██ ██ ██ ██ ██ ██ ██ ██████ | ████████████▄ ▀▀██████▀▀███ ██▄▄▀▀▄▄█████ █████████████ █████████████ ███▀█████████ ▀▄▄██████████ █████████████ █████████████ █████████████ █████████████ █████████████ ████████████▀ | ████████████▄ ▀▀▀▀▀▀▀██████ █████████████ ▄████████████ ██▄██████████ ████▄████████ █████████████ █░▀▀█████████ ▀▀███████████ █████▄███████ ████▀▄▀██████ ▄▄▄▄▄▄▄██████ ████████████▀ | [ [ | 5,000+ GAMES INSTANT WITHDRAWALS | ][ ][ | HUGE REWARDS VIP PROGRAM | ] ] | ████ ██ ██ ██ ██ ██ ██ ██ ██ ██ ██ ██ ████ | ████████████████████████████████████████████████ PLAY NOW ████████████████████████████████████████████████ | ████ ██ ██ ██ ██ ██ ██ ██ ██ ██ ██ ██ ████ |
|
|
|
NotATether (OP)
Legendary
Offline
Activity: 2002
Merit: 8611
Search? Try talksearch.io
|
 |
May 03, 2025, 09:10:01 AM |
|
I have some experience dealing with tens of millions of files, and apart from making a directory view terribly slow, it works fine as long as you have enough inodes. On ext4, with default settings, it looks like a ten times larger disk does not get ten times more inodes. I checked a few disks, and typical limits are tens to hundreds of millions of inodes per disk. Just enter df -hi and it tells you want you need to know.
About 18% of my inodes are used. ls ran for a horribly long time but I finally got output: zenulabidin@zerstrorer ~ % ls -l /opt/talksearch/processed_chunks | wc -l 30240178 command ls --color=auto -v -l /opt/talksearch/processed_chunks 435.80s user 1776.56s system 2% cpu 21:08:55.04 total wc -l 1.46s user 2.14s system 0% cpu 21:08:54.10 total
So about 30 million files. Thank goodness for zsh, otherwise I wouldn't have known the run time of this. I'll see if this long directory listing time is the cause of "No space left on device" bailing-out in the filesystem code and/or the kernel.
|
|
|
|
LoyceV
Legendary
Offline
Activity: 3710
Merit: 19117
Thick-Skinned Gang Leader and Golden Feather 2021
|
 |
May 03, 2025, 09:24:43 AM Last edit: May 03, 2025, 10:44:34 AM by LoyceV Merited by NotATether (2) |
|
About 18% of my inodes are used. ~ So about 30 million files. ~ I'll see if this long directory listing time is the cause of "No space left on device" bailing-out in the filesystem code and/or the kernel. As far as I know, there are no limits to the number of files per directory on ext4, so this is weird. I'm pretty sure I've had more files in one directory before I added subdirectories for faster listings. I'm going to test it  I don't want this many files on my own system, so I use a temporary server: 16GB PKVM $100/mo ($0.15/hr) 4 CPU, 16GB RAM, 400GB NVMe Running this as a user: i=1; while test $i -le 40000000; do echo "Hello world!" > $i; i=$((i+1)); done This takes a while  I'll be damned: No space left on device! I got to 29,272,362 files with 22M inodes free. Filesystem: /dev/vda1 on / type ext4 (rw,relatime,discard,errors=remount-ro,commit=30
It gets weirder: I can still create new files, just not all of them: i=100000000; time while test $i -le 110000000; do echo "Hello world!" > $i; i=$((i+1)); done -bash: 100000040: No space left on device -bash: 100000145: No space left on device -bash: 100002253: No space left on device -bash: 100002567: No space left on device -bash: 100002715: No space left on device -bash: 100002827: No space left on device -bash: 100003033: No space left on device -bash: 100003445: No space left on device -bash: 100003749: No space left on device -bash: 100003997: No space left on device -bash: 100004406: No space left on device -bash: 100007839: No space left on device That's 12 out of 7840 files that couldn't be created, the rest is fine: ls 10000282* 10000282 100002821 100002823 100002825 100002828 100002820 100002822 100002824 100002826 100002829 Root command dmesg shows this: [ 2024.349441] EXT4-fs warning: 598 callbacks suppressed [ 2024.349450] EXT4-fs warning (device vda1): ext4_dx_add_entry:2592: Directory (ino: 295416) index full, reach max htree level :2 [ 2024.349477] EXT4-fs warning (device vda1): ext4_dx_add_entry:2596: Large directory feature is not enabled on this filesystem [ 2024.349503] EXT4-fs warning (device vda1): ext4_dx_add_entry:2592: Directory (ino: 295416) index full, reach max htree level :2 [ 2024.349505] EXT4-fs warning (device vda1): ext4_dx_add_entry:2596: Large directory feature is not enabled on this filesystem [ 2024.349524] EXT4-fs warning (device vda1): ext4_dx_add_entry:2592: Directory (ino: 295416) index full, reach max htree level :2 [ 2024.349526] EXT4-fs warning (device vda1): ext4_dx_add_entry:2596: Large directory feature is not enabled on this filesystem [ 2024.349545] EXT4-fs warning (device vda1): ext4_dx_add_entry:2592: Directory (ino: 295416) index full, reach max htree level :2 [ 2024.349547] EXT4-fs warning (device vda1): ext4_dx_add_entry:2596: Large directory feature is not enabled on this filesystem [ 2024.363790] EXT4-fs warning (device vda1): ext4_dx_add_entry:2592: Directory (ino: 295416) index full, reach max htree level :2 [ 2024.363797] EXT4-fs warning (device vda1): ext4_dx_add_entry:2596: Large directory feature is not enabled on this filesystem [ 2050.577162] EXT4-fs warning: 118 callbacks suppressed [ 2050.577169] EXT4-fs warning (device vda1): ext4_dx_add_entry:2592: Directory (ino: 295416) index full, reach max htree level :2 [ 2050.577175] EXT4-fs warning (device vda1): ext4_dx_add_entry:2596: Large directory feature is not enabled on this filesystem [ 2050.582961] EXT4-fs warning (device vda1): ext4_dx_add_entry:2592: Directory (ino: 295416) index full, reach max htree level :2 [ 2050.582965] EXT4-fs warning (device vda1): ext4_dx_add_entry:2596: Large directory feature is not enabled on this filesystem [ 2050.582990] EXT4-fs warning (device vda1): ext4_dx_add_entry:2592: Directory (ino: 295416) index full, reach max htree level :2 [ 2050.582992] EXT4-fs warning (device vda1): ext4_dx_add_entry:2596: Large directory feature is not enabled on this filesystem [ 2050.583012] EXT4-fs warning (device vda1): ext4_dx_add_entry:2592: Directory (ino: 295416) index full, reach max htree level :2 [ 2050.583014] EXT4-fs warning (device vda1): ext4_dx_add_entry:2596: Large directory feature is not enabled on this filesystem [ 2050.598773] EXT4-fs warning (device vda1): ext4_dx_add_entry:2592: Directory (ino: 295416) index full, reach max htree level :2 [ 2050.598778] EXT4-fs warning (device vda1): ext4_dx_add_entry:2596: Large directory feature is not enabled on this filesystem [ 2078.294090] EXT4-fs warning: 302 callbacks suppressed [ 2078.294097] EXT4-fs warning (device vda1): ext4_dx_add_entry:2592: Directory (ino: 295416) index full, reach max htree level :2 [ 2078.294103] EXT4-fs warning (device vda1): ext4_dx_add_entry:2596: Large directory feature is not enabled on this filesystem [ 2078.296589] EXT4-fs warning (device vda1): ext4_dx_add_entry:2592: Directory (ino: 295416) index full, reach max htree level :2 [ 2078.296594] EXT4-fs warning (device vda1): ext4_dx_add_entry:2596: Large directory feature is not enabled on this filesystem [ 2078.296638] EXT4-fs warning (device vda1): ext4_dx_add_entry:2592: Directory (ino: 295416) index full, reach max htree level :2 [ 2078.296641] EXT4-fs warning (device vda1): ext4_dx_add_entry:2596: Large directory feature is not enabled on this filesystem [ 2078.296659] EXT4-fs warning (device vda1): ext4_dx_add_entry:2592: Directory (ino: 295416) index full, reach max htree level :2 [ 2078.296661] EXT4-fs warning (device vda1): ext4_dx_add_entry:2596: Large directory feature is not enabled on this filesystem [ 2078.302125] EXT4-fs warning (device vda1): ext4_dx_add_entry:2592: Directory (ino: 295416) index full, reach max htree level :2 [ 2078.302130] EXT4-fs warning (device vda1): ext4_dx_add_entry:2596: Large directory feature is not enabled on this filesystem SolutionEnabling ext4 large_dir seems to fix it: tune2fs -O large_dir /dev/nvme2n1 The EXT4 "largedir" feature overcomes the current limit of around ten million entires allowed within a directory on EXT4. Now, EXT4 directories can support around two billion directory entries. However, you are likely to hit performance bottlenecks before hitting this new EXT4 limitation. It looks like the safe limit is about 10 million files per directory, although it may work up to around 30 million files, but you shouldn't get anywhere near that number without enabling large_dir because things start failing. I completed my test at over 51 million files in a single directory. No more errors until I actually ran out of inodes.
|
¡uʍop ǝpᴉsdn pɐǝɥ ɹnoʎ ɥʇᴉʍ ʎuunɟ ʞool no⅄
|
|
|
NotATether (OP)
Legendary
Offline
Activity: 2002
Merit: 8611
Search? Try talksearch.io
|
 |
May 03, 2025, 12:35:16 PM |
|
~ SolutionEnabling ext4 large_dir seems to fix it: tune2fs -O large_dir /dev/nvme2n1 The EXT4 "largedir" feature overcomes the current limit of around ten million entires allowed within a directory on EXT4. Now, EXT4 directories can support around two billion directory entries. However, you are likely to hit performance bottlenecks before hitting this new EXT4 limitation. It looks like the safe limit is about 10 million files per directory, although it may work up to around 30 million files, but you shouldn't get anywhere near that number without enabling large_dir because things start failing. I completed my test at over 51 million files in a single directory. No more errors until I actually ran out of inodes. Amazing work! The forum should hire you as a consultant  I can restart the chunks processing now, but it's going to be starting from the first topic because I lost track of which topics failed to write. Fortunately it is much faster than upload at the moment - I was actually processing topics from 2023 when I noticed this issue.
|
|
|
|
joker_josue
Legendary
Online
Activity: 2058
Merit: 5834
**In BTC since 2013**
|
 |
May 04, 2025, 06:34:05 AM |
|
I can restart the chunks processing now, but it's going to be starting from the first topic because I lost track of which topics failed to write. Fortunately it is much faster than upload at the moment - I was actually processing topics from 2023 when I noticed this issue.
Don't run the system all at once! Make it run in cycles, for example 1 year at a time. This way, if there is a failure in any cycle, you know to what extent everything is fine and you won't have to start from scratch. You do this manually by running the script in each cycle. Or you can set up the script so that it runs in cycles and keeps a log of the events. Whenever a cycle ends, it informs you of the result, if everything is ok. This way you can follow the process.
|
| . BC.GAME | ███████████████ ███████████████ ███████████████ ███████████████ ██████▀░▀██████ ████▀░░░░░▀████ ███░░░░░░░░░███ ███▄░░▄░▄░░▄███ █████▀░░░▀█████ ███████████████ ███████████████ ███████████████ ███████████████ | ███████████████ ███████████████ ███████████████ ███████████████ ███░░▀░░░▀░░███ ███░░▄▄▄░░▄████ ███▄▄█▀░░▄█████ █████▀░░▐██████ █████░░░░██████ ███████████████ ███████████████ ███████████████ ███████████████ | ███████████████ ███████████████ ███████████████ ███████████████ ██████▀▀░▀▄░███ ████▀░░▄░▄░▀███ ███▀░░▀▄▀▄░▄███ ███▄░░▀░▀░▄████ ███░▀▄░▄▄██████ ███████████████ ███████████████ ███████████████ ███████████████ | │ │ | DEPOSIT BONUS .1000%. | GET FREE ...5 BTC... | │ │ | REFER & EARN ..$1000 + 15%.. COMMISSION | │ │ | Play Now |
|
|
|
GazetaBitcoin
Legendary
Offline
Activity: 2100
Merit: 8316
Fully-fledged Merit Cycler|Spambuster'23|Pie Baker
|
 |
May 04, 2025, 12:18:06 PM |
|
Hey NotATether, please be aware that 1 more translation was made for your topic by AOBT: Ukrainian translation, made by DrBeer Cheers!
|
░░░░▄▄████████████▄ ░▄████████████████▀ ▄████████████████▀▄█▄ ▄███████▀▀░░▄███▀▄████▄ ▄██████▀░░░▄███▀░▀██████▄ ██████▀░░▄████▄░░░▀██████ ██████░░▀▀▀▀░▄▄▄▄░░██████ ██████▄░░░▀████▀░░▄██████ ▀██████▄░▄███▀░░░▄██████▀ ▀████▀▄████░░▄▄███████▀ ▀█▀▄████████████████▀ ▄████████████████▀░ ▀████████████▀▀░░░░ | | CCECASH | | | | ANN THREAD TUTORIAL |
|
|
|
Mahiyammahi
Full Member
 
Offline
Activity: 308
Merit: 165
The largest #BITCOINPOKER site to this day
|
 |
May 09, 2025, 10:12:09 AM |
|
Hey NotATether, how about creating an AI Model only specific to the Bitcointalk forum? Since you have developed a search engine, it can scrape posts. Why not train an AI model using it? I don't know if it will be helpful for forum users. But if an AI model found that scraps all the answers from the Bitcointalk forum Topic, replies, I think it won't be bad. A user can get their answer within a few seconds rather than scraping all the data it had been on Bitcointalk. Other AI models like Chatgpt look everywhere for an answer. So, if an AI model specifically only looks at forum data, this would be great.
|
|
|
|
$crypto$
Legendary
Offline
Activity: 2772
Merit: 1122
Smart is not enough, there must be skills
|
 |
May 09, 2025, 12:00:18 PM |
|
Hey NotATether, how about creating an AI Model only specific to the Bitcointalk forum? Since you have developed a search engine, it can scrape posts. Why not train an AI model using it? I don't know if it will be helpful for forum users. But if an AI model found that scraps all the answers from the Bitcointalk forum Topic, replies, I think it won't be bad. A user can get their answer within a few seconds rather than scraping all the data it had been on Bitcointalk. Other AI models like Chatgpt look everywhere for an answer. So, if an AI model specifically only looks at forum data, this would be great.
There is an AI search engine (Bitcointalk) you can do some browsing there. [AI Search Engine] BitcointalkHave tried asking questions on this AI search engine --- there are some answers that the AI gives are not accurate, and it takes a few seconds to give an answer.
|
|
|
|
R |
▀▀▀▀▀▀▀██████▄▄ ████████████████ ▀▀▀▀█████▀▀▀█████ ████████▌███▐████ ▄▄▄▄█████▄▄▄█████ ████████████████ ▄▄▄▄▄▄▄██████▀▀ | LLBIT | | | 4,000+ GAMES███████████████████ ██████████▀▄▀▀▀████ ████████▀▄▀██░░░███ ██████▀▄███▄▀█▄▄▄██ ███▀▀▀▀▀▀█▀▀▀▀▀▀███ ██░░░░░░░░█░░░░░░██ ██▄░░░░░░░█░░░░░▄██ ███▄░░░░▄█▄▄▄▄▄████ ▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀ | █████████ ▀████████ ░░▀██████ ░░░░▀████ ░░░░░░███ ▄░░░░░███ ▀█▄▄▄████ ░░▀▀█████ ▀▀▀▀▀▀▀▀▀ | █████████ ░░░▀▀████ ██▄▄▀░███ █░░█▄░░██ ░████▀▀██ █░░█▀░░██ ██▀▀▄░███ ░░░▄▄████ ▀▀▀▀▀▀▀▀▀ |
| | | | | | .
| | | ▄▄████▄▄ ▀█▀▄▀▀▄▀█▀ ▄▄░░▄█░██░█▄░░▄▄ ▄▄█░▄▀█░▀█▄▄█▀░█▀▄░█▄▄ ▀▄█░███▄█▄▄█▄███░█▄▀ ▀▀█░░░▄▄▄▄░░░█▀▀ █░░██████░░█ █░░░░▀▀░░░░█ █▀▄▀▄▀▄▀▄▀▄█ ▄░█████▀▀█████░▄ ▄███████░██░███████▄ ▀▀██████▄▄██████▀▀ ▀▀████████▀▀ | . ▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄ ░▀▄░▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄░▄▀ ███▀▄▀█████████████████▀▄▀ █████▀▄░▄▄▄▄▄███░▄▄▄▄▄▄▀ ███████▀▄▀██████░█▄▄▄▄▄▄▄▄ █████████▀▄▄░███▄▄▄▄▄▄░▄▀ ████████████░███████▀▄▀ ████████████░██▀▄▄▄▄▀ ████████████░▀▄▀ ████████████▄▀ ███████████▀ | ▄▄███████▄▄ ▄████▀▀▀▀▀▀▀████▄ ▄███▀▄▄███████▄▄▀███▄ ▄██▀▄█▀▀▀█████▀▀▀█▄▀██▄ ▄██▀▄███░░░▀████░███▄▀██▄ ███░████░░░░░▀██░████░███ ███░████░█▄░░░░▀░████░███ ███░████░███▄░░░░████░███ ▀██▄▀███░█████▄░░███▀▄██▀ ▀██▄▀█▄▄▄██████▄██▀▄██▀ ▀███▄▀▀███████▀▀▄███▀ ▀████▄▄▄▄▄▄▄████▀ ▀▀███████▀▀ | | OFFICIAL PARTNERSHIP SOUTHAMPTON FC FAZE CLAN SSC NAPOLI |
|
|
|
NotATether (OP)
Legendary
Offline
Activity: 2002
Merit: 8611
Search? Try talksearch.io
|
 |
May 09, 2025, 01:32:16 PM |
|
Hey NotATether, how about creating an AI Model only specific to the Bitcointalk forum? Since you have developed a search engine, it can scrape posts. Why not train an AI model using it? I don't know if it will be helpful for forum users. But if an AI model found that scraps all the answers from the Bitcointalk forum Topic, replies, I think it won't be bad. A user can get their answer within a few seconds rather than scraping all the data it had been on Bitcointalk. Other AI models like Chatgpt look everywhere for an answer. So, if an AI model specifically only looks at forum data, this would be great.
I don't have a dev team, so this will take a very long time to implement. It is not a priority at the moment. In fact, only about 5 million chunks out of almost a hundred million have been uploaded so far.
|
|
|
|
hopenotlate
Legendary
Offline
Activity: 3710
Merit: 1256
|
 |
May 09, 2025, 03:10:44 PM |
|
I had some free time, and as a sign of gratitude for the efforts you make to improve users experience of this forum I took the liberty of translating opening post into Italian as I noticed it hadn't been done yet, without even asking your permission to do it. I hope you don't mind and please let me know if it's okay or if I should remove it. Translation link : Talksearch.io - Motore di ricerca avanzato per Bitcointalk
|
| . BC.GAME | ███████████████ ███████████████ ███████████████ ███████████████ ██████▀░▀██████ ████▀░░░░░▀████ ███░░░░░░░░░███ ███▄░░▄░▄░░▄███ █████▀░░░▀█████ ███████████████ ███████████████ ███████████████ ███████████████ | ███████████████ ███████████████ ███████████████ ███████████████ ███░░▀░░░▀░░███ ███░░▄▄▄░░▄████ ███▄▄█▀░░▄█████ █████▀░░▐██████ █████░░░░██████ ███████████████ ███████████████ ███████████████ ███████████████ | ███████████████ ███████████████ ███████████████ ███████████████ ██████▀▀░▀▄░███ ████▀░░▄░▄░▀███ ███▀░░▀▄▀▄░▄███ ███▄░░▀░▀░▄████ ███░▀▄░▄▄██████ ███████████████ ███████████████ ███████████████ ███████████████ | │ │ | DEPOSIT BONUS .1000%. | GET FREE ...5 BTC... | │ │ | REFER & EARN ..$1000 + 15%.. COMMISSION | │ │ | Play Now |
|
|
|
NotATether (OP)
Legendary
Offline
Activity: 2002
Merit: 8611
Search? Try talksearch.io
|
 |
May 12, 2025, 06:01:34 AM |
|
I had some free time, and as a sign of gratitude for the efforts you make to improve users experience of this forum I took the liberty of translating opening post into Italian as I noticed it hadn't been done yet, without even asking your permission to do it. I hope you don't mind and please let me know if it's okay or if I should remove it. Translation link : Talksearch.io - Motore di ricerca avanzato per BitcointalkAnybody can make a translation of this topic without asking me. But to avoid duplicate efforts, people should make sure that a local translation doesn't already exist. On an unrelated note - Google Cloud is so useful! It's like having a free VScode in the cloud that doesn't cost anything extra, along with a database and git integration. HTTP server URLs are practically free as well. I am even using it for other projects too. It's too bad that Elasticsearch is not keeping up with the load  , I guess I will have to wait a while for the upload to complete.
|
|
|
|
hopenotlate
Legendary
Offline
Activity: 3710
Merit: 1256
|
 |
May 12, 2025, 09:26:18 AM |
|
I had some free time, and as a sign of gratitude for the efforts you make to improve users experience of this forum I took the liberty of translating opening post into Italian as I noticed it hadn't been done yet, without even asking your permission to do it. I hope you don't mind and please let me know if it's okay or if I should remove it. Translation link : Talksearch.io - Motore di ricerca avanzato per BitcointalkAnybody can make a translation of this topic without asking me. But to avoid duplicate efforts, people should make sure that a local translation doesn't already exist. -snip- Glad to hear everything it's ok with it; maybe to avoid a duplicate you might want to add my translation link in opening post just for everyone to make sure at a first look it has already been done.
|
| . BC.GAME | ███████████████ ███████████████ ███████████████ ███████████████ ██████▀░▀██████ ████▀░░░░░▀████ ███░░░░░░░░░███ ███▄░░▄░▄░░▄███ █████▀░░░▀█████ ███████████████ ███████████████ ███████████████ ███████████████ | ███████████████ ███████████████ ███████████████ ███████████████ ███░░▀░░░▀░░███ ███░░▄▄▄░░▄████ ███▄▄█▀░░▄█████ █████▀░░▐██████ █████░░░░██████ ███████████████ ███████████████ ███████████████ ███████████████ | ███████████████ ███████████████ ███████████████ ███████████████ ██████▀▀░▀▄░███ ████▀░░▄░▄░▀███ ███▀░░▀▄▀▄░▄███ ███▄░░▀░▀░▄████ ███░▀▄░▄▄██████ ███████████████ ███████████████ ███████████████ ███████████████ | │ │ | DEPOSIT BONUS .1000%. | GET FREE ...5 BTC... | │ │ | REFER & EARN ..$1000 + 15%.. COMMISSION | │ │ | Play Now |
|
|
|
NotATether (OP)
Legendary
Offline
Activity: 2002
Merit: 8611
Search? Try talksearch.io
|
 |
May 12, 2025, 09:29:43 AM |
|
It’s missing a few key features though. Not being able to tweak the search text is a bit of a letdown since that’s pretty important for narrowing things down.
Can you elaborate on this? I don't really understand what you mean by tweaking. Would you like variations that are more professional, casual, or critical?
As in what? Sorry but just like the other part, I'm not very sure what you're asking for here. I am working on automatically including synonyms and verb conjugations of search terms in order to capture additional relevant topics though. This is something I can do independently of the document upload.
|
|
|
|
nutildah
Legendary
Offline
Activity: 3388
Merit: 9542
|
 |
May 18, 2025, 10:20:08 AM Merited by NotATether (2) |
|
It’s missing a few key features though. Not being able to tweak the search text is a bit of a letdown since that’s pretty important for narrowing things down.
Can you elaborate on this? I don't really understand what you mean by tweaking. Would you like variations that are more professional, casual, or critical?
As in what? Sorry but just like the other part, I'm not very sure what you're asking for here. ... The problem is you're talking with a bot, or a human emulating a bot, rather. This last part is the AI asking him if he want the output rephrased but he just copy/pasted it because, naturally, he's a maroon: Would you like variations that are more professional, casual, or critical?
Don't let the bots bring you down, NotATether!  As a human, I for one applaud your efforts and think its great to see alternative resources being built around forum data. Will remember to add it to my arsenal the next time I am researching something.
|
|
|
|
Wouter Mense
Newbie
Offline
Activity: 16
Merit: 8
|
 |
May 19, 2025, 09:50:57 AM |
|
The issue is, I currently don't have a reliable way to measure post quality.
Suggest to look at "user quality". Example post history. A lot of this kind of user exists. Looked at recent unread topics and this one I found at my third try. The patterns to look for in this case there are about 1200 posts that all "look" the same: - Each post begins with a quote. - Followed by one or two lines of text. Other things to look for: - All roughly the same total length. - All roughly the same number of paragraphs, of the same length. - Same number of sentences, of the same length. - Each with for example one image. All these are in my opinion the result of "forced" content generation. Usually with financial incentive I would assume. Of course above metric can be gamed. The thing here is that this pattern is predictable. The next posts of above user will also look the same. Introucing more variety in post style will take more effort, and would possibly also be indicative of improved quality.
|
|
|
|
NotATether (OP)
Legendary
Offline
Activity: 2002
Merit: 8611
Search? Try talksearch.io
|
 |
May 19, 2025, 10:53:33 AM |
|
Don't let the bots bring you down, NotATether!  As a human, I for one applaud your efforts and think its great to see alternative resources being built around forum data. Will remember to add it to my arsenal the next time I am researching something. Thanks, I appreciate it. The issue is, I currently don't have a reliable way to measure post quality.
Suggest to look at "user quality". Example post history. A lot of this kind of user exists. Looked at recent unread topics and this one I found at my third try. The patterns to look for in this case there are about 1200 posts that all "look" the same: - Each post begins with a quote. - Followed by one or two lines of text. Other things to look for: - All roughly the same total length. - All roughly the same number of paragraphs, of the same length. - Same number of sentences, of the same length. - Each with for example one image. All these are in my opinion the result of "forced" content generation. Usually with financial incentive I would assume. Of course above metric can be gamed. The thing here is that this pattern is predictable. The next posts of above user will also look the same. Introucing more variety in post style will take more effort, and would possibly also be indicative of improved quality. Noted. I do think, however, that post quality can be quantified somehow, so I'm going to look for some research on how that would be calculated. Probably it should be between 0 and 1. Then the user quality can be set to the mean of all post qualities from that user, which is then used as a weight for search results, but will not dampen results too much compared to post quality.
|
|
|
|
Wouter Mense
Newbie
Offline
Activity: 16
Merit: 8
|
 |
May 19, 2025, 12:10:51 PM Last edit: May 19, 2025, 12:43:18 PM by Wouter Mense |
|
I do assume a strong correlation between post and user quality but I don't have proof. Also I totally ignored topic context. post quality can be quantified somehow Looking at just one post without context? I guess it would be less cpu time? look for some research After reading your post I did pose a few questions to ai chat with possibly interesting results. Queries (in order, with typos, and ai chat answers between each query): - quantify post quality of a forum post
- specifically site is bitcointalk.org
- indicate which of these an be measured with low computational cost
- rearrange the low cost metrics from best to worst
- adjust for the fact that accounts can be bought and sold
- adjust to the fact that users may get paid for posting
- same analisys for comments vs opening posts
- which are most usefule without taking context from other posts
Offtopic, I hope you appreciate getting more questions instead of more answers. I do believe asking the right questions is more helpful to start your research. I can't vouch for the quality of ai answers, just that it looked interesting. I'm not a programmer, but it does offer to write your code as well.
|
|
|
|
|