Title: Talksearch.io - Advanced Bitcointalk Search Engine Post by: NotATether on March 29, 2025, 11:28:08 AM https://www.talkimg.com/images/2025/03/29/lIkk1.png (https://talksearch.io)
Talksearch.io (https://talksearch.io) It has always been a dream of mine to create a high-quality search engine for Bitcointalk. After many months of development, I am proud to announce that Talksearch is now generally available. This is an important milestone towards providing users with high-quality search. Talksearch is a simple search engine that allows you to quickly find and go to any posts on Bitcointalk. Features ____________________ - Indexed ALL posts (up to March 2025)
- Tor-friendly, lacking rate limits and captchas. - Detailed post metadata such as user, date, and topic title. - Eliminates spammy results by enforcing basic length restrictions. And will eventually have more accurate results than either. Infrastructure ____________________ Talksearch is hosted on Google App Engine, utilizing several Compute Engine servers to run Elasticsearch. It has a capacity to host up to 180GB of posts. Backups are available off-site in case of disaster. None of this is cheap, and it costs around $170/month in total to maintain. Roadmap ____________________ - Continuously ingest posts from Bitcointalk (immediate priority) - Refine search result quality - Posts do not update automatically yet. I am working on that. I will continuously optimize the quality of results on the index to achieve the best possible results. For now though, you may need to use multiple words to filter relevant posts. Donate ____________________ If you wish to support the maintenance of Talksearch, you can send funds to the following addresses: Bitcoin: bc1q6dphprljdas0xl2cmqn6tlskselx5xtcpcw8kx Ethereum and ERC-20 tokens: 0xd3CaaE5098b8Bef64A6FD415b0b1B61aE880FFF5 Tron and USDT-TRC20: TGrsWW6knTwcJxkUKvp7SoV5Fjub9KMAeb Donations will only be used to pay for hosting costs. Translations ____________________
Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: NotATether on March 29, 2025, 11:28:24 AM Changelog ____________________
Code: 2025-06-16: App v1.0.5 - https://bitcointalk.org/index.php?topic=5536692.msg65488039#msg65488039 Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: NotATether on March 29, 2025, 11:29:45 AM Operational Expenses ____________________
This section is reserved for screenshots of the Hosting Provider invoices, with sensitive info redacted. The first one for May will be published in a few days. Additionally, technical specs of the search cluster will be posted here, along with the resource usage and high-level statistics related to the indexed post data. This information is being published for accounting and transparency purposes as this is being operated as a public service to the forum. May 2025 invoice (https://www.talkimg.com/images/2025/06/12/UdJQ0o.png) - $179.76 Donations March-May 2025 - 2x - $110 Net Cost May 2025: $69.76 Total Running Cost: $69.76 Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: shahzadafzal on March 29, 2025, 11:52:23 AM None of this is cheap, and it costs around $150/month in total to maintain. $150/month? That’s expensive! However, it looks good at first glance—definitely faster, and I think the filters will be helpful. Some basic functionalities would make it even more useful. For example, there’s no option to edit the search text, which is essential for refining searches. I’ll explore it further. Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: NotATether on March 29, 2025, 11:56:36 AM Some basic functionalities would make it even more useful. For example, there’s no option to edit the search text, which is essential for refining searches. You mean like a search bar above the results page? I can get that added quickly. Edit: done Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: arhipova on March 29, 2025, 01:48:51 PM Is Google App Engine a paid host to join ?
Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: NotATether on March 29, 2025, 01:53:55 PM Is Google App Engine a paid host to join ? It's part of Google Cloud. They offer a free trial of 3 months and $300 in credit, but you must add a debit card to use it at all. (no VCCs allowed). App Engine is pretty cool - you make it scale down to zero instances if nobody is using it, which hopefully does not happen here, so that you pay for nothing. It's not really a VM though. You can only make use of it if you know how to code. Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: TryNinja on March 29, 2025, 02:46:14 PM Talksearch is hosted on Google App Engine, utilizing several Compute Engine servers to run Elasticsearch. It has a capacity to host up to 425GB of posts. Backups are available off-site in case of disaster. None of this is cheap, and it costs around $150/month in total to maintain. What... are you serious? Why on earth are you paying this much? :PFor the ninjastic.space database I pay around $26 per month. That gives me more than enough to self host my elasticsearch node and a shit ton of other projects, and searching is very fast (see the new beta.ninjastic.space/search). Believe me, you do *not* need to pay for "several compute engine servers" for your project. That's complete insanity. Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: NotATether on March 29, 2025, 03:29:10 PM What... are you serious? Why on earth are you paying this much? :P For the ninjastic.space database I pay around $26 per month. That gives me more than enough to self host my elasticsearch node and a shit ton of other projects, and searching is very fast (see the new beta.ninjastic.space/search). Believe me, you do *not* need to pay for "several compute engine servers" for your project. That's complete insanity. Wasn't my choice, I had an option to buy the cluster for $36/month without the extra storage server, but it would only fit half of the posts at 45GB. This was the next smallest config. Very crazy prices going on at cloud providers. Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: TryNinja on March 29, 2025, 03:33:45 PM Wasn't my choice, I had an option to buy the cluster for $36/month without the extra storage server, but it would only fit half of the posts at 45GB. This was the next smallest config. Why not move to a VPS or at least your own dedicated server?Very crazy prices going on at cloud providers. I get 1.5 TB of NVME for the price I pay. :P If you need space (you do, because you're indexing tons of posts), you're on the wrong service. That's the price huge companies pay because they can, and they usually need the extreme scaling possibility due to the nature of their business. Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: KingsDen on March 29, 2025, 03:52:12 PM https://www.talkimg.com/images/2025/03/29/lIkk1.png (https://talksearch.io) The best search tool I have seen since I joined BTT. I did searches of a few topics I created and the result returned accurately. If there is a means to pin this topic at the menu bar of the forum, I would so much appreciate. Kudos for this exceptional work.Quote - Eliminates spammy results by enforcing basic length restrictions. Like short posts are not indence, if yes, to what length?Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: DYING_S0UL on March 29, 2025, 03:55:06 PM Just a little thought inside my mind! Wouldn't it be better if only one result were shown per topic, with an expandable slider to see the rest of the replies inside that topic? What I mean is, when a search query is made, and it is showing every single reply from a specific topic (not ever grouped or sorted or numbered). Just look at the image. All of the results I was shown belonging to the same topic.
Maybe you could make some changes and make it look more sorted or grouped? Or the topic ID is shown below the username & date? Btw, the UI is very clean, maybe you can consider adding some new theme. For example pitch black, similar to the new ninjastic. Sorry, I am having a little trouble choosing the right words to express my thoughts/suggestion/idea. Do let me know if I was confusing. https://www.talkimg.com/images/2025/03/29/lsr53.png Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: KingsDen on March 29, 2025, 04:05:32 PM Just a little thought inside my mind! Wouldn't it be better if only one result were shown per topic, with an expandable slider to see the rest of the replies inside that topic? What I mean is, when a search query is made, and it is showing every single reply from a specific topic (not ever grouped or sorted or numbered). Nice idea but it will lead to complex sorting. I feel that what we have is already good since you might not need to scroll down to see your desired.Just look at the image. All of the results I was shown belonging to the same topic. That topic has the major concentration of the search key words.Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: NotATether on March 29, 2025, 04:46:46 PM Why not move to a VPS or at least your own dedicated server? I get 1.5 TB of NVME for the price I pay. :P If you need space (you do, because you're indexing tons of posts), you're on the wrong service. That's the price huge companies pay because they can, and they usually need the extreme scaling possibility due to the nature of their business. I actually had a dedicated server rented out for this specifically at one point, with very capable specs (and 33% of the cost). But sysadmin is my pet peeve - I usually cause a lot of downtime during things like updates which would be unacceptable for a production service like this. I ended up mining some XMR on it while debating whether or not I should use the cloud, until about a week ago when I canceled the box. As for my main box, well I already have a warning from the hosting provider from last year not to "send abusive port 80 traffic" again (apparently one of my HTTP services was hacked and was being used for DDOS) so I don't dare host any websites on it anymore. edit: typo Just a little thought inside my mind! Wouldn't it be better if only one result were shown per topic, with an expandable slider to see the rest of the replies inside that topic? What I mean is, when a search query is made, and it is showing every single reply from a specific topic (not ever grouped or sorted or numbered). Just look at the image. All of the results I was shown belonging to the same topic. This is a defect in the algorithm I'm using. It weights topics much more than post content which is why you see so many posts from the same thread together even though they may not be relevant themselves. Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: Pablo-wood on March 29, 2025, 05:14:10 PM Why not move to a VPS or at least your own dedicated server? A person dedicated server should be a better choice but won't it be as expensive as compared to rented dedicated servers?. Although it might just require a one time fee but the cost might be discouraging. A Virtual private server should just be the best because it's cheaper and more scalable but the issue is with it's dedicated allocation, it might be another pull back.I get 1.5 TB of NVME for the price I pay. :P If you need space (you do, because you're indexing tons of posts), you're on the wrong service. That's the price huge companies pay because they can, and they usually need the extreme scaling possibility due to the nature of their business. Been on the wrong server I agree, even though DS are quite expensive, $150 is much. A range from $36 - $50 should be fair enough Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: DYING_S0UL on March 29, 2025, 05:17:48 PM Just a little thought inside my mind! Wouldn't it be better if only one result were shown per topic, with an expandable slider to see the rest of the replies inside that topic? What I mean is, when a search query is made, and it is showing every single reply from a specific topic (not ever grouped or sorted or numbered). Just look at the image. All of the results I was shown belonging to the same topic. This is a defect in the algorithm I'm using. It weights topics much more than post content which is why you see so many posts from the same thread together even though they may not be relevant themselves. So do you plan on changing the algorithm, maybe in the future or until you figure out an alternative solution?! For now I think it is good enough, not gonna lie. But I guess, at least maybe you can add the main topic ID below the usernames of every results (of course in a serial manner for the same topic). That would greatly help in identifying that this content belongs to the same topic! I assume you understood what I was trying to imply here! :) Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: Mrbluntzy on March 29, 2025, 06:17:44 PM Quote Talksearch is a simple search engine that allows you to quickly find and go to any posts on Bitcointalk. What if I don't remember the whole tittle of the topic that I want to search for and maybe I only remembered a few key words on the topic, when searching on this website with just those few key words, will it show results for other topics that has same key words? On the forum search, even if I don't remember the topic am looking for correctly, I could just type the few key words that I remembered and it will bring up several topics on that and I will have to scroll and keep nexting until I probably see the right topic am looking for. Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: Upgrade00 on March 29, 2025, 07:28:15 PM The results at the moment look to be only focused on topics, showing those with the key words selected. Is there a plan to index text within the post content too?
On the page number slider, one has to move one at a time forward or backwards, can there be an option to jump to the specific page you want to check out? If it gets too long, it can be trimmed out. Website looks good after some tries, lightning quick, I don't think I've used any that returns to the homepage as fast. Results load very quickly too. Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: NotATether on March 29, 2025, 07:58:41 PM The results at the moment look to be only focused on topics, showing those with the key words selected. Is there a plan to index text within the post content too? Post content is indexed but the results are heavily weighted towards titles at the moment. On the page number slider, one has to move one at a time forward or backwards, can there be an option to jump to the specific page you want to check out? If it gets too long, it can be trimmed out. Talksearch will only return 10 pages max, so it would be better if I made them all clickable. This will be done later, as I am on holiday tomorrow. What if I don't remember the whole tittle of the topic that I want to search for and maybe I only remembered a few key words on the topic, when searching on this website with just those few key words, will it show results for other topics that has same key words? On the forum search, even if I don't remember the topic am looking for correctly, I could just type the few key words that I remembered and it will bring up several topics on that and I will have to scroll and keep nexting until I probably see the right topic am looking for. A search engine's #1 job is to help you find stuff, so naturally you can type a few key words that occur in the title and get it returned to you, if you type like 3-4 of them. The term doesn't have to be an exact match as it searches by word. Also it is not case-sensitive. One advantage Talksearch has over the Bitcointalk search is that results are not returned in random order. Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: Vod on March 29, 2025, 11:30:09 PM Congrats OP - although the results are a bit vanilla for me (I like a lot of condensed information), I'm sure you'll modify it over time based on suggestions. Are you planning on having Theymos fund it or putting paid links in the results? (The forum has over 1,000 btc donated for this purpose, so regular users should not be supporting this.)
If you are interested in having partners send you traffic, shoot me a line. From one developer to another - the rush of seeing something you created being used is better than any man made drug. :) Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: NotATether on March 30, 2025, 01:13:10 PM Searches seem to be failing. I'm diagnosing the issue.
Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: Agbe on March 30, 2025, 01:59:17 PM Bitcoin: bc1q6dphprljdas0xl2cmqn6tlskselx5xtcpcw8kx Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: DPHOR on March 30, 2025, 02:57:49 PM Great job boss.. I tried to explore it and it was fine and cool. Do you intend to add dark theme? Sometimes I usually get affected by white screen so most time I do tuned on colors inversion on my phone to control how my screen adept to the environment where I am. Sometimes if I am in a light place my screen light increases to fit the weather or sunlight of where I am. Please I would also want to see that added maybe you might feels is not important that there are people who loves accessing most of the site with dark theme.
Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: NotATether on March 30, 2025, 03:03:03 PM The problem has been identified and searches are now working normally again..
Post-mortem analysis Logs in the Google Cloud backend indicated that the service unavailability was caused by the Elasticsearch master node becoming completely unavailable as a result of the hard disk capacity of the hot-content tier server overflowing. This was most likely due to a bug in the script I use to bulk import topics into Elasticsearch. After contacting support, they were able to temporarily upscale the hot-content server to a larger capacity within 20 minutes. I am now going through the data and cleaning up any duplicates and other data that might have overflowed the server. This highlights the importance of using managed resources for production services. If I was running the server myself, it would have probably taken days to recover from this. edit: Apparently, the warm tier server was not utilized at all, not even for a single document. That's why the content server was overran so quickly. The ingestion process ran into trouble after around 46 million documents. Considering the number of posts on the forum exceeds 65 million, I need to make it utilized more aggressively. For now, indexing has been halted while I design a better process for storing the the posts. As a side effect, an option will probably be developed to not search in spam topics such as signature campaigns, bounties, etc when preforming a query. An efficient way to filter local board topics is also desirable but will probably not be done anytime soon. Great job boss.. I tried to explore it and it was fine and cool. Do you intend to add dark theme? Sometimes I usually get affected by white screen so most time I do tuned on colors inversion on my phone to control how my screen adept to the environment where I am. Sometimes if I am in a light place my screen light increases to fit the weather or sunlight of where I am. Please I would also want to see that added maybe you might feels is not important that there are people who loves accessing most of the site with dark theme. Dark theme is not a priority right now. The backend needs to be stabilized first. Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: Royal Cap on March 30, 2025, 04:38:00 PM First of all, thank you for making such a beautiful search system. I searched quite a lot with it and got very good results. However, I think it would be good to add another thing, which is Short by Date. If you had added this, it would have been easier for us to search for any post that has been made recently. Actually, this is my personal feedback.
Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: NotATether on March 30, 2025, 05:08:10 PM First of all, thank you for making such a beautiful search system. I searched quite a lot with it and got very good results. However, I think it would be good to add another thing, which is Short by Date. If you had added this, it would have been easier for us to search for any post that has been made recently. Actually, this is my personal feedback. I will work on this soon. For now, recent posts from March are not available, but as I configure automatic scraping then they will gradually become available. Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: Macabury on March 30, 2025, 07:28:13 PM This search engine is fast. This must have cost so much time and effort, thanks for the good job done. I searched a few topics and I saw threads related to my input clustered waiting for me to click on the exact topic I wanted. This is so innovative. Is this meant to be an extension of ninjastic space?
Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: Vod on March 30, 2025, 10:15:38 PM For now, indexing has been halted while I design a better process for storing the the posts. A good solution for you may be Wasabi (https://wasabi.com/). They do not charge for egress data, but you must keep the data on the server for a minimum of three months. Also, Google cloud should have some sort of monitoring for hard drive usage. I have an alert sent when any disk usage passes 90% for five minutes. Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: NotATether on March 31, 2025, 05:51:17 AM Also, Google cloud should have some sort of monitoring for hard drive usage. I have an alert sent when any disk usage passes 90% for five minutes. They have monitors somewhere, I just haven't figured out how to use them yet. After development, I am learning Operations the hard way :-\ This search engine is fast. This must have cost so much time and effort, thanks for the good job done. I searched a few topics and I saw threads related to my input clustered waiting for me to click on the exact topic I wanted. This is so innovative. Is this meant to be an extension of ninjastic space? No, this is not Ninjastic space. Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: joker_josue on March 31, 2025, 06:33:02 AM Congratulations on the project. A good search tool for Bitcointalk is really useful.
I will look into this in more detail later and then share any suggestions (if applicable). ;) Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: Synchronice on March 31, 2025, 01:36:38 PM First of all, thank you for this service, I appreciate everyone who tries to improve this forum.
Why didn't you host your website on Hetzner? They have one of the cheapest and fastest servers, this is the best thing someone can get for their bucks. $150 per month is a huge cost, I don't think you'll be able to collect that every month. By the way, can you create a demonstration of what's the difference between your search engine and Bitcointalk's (Google) search engine? At the moment I did some search and Bitcointalk have me more accurate results than Talksearch.io Do you plan to add more filters? Like sort by ascending/descending, search by an user X and so on? Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: NotATether on March 31, 2025, 02:12:21 PM Why didn't you host your website on Hetzner? They have one of the cheapest and fastest servers, this is the best thing someone can get for their bucks. $150 per month is a huge cost, I don't think you'll be able to collect that every month. Because I already own a Hetzner server, and I don't want to get kicked out by the reseller due to a hack or DDoS. This is what the Google Cloud setup looks like by the way: https://www.talkimg.com/images/2025/03/31/lyffN.png So not only are there two content servers, which aren't as big as what you can order with an HDD on a other websites, there's also a server for the internal dashboard, the ingestion server that's going to scrape Bitcointalk, and the server that actually delivers search results (enterprise search, on the right of the screen). All this allows me to easily experiment with different search algorithms and parameters. I also get free customer support which allows me to recover from any hardware failure in just hours (literally), like yesterday. Google App Engine itself is virtually free and that's where I host the website. In my experience, it is much better than attempting to host it directly because I don't have to worry about downtime or attacks like for my other sites. By the way, can you create a demonstration of what's the difference between your search engine and Bitcointalk's (Google) search engine? At the moment I did some search and Bitcointalk have me more accurate results than Talksearch.io I haven't gotten a chance to enhance the Talksearch result quality yet. Do you plan to add more filters? Like sort by ascending/descending, search by an user X and so on? Yes, of course. Just give me a couple days. There's other stuff I need to take care of first. Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: dkbit98 on March 31, 2025, 07:41:17 PM Nice service and it open everything really fast, but I don't think this is worth paying $150 per month, especially if you want to make this project long term sustainable.
One of my suggestions is to add optional alternative dark theme switch, and maybe add some default search terms, and predefined board locations. As for donations, you should add all addresses or links in the website footer. Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: libert19 on April 01, 2025, 10:20:51 AM I searched, 'libert19's first post' — I was expecting that results would show my first post, and it didn't. It may be stupid query, but a simple one, and should deffo work.
Another query I did was, "I was fucked, do not repeat same mistakes as me" which is title of this (https://bitcointalk.org/index.php?topic=5289504.msg55596163#msg55596163) thread — and results were irrelevant instead of showing thread which I was expecting. Title was literally copy-pasted here!? Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: BenCodie on April 01, 2025, 11:10:27 AM I searched, 'libert19's first post' — I was expecting that results would show my first post, and it didn't. It may be stupid query, but a simple one, and should deffo work. This is not an AI or conversational search engine. It works from keywords. The engine is not going to know that "libert19" is the user nor is it going to recognize "first post" as search for the first post, it's going to search libert19, first, and post in the topic title/content. I think you're a bit used to AI ::) I think this is better thought of like a pre-ai search engine. Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: JollyGood on April 01, 2025, 11:16:39 AM Without a doubt though congratulations are in order, I think the general consensus is clear that $150 per month is far too expensive for any endeavour of this nature.
As for the search results, I searched for "Satoshi" and saw results that had no chronological (date) order nor a way to filter the results allowing me to view them how I wanted (such as new, old, mentioned in subject, mentioned in post). Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: KingsDen on April 01, 2025, 09:03:31 PM I searched, 'libert19's first post' — I was expecting that results would show my first post, and it didn't. It may be stupid query, but a simple one, and should deffo work. This is not an AI or conversational search engine. It works from keywords. The engine is not going to know that "libert19" is the user nor is it going to recognize "first post" as search for the first post, it's going to search libert19, first, and post in the topic title/content. I think you're a bit used to AI ::) I think this is better thought of like a pre-ai search engine. Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: BenCodie on April 02, 2025, 11:16:11 AM I searched, 'libert19's first post' — I was expecting that results would show my first post, and it didn't. It may be stupid query, but a simple one, and should deffo work. This is not an AI or conversational search engine. It works from keywords. The engine is not going to know that "libert19" is the user nor is it going to recognize "first post" as search for the first post, it's going to search libert19, first, and post in the topic title/content. I think you're a bit used to AI ::) I think this is better thought of like a pre-ai search engine. Maybe it's a consequence of replacing search engines with conversational AI, though from what I've witnessed, libert19 commonly makes these kinds of posts (to the point where I can hardly tell if it is extremely sophisticated trolling or natural). Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: NotATether on April 03, 2025, 09:54:24 AM New content update v1.0.1 published
This update enhances search functionality and includes optimizations to the search algorithm to return more relevant results. Search features:
App features:
The old versions continue to be cached in the cloud, allowing me to roll back immediately if any bugs are detected. There should not be any, but it's a good failsafe to have. Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: libert19 on April 04, 2025, 06:35:04 AM Maybe it's a consequence of replacing search engines with conversational AI, though from what I've witnessed, libert19 commonly makes these kinds of posts (to the point where I can hardly tell if it is extremely sophisticated trolling or natural). That was genuine query and I don't troll, may be my posts just come off like that. Regarding query, even before AI if you search stuff on Google, for example, "Messi's first match" you would find relevant result, so intention was same here. Meanwhile, I repeated the below query after update above, and I am still not getting expected result. Another query I did was, "I was fucked, do not repeat same mistakes as me" which is title of this (https://bitcointalk.org/index.php?topic=5289504.msg55596163#msg55596163) thread — and results were irrelevant instead of showing thread which I was expecting. Title was literally copy-pasted here!? Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: NotATether on April 04, 2025, 06:50:59 AM Meanwhile, I repeated the below query after update above, and I am still not getting expected result. Another query I did was, "I was fucked, do not repeat same mistakes as me" which is title of this (https://bitcointalk.org/index.php?topic=5289504.msg55596163#msg55596163) thread — and results were irrelevant instead of showing thread which I was expecting. Title was literally copy-pasted here!? Your topic is not in the index yet. As I mentioned earlier, a third of the topics could not be uploaded because the process was interrupted when I ran out of disk space. I'm still working on adding them. Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: satscraper on April 04, 2025, 12:32:24 PM @NotATether, thanks for the powerful tool.
Talksearch.io is so important for users that I took the initiative to translate your topic and share it (https://bitcointalk.org/index.php?topic=5537277) in the Russian local board of the forum. I chose to use the first-person narration i.e. NotATether to ensure that no details were lost in the translation. Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: bchannel on April 05, 2025, 03:13:13 PM Thanks for the work, there is still a lot to do but this is a colossal amount of effort.
Sent a small donation of $100 , good luck Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: NotATether on April 06, 2025, 07:27:27 AM New app update v1.0.2 published
This is a minor update that publishes the donation addresses at the footer of the page. Suggested by Igebotz. As a reminder, I am still working to index all topics as well as new ones, so stay tuned for further developments on that front. Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: BenCodie on April 07, 2025, 03:39:42 AM Maybe it's a consequence of replacing search engines with conversational AI, though from what I've witnessed, libert19 commonly makes these kinds of posts (to the point where I can hardly tell if it is extremely sophisticated trolling or natural). That was genuine query and I don't troll, may be my posts just come off like that. Regarding query, even before AI if you search stuff on Google, for example, "Messi's first match" you would find relevant result, so intention was same here. In that case, I think it's a bit much to expect from a tool like this, which is probably just searching through indexed data and nothing more than that... Meanwhile, I repeated the below query after update above, and I am still not getting expected result. Another query I did was, "I was fucked, do not repeat same mistakes as me" which is title of this (https://bitcointalk.org/index.php?topic=5289504.msg55596163#msg55596163) thread — and results were irrelevant instead of showing thread which I was expecting. Title was literally copy-pasted here!? ...Though if it was just searching indexed data, this should work. I suppose the answer might be: - Indexed ALL posts (up to February 2025) *Operational bugs have prevented me from uploading one-third of the posts, which is actively being fixed Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: NotATether on April 07, 2025, 09:11:06 AM Maintenance alert
There will be planned downtime for a few hours (maximum) today in order to work on the Elasticsearch cluster. Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: JollyGood on April 07, 2025, 09:25:07 AM Do you intend to move the hosting elsewhere in an attempt to lower the monthly costs? Even with that generous donation of $100 it only covers 66% of the monthly cost (to run the service for a month). If the project is going to have a long term future, I think finding an alternative host with lower costs will be an important step.
Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: NotATether on April 07, 2025, 11:08:12 AM Do you intend to move the hosting elsewhere in an attempt to lower the monthly costs? Even with that generous donation of $100 it only covers 66% of the monthly cost (to run the service for a month). If the project is going to have a long term future, I think finding an alternative host with lower costs will be an important step. Dedicated servers are prepaid and often sold in units of one, so while useful for small projects, they're a very bad fit for a service that needs to scale and must always remain online (the irony, I know) I plan on adding Altcoinstalks and some other forums & mailing lists into the index eventually. Currently this maintenance is to fix up the cluster "indices" (entities inside the search software used for storing post data) that I screwed up when attempting to upload the rest of the post data a few days ago. That's why the maintenance page is required, as search is not possible without them. Maintenance is complete. Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: Welsh on April 07, 2025, 03:09:44 PM I've always been impressed by the dedication of the community to improve the user experience for us all. This one addresses probably one of the more mentioned problems in recent years (in fact for a very long time). I've tested it out a little, and it seems to be a lot better than the forum search. I'm sure some users will be put off by the fact it's on a third party website, but hopefully this site gets some use in the long term.
Thanks NotATether for yet another great contribution to the community! Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: JollyGood on April 08, 2025, 09:46:55 AM I reiterate the comments mad by Welsh, it is commendable thatand are facing $150 forum members manage to find ways to improve/assist in different ways according to their skills and depending on how much time they have on their hands. Having said that, if you have paid in advance for the servers and are facing $150 payments every month subsequently, what has been your total expenditure and when does the account need replenishing?
Dedicated servers are prepaid and often sold in units of one, so while useful for small projects, they're a very bad fit for a service that needs to scale and must always remain online (the irony, I know). Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: NotATether on April 08, 2025, 10:52:37 AM I reiterate the comments mad by Welsh, it is commendable thatand are facing $150 forum members manage to find ways to improve/assist in different ways according to their skills and depending on how much time they have on their hands. Having said that, if you have paid in advance for the servers and are facing $150 payments every month subsequently, what has been your total expenditure and when does the account need replenishing? This infrastructure is all postpaid and my bill arrives at the beginning of each month. Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: katanic97 on April 08, 2025, 07:13:36 PM Croatian translate by katanic97 (https://bitcointalk.org/index.php?action=profile;u=1856852;sa=summary/)
Topic Talksearch.io - Napredni pretraživač za Bitcointalk (https://bitcointalk.org/index.php?topic=5537640.msg65257270#msg65257270) Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: BitcoinGirl.Club on April 08, 2025, 10:41:58 PM Roadmap ____________________ - Continuously ingest posts from Bitcointalk (immediate priority) - Refine search result quality - Introduce filters for user, date, etc. You need to add features asap because right now how it's better (https://talksearch.io/search?q=Bitcoingirl) than going to search engine like Google and ask to find a string using specific search: 'site: bitcointalk.org "bitcoingirl"'. Congratulations on your project. It's nice to see community members are trying to create tools that will be useful for the community. Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: Porfirii on April 11, 2025, 04:34:11 PM Congratulations NotATether for your search engine, and thank you for sharing it with all users :)
Several members of the AoBT (https://bitcointalk.org/index.php?topic=5442314.0) have proposed to translate this topic and post it in our respective local boards to help spread the word, so we'd like to reserve the following languages, with your permission, among those that remain untranslated: Polish, Romanian, Turkish, French, Spanish, Ukrainian, Filipino, Pidgin, Bangla, Portuguese, Urdu and Hindi. Keep up the good work! Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: Upgrade00 on April 11, 2025, 09:45:55 PM Checked this out again and there has been a couple of good updates and it's still as fast as it was. I assume it's a struggle trying to maintain the speed while indexing more data, but it's working well so far.
I will be adding this to my top options for searching on the forum now. I think it will great it theymos can add some custom search engines like this and ninjastic when a user clicks the on the search icon. Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: joker_josue on April 12, 2025, 06:53:07 AM I was doing some tests, until I realized that there is no possibility to choose the order in which the results appear.
It would be interesting to be able to choose whether we want to see the most recent result first, or the oldest one. Or if we want to see the most frequent or most similar terms used in the search. I think that would be interesting, just a suggestion. Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: NotATether on April 12, 2025, 07:51:05 AM I was doing some tests, until I realized that there is no possibility to choose the order in which the results appear. It would be interesting to be able to choose whether we want to see the most recent result first, or the oldest one. Or if we want to see the most frequent or most similar terms used in the search. I think that would be interesting, just a suggestion. I'm working on that. Congratulations NotATether for your search engine, and thank you for sharing it with all users :) Several members of the AoBT (https://bitcointalk.org/index.php?topic=5442314.0) have proposed to translate this topic and post it in our respective local boards to help spread the word, so we'd like to reserve the following languages, with your permission, among those that remain untranslated: Polish, Romanian, Turkish, French, Spanish, Ukrainian, Filipino, Pidgin, Bangla, Portuguese, Urdu and Hindi. Keep up the good work! I will add the new translated topics now. Checked this out again and there has been a couple of good updates and it's still as fast as it was. I assume it's a struggle trying to maintain the speed while indexing more data, but it's working well so far. I will be adding this to my top options for searching on the forum now. I think it will great it theymos can add some custom search engines like this and ninjastic when a user clicks the on the search icon. Moving the data to the larger, "warm" node means that searches became slightly slower, but I plan to improve this by splitting up the index of Bitcointalk posts into smaller parts, specifically: Non-english posts, posts in Archival, or spam (e.g. Bounties). Then whatever's left over after that will not be so large, and might make search queries perform faster. Currently, I have the ability to scrape any topic or update it, but I still have to automate it. Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: Adiljutt156 on April 14, 2025, 11:57:54 AM Hello NotATether!
I am the member of AOBT Gang ( The Alliance Of Bitcointalk Translators). New translation is now available of this topic into Urdu language. Topic: Talksearch.io - Advanced Bitcointalk Search Engine (https://bitcointalk.org/index.php?topic=5536692.msg65221234#msg65221234) Translation: Talksearch.io - ایڈوانسڈ بٹ کوائن ٹاک سرچ انجن (https://bitcointalk.org/index.php?topic=232519.msg65274038#msg65274038) Thanks :) Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: examplens on April 15, 2025, 01:07:36 PM I was doing some tests, until I realized that there is no possibility to choose the order in which the results appear. Tested it for the first time and that's what I'm missing too. Sorting by time of creation and possible number of views or comments of a certain topic. It would help to identify more relevant results.It would be interesting to be able to choose whether we want to see the most recent result first, or the oldest one. Or if we want to see the most frequent or most similar terms used in the search. I think that would be interesting, just a suggestion. Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: cygan on April 15, 2025, 05:29:51 PM @NotATether maybe my pm has somehow disappeared in your message center or you haven't had the time yet or simply forgot about it... ;)
i wanted to remind you again that the polish translation (https://bitcointalk.org/index.php?topic=5537249.0) of your search engine is now ready - would be nice if you would include it in your overview :) Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: NotATether on April 18, 2025, 11:32:45 AM An update to the enhanced search feature:
A new dataset is being uploaded to Elasticsearch. This dataset is more enriched than the current unprocessed posts and includes even more metadata such as the lock type, scrape time and check time, which will be used along with other parameters to determine in what order should topics be checked for updates and the frequency they will be checked. An experimental quality score is also included with each post, in an attempt to deprioritize low-quality posts and sig spam from the search results. In an effort to remove irrelevant data such as quotes from the search results, posts are now divided into chunks, delimited by the presence of a quote or a line separator. This upload process was started yesterday, and about 5 million records have been indexed so far, out of a total estimated to be around 120 million. https://www.talkimg.com/images/2025/04/18/xiIAD.png The v2 indices contain the data which Talksearch will use for searching in the future. Also, local language posts are categorized to facilitate for local search. I continue to work on automatic scraping support. However, the v2 dataset is more recent than the original, and contains posts from up to March 2025. New translated ANN links will be added shortly. Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: BitcoinGirl.Club on April 18, 2025, 11:39:06 AM http://talksearch.io/search?q=bitcoingirl
Taking ages to load. Is the search engine working? Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: joker_josue on April 18, 2025, 02:27:57 PM This upload process was started yesterday, and about 5 million records have been indexed so far, out of a total estimated to be around 120 million. That's a lot of data. Is the system cataloging all the words and things like that? How are you processing all the information? Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: LoyceV on April 18, 2025, 02:57:45 PM An experimental quality score is also included with each post, in an attempt to deprioritize low-quality posts and sig spam from the search results. I'm curious to see the posts per user sorted by your experimental algorithm. Would that be possible to search for?Quote about 5 million records have been indexed so far, out of a total estimated to be around 120 million. Is 120 million the number of posts + edits?Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: NotATether on April 18, 2025, 03:37:35 PM http://talksearch.io/search?q=bitcoingirl Taking ages to load. Is the search engine working? The warm tier of nodes is slower than the hot/content tier when fetching data, but has about 4x more storage capacity. Currently, v1 (talksearch_bitcointalk in the picture) search is running on warm nodes. It used to be on the hot nodes, where searches were quite fast, but in the process of fixing my cluster, it got moved to warm. All v2 indices besides English are on the hot nodes, however I'm not particularly satisfied with the amount of low-quality posts present in this index, so I'm considering moving the high-quality Engilsh posts to the hot nodes. Then there would be a checkbox on the site that reads "Only search high-quality posts". The issue is, I currently don't have a reliable way to measure post quality. By the way, http:// does not currently work on Talksearch. Use https://. I am thinking about redirecting all traffic to the https:// version anyway. That's a lot of data. Is the system cataloging all the words and things like that? How are you processing all the information? Yes! In fact I am excited to show you the advanced classifications that are available for the data. Elasticsearch has a number of field types available for naturally processing JSON, more than just strings, numbers, and booleans. I am talking about things like rank features (common in search applications), vectors, points, geolocation stuff, dates, binary data, and many other stuff: https://www.talkimg.com/images/2025/04/18/x8ydd.png These are then used by Elasticsearch's query language. It is very advanced and can find stuff much more relevant than e.g. a regex-based search. This is where the true power lies. Here's a list of things that its query language is able to do: - It can boost or penalize certain keywords - It supports semantic search e.g. "satoshi nakamoto identity" is supposed to return topics about Hal Finney or Nick Szabo - Apply autocorrect and "keyboard shift" - It can match text that only appears in a certain position in the post - It can find "More like this" results - You can attach scripts to create complex searches (but this is slow) - Feature-based search is fully supported, so you can make queries for posts based on username, board, topic title, date etc in the exact same manner as for post body. (This last part is sorely missing from Google Site Search.) There's a lot more stuff I didn't list that you can find here (https://www.elastic.co/docs/reference/query-languages/querydsl). But the exact algorithm I use is proprietary. I'm curious to see the posts per user sorted by your experimental algorithm. Would that be possible to search for? That's a great idea, but with the current version of Talksearch this is not possible. It will eventually be available though. Is 120 million the number of posts + edits? No, 120 million is an estimate of the average number of chunks there would be if you split posts by quote body, or by line break [ hr ] tags. For example, this post I am writing would be indexed as 4 chunks, as they are separate pieces of information being written. This allows for users to see as specific results as possible from the search results page without having to navigate to the Bitcointalk post. I assume there are on average a bit less than one quote or line break per post, hence why there's a bit less than 2 chunks on average (quotes etc. plus one). I only store the latest revision of each post. Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: dkbit98 on April 18, 2025, 05:24:58 PM An update to the enhanced search feature It is working very slow for me, and I only wrote a simple two word search term.Maybe you should allow users to narrow down search into specific boards/topics only, that would certainly speed things up significantly. You can also add search only for specific members to speed things even more. I compared search using Ninjastic.space with new updated Talkserach with, and Ninjastic gave much better and faster results. Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: NotATether on April 20, 2025, 04:48:39 AM Searches are temporarily unavailable while I perform some routine maintenance on the nodes.
Thanks for understanding. Update: maintenance is complete. This should fix the search performance. Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: BitcoinGirl.Club on April 20, 2025, 05:18:33 AM Update: maintenance is complete. This should fix the search performance. It's a little faster than the last time I checked.Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: NotATether on April 20, 2025, 05:39:41 AM It's a little faster than the last time I checked. The nodes ran out of memory processing the uploaded posts, that's why performance sucked for a while. This was resolved by increasing the size of the hot/content node, and deleting the warm node. This has cut my storage capacity by about half (I have about 180GB total disk now) and marginally increases the total cost by about $10, but it will keep the node stable for some time. I will have to slow down the rate at which I'm uploading posts to the cluster, in order to avoid this sort of thing happening again. However, the new data is nowhere near close to 100% uploaded, so i will have to figure out a way to make that faster so that I can make Talksearch use the higher-quality v2 indices. Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: SamReomo on April 20, 2025, 01:43:25 PM A big congratulations from my side OP, you've really solved one of the biggest issues of Bitcointalk but like others I also think that you're paying way more for it than needed. $150 + $10 a month is a huge amount to pay for a service like that. However, I hope that you may receive some good donations or get some good sponsors so it won't be a burden on your shoulders alone.
Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: GazetaBitcoin on April 20, 2025, 08:08:00 PM Hey NotATether, please be aware that 2 more translations were made for your topic by AOBT. I hope this is good news :)
Urdu (https://bitcointalk.org/index.php?topic=232519.msg65274038#msg65274038) translation, made by Adiljutt156 Polish (https://bitcointalk.org/index.php?topic=5537249) translation, made by cygan Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: PX-Z on April 20, 2025, 10:52:49 PM Good project and initiative!
@OP, could you clarify whether the $150 covers only the hosting costs, or does it also include access to Elasticsearch, since I noticed it has paid options too? I tried accessing the site just yesterday, but it was down at the time. However, it's noticeably faster now, great improvement so far. Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: nakamura12 on April 21, 2025, 03:55:08 AM On the page number slider, one has to move one at a time forward or backwards, can there be an option to jump to the specific page you want to check out? If it gets too long, it can be trimmed out. Talksearch will only return 10 pages max, so it would be better if I made them all clickable. This will be done later, as I am on holiday tomorrow.Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: NotATether on April 21, 2025, 04:19:06 AM Good project and initiative! @OP, could you clarify whether the $150 covers only the hosting costs, or does it also include access to Elasticsearch, since I noticed it has paid options too? I tried accessing the site just yesterday, but it was down at the time. However, it's noticeably faster now, great improvement so far. It includes Elasticsearch (which is "self-hosted" on Google Cloud, and makes up the majority of the bill). Edit: It appears that quote splitting over 56 million posts is going to take many days (and not a few days like I imagined at first), and this is with the processor and the document uploader running in parallel, so while we wait, I am going to prioritize adding fine-grained search fields like user, title, date to the website's search parameters. The old v1 index that's currently being used for search can also have the remaining posts from that collection uploaded in there. That one at least shouldn't take too long, because only new or previously non-existent posts are going to be added there. Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: GazetaBitcoin on April 24, 2025, 09:34:20 AM Hello again NotATether!
In addition to my previous post (https://bitcointalk.org/index.php?topic=5536692.msg65298114#msg65298114), I'd like to let you know that three more translations have been made by AOBT for this thread: Turkish (https://bitcointalk.org/index.php?topic=5539433.0) translation, made by mela65 Portuguese (https://bitcointalk.org/index.php?topic=5539449.0) translation, made by r_victory Romanian (https://bitcointalk.org/index.php?topic=5539441.0) translation, made by myself In this moment, this topic has 10 translations, so from AOBT perspective it is considered as "Done". However, if new translations will be made, I will let you know. Cheers! Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: NotATether on April 24, 2025, 04:09:06 PM Hello again NotATether! In addition to my previous post (https://bitcointalk.org/index.php?topic=5536692.msg65298114#msg65298114), I'd like to let you know that three more translations have been made by AOBT for this thread: Turkish (https://bitcointalk.org/index.php?topic=5539433.0) translation, made by mela65 Portuguese (https://bitcointalk.org/index.php?topic=5539449.0) translation, made by r_victory Romanian (https://bitcointalk.org/index.php?topic=5539441.0) translation, made by myself In this moment, this topic has 10 translations, so from AOBT perspective it is considered as "Done". However, if new translations will be made, I will let you know. Cheers! Thank you. My apologies for not adding translations to the OP yet. I am busy fighting my upload scripts to get them sending posts to Elasticsearch at an acceptable rate. For some reason, my usually fast hard disk drive on my server has slowed down massively, even after a reboot, and I'm not exactly sure why. Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: LoyceV on April 24, 2025, 04:40:37 PM my usually fast hard disk drive on my server has slowed down massively, even after a reboot, and I'm not exactly sure why. Some (shitty) disks drop in performance after long sustained writes, but since you're using Google Cloud, that shouldn't happen.Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: Vod on April 24, 2025, 04:43:17 PM For some reason, my usually fast hard disk drive on my server has slowed down massively, even after a reboot, and I'm not exactly sure why. Are your indexes normally set for reading? You should disable them while doing massive updates. Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: Porfirii on April 25, 2025, 10:56:14 AM Hello again NotATether! In addition to my previous post (https://bitcointalk.org/index.php?topic=5536692.msg65298114#msg65298114), I'd like to let you know that three more translations have been made by AOBT for this thread: Turkish (https://bitcointalk.org/index.php?topic=5539433.0) translation, made by mela65 Portuguese (https://bitcointalk.org/index.php?topic=5539449.0) translation, made by r_victory Romanian (https://bitcointalk.org/index.php?topic=5539441.0) translation, made by myself In this moment, this topic has 10 translations, so from AOBT perspective it is considered as "Done". However, if new translations will be made, I will let you know. Cheers! Thank you. My apologies for not adding translations to the OP yet. I am busy fighting my upload scripts to get them sending posts to Elasticsearch at an acceptable rate. For some reason, my usually fast hard disk drive on my server has slowed down massively, even after a reboot, and I'm not exactly sure why. Hi! Sorry for the short delay. I've just finished the Spanish (https://bitcointalk.org/index.php?topic=5539433.0) translation for this thread. Although we usually mark topics as "Done" when they reach the 10 translations mark, we are going to keep this one as "in progress" for a few more days, just in case other members finish theirs soon too. Gazeta will let you know (ty!!!). As this tool is still being developed and changes are expected to be made in the OP soon, although we'll be monitoring this thread from time to time, please, let us know when we have to update our translations by posting directly in our thread (https://bitcointalk.org/index.php?topic=5442314) or, if you prefer, sending a PM to Gazeta or me. Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: NotATether on April 30, 2025, 04:15:42 AM I don't want to keep you guys waiting, so should I release a mode for Talksearch that queries data from the new index?
There are about 2.6 million chunks so far. A chunk is just a section of a post delimited by someone's quote or by a horizontal line. The upload is not slow - It's uploading 1000 chunks every couple of seconds, but there are simply millions of them. This mode will not replace the currennt search engine yet. Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: Hossain Risfa on May 01, 2025, 05:38:54 PM I've done my translation in our Bangla local board thank you very much for giving me permission to translate the post. I was quite nervous and I don't know that am I able to translate accurately. But after post my translation in our local board some senior brothers told me that I've done great translation and my translation skill is good and they also give me some advice. Thank you @NotaTether for give me permission and thankyou for giving me a chance to translate it give me experience and all over as e newbie you think thak I may be able to do and sir I try to do my best. My post translational link .
Talksearch.io - Advanced Bitcointalk Search Engine (https://bitcointalk.org/index.php?topic=631891.msg65330497#msg65330497) Translated in Bangla local board by Hossain Risfa Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: NotATether on May 02, 2025, 02:30:52 PM Well, it looks like I've hit another snag during uploading. Thankfully, this has nothing to do with Elasticsearch, but with my scraping server.
As you might be aware, I scrape the posts on my server before processing them. The processing involves splitting up posts by quotes, which create a series of chunks for each posts. Usually 1-3. This is saved to the disk, and then another part of the program reads them into memory, and after that these are uploaded to Elasticsearch. It seems that the splitting process has created so many chunks that I simply cannot create any more in that folder. Any attempts to do so lead to an error. It might have something to do with the fact that there are tens of millions of these files (inodes) in the filesystem, but I don't know if ext4 has such a limitation. And I'm definitely not out of disk space (though the Elasticsearch server could be a different story when this is all uploaded), as not even 50% of the disk space is used so far. (Strangely, I'm not out of inodes either.) One solution to this problem could be to avoid saving these chunks to the disk all together and run the processing and upload as one step. This is what I was doing for several days, but then I had to diagnose performance issues on the cluster so it got interrupted. Performance was bad after that though, because I was reading already-uploaded chunks form the disk. Another solution would be to simply avoid processing low-quality posts, e.g. gambling discussion. This will make for a smaller set, but it will take vastly less space. I estimate that around 15% of all Bitcointalk posts are made on Gambling Discussion. This is mostly sig spam that nobody wants to read, so there's no use returning that in search results. As a side effect of this, it will bring features resembling Google de-indexing to Talksearch, but I will never knowingly de-index posts I don't agree with. There will still be an index containing all existing forum posts, but that will be reserved for detailed search and API only. Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: LoyceV on May 02, 2025, 03:29:34 PM It might have something to do with the fact that there are tens of millions of these files (inodes) in the filesystem, but I don't know if ext4 has such a limitation. And I'm definitely not out of disk space (though the Elasticsearch server could be a different story when this is all uploaded), as not even 50% of the disk space is used so far. (Strangely, I'm not out of inodes either.) I have some experience dealing with tens of millions of files, and apart from making a directory view terribly slow, it works fine as long as you have enough inodes.On ext4, with default settings, it looks like a ten times larger disk does not get ten times more inodes. I checked a few disks, and typical limits are tens to hundreds of millions of inodes per disk. Just enter df -hi and it tells you want you need to know. Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: Z_MBFM on May 02, 2025, 06:52:33 PM Although I used to search Google to see if there was a related topic on this forum before I thought about something, I could have found information there too, but since Google is a search engine, there would have been many more search results besides the forum related.
However, I found using talksearch that it could make our forum related search much smoother. However, it is quite effective. nice job op Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: NotATether on May 03, 2025, 09:10:01 AM I have some experience dealing with tens of millions of files, and apart from making a directory view terribly slow, it works fine as long as you have enough inodes. On ext4, with default settings, it looks like a ten times larger disk does not get ten times more inodes. I checked a few disks, and typical limits are tens to hundreds of millions of inodes per disk. Just enter df -hi and it tells you want you need to know. About 18% of my inodes are used. ls ran for a horribly long time but I finally got output: Code: zenulabidin@zerstrorer ~ % ls -l /opt/talksearch/processed_chunks | wc -l So about 30 million files. Thank goodness for zsh, otherwise I wouldn't have known the run time of this. I'll see if this long directory listing time is the cause of "No space left on device" bailing-out in the filesystem code and/or the kernel. Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: LoyceV on May 03, 2025, 09:24:43 AM About 18% of my inodes are used. As far as I know, there are no limits to the number of files per directory on ext4, so this is weird. I'm pretty sure I've had more files in one directory before I added subdirectories for faster listings.~ So about 30 million files. ~ I'll see if this long directory listing time is the cause of "No space left on device" bailing-out in the filesystem code and/or the kernel. I'm going to test it :) I don't want this many files on my own system, so I use a temporary server: Code: 16GB PKVM Code: i=1; while test $i -le 40000000; do echo "Hello world!" > $i; i=$((i+1)); done I'll be damned: No space left on device! I got to 29,272,362 files with 22M inodes free. Filesystem: Code: /dev/vda1 on / type ext4 (rw,relatime,discard,errors=remount-ro,commit=30 It gets weirder: I can still create new files, just not all of them: Code: i=100000000; time while test $i -le 110000000; do echo "Hello world!" > $i; i=$((i+1)); done Code: ls 10000282* Root command dmesg shows this: Code: [ 2024.349441] EXT4-fs warning: 598 callbacks suppressed Solution Enabling ext4 large_dir (https://serverfault.com/questions/1052075/when-enabling-ext4-large-dir-how-can-you-tell-its-used) seems to fix it: Code: tune2fs -O large_dir /dev/nvme2n1 The EXT4 "largedir" feature overcomes the current limit of around ten million entires allowed within a directory on EXT4. Now, EXT4 directories can support around two billion directory entries. However, you are likely to hit performance bottlenecks before hitting this new EXT4 limitation. It looks like the safe limit is about 10 million files per directory, although it may work up to around 30 million files, but you shouldn't get anywhere near that number without enabling large_dir because things start failing. I completed my test at over 51 million files in a single directory. No more errors until I actually ran out of inodes. Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: NotATether on May 03, 2025, 12:35:16 PM ~ Solution Enabling ext4 large_dir (https://serverfault.com/questions/1052075/when-enabling-ext4-large-dir-how-can-you-tell-its-used) seems to fix it: Code: tune2fs -O large_dir /dev/nvme2n1 The EXT4 "largedir" feature overcomes the current limit of around ten million entires allowed within a directory on EXT4. Now, EXT4 directories can support around two billion directory entries. However, you are likely to hit performance bottlenecks before hitting this new EXT4 limitation. It looks like the safe limit is about 10 million files per directory, although it may work up to around 30 million files, but you shouldn't get anywhere near that number without enabling large_dir because things start failing. I completed my test at over 51 million files in a single directory. No more errors until I actually ran out of inodes. Amazing work! The forum should hire you as a consultant :) I can restart the chunks processing now, but it's going to be starting from the first topic because I lost track of which topics failed to write. Fortunately it is much faster than upload at the moment - I was actually processing topics from 2023 when I noticed this issue. Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: joker_josue on May 04, 2025, 06:34:05 AM I can restart the chunks processing now, but it's going to be starting from the first topic because I lost track of which topics failed to write. Fortunately it is much faster than upload at the moment - I was actually processing topics from 2023 when I noticed this issue. Don't run the system all at once! Make it run in cycles, for example 1 year at a time. This way, if there is a failure in any cycle, you know to what extent everything is fine and you won't have to start from scratch. You do this manually by running the script in each cycle. Or you can set up the script so that it runs in cycles and keeps a log of the events. Whenever a cycle ends, it informs you of the result, if everything is ok. This way you can follow the process. Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: GazetaBitcoin on May 04, 2025, 12:18:06 PM Hey NotATether, please be aware that 1 more translation was made for your topic by AOBT:
Ukrainian (https://bitcointalk.org/index.php?topic=236982.msg65341140#msg65341140) translation, made by DrBeer Cheers! Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: Mahiyammahi on May 09, 2025, 10:12:09 AM Hey NotATether, how about creating an AI Model only specific to the Bitcointalk forum? Since you have developed a search engine, it can scrape posts. Why not train an AI model using it? I don't know if it will be helpful for forum users. But if an AI model found that scraps all the answers from the Bitcointalk forum Topic, replies, I think it won't be bad. A user can get their answer within a few seconds rather than scraping all the data it had been on Bitcointalk. Other AI models like Chatgpt look everywhere for an answer. So, if an AI model specifically only looks at forum data, this would be great.
Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: $crypto$ on May 09, 2025, 12:00:18 PM Hey NotATether, how about creating an AI Model only specific to the Bitcointalk forum? Since you have developed a search engine, it can scrape posts. Why not train an AI model using it? I don't know if it will be helpful for forum users. But if an AI model found that scraps all the answers from the Bitcointalk forum Topic, replies, I think it won't be bad. A user can get their answer within a few seconds rather than scraping all the data it had been on Bitcointalk. Other AI models like Chatgpt look everywhere for an answer. So, if an AI model specifically only looks at forum data, this would be great. There is an AI search engine (Bitcointalk) you can do some browsing there.[AI Search Engine] Bitcointalk (https://bitcointalk.org/index.php?topic=5537932.0) Have tried asking questions on this AI search engine --- there are some answers that the AI gives are not accurate, and it takes a few seconds to give an answer. Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: NotATether on May 09, 2025, 01:32:16 PM Hey NotATether, how about creating an AI Model only specific to the Bitcointalk forum? Since you have developed a search engine, it can scrape posts. Why not train an AI model using it? I don't know if it will be helpful for forum users. But if an AI model found that scraps all the answers from the Bitcointalk forum Topic, replies, I think it won't be bad. A user can get their answer within a few seconds rather than scraping all the data it had been on Bitcointalk. Other AI models like Chatgpt look everywhere for an answer. So, if an AI model specifically only looks at forum data, this would be great. I don't have a dev team, so this will take a very long time to implement. It is not a priority at the moment. In fact, only about 5 million chunks out of almost a hundred million have been uploaded so far. Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: hopenotlate on May 09, 2025, 03:10:44 PM I had some free time, and as a sign of gratitude for the efforts you make to improve users experience of this forum I took the liberty of translating opening post into Italian as I noticed it hadn't been done yet, without even asking your permission to do it.
I hope you don't mind and please let me know if it's okay or if I should remove it. Translation link : Talksearch.io - Motore di ricerca avanzato per Bitcointalk (https://bitcointalk.org/index.php?topic=5542282.0) Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: NotATether on May 12, 2025, 06:01:34 AM I had some free time, and as a sign of gratitude for the efforts you make to improve users experience of this forum I took the liberty of translating opening post into Italian as I noticed it hadn't been done yet, without even asking your permission to do it. I hope you don't mind and please let me know if it's okay or if I should remove it. Translation link : Talksearch.io - Motore di ricerca avanzato per Bitcointalk (https://bitcointalk.org/index.php?topic=5542282.0) Anybody can make a translation of this topic without asking me. But to avoid duplicate efforts, people should make sure that a local translation doesn't already exist. On an unrelated note - Google Cloud is so useful! It's like having a free VScode in the cloud that doesn't cost anything extra, along with a database and git integration. HTTP server URLs are practically free as well. I am even using it for other projects too. It's too bad that Elasticsearch is not keeping up with the load :P, I guess I will have to wait a while for the upload to complete. Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: hopenotlate on May 12, 2025, 09:26:18 AM I had some free time, and as a sign of gratitude for the efforts you make to improve users experience of this forum I took the liberty of translating opening post into Italian as I noticed it hadn't been done yet, without even asking your permission to do it. I hope you don't mind and please let me know if it's okay or if I should remove it. Translation link : Talksearch.io - Motore di ricerca avanzato per Bitcointalk (https://bitcointalk.org/index.php?topic=5542282.0) Anybody can make a translation of this topic without asking me. But to avoid duplicate efforts, people should make sure that a local translation doesn't already exist. -snip- Glad to hear everything it's ok with it; maybe to avoid a duplicate you might want to add my translation link in opening post just for everyone to make sure at a first look it has already been done. Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: NotATether on May 12, 2025, 09:29:43 AM It’s missing a few key features though. Not being able to tweak the search text is a bit of a letdown since that’s pretty important for narrowing things down. Can you elaborate on this? I don't really understand what you mean by tweaking. Would you like variations that are more professional, casual, or critical? As in what? Sorry but just like the other part, I'm not very sure what you're asking for here. I am working on automatically including synonyms and verb conjugations of search terms in order to capture additional relevant topics though. This is something I can do independently of the document upload. Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: nutildah on May 18, 2025, 10:20:08 AM It’s missing a few key features though. Not being able to tweak the search text is a bit of a letdown since that’s pretty important for narrowing things down. Can you elaborate on this? I don't really understand what you mean by tweaking. Would you like variations that are more professional, casual, or critical? As in what? Sorry but just like the other part, I'm not very sure what you're asking for here. ... The problem is you're talking with a bot, or a human emulating a bot (https://bitcointalk.org/index.php?topic=5456516.msg65392364#msg65392364), rather. This last part is the AI asking him if he want the output rephrased but he just copy/pasted it because, naturally, he's a maroon: Would you like variations that are more professional, casual, or critical? Don't let the bots bring you down, NotATether! :D As a human, I for one applaud your efforts and think its great to see alternative resources being built around forum data. Will remember to add it to my arsenal the next time I am researching something. Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: Wouter Mense on May 19, 2025, 09:50:57 AM The issue is, I currently don't have a reliable way to measure post quality. Suggest to look at "user quality". Example post history (https://bitcointalk.org/index.php?action=profile;u=3618422;sa=showPosts;start=0). A lot of this kind of user exists. Looked at recent unread topics and this one I found at my third try. The patterns to look for in this case there are about 1200 posts that all "look" the same: - Each post begins with a quote. - Followed by one or two lines of text. Other things to look for: - All roughly the same total length. - All roughly the same number of paragraphs, of the same length. - Same number of sentences, of the same length. - Each with for example one image. All these are in my opinion the result of "forced" content generation. Usually with financial incentive I would assume. Of course above metric can be gamed. The thing here is that this pattern is predictable. The next posts of above user will also look the same. Introucing more variety in post style will take more effort, and would possibly also be indicative of improved quality. Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: NotATether on May 19, 2025, 10:53:33 AM Don't let the bots bring you down, NotATether! :D As a human, I for one applaud your efforts and think its great to see alternative resources being built around forum data. Will remember to add it to my arsenal the next time I am researching something. Thanks, I appreciate it. The issue is, I currently don't have a reliable way to measure post quality. Suggest to look at "user quality". Example post history (https://bitcointalk.org/index.php?action=profile;u=3618422;sa=showPosts;start=0). A lot of this kind of user exists. Looked at recent unread topics and this one I found at my third try. The patterns to look for in this case there are about 1200 posts that all "look" the same: - Each post begins with a quote. - Followed by one or two lines of text. Other things to look for: - All roughly the same total length. - All roughly the same number of paragraphs, of the same length. - Same number of sentences, of the same length. - Each with for example one image. All these are in my opinion the result of "forced" content generation. Usually with financial incentive I would assume. Of course above metric can be gamed. The thing here is that this pattern is predictable. The next posts of above user will also look the same. Introucing more variety in post style will take more effort, and would possibly also be indicative of improved quality. Noted. I do think, however, that post quality can be quantified somehow, so I'm going to look for some research on how that would be calculated. Probably it should be between 0 and 1. Then the user quality can be set to the mean of all post qualities from that user, which is then used as a weight for search results, but will not dampen results too much compared to post quality. Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: Wouter Mense on May 19, 2025, 12:10:51 PM I do assume a strong correlation between post and user quality but I don't have proof.
Also I totally ignored topic context. post quality can be quantified somehow Looking at just one post without context? I guess it would be less cpu time?Quote look for some research After reading your post I did pose a few questions to ai chat with possibly interesting results. Queries (in order, with typos, and ai chat answers between each query):
Offtopic, I hope you appreciate getting more questions instead of more answers. I do believe asking the right questions is more helpful to start your research. I can't vouch for the quality of ai answers, just that it looked interesting. I'm not a programmer, but it does offer to write your code as well. Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: NotATether on May 23, 2025, 11:33:51 AM Offtopic, I hope you appreciate getting more questions instead of more answers. I do believe asking the right questions is more helpful to start your research. I can't vouch for the quality of ai answers, just that it looked interesting. I'm not a programmer, but it does offer to write your code as well. I appreciate it greatly. I have done some looking around over the past few days, and I found a machine learning model called BERT (https://research.google/pubs/bert-pre-training-of-deep-bidirectional-transformers-for-language-understanding/) that was made by Google in 2018 for search engines. Can you believe that. An AI model from before AI models were a thing. :) I do have sort of a background in machine learning models, so I can summarize it briefly here: Instead of vectorizing words, and thus relying on keywords to search, it vectorizes entire phrases. Words that are adjacent to each other in a sentence. This makes natural language search possible (example: "block size wars" returning debates about segwit and bcash instead of only posts with "block size" in them). There are many improved versions of BERT nowadays, large ones and small ones. However, the models require dedicated hardware with GPUs to run. The good news is, Elasticsearch makes it painfully easy to deploy a model. You literally just have to press the "Run" button next to it. And then search algorithms will be using the model automatically. The bad news is, they don't come cheap. There is one ML node in my cluster, which I receive at no additional cost, but it only has 1GB of RAM and can't store any model, so it's pretty useless. Upgrading to the next hardware tier that has 2GB is going to bump the total monthly bill to around $300. And I am already hounded enough by Google with biweekly invoices. Therefore I want to wait until all the new post content is uploaded before I delete the old, incomplete post content, which will allow me to slash the storage size by about half. Then adding a larger ML node will make Talksearch's running cost somewhat lower than they are right now. It will be a wise investment, though. GPUs on dedicated servers are not plentiful, and are much expensive than this. Unfortunately, despite thousands and thousands of post chunks a day being uploaded, I am only about 10% of the way there. I can't experiment with BERT search until it's done. And pray my server doesn't run out of memory mid-upload, because my disk being the primary bottleneck means that retries will not be faster. But move to an SSD or something and the Elasticsearch nodes get overwhelmed with requests and run out of memory themselves. I imagine this whole process becomes much faster with even larger hardware, but that is not an amount I'm willing to spend, especially on a beta product. Good thing there is only one "initial block download" - after that I'll never have to worry about that again (unless catastrophic data loss occurs, as I'm only paying for one availability zone - ugh). Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: NotATether on May 26, 2025, 02:49:33 PM It appears that there is a problem with making search queries again. I will investigate this.
Please do not delete this post. Update: The problem has been identified. It appears that the access token has expired. I am currently deploying a fix and will update you when this is done. Update 2: It has been fixed. Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: NotATether on May 31, 2025, 08:34:00 AM Guys, I need some suggestions. I want to move the Elasticsearch server off of Google Cloud, due to AML problems I'm now facing when I attempt to load my card $100 to pay the bills.
What are some hosting providers that *do not* use Coingate or Cryptomus? Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: LoyceV on May 31, 2025, 08:47:08 AM What are some hosting providers that *do not* use Coingate or Cryptomus? I got my last VPS from Servarica, but can't remember which payment provider they used. I checked my email, and it doesn't show anything from any external provider. I can't really test it by making a payment now, maybe just ask them?This is the offer I took (https://lowendtalk.com/discussion/199994/servarica-black-friday-2024-dedicated-servers-unified-plans-and-storage-incredible/p1) (8 slices Slim Plan + 2 TB SAN Storage). Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: NotATether on May 31, 2025, 09:48:17 AM What are some hosting providers that *do not* use Coingate or Cryptomus? I got my last VPS from Servarica, but can't remember which payment provider they used. I checked my email, and it doesn't show anything from any external provider. I can't really test it by making a payment now, maybe just ask them?This is the offer I took (https://lowendtalk.com/discussion/199994/servarica-black-friday-2024-dedicated-servers-unified-plans-and-storage-incredible/p1) (8 slices Slim Plan + 2 TB SAN Storage). It's only enough to know if they support Monero payments or not. If so, then no transaction screening on any coins since XMR is untraceable anyway. Looks like I'm going to be scouring LowEndTalk for a while. Some specs I'm looking for to make searching easier: - 512GB SSD - At least 16GB of memory - more is obviously good, I want indexing to be instantaneous this time, instead of taking months. - A regular Intel/AMD processor will do (Apparently, I don't need a GPU (https://discuss.elastic.co/t/elastic-machine-learning-need-gpu/359471/4). w00t!) - 1 Gbps Ethernet I'm fine with spending $100/month on this, but deals are obviously nice. Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: LoyceV on May 31, 2025, 10:00:58 AM Looks like I'm going to be scouring LowEndTalk for a while. Note: I've seen and paid good and bad providers, and I've been burned more than once. So be careful who you trust.I'm quite happy with Racknerd too: Code: up 741 days, 19:49 Quote - 512GB SSD At that price, you could get a Premium KVM or VDS with 4 CPU, 16 GB RAM and 400 NVMe at Ramnode Cloud. They're good, but expensive. I only use them when I need it shortly: $0.15 per hour you use it.- At least 16GB of memory - more is obviously good, I want indexing to be instantaneous this time, instead of taking months. - 1 Gbps Ethernet I'm fine with spending $100/month on this, but deals are obviously nice. Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: psycodad on May 31, 2025, 11:35:29 AM Looks like I'm going to be scouring LowEndTalk for a while. Note: I've seen and paid good and bad providers, and I've been burned more than once. So be careful who you trust.I'm quite happy with Racknerd too: Code: up 741 days, 19:49 I can second that statement about Racknerd, running a kvm vps there since ~3yrs and no single problem so far. But I concede that I am a few Code: up 593 days, 16:12 Though unfortunately Racknerd accepts some crypto but not Monero: Quote from: What payment methods do you accept? We accept the following payment methods: ALL major credit cards (AMEX, Discover, VISA, Master). PayPal Cryptocurrency (Bitcoin, Bitcoin Cash, Litecoin, Ethereum, USDT, USDC) Alipay/支付宝 Wire More payment methods are supported upon checking out. Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: NotATether on May 31, 2025, 12:03:57 PM I've settled on this beauty from Dartnode:
Code: Model: Dual Xeon E5-2650 v4 It only costs me $100 a month, so it's a massive improvement from Google Cloud. The application itself will still be hosted there by the way, as it costs almost nothing to run. It's just the Elasticsearch server(s) being moved. It isn't actually usable yet, it is still in the setup phase. Edit: For some reason, the DDoS protection is a $35 addon. Whatever. That's already been added. I only have about 3 or so days to set up the new server with elasticsearch before I have to move money around again to the cards, so I have to do it fast as I'd like to avoid that. Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: NotATether on June 02, 2025, 09:41:56 AM Ingestion has now started on the new Elasticsearch server, and compared to my old cluster it's going lightning fast. If all goes well, it should be finished in about a day or two, and then I will redirect the search queries towards it and then shut down the old cluster.
Edit: Wow, already over 200k posts indexed in just an hour! https://www.talkimg.com/images/2025/06/02/UXrRfd.png According to my calculations, about 5 million posts can be uploaded in a single day. Therefore it's going to take up to 2 weeks for everything to get in there, but guess what? No resource exhaustion this time, so no crashes. I still plan to shut off the old cluster ASAP. Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: Vod on June 02, 2025, 07:20:15 PM Ingestion has now started on the new Elasticsearch server I looked into that for my new project - it allows you to search for a minimum of TWO characters instead of three. It's expensive though... Hopefully you'll let me add your engine to my extension so the user can choose. Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: NotATether on June 05, 2025, 06:38:50 AM NOTICE
Planned maintenance has commenced on Talksearch. (It did not start exactly as planned, because of ongoing bullshit from my internet provider.) During this time, search queries will be redirected to the new cluster. This post will be updated periodically with the status as it progresses. Update 06:57 utc - migration has finished and the service is being brought back online. Update 10:09 utc - Talksearch service brought back offline. Search traffic was moved to a new cluster. Posts may be missing while the index is filled over the next few days. Update 10:13 utc - the old Elasticsearch cluster on Google Cloud has been deleted. Maintenance has been completed. I looked into that for my new project - it allows you to search for a minimum of TWO characters instead of three. It's expensive though... It has a free, open source version, but it needs to run on very powerful hardware to be useful. Hopefully you'll let me add your engine to my extension so the user can choose. We can talk about that later. The immediate priority right now for me is to create significantly more powerful search parameters on the website. Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: NotATether on June 09, 2025, 10:52:31 AM Bump (Merit overload managed to push this thread all the way down to page 2 :o)
The algorithm feels awful though - any suggestions for suggestions on how I should improve it? Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: joker_josue on June 09, 2025, 05:19:00 PM The algorithm feels awful though - any suggestions for suggestions on how I should improve it? What do you mean horrible? What do you think he's doing wrong for the proposed goals? Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: Ivystar5 on June 09, 2025, 06:45:56 PM The algorithm feels awful though - any suggestions for suggestions on how I should improve it? I was thinking of we can get to an advanced stage where I can input a prompt like "what does Satoshi say about Bitcointalk adminstration?" and it will give results of threads where Satoshi talked about the administration of the forum, which in there on will able to figure out the exact thread or discussion that he or she is searching for. More like an AI type of research response with links to several related threads. I did try to ask a question like this but, it only delivers threads with titles that has each of the word in accordance. Why I wanted this, is because sometimes having an argument that requires you to provide links or thread where a user said something somehow becomes difficult as one will have to search several times or even have to remember some statements that are in the thread. Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: NotATether on June 10, 2025, 05:28:41 AM The algorithm feels awful though - any suggestions for suggestions on how I should improve it? What do you mean horrible? What do you think he's doing wrong for the proposed goals? It prioritizes occurrences too much. So when you search "casino" for example, the top results are the ones that have written casino two or three times in the title. It makes it feel spammy, but I'm waiting until all the content is uploaded before I do anything about it. Fortunately, this time, it will only take a few more days. Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: joker_josue on June 10, 2025, 06:58:05 AM It prioritizes occurrences too much. So when you search "casino" for example, the top results are the ones that have written casino two or three times in the title. It makes it feel spammy, but I'm waiting until all the content is uploaded before I do anything about it. Fortunately, this time, it will only take a few more days. Well, that's the biggest challenge for search engines. It took Google years to create an algorithm that could minimize this situation. To help minimize this, you have to create more filter criteria. For example, in addition to looking at just the title, it has to look at the topic content. An example: the topic title has the word "casino" 3 times and how many times does the OP have? Throughout the topic, does the word "casino" appear more often or not at all? Is the term "casino" in a conversational context or in the context of a name of something? Applying the rules and ensuring a good balance is not easy. This will undoubtedly be the biggest challenge of the project. Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: NotATether on June 10, 2025, 03:42:17 PM Well, that's the biggest challenge for search engines. It took Google years to create an algorithm that could minimize this situation. To help minimize this, you have to create more filter criteria. For example, in addition to looking at just the title, it has to look at the topic content. An example: the topic title has the word "casino" 3 times and how many times does the OP have? Throughout the topic, does the word "casino" appear more often or not at all? Is the term "casino" in a conversational context or in the context of a name of something? Applying the rules and ensuring a good balance is not easy. This will undoubtedly be the biggest challenge of the project. It's not just spam, there are for some reason a ton of topics in search results that have been deleted on the forum. So they all have to be purged. Finding out which ones are deleted is going to be a challenge as it will require another forum scrape. Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: joker_josue on June 10, 2025, 04:09:10 PM It's not just spam, there are for some reason a ton of topics in search results that have been deleted on the forum. So they all have to be purged. Finding out which ones are deleted is going to be a challenge as it will require another forum scrape. But what kind of sweep did you do to collect topics that have already been deleted? Did you use an old database? Maybe you can just run a script to validate if a certain topic exists, if it doesn't exist it deletes it from the DB. Or you may want to use this as a historical archive. Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: NotATether on June 11, 2025, 08:28:44 AM But what kind of sweep did you do to collect topics that have already been deleted? Did you use an old database? Maybe you can just run a script to validate if a certain topic exists, if it doesn't exist it deletes it from the DB. Or you may want to use this as a historical archive. Most of the old posts came were from Ninjastic.space. While I figure out how to weed out the old posts, I've ran some tests on Google Collaboratory with three different spam-detection LLM models (well, they are not specifically for spam detection except for the first one, but it can be used to classify text) on various categories. https://pdflink.to/bert-tiny-finetuned-sms-spam-detection/ https://pdflink.to/distilbert-base-uncased-finetuned-sst-2-english/ https://pdflink.to/deberta-large-mnli/ I think they get the overall sentiment, especially the last one, but it would be unwise to rely only on a LLM as a universal quality score. Additional measures must be taken in place to identify e.g. application posts, obviously AI-generated posts, and such in order to not return them in search results. I'm also going to place a minimum post length, to avoid indexing things like bumps. Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: joker_josue on June 11, 2025, 05:15:01 PM I'm also going to place a minimum post length, to avoid indexing things like bumps. Have you ever thought about a post/topic author rating system? A higher ranked user - more posts, merit, ranking - has passes the filters. The rest have to go through tighter filters. This may help reduce the number of posts analyzed, and help filter better. Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: NotATether on June 12, 2025, 08:21:27 AM New content update v1.0.3 and backend update v1.0.2 published
These updates add advanced search capability to Talksearch. Search features:
App features:
48 million posts have been index now. We are approaching indexing completion. Have you ever thought about a post/topic author rating system? A higher ranked user - more posts, merit, ranking - has passes the filters. The rest have to go through tighter filters. This may help reduce the number of posts analyzed, and help filter better. I don't like such a system because it will bias search results for users with a lot of merit. Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: cygan on June 12, 2025, 08:56:01 AM New content update v1.0.2 published ✂️ very nice to see another update of your search engine :) to update the translated threads from (taufik123, satscraper, Abdulzuruku01, katanic97, Adiljutt156, mela65, r_victory, GazetaBitcoin, Danica22 and Porfirii) i would ask you to update your op and the changelog - but you were probably planning to do that anyway ;) Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: taufik123 on June 12, 2025, 01:18:45 PM New content update v1.0.2 published Shouldn't this be a v1.0.3 update, because there was already a v1.0.2 updateNew app update v1.0.2 published Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: nutildah on June 13, 2025, 01:11:39 AM I think they get the overall sentiment, especially the last one, but it would be unwise to rely only on a LLM as a universal quality score. Agreed -- what is interesting or relevant to a LLM might not be so for people actually utilizing your search engine. Additional measures must be taken in place to identify e.g. application posts, obviously AI-generated posts, and such in order to not return them in search results. I like the initiative you're talking here. Whats interesting is that, last I checked, Google doesn't filter out AI-generated content, but it may do so in the future if it turns out that nobody wants to read such content, thereby making their search results not as accurate or relevant to the query as could potentially be. Seems like it would be super easy to game SEO ranking with AI content, so I don't know why they wouldn't attempt to block it. Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: NotATether on June 13, 2025, 04:13:01 AM New content update v1.0.2 published Shouldn't this be a v1.0.3 update, because there was already a v1.0.2 updateNew app update v1.0.2 published You're right - But only the frontend would be v1.0.3, because there was no v1.0.2 update for the backend. I like the initiative you're talking here. Whats interesting is that, last I checked, Google doesn't filter out AI-generated content, but it may do so in the future if it turns out that nobody wants to read such content, thereby making their search results not as accurate or relevant to the query as could potentially be. Seems like it would be super easy to game SEO ranking with AI content, so I don't know why they wouldn't attempt to block it. I am fortunate that my preliminary tests can detect AI to a similar degree of accuracy to other types of spam. New content update v1.0.4 published This is a minor update that adds missing time controls for the Date From and Date To filters. All posts up to March 2025 have now been indexed. I am actively working on enabling real-time indexing. Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: NotATether on June 14, 2025, 05:36:18 AM All posts from March - June 2025 are now being uploaded to the index, while I continue to contrive an automated solution to this problem.
Edit: this batch was uploaded with wrong dates which will cause search errors, and has been deleted and is being reuploaded again. Edit 2: All done. I want to implement a spam score as soon as possible, but I'm still not exactly sure how I will do that without re-indexing all the posts. At any rate, I will figure something out. Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: Rashlyowl on June 16, 2025, 04:11:48 AM Hey bros @NotATether, is it possible to implement pagination/paging directly on the site?
https://talkimg.com/images/2025/06/16/UdpQAJ.jpeg https://talkimg.com/images/2025/06/16/UdpM5b.png When I've gone too far, I want to go back to the page I want, but opening previous pages is a barrier for me. The solution is actually easy, just by changing: Current page Code: https://talksearch.io/search?q=Bitcointalk&page=9 To Page I want to see Code: https://talksearch.io/search?q=Bitcointalk&page=4 But it makes me a bit annoyed, after all, pagination can improve user experience to a better level. Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: NotATether on June 16, 2025, 06:17:28 AM Hey bros @NotATether, is it possible to implement pagination/paging directly on the site? https://talkimg.com/images/2025/06/16/UdpQAJ.jpeg https://talkimg.com/images/2025/06/16/UdpM5b.png When I've gone too far, I want to go back to the page I want, but opening previous pages is a barrier for me. The solution is actually easy, just by changing: Current page Code: https://talksearch.io/search?q=Bitcointalk&page=9 To Page I want to see Code: https://talksearch.io/search?q=Bitcointalk&page=4 But it makes me a bit annoyed, after all, pagination can improve user experience to a better level. As you said, this sort of change is very easy to do, and I will make sure to find some time with it. Due to a lack of software available on Github for this purpose, I'm currently busy building a project for calculating "embeddings" in text classification LLMs. It is a software that is blatantly missing from open-source repositories, and essential for anybody who is building a search application without a paid-for Elasticseearch subscription, which are expensive, even though they package AI search directly. The hope is that some others in the AI community will find it useful. Edit: This is what it's going to look like: https://bert-embedding-playground.lovable.app/ - it's designed to be self-hosted. This is just the frontend though, I haven't written much of the backend yet. And even the frontend was made by AI, because I suck at designing HTML by hand. :-\ (Modified again to avoid double-posting) New app update v1.0.5 published This is a minor update that adds detailed pagination to search results. |