|
Title: Talksearch.io - Advanced Bitcointalk Search Engine Post by: NotATether on March 29, 2025, 11:28:08 AM https://www.talkimg.com/images/2025/03/29/lIkk1.png (https://talksearch.io/?utm-link=bitcointalk)
Talksearch.io (https://talksearch.io/?utm-link=bitcointalk) It has always been a dream of mine to create a high-quality search engine for Bitcointalk. After many months of development, I am proud to announce that Talksearch is now generally available. This is an important milestone towards providing users with high-quality search. Talksearch is a simple search engine that allows you to quickly find and go to any posts on Bitcointalk. Features ____________________ - Real-time indexing of ALL posts (currently with several hours delay in indexing)
- Tor-friendly, lacking rate limits and captchas. - Detailed post metadata such as user, date, and topic title. - Eliminates spammy results by enforcing basic length restrictions. - Report content feature for removing sensitive posts and preventing them from being indexed again. And will eventually have more accurate results than either. Infrastructure ____________________ Talksearch's website is hosted on Google App Engine, but the search engine itself is hosted on a DDoS-protected server at Dartnode. Its IP address is shielded from public exposure by proxying all requests through Cloud Run Functions. It has a capacity to host up to 380GB of posts, including AI embeddings. Talksearch has access to 32GB of memory and a powerful dual-socket Xeon processor with 48 threads for powering natural language search. Backups are available off-site in case of disaster. Obviously, none of this is cheap, and it costs around $170/month in total to maintain. Roadmap ____________________ - Refine search result quality. In progress - Create an end-user API (working on this soon) Posts do not update automatically yet. I am working on that. I will continuously optimize the quality of results on the index to achieve the best possible results. For now though, you may need to use multiple words to filter relevant posts. Donate ____________________ If you wish to support the maintenance and future development of Talksearch, you can send funds to the following addresses: Bitcoin: bc1q6dphprljdas0xl2cmqn6tlskselx5xtcpcw8kx Ethereum and ERC-20 tokens: 0xd3CaaE5098b8Bef64A6FD415b0b1B61aE880FFF5 Tron and USDT-TRC20: TGrsWW6knTwcJxkUKvp7SoV5Fjub9KMAeb Donations will only be used to pay for hosting costs. Translations ____________________
Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: NotATether on March 29, 2025, 11:28:24 AM Changelog ____________________
Code: 2025-09-04: App v1.0.6 & Search 1.1.1 - https://bitcointalk.org/index.php?topic=5536692.msg65767675#msg65767675 Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: NotATether on March 29, 2025, 11:29:45 AM Operational Expenses ____________________
This section is reserved for screenshots of the Hosting Provider invoices, with sensitive info redacted. The first one for May will be published in a few days. Additionally, technical specs of the search cluster will be posted here, along with the resource usage and high-level statistics related to the indexed post data. This information is being published for accounting and transparency purposes as this is being operated as a public service to the forum. May 2025 invoice (https://www.talkimg.com/images/2025/06/12/UdJQ0o.png) - $179.76 Donations March-May 2025 - 2x - $110 Net Cost May 2025: $69.76 Total Running Cost: $69.76 June & July invoices will be published soon. Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: shahzadafzal on March 29, 2025, 11:52:23 AM None of this is cheap, and it costs around $150/month in total to maintain. $150/month? That’s expensive! However, it looks good at first glance—definitely faster, and I think the filters will be helpful. Some basic functionalities would make it even more useful. For example, there’s no option to edit the search text, which is essential for refining searches. I’ll explore it further. Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: NotATether on March 29, 2025, 11:56:36 AM Some basic functionalities would make it even more useful. For example, there’s no option to edit the search text, which is essential for refining searches. You mean like a search bar above the results page? I can get that added quickly. Edit: done Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: arhipova on March 29, 2025, 01:48:51 PM Is Google App Engine a paid host to join ?
Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: NotATether on March 29, 2025, 01:53:55 PM Is Google App Engine a paid host to join ? It's part of Google Cloud. They offer a free trial of 3 months and $300 in credit, but you must add a debit card to use it at all. (no VCCs allowed). App Engine is pretty cool - you make it scale down to zero instances if nobody is using it, which hopefully does not happen here, so that you pay for nothing. It's not really a VM though. You can only make use of it if you know how to code. Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: TryNinja on March 29, 2025, 02:46:14 PM Talksearch is hosted on Google App Engine, utilizing several Compute Engine servers to run Elasticsearch. It has a capacity to host up to 425GB of posts. Backups are available off-site in case of disaster. None of this is cheap, and it costs around $150/month in total to maintain. What... are you serious? Why on earth are you paying this much? :PFor the ninjastic.space database I pay around $26 per month. That gives me more than enough to self host my elasticsearch node and a shit ton of other projects, and searching is very fast (see the new beta.ninjastic.space/search). Believe me, you do *not* need to pay for "several compute engine servers" for your project. That's complete insanity. Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: NotATether on March 29, 2025, 03:29:10 PM What... are you serious? Why on earth are you paying this much? :P For the ninjastic.space database I pay around $26 per month. That gives me more than enough to self host my elasticsearch node and a shit ton of other projects, and searching is very fast (see the new beta.ninjastic.space/search). Believe me, you do *not* need to pay for "several compute engine servers" for your project. That's complete insanity. Wasn't my choice, I had an option to buy the cluster for $36/month without the extra storage server, but it would only fit half of the posts at 45GB. This was the next smallest config. Very crazy prices going on at cloud providers. Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: TryNinja on March 29, 2025, 03:33:45 PM Wasn't my choice, I had an option to buy the cluster for $36/month without the extra storage server, but it would only fit half of the posts at 45GB. This was the next smallest config. Why not move to a VPS or at least your own dedicated server?Very crazy prices going on at cloud providers. I get 1.5 TB of NVME for the price I pay. :P If you need space (you do, because you're indexing tons of posts), you're on the wrong service. That's the price huge companies pay because they can, and they usually need the extreme scaling possibility due to the nature of their business. Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: KingsDen on March 29, 2025, 03:52:12 PM https://www.talkimg.com/images/2025/03/29/lIkk1.png (https://talksearch.io) The best search tool I have seen since I joined BTT. I did searches of a few topics I created and the result returned accurately. If there is a means to pin this topic at the menu bar of the forum, I would so much appreciate. Kudos for this exceptional work.Quote - Eliminates spammy results by enforcing basic length restrictions. Like short posts are not indence, if yes, to what length?Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: DYING_S0UL on March 29, 2025, 03:55:06 PM Just a little thought inside my mind! Wouldn't it be better if only one result were shown per topic, with an expandable slider to see the rest of the replies inside that topic? What I mean is, when a search query is made, and it is showing every single reply from a specific topic (not ever grouped or sorted or numbered). Just look at the image. All of the results I was shown belonging to the same topic.
Maybe you could make some changes and make it look more sorted or grouped? Or the topic ID is shown below the username & date? Btw, the UI is very clean, maybe you can consider adding some new theme. For example pitch black, similar to the new ninjastic. Sorry, I am having a little trouble choosing the right words to express my thoughts/suggestion/idea. Do let me know if I was confusing. https://www.talkimg.com/images/2025/03/29/lsr53.png Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: KingsDen on March 29, 2025, 04:05:32 PM Just a little thought inside my mind! Wouldn't it be better if only one result were shown per topic, with an expandable slider to see the rest of the replies inside that topic? What I mean is, when a search query is made, and it is showing every single reply from a specific topic (not ever grouped or sorted or numbered). Nice idea but it will lead to complex sorting. I feel that what we have is already good since you might not need to scroll down to see your desired.Just look at the image. All of the results I was shown belonging to the same topic. That topic has the major concentration of the search key words.Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: NotATether on March 29, 2025, 04:46:46 PM Why not move to a VPS or at least your own dedicated server? I get 1.5 TB of NVME for the price I pay. :P If you need space (you do, because you're indexing tons of posts), you're on the wrong service. That's the price huge companies pay because they can, and they usually need the extreme scaling possibility due to the nature of their business. I actually had a dedicated server rented out for this specifically at one point, with very capable specs (and 33% of the cost). But sysadmin is my pet peeve - I usually cause a lot of downtime during things like updates which would be unacceptable for a production service like this. I ended up mining some XMR on it while debating whether or not I should use the cloud, until about a week ago when I canceled the box. As for my main box, well I already have a warning from the hosting provider from last year not to "send abusive port 80 traffic" again (apparently one of my HTTP services was hacked and was being used for DDOS) so I don't dare host any websites on it anymore. edit: typo Just a little thought inside my mind! Wouldn't it be better if only one result were shown per topic, with an expandable slider to see the rest of the replies inside that topic? What I mean is, when a search query is made, and it is showing every single reply from a specific topic (not ever grouped or sorted or numbered). Just look at the image. All of the results I was shown belonging to the same topic. This is a defect in the algorithm I'm using. It weights topics much more than post content which is why you see so many posts from the same thread together even though they may not be relevant themselves. Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: Pablo-wood on March 29, 2025, 05:14:10 PM Why not move to a VPS or at least your own dedicated server? A person dedicated server should be a better choice but won't it be as expensive as compared to rented dedicated servers?. Although it might just require a one time fee but the cost might be discouraging. A Virtual private server should just be the best because it's cheaper and more scalable but the issue is with it's dedicated allocation, it might be another pull back.I get 1.5 TB of NVME for the price I pay. :P If you need space (you do, because you're indexing tons of posts), you're on the wrong service. That's the price huge companies pay because they can, and they usually need the extreme scaling possibility due to the nature of their business. Been on the wrong server I agree, even though DS are quite expensive, $150 is much. A range from $36 - $50 should be fair enough Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: DYING_S0UL on March 29, 2025, 05:17:48 PM Just a little thought inside my mind! Wouldn't it be better if only one result were shown per topic, with an expandable slider to see the rest of the replies inside that topic? What I mean is, when a search query is made, and it is showing every single reply from a specific topic (not ever grouped or sorted or numbered). Just look at the image. All of the results I was shown belonging to the same topic. This is a defect in the algorithm I'm using. It weights topics much more than post content which is why you see so many posts from the same thread together even though they may not be relevant themselves. So do you plan on changing the algorithm, maybe in the future or until you figure out an alternative solution?! For now I think it is good enough, not gonna lie. But I guess, at least maybe you can add the main topic ID below the usernames of every results (of course in a serial manner for the same topic). That would greatly help in identifying that this content belongs to the same topic! I assume you understood what I was trying to imply here! :) Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: Mrbluntzy on March 29, 2025, 06:17:44 PM Quote Talksearch is a simple search engine that allows you to quickly find and go to any posts on Bitcointalk. What if I don't remember the whole tittle of the topic that I want to search for and maybe I only remembered a few key words on the topic, when searching on this website with just those few key words, will it show results for other topics that has same key words? On the forum search, even if I don't remember the topic am looking for correctly, I could just type the few key words that I remembered and it will bring up several topics on that and I will have to scroll and keep nexting until I probably see the right topic am looking for. Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: Upgrade00 on March 29, 2025, 07:28:15 PM The results at the moment look to be only focused on topics, showing those with the key words selected. Is there a plan to index text within the post content too?
On the page number slider, one has to move one at a time forward or backwards, can there be an option to jump to the specific page you want to check out? If it gets too long, it can be trimmed out. Website looks good after some tries, lightning quick, I don't think I've used any that returns to the homepage as fast. Results load very quickly too. Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: NotATether on March 29, 2025, 07:58:41 PM The results at the moment look to be only focused on topics, showing those with the key words selected. Is there a plan to index text within the post content too? Post content is indexed but the results are heavily weighted towards titles at the moment. On the page number slider, one has to move one at a time forward or backwards, can there be an option to jump to the specific page you want to check out? If it gets too long, it can be trimmed out. Talksearch will only return 10 pages max, so it would be better if I made them all clickable. This will be done later, as I am on holiday tomorrow. What if I don't remember the whole tittle of the topic that I want to search for and maybe I only remembered a few key words on the topic, when searching on this website with just those few key words, will it show results for other topics that has same key words? On the forum search, even if I don't remember the topic am looking for correctly, I could just type the few key words that I remembered and it will bring up several topics on that and I will have to scroll and keep nexting until I probably see the right topic am looking for. A search engine's #1 job is to help you find stuff, so naturally you can type a few key words that occur in the title and get it returned to you, if you type like 3-4 of them. The term doesn't have to be an exact match as it searches by word. Also it is not case-sensitive. One advantage Talksearch has over the Bitcointalk search is that results are not returned in random order. Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: Vod on March 29, 2025, 11:30:09 PM Congrats OP - although the results are a bit vanilla for me (I like a lot of condensed information), I'm sure you'll modify it over time based on suggestions. Are you planning on having Theymos fund it or putting paid links in the results? (The forum has over 1,000 btc donated for this purpose, so regular users should not be supporting this.)
If you are interested in having partners send you traffic, shoot me a line. From one developer to another - the rush of seeing something you created being used is better than any man made drug. :) Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: NotATether on March 30, 2025, 01:13:10 PM Searches seem to be failing. I'm diagnosing the issue.
Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: Agbe on March 30, 2025, 01:59:17 PM Bitcoin: bc1q6dphprljdas0xl2cmqn6tlskselx5xtcpcw8kx Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: DPHOR on March 30, 2025, 02:57:49 PM Great job boss.. I tried to explore it and it was fine and cool. Do you intend to add dark theme? Sometimes I usually get affected by white screen so most time I do tuned on colors inversion on my phone to control how my screen adept to the environment where I am. Sometimes if I am in a light place my screen light increases to fit the weather or sunlight of where I am. Please I would also want to see that added maybe you might feels is not important that there are people who loves accessing most of the site with dark theme.
Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: NotATether on March 30, 2025, 03:03:03 PM The problem has been identified and searches are now working normally again..
Post-mortem analysis Logs in the Google Cloud backend indicated that the service unavailability was caused by the Elasticsearch master node becoming completely unavailable as a result of the hard disk capacity of the hot-content tier server overflowing. This was most likely due to a bug in the script I use to bulk import topics into Elasticsearch. After contacting support, they were able to temporarily upscale the hot-content server to a larger capacity within 20 minutes. I am now going through the data and cleaning up any duplicates and other data that might have overflowed the server. This highlights the importance of using managed resources for production services. If I was running the server myself, it would have probably taken days to recover from this. edit: Apparently, the warm tier server was not utilized at all, not even for a single document. That's why the content server was overran so quickly. The ingestion process ran into trouble after around 46 million documents. Considering the number of posts on the forum exceeds 65 million, I need to make it utilized more aggressively. For now, indexing has been halted while I design a better process for storing the the posts. As a side effect, an option will probably be developed to not search in spam topics such as signature campaigns, bounties, etc when preforming a query. An efficient way to filter local board topics is also desirable but will probably not be done anytime soon. Great job boss.. I tried to explore it and it was fine and cool. Do you intend to add dark theme? Sometimes I usually get affected by white screen so most time I do tuned on colors inversion on my phone to control how my screen adept to the environment where I am. Sometimes if I am in a light place my screen light increases to fit the weather or sunlight of where I am. Please I would also want to see that added maybe you might feels is not important that there are people who loves accessing most of the site with dark theme. Dark theme is not a priority right now. The backend needs to be stabilized first. Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: Royal Cap on March 30, 2025, 04:38:00 PM First of all, thank you for making such a beautiful search system. I searched quite a lot with it and got very good results. However, I think it would be good to add another thing, which is Short by Date. If you had added this, it would have been easier for us to search for any post that has been made recently. Actually, this is my personal feedback.
Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: NotATether on March 30, 2025, 05:08:10 PM First of all, thank you for making such a beautiful search system. I searched quite a lot with it and got very good results. However, I think it would be good to add another thing, which is Short by Date. If you had added this, it would have been easier for us to search for any post that has been made recently. Actually, this is my personal feedback. I will work on this soon. For now, recent posts from March are not available, but as I configure automatic scraping then they will gradually become available. Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: Macabury on March 30, 2025, 07:28:13 PM This search engine is fast. This must have cost so much time and effort, thanks for the good job done. I searched a few topics and I saw threads related to my input clustered waiting for me to click on the exact topic I wanted. This is so innovative. Is this meant to be an extension of ninjastic space?
Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: Vod on March 30, 2025, 10:15:38 PM For now, indexing has been halted while I design a better process for storing the the posts. A good solution for you may be Wasabi (https://wasabi.com/). They do not charge for egress data, but you must keep the data on the server for a minimum of three months. Also, Google cloud should have some sort of monitoring for hard drive usage. I have an alert sent when any disk usage passes 90% for five minutes. Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: NotATether on March 31, 2025, 05:51:17 AM Also, Google cloud should have some sort of monitoring for hard drive usage. I have an alert sent when any disk usage passes 90% for five minutes. They have monitors somewhere, I just haven't figured out how to use them yet. After development, I am learning Operations the hard way :-\ This search engine is fast. This must have cost so much time and effort, thanks for the good job done. I searched a few topics and I saw threads related to my input clustered waiting for me to click on the exact topic I wanted. This is so innovative. Is this meant to be an extension of ninjastic space? No, this is not Ninjastic space. Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: joker_josue on March 31, 2025, 06:33:02 AM Congratulations on the project. A good search tool for Bitcointalk is really useful.
I will look into this in more detail later and then share any suggestions (if applicable). ;) Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: Synchronice on March 31, 2025, 01:36:38 PM First of all, thank you for this service, I appreciate everyone who tries to improve this forum.
Why didn't you host your website on Hetzner? They have one of the cheapest and fastest servers, this is the best thing someone can get for their bucks. $150 per month is a huge cost, I don't think you'll be able to collect that every month. By the way, can you create a demonstration of what's the difference between your search engine and Bitcointalk's (Google) search engine? At the moment I did some search and Bitcointalk have me more accurate results than Talksearch.io Do you plan to add more filters? Like sort by ascending/descending, search by an user X and so on? Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: NotATether on March 31, 2025, 02:12:21 PM Why didn't you host your website on Hetzner? They have one of the cheapest and fastest servers, this is the best thing someone can get for their bucks. $150 per month is a huge cost, I don't think you'll be able to collect that every month. Because I already own a Hetzner server, and I don't want to get kicked out by the reseller due to a hack or DDoS. This is what the Google Cloud setup looks like by the way: https://www.talkimg.com/images/2025/03/31/lyffN.png So not only are there two content servers, which aren't as big as what you can order with an HDD on a other websites, there's also a server for the internal dashboard, the ingestion server that's going to scrape Bitcointalk, and the server that actually delivers search results (enterprise search, on the right of the screen). All this allows me to easily experiment with different search algorithms and parameters. I also get free customer support which allows me to recover from any hardware failure in just hours (literally), like yesterday. Google App Engine itself is virtually free and that's where I host the website. In my experience, it is much better than attempting to host it directly because I don't have to worry about downtime or attacks like for my other sites. By the way, can you create a demonstration of what's the difference between your search engine and Bitcointalk's (Google) search engine? At the moment I did some search and Bitcointalk have me more accurate results than Talksearch.io I haven't gotten a chance to enhance the Talksearch result quality yet. Do you plan to add more filters? Like sort by ascending/descending, search by an user X and so on? Yes, of course. Just give me a couple days. There's other stuff I need to take care of first. Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: dkbit98 on March 31, 2025, 07:41:17 PM Nice service and it open everything really fast, but I don't think this is worth paying $150 per month, especially if you want to make this project long term sustainable.
One of my suggestions is to add optional alternative dark theme switch, and maybe add some default search terms, and predefined board locations. As for donations, you should add all addresses or links in the website footer. Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: libert19 on April 01, 2025, 10:20:51 AM I searched, 'libert19's first post' — I was expecting that results would show my first post, and it didn't. It may be stupid query, but a simple one, and should deffo work.
Another query I did was, "I was fucked, do not repeat same mistakes as me" which is title of this (https://bitcointalk.org/index.php?topic=5289504.msg55596163#msg55596163) thread — and results were irrelevant instead of showing thread which I was expecting. Title was literally copy-pasted here!? Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: BenCodie on April 01, 2025, 11:10:27 AM I searched, 'libert19's first post' — I was expecting that results would show my first post, and it didn't. It may be stupid query, but a simple one, and should deffo work. This is not an AI or conversational search engine. It works from keywords. The engine is not going to know that "libert19" is the user nor is it going to recognize "first post" as search for the first post, it's going to search libert19, first, and post in the topic title/content. I think you're a bit used to AI ::) I think this is better thought of like a pre-ai search engine. Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: JollyGood on April 01, 2025, 11:16:39 AM Without a doubt though congratulations are in order, I think the general consensus is clear that $150 per month is far too expensive for any endeavour of this nature.
As for the search results, I searched for "Satoshi" and saw results that had no chronological (date) order nor a way to filter the results allowing me to view them how I wanted (such as new, old, mentioned in subject, mentioned in post). Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: KingsDen on April 01, 2025, 09:03:31 PM I searched, 'libert19's first post' — I was expecting that results would show my first post, and it didn't. It may be stupid query, but a simple one, and should deffo work. This is not an AI or conversational search engine. It works from keywords. The engine is not going to know that "libert19" is the user nor is it going to recognize "first post" as search for the first post, it's going to search libert19, first, and post in the topic title/content. I think you're a bit used to AI ::) I think this is better thought of like a pre-ai search engine. Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: BenCodie on April 02, 2025, 11:16:11 AM I searched, 'libert19's first post' — I was expecting that results would show my first post, and it didn't. It may be stupid query, but a simple one, and should deffo work. This is not an AI or conversational search engine. It works from keywords. The engine is not going to know that "libert19" is the user nor is it going to recognize "first post" as search for the first post, it's going to search libert19, first, and post in the topic title/content. I think you're a bit used to AI ::) I think this is better thought of like a pre-ai search engine. Maybe it's a consequence of replacing search engines with conversational AI, though from what I've witnessed, libert19 commonly makes these kinds of posts (to the point where I can hardly tell if it is extremely sophisticated trolling or natural). Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: NotATether on April 03, 2025, 09:54:24 AM New content update v1.0.1 published
This update enhances search functionality and includes optimizations to the search algorithm to return more relevant results. Search features:
App features:
The old versions continue to be cached in the cloud, allowing me to roll back immediately if any bugs are detected. There should not be any, but it's a good failsafe to have. Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: libert19 on April 04, 2025, 06:35:04 AM Maybe it's a consequence of replacing search engines with conversational AI, though from what I've witnessed, libert19 commonly makes these kinds of posts (to the point where I can hardly tell if it is extremely sophisticated trolling or natural). That was genuine query and I don't troll, may be my posts just come off like that. Regarding query, even before AI if you search stuff on Google, for example, "Messi's first match" you would find relevant result, so intention was same here. Meanwhile, I repeated the below query after update above, and I am still not getting expected result. Another query I did was, "I was fucked, do not repeat same mistakes as me" which is title of this (https://bitcointalk.org/index.php?topic=5289504.msg55596163#msg55596163) thread — and results were irrelevant instead of showing thread which I was expecting. Title was literally copy-pasted here!? Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: NotATether on April 04, 2025, 06:50:59 AM Meanwhile, I repeated the below query after update above, and I am still not getting expected result. Another query I did was, "I was fucked, do not repeat same mistakes as me" which is title of this (https://bitcointalk.org/index.php?topic=5289504.msg55596163#msg55596163) thread — and results were irrelevant instead of showing thread which I was expecting. Title was literally copy-pasted here!? Your topic is not in the index yet. As I mentioned earlier, a third of the topics could not be uploaded because the process was interrupted when I ran out of disk space. I'm still working on adding them. Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: satscraper on April 04, 2025, 12:32:24 PM @NotATether, thanks for the powerful tool.
Talksearch.io is so important for users that I took the initiative to translate your topic and share it (https://bitcointalk.org/index.php?topic=5537277) in the Russian local board of the forum. I chose to use the first-person narration i.e. NotATether to ensure that no details were lost in the translation. Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: bchannel on April 05, 2025, 03:13:13 PM Thanks for the work, there is still a lot to do but this is a colossal amount of effort.
Sent a small donation of $100 , good luck Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: NotATether on April 06, 2025, 07:27:27 AM New app update v1.0.2 published
This is a minor update that publishes the donation addresses at the footer of the page. Suggested by Igebotz. As a reminder, I am still working to index all topics as well as new ones, so stay tuned for further developments on that front. Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: BenCodie on April 07, 2025, 03:39:42 AM Maybe it's a consequence of replacing search engines with conversational AI, though from what I've witnessed, libert19 commonly makes these kinds of posts (to the point where I can hardly tell if it is extremely sophisticated trolling or natural). That was genuine query and I don't troll, may be my posts just come off like that. Regarding query, even before AI if you search stuff on Google, for example, "Messi's first match" you would find relevant result, so intention was same here. In that case, I think it's a bit much to expect from a tool like this, which is probably just searching through indexed data and nothing more than that... Meanwhile, I repeated the below query after update above, and I am still not getting expected result. Another query I did was, "I was fucked, do not repeat same mistakes as me" which is title of this (https://bitcointalk.org/index.php?topic=5289504.msg55596163#msg55596163) thread — and results were irrelevant instead of showing thread which I was expecting. Title was literally copy-pasted here!? ...Though if it was just searching indexed data, this should work. I suppose the answer might be: - Indexed ALL posts (up to February 2025) *Operational bugs have prevented me from uploading one-third of the posts, which is actively being fixed Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: NotATether on April 07, 2025, 09:11:06 AM Maintenance alert
There will be planned downtime for a few hours (maximum) today in order to work on the Elasticsearch cluster. Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: JollyGood on April 07, 2025, 09:25:07 AM Do you intend to move the hosting elsewhere in an attempt to lower the monthly costs? Even with that generous donation of $100 it only covers 66% of the monthly cost (to run the service for a month). If the project is going to have a long term future, I think finding an alternative host with lower costs will be an important step.
Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: NotATether on April 07, 2025, 11:08:12 AM Do you intend to move the hosting elsewhere in an attempt to lower the monthly costs? Even with that generous donation of $100 it only covers 66% of the monthly cost (to run the service for a month). If the project is going to have a long term future, I think finding an alternative host with lower costs will be an important step. Dedicated servers are prepaid and often sold in units of one, so while useful for small projects, they're a very bad fit for a service that needs to scale and must always remain online (the irony, I know) I plan on adding Altcoinstalks and some other forums & mailing lists into the index eventually. Currently this maintenance is to fix up the cluster "indices" (entities inside the search software used for storing post data) that I screwed up when attempting to upload the rest of the post data a few days ago. That's why the maintenance page is required, as search is not possible without them. Maintenance is complete. Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: Welsh on April 07, 2025, 03:09:44 PM I've always been impressed by the dedication of the community to improve the user experience for us all. This one addresses probably one of the more mentioned problems in recent years (in fact for a very long time). I've tested it out a little, and it seems to be a lot better than the forum search. I'm sure some users will be put off by the fact it's on a third party website, but hopefully this site gets some use in the long term.
Thanks NotATether for yet another great contribution to the community! Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: JollyGood on April 08, 2025, 09:46:55 AM I reiterate the comments mad by Welsh, it is commendable thatand are facing $150 forum members manage to find ways to improve/assist in different ways according to their skills and depending on how much time they have on their hands. Having said that, if you have paid in advance for the servers and are facing $150 payments every month subsequently, what has been your total expenditure and when does the account need replenishing?
Dedicated servers are prepaid and often sold in units of one, so while useful for small projects, they're a very bad fit for a service that needs to scale and must always remain online (the irony, I know). Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: NotATether on April 08, 2025, 10:52:37 AM I reiterate the comments mad by Welsh, it is commendable thatand are facing $150 forum members manage to find ways to improve/assist in different ways according to their skills and depending on how much time they have on their hands. Having said that, if you have paid in advance for the servers and are facing $150 payments every month subsequently, what has been your total expenditure and when does the account need replenishing? This infrastructure is all postpaid and my bill arrives at the beginning of each month. Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: katanic97 on April 08, 2025, 07:13:36 PM Croatian translate by katanic97 (https://bitcointalk.org/index.php?action=profile;u=1856852;sa=summary/)
Topic Talksearch.io - Napredni pretraživač za Bitcointalk (https://bitcointalk.org/index.php?topic=5537640.msg65257270#msg65257270) Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: BitcoinGirl.Club on April 08, 2025, 10:41:58 PM Roadmap ____________________ - Continuously ingest posts from Bitcointalk (immediate priority) - Refine search result quality - Introduce filters for user, date, etc. You need to add features asap because right now how it's better (https://talksearch.io/search?q=Bitcoingirl) than going to search engine like Google and ask to find a string using specific search: 'site: bitcointalk.org "bitcoingirl"'. Congratulations on your project. It's nice to see community members are trying to create tools that will be useful for the community. Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: Porfirii on April 11, 2025, 04:34:11 PM Congratulations NotATether for your search engine, and thank you for sharing it with all users :)
Several members of the AoBT (https://bitcointalk.org/index.php?topic=5442314.0) have proposed to translate this topic and post it in our respective local boards to help spread the word, so we'd like to reserve the following languages, with your permission, among those that remain untranslated: Polish, Romanian, Turkish, French, Spanish, Ukrainian, Filipino, Pidgin, Bangla, Portuguese, Urdu and Hindi. Keep up the good work! Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: Upgrade00 on April 11, 2025, 09:45:55 PM Checked this out again and there has been a couple of good updates and it's still as fast as it was. I assume it's a struggle trying to maintain the speed while indexing more data, but it's working well so far.
I will be adding this to my top options for searching on the forum now. I think it will great it theymos can add some custom search engines like this and ninjastic when a user clicks the on the search icon. Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: joker_josue on April 12, 2025, 06:53:07 AM I was doing some tests, until I realized that there is no possibility to choose the order in which the results appear.
It would be interesting to be able to choose whether we want to see the most recent result first, or the oldest one. Or if we want to see the most frequent or most similar terms used in the search. I think that would be interesting, just a suggestion. Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: NotATether on April 12, 2025, 07:51:05 AM I was doing some tests, until I realized that there is no possibility to choose the order in which the results appear. It would be interesting to be able to choose whether we want to see the most recent result first, or the oldest one. Or if we want to see the most frequent or most similar terms used in the search. I think that would be interesting, just a suggestion. I'm working on that. Congratulations NotATether for your search engine, and thank you for sharing it with all users :) Several members of the AoBT (https://bitcointalk.org/index.php?topic=5442314.0) have proposed to translate this topic and post it in our respective local boards to help spread the word, so we'd like to reserve the following languages, with your permission, among those that remain untranslated: Polish, Romanian, Turkish, French, Spanish, Ukrainian, Filipino, Pidgin, Bangla, Portuguese, Urdu and Hindi. Keep up the good work! I will add the new translated topics now. Checked this out again and there has been a couple of good updates and it's still as fast as it was. I assume it's a struggle trying to maintain the speed while indexing more data, but it's working well so far. I will be adding this to my top options for searching on the forum now. I think it will great it theymos can add some custom search engines like this and ninjastic when a user clicks the on the search icon. Moving the data to the larger, "warm" node means that searches became slightly slower, but I plan to improve this by splitting up the index of Bitcointalk posts into smaller parts, specifically: Non-english posts, posts in Archival, or spam (e.g. Bounties). Then whatever's left over after that will not be so large, and might make search queries perform faster. Currently, I have the ability to scrape any topic or update it, but I still have to automate it. Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: Adiljutt156 on April 14, 2025, 11:57:54 AM Hello NotATether!
I am the member of AOBT Gang ( The Alliance Of Bitcointalk Translators). New translation is now available of this topic into Urdu language. Topic: Talksearch.io - Advanced Bitcointalk Search Engine (https://bitcointalk.org/index.php?topic=5536692.msg65221234#msg65221234) Translation: Talksearch.io - ایڈوانسڈ بٹ کوائن ٹاک سرچ انجن (https://bitcointalk.org/index.php?topic=232519.msg65274038#msg65274038) Thanks :) Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: examplens on April 15, 2025, 01:07:36 PM I was doing some tests, until I realized that there is no possibility to choose the order in which the results appear. Tested it for the first time and that's what I'm missing too. Sorting by time of creation and possible number of views or comments of a certain topic. It would help to identify more relevant results.It would be interesting to be able to choose whether we want to see the most recent result first, or the oldest one. Or if we want to see the most frequent or most similar terms used in the search. I think that would be interesting, just a suggestion. Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: cygan on April 15, 2025, 05:29:51 PM @NotATether maybe my pm has somehow disappeared in your message center or you haven't had the time yet or simply forgot about it... ;)
i wanted to remind you again that the polish translation (https://bitcointalk.org/index.php?topic=5537249.0) of your search engine is now ready - would be nice if you would include it in your overview :) Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: NotATether on April 18, 2025, 11:32:45 AM An update to the enhanced search feature:
A new dataset is being uploaded to Elasticsearch. This dataset is more enriched than the current unprocessed posts and includes even more metadata such as the lock type, scrape time and check time, which will be used along with other parameters to determine in what order should topics be checked for updates and the frequency they will be checked. An experimental quality score is also included with each post, in an attempt to deprioritize low-quality posts and sig spam from the search results. In an effort to remove irrelevant data such as quotes from the search results, posts are now divided into chunks, delimited by the presence of a quote or a line separator. This upload process was started yesterday, and about 5 million records have been indexed so far, out of a total estimated to be around 120 million. https://www.talkimg.com/images/2025/04/18/xiIAD.png The v2 indices contain the data which Talksearch will use for searching in the future. Also, local language posts are categorized to facilitate for local search. I continue to work on automatic scraping support. However, the v2 dataset is more recent than the original, and contains posts from up to March 2025. New translated ANN links will be added shortly. Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: BitcoinGirl.Club on April 18, 2025, 11:39:06 AM http://talksearch.io/search?q=bitcoingirl
Taking ages to load. Is the search engine working? Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: joker_josue on April 18, 2025, 02:27:57 PM This upload process was started yesterday, and about 5 million records have been indexed so far, out of a total estimated to be around 120 million. That's a lot of data. Is the system cataloging all the words and things like that? How are you processing all the information? Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: LoyceV on April 18, 2025, 02:57:45 PM An experimental quality score is also included with each post, in an attempt to deprioritize low-quality posts and sig spam from the search results. I'm curious to see the posts per user sorted by your experimental algorithm. Would that be possible to search for?Quote about 5 million records have been indexed so far, out of a total estimated to be around 120 million. Is 120 million the number of posts + edits?Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: NotATether on April 18, 2025, 03:37:35 PM http://talksearch.io/search?q=bitcoingirl Taking ages to load. Is the search engine working? The warm tier of nodes is slower than the hot/content tier when fetching data, but has about 4x more storage capacity. Currently, v1 (talksearch_bitcointalk in the picture) search is running on warm nodes. It used to be on the hot nodes, where searches were quite fast, but in the process of fixing my cluster, it got moved to warm. All v2 indices besides English are on the hot nodes, however I'm not particularly satisfied with the amount of low-quality posts present in this index, so I'm considering moving the high-quality Engilsh posts to the hot nodes. Then there would be a checkbox on the site that reads "Only search high-quality posts". The issue is, I currently don't have a reliable way to measure post quality. By the way, http:// does not currently work on Talksearch. Use https://. I am thinking about redirecting all traffic to the https:// version anyway. That's a lot of data. Is the system cataloging all the words and things like that? How are you processing all the information? Yes! In fact I am excited to show you the advanced classifications that are available for the data. Elasticsearch has a number of field types available for naturally processing JSON, more than just strings, numbers, and booleans. I am talking about things like rank features (common in search applications), vectors, points, geolocation stuff, dates, binary data, and many other stuff: https://www.talkimg.com/images/2025/04/18/x8ydd.png These are then used by Elasticsearch's query language. It is very advanced and can find stuff much more relevant than e.g. a regex-based search. This is where the true power lies. Here's a list of things that its query language is able to do: - It can boost or penalize certain keywords - It supports semantic search e.g. "satoshi nakamoto identity" is supposed to return topics about Hal Finney or Nick Szabo - Apply autocorrect and "keyboard shift" - It can match text that only appears in a certain position in the post - It can find "More like this" results - You can attach scripts to create complex searches (but this is slow) - Feature-based search is fully supported, so you can make queries for posts based on username, board, topic title, date etc in the exact same manner as for post body. (This last part is sorely missing from Google Site Search.) There's a lot more stuff I didn't list that you can find here (https://www.elastic.co/docs/reference/query-languages/querydsl). But the exact algorithm I use is proprietary. I'm curious to see the posts per user sorted by your experimental algorithm. Would that be possible to search for? That's a great idea, but with the current version of Talksearch this is not possible. It will eventually be available though. Is 120 million the number of posts + edits? No, 120 million is an estimate of the average number of chunks there would be if you split posts by quote body, or by line break [ hr ] tags. For example, this post I am writing would be indexed as 4 chunks, as they are separate pieces of information being written. This allows for users to see as specific results as possible from the search results page without having to navigate to the Bitcointalk post. I assume there are on average a bit less than one quote or line break per post, hence why there's a bit less than 2 chunks on average (quotes etc. plus one). I only store the latest revision of each post. Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: dkbit98 on April 18, 2025, 05:24:58 PM An update to the enhanced search feature It is working very slow for me, and I only wrote a simple two word search term.Maybe you should allow users to narrow down search into specific boards/topics only, that would certainly speed things up significantly. You can also add search only for specific members to speed things even more. I compared search using Ninjastic.space with new updated Talkserach with, and Ninjastic gave much better and faster results. Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: NotATether on April 20, 2025, 04:48:39 AM Searches are temporarily unavailable while I perform some routine maintenance on the nodes.
Thanks for understanding. Update: maintenance is complete. This should fix the search performance. Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: BitcoinGirl.Club on April 20, 2025, 05:18:33 AM Update: maintenance is complete. This should fix the search performance. It's a little faster than the last time I checked.Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: NotATether on April 20, 2025, 05:39:41 AM It's a little faster than the last time I checked. The nodes ran out of memory processing the uploaded posts, that's why performance sucked for a while. This was resolved by increasing the size of the hot/content node, and deleting the warm node. This has cut my storage capacity by about half (I have about 180GB total disk now) and marginally increases the total cost by about $10, but it will keep the node stable for some time. I will have to slow down the rate at which I'm uploading posts to the cluster, in order to avoid this sort of thing happening again. However, the new data is nowhere near close to 100% uploaded, so i will have to figure out a way to make that faster so that I can make Talksearch use the higher-quality v2 indices. Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: SamReomo on April 20, 2025, 01:43:25 PM A big congratulations from my side OP, you've really solved one of the biggest issues of Bitcointalk but like others I also think that you're paying way more for it than needed. $150 + $10 a month is a huge amount to pay for a service like that. However, I hope that you may receive some good donations or get some good sponsors so it won't be a burden on your shoulders alone.
Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: GazetaBitcoin on April 20, 2025, 08:08:00 PM Hey NotATether, please be aware that 2 more translations were made for your topic by AOBT. I hope this is good news :)
Urdu (https://bitcointalk.org/index.php?topic=232519.msg65274038#msg65274038) translation, made by Adiljutt156 Polish (https://bitcointalk.org/index.php?topic=5537249) translation, made by cygan Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: PX-Z on April 20, 2025, 10:52:49 PM Good project and initiative!
@OP, could you clarify whether the $150 covers only the hosting costs, or does it also include access to Elasticsearch, since I noticed it has paid options too? I tried accessing the site just yesterday, but it was down at the time. However, it's noticeably faster now, great improvement so far. Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: nakamura12 on April 21, 2025, 03:55:08 AM On the page number slider, one has to move one at a time forward or backwards, can there be an option to jump to the specific page you want to check out? If it gets too long, it can be trimmed out. Talksearch will only return 10 pages max, so it would be better if I made them all clickable. This will be done later, as I am on holiday tomorrow.Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: NotATether on April 21, 2025, 04:19:06 AM Good project and initiative! @OP, could you clarify whether the $150 covers only the hosting costs, or does it also include access to Elasticsearch, since I noticed it has paid options too? I tried accessing the site just yesterday, but it was down at the time. However, it's noticeably faster now, great improvement so far. It includes Elasticsearch (which is "self-hosted" on Google Cloud, and makes up the majority of the bill). Edit: It appears that quote splitting over 56 million posts is going to take many days (and not a few days like I imagined at first), and this is with the processor and the document uploader running in parallel, so while we wait, I am going to prioritize adding fine-grained search fields like user, title, date to the website's search parameters. The old v1 index that's currently being used for search can also have the remaining posts from that collection uploaded in there. That one at least shouldn't take too long, because only new or previously non-existent posts are going to be added there. Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: GazetaBitcoin on April 24, 2025, 09:34:20 AM Hello again NotATether!
In addition to my previous post (https://bitcointalk.org/index.php?topic=5536692.msg65298114#msg65298114), I'd like to let you know that three more translations have been made by AOBT for this thread: Turkish (https://bitcointalk.org/index.php?topic=5539433.0) translation, made by mela65 Portuguese (https://bitcointalk.org/index.php?topic=5539449.0) translation, made by r_victory Romanian (https://bitcointalk.org/index.php?topic=5539441.0) translation, made by myself In this moment, this topic has 10 translations, so from AOBT perspective it is considered as "Done". However, if new translations will be made, I will let you know. Cheers! Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: NotATether on April 24, 2025, 04:09:06 PM Hello again NotATether! In addition to my previous post (https://bitcointalk.org/index.php?topic=5536692.msg65298114#msg65298114), I'd like to let you know that three more translations have been made by AOBT for this thread: Turkish (https://bitcointalk.org/index.php?topic=5539433.0) translation, made by mela65 Portuguese (https://bitcointalk.org/index.php?topic=5539449.0) translation, made by r_victory Romanian (https://bitcointalk.org/index.php?topic=5539441.0) translation, made by myself In this moment, this topic has 10 translations, so from AOBT perspective it is considered as "Done". However, if new translations will be made, I will let you know. Cheers! Thank you. My apologies for not adding translations to the OP yet. I am busy fighting my upload scripts to get them sending posts to Elasticsearch at an acceptable rate. For some reason, my usually fast hard disk drive on my server has slowed down massively, even after a reboot, and I'm not exactly sure why. Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: LoyceV on April 24, 2025, 04:40:37 PM my usually fast hard disk drive on my server has slowed down massively, even after a reboot, and I'm not exactly sure why. Some (shitty) disks drop in performance after long sustained writes, but since you're using Google Cloud, that shouldn't happen.Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: Vod on April 24, 2025, 04:43:17 PM For some reason, my usually fast hard disk drive on my server has slowed down massively, even after a reboot, and I'm not exactly sure why. Are your indexes normally set for reading? You should disable them while doing massive updates. Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: Porfirii on April 25, 2025, 10:56:14 AM Hello again NotATether! In addition to my previous post (https://bitcointalk.org/index.php?topic=5536692.msg65298114#msg65298114), I'd like to let you know that three more translations have been made by AOBT for this thread: Turkish (https://bitcointalk.org/index.php?topic=5539433.0) translation, made by mela65 Portuguese (https://bitcointalk.org/index.php?topic=5539449.0) translation, made by r_victory Romanian (https://bitcointalk.org/index.php?topic=5539441.0) translation, made by myself In this moment, this topic has 10 translations, so from AOBT perspective it is considered as "Done". However, if new translations will be made, I will let you know. Cheers! Thank you. My apologies for not adding translations to the OP yet. I am busy fighting my upload scripts to get them sending posts to Elasticsearch at an acceptable rate. For some reason, my usually fast hard disk drive on my server has slowed down massively, even after a reboot, and I'm not exactly sure why. Hi! Sorry for the short delay. I've just finished the Spanish (https://bitcointalk.org/index.php?topic=5539433.0) translation for this thread. Although we usually mark topics as "Done" when they reach the 10 translations mark, we are going to keep this one as "in progress" for a few more days, just in case other members finish theirs soon too. Gazeta will let you know (ty!!!). As this tool is still being developed and changes are expected to be made in the OP soon, although we'll be monitoring this thread from time to time, please, let us know when we have to update our translations by posting directly in our thread (https://bitcointalk.org/index.php?topic=5442314) or, if you prefer, sending a PM to Gazeta or me. Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: NotATether on April 30, 2025, 04:15:42 AM I don't want to keep you guys waiting, so should I release a mode for Talksearch that queries data from the new index?
There are about 2.6 million chunks so far. A chunk is just a section of a post delimited by someone's quote or by a horizontal line. The upload is not slow - It's uploading 1000 chunks every couple of seconds, but there are simply millions of them. This mode will not replace the currennt search engine yet. Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: Hossain Risfa on May 01, 2025, 05:38:54 PM I've done my translation in our Bangla local board thank you very much for giving me permission to translate the post. I was quite nervous and I don't know that am I able to translate accurately. But after post my translation in our local board some senior brothers told me that I've done great translation and my translation skill is good and they also give me some advice. Thank you @NotaTether for give me permission and thankyou for giving me a chance to translate it give me experience and all over as e newbie you think thak I may be able to do and sir I try to do my best. My post translational link .
Talksearch.io - Advanced Bitcointalk Search Engine (https://bitcointalk.org/index.php?topic=631891.msg65330497#msg65330497) Translated in Bangla local board by Hossain Risfa Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: NotATether on May 02, 2025, 02:30:52 PM Well, it looks like I've hit another snag during uploading. Thankfully, this has nothing to do with Elasticsearch, but with my scraping server.
As you might be aware, I scrape the posts on my server before processing them. The processing involves splitting up posts by quotes, which create a series of chunks for each posts. Usually 1-3. This is saved to the disk, and then another part of the program reads them into memory, and after that these are uploaded to Elasticsearch. It seems that the splitting process has created so many chunks that I simply cannot create any more in that folder. Any attempts to do so lead to an error. It might have something to do with the fact that there are tens of millions of these files (inodes) in the filesystem, but I don't know if ext4 has such a limitation. And I'm definitely not out of disk space (though the Elasticsearch server could be a different story when this is all uploaded), as not even 50% of the disk space is used so far. (Strangely, I'm not out of inodes either.) One solution to this problem could be to avoid saving these chunks to the disk all together and run the processing and upload as one step. This is what I was doing for several days, but then I had to diagnose performance issues on the cluster so it got interrupted. Performance was bad after that though, because I was reading already-uploaded chunks form the disk. Another solution would be to simply avoid processing low-quality posts, e.g. gambling discussion. This will make for a smaller set, but it will take vastly less space. I estimate that around 15% of all Bitcointalk posts are made on Gambling Discussion. This is mostly sig spam that nobody wants to read, so there's no use returning that in search results. As a side effect of this, it will bring features resembling Google de-indexing to Talksearch, but I will never knowingly de-index posts I don't agree with. There will still be an index containing all existing forum posts, but that will be reserved for detailed search and API only. Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: LoyceV on May 02, 2025, 03:29:34 PM It might have something to do with the fact that there are tens of millions of these files (inodes) in the filesystem, but I don't know if ext4 has such a limitation. And I'm definitely not out of disk space (though the Elasticsearch server could be a different story when this is all uploaded), as not even 50% of the disk space is used so far. (Strangely, I'm not out of inodes either.) I have some experience dealing with tens of millions of files, and apart from making a directory view terribly slow, it works fine as long as you have enough inodes.On ext4, with default settings, it looks like a ten times larger disk does not get ten times more inodes. I checked a few disks, and typical limits are tens to hundreds of millions of inodes per disk. Just enter df -hi and it tells you want you need to know. Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: Z_MBFM on May 02, 2025, 06:52:33 PM Although I used to search Google to see if there was a related topic on this forum before I thought about something, I could have found information there too, but since Google is a search engine, there would have been many more search results besides the forum related.
However, I found using talksearch that it could make our forum related search much smoother. However, it is quite effective. nice job op Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: NotATether on May 03, 2025, 09:10:01 AM I have some experience dealing with tens of millions of files, and apart from making a directory view terribly slow, it works fine as long as you have enough inodes. On ext4, with default settings, it looks like a ten times larger disk does not get ten times more inodes. I checked a few disks, and typical limits are tens to hundreds of millions of inodes per disk. Just enter df -hi and it tells you want you need to know. About 18% of my inodes are used. ls ran for a horribly long time but I finally got output: Code: zenulabidin@zerstrorer ~ % ls -l /opt/talksearch/processed_chunks | wc -l So about 30 million files. Thank goodness for zsh, otherwise I wouldn't have known the run time of this. I'll see if this long directory listing time is the cause of "No space left on device" bailing-out in the filesystem code and/or the kernel. Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: LoyceV on May 03, 2025, 09:24:43 AM About 18% of my inodes are used. As far as I know, there are no limits to the number of files per directory on ext4, so this is weird. I'm pretty sure I've had more files in one directory before I added subdirectories for faster listings.~ So about 30 million files. ~ I'll see if this long directory listing time is the cause of "No space left on device" bailing-out in the filesystem code and/or the kernel. I'm going to test it :) I don't want this many files on my own system, so I use a temporary server: Code: 16GB PKVM Code: i=1; while test $i -le 40000000; do echo "Hello world!" > $i; i=$((i+1)); done I'll be damned: No space left on device! I got to 29,272,362 files with 22M inodes free. Filesystem: Code: /dev/vda1 on / type ext4 (rw,relatime,discard,errors=remount-ro,commit=30 It gets weirder: I can still create new files, just not all of them: Code: i=100000000; time while test $i -le 110000000; do echo "Hello world!" > $i; i=$((i+1)); done Code: ls 10000282* Root command dmesg shows this: Code: [ 2024.349441] EXT4-fs warning: 598 callbacks suppressed Solution Enabling ext4 large_dir (https://serverfault.com/questions/1052075/when-enabling-ext4-large-dir-how-can-you-tell-its-used) seems to fix it: Code: tune2fs -O large_dir /dev/nvme2n1 The EXT4 "largedir" feature overcomes the current limit of around ten million entires allowed within a directory on EXT4. Now, EXT4 directories can support around two billion directory entries. However, you are likely to hit performance bottlenecks before hitting this new EXT4 limitation. It looks like the safe limit is about 10 million files per directory, although it may work up to around 30 million files, but you shouldn't get anywhere near that number without enabling large_dir because things start failing. I completed my test at over 51 million files in a single directory. No more errors until I actually ran out of inodes. Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: NotATether on May 03, 2025, 12:35:16 PM ~ Solution Enabling ext4 large_dir (https://serverfault.com/questions/1052075/when-enabling-ext4-large-dir-how-can-you-tell-its-used) seems to fix it: Code: tune2fs -O large_dir /dev/nvme2n1 The EXT4 "largedir" feature overcomes the current limit of around ten million entires allowed within a directory on EXT4. Now, EXT4 directories can support around two billion directory entries. However, you are likely to hit performance bottlenecks before hitting this new EXT4 limitation. It looks like the safe limit is about 10 million files per directory, although it may work up to around 30 million files, but you shouldn't get anywhere near that number without enabling large_dir because things start failing. I completed my test at over 51 million files in a single directory. No more errors until I actually ran out of inodes. Amazing work! The forum should hire you as a consultant :) I can restart the chunks processing now, but it's going to be starting from the first topic because I lost track of which topics failed to write. Fortunately it is much faster than upload at the moment - I was actually processing topics from 2023 when I noticed this issue. Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: joker_josue on May 04, 2025, 06:34:05 AM I can restart the chunks processing now, but it's going to be starting from the first topic because I lost track of which topics failed to write. Fortunately it is much faster than upload at the moment - I was actually processing topics from 2023 when I noticed this issue. Don't run the system all at once! Make it run in cycles, for example 1 year at a time. This way, if there is a failure in any cycle, you know to what extent everything is fine and you won't have to start from scratch. You do this manually by running the script in each cycle. Or you can set up the script so that it runs in cycles and keeps a log of the events. Whenever a cycle ends, it informs you of the result, if everything is ok. This way you can follow the process. Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: GazetaBitcoin on May 04, 2025, 12:18:06 PM Hey NotATether, please be aware that 1 more translation was made for your topic by AOBT:
Ukrainian (https://bitcointalk.org/index.php?topic=236982.msg65341140#msg65341140) translation, made by DrBeer Cheers! Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: Mahiyammahi on May 09, 2025, 10:12:09 AM Hey NotATether, how about creating an AI Model only specific to the Bitcointalk forum? Since you have developed a search engine, it can scrape posts. Why not train an AI model using it? I don't know if it will be helpful for forum users. But if an AI model found that scraps all the answers from the Bitcointalk forum Topic, replies, I think it won't be bad. A user can get their answer within a few seconds rather than scraping all the data it had been on Bitcointalk. Other AI models like Chatgpt look everywhere for an answer. So, if an AI model specifically only looks at forum data, this would be great.
Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: $crypto$ on May 09, 2025, 12:00:18 PM Hey NotATether, how about creating an AI Model only specific to the Bitcointalk forum? Since you have developed a search engine, it can scrape posts. Why not train an AI model using it? I don't know if it will be helpful for forum users. But if an AI model found that scraps all the answers from the Bitcointalk forum Topic, replies, I think it won't be bad. A user can get their answer within a few seconds rather than scraping all the data it had been on Bitcointalk. Other AI models like Chatgpt look everywhere for an answer. So, if an AI model specifically only looks at forum data, this would be great. There is an AI search engine (Bitcointalk) you can do some browsing there.[AI Search Engine] Bitcointalk (https://bitcointalk.org/index.php?topic=5537932.0) Have tried asking questions on this AI search engine --- there are some answers that the AI gives are not accurate, and it takes a few seconds to give an answer. Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: NotATether on May 09, 2025, 01:32:16 PM Hey NotATether, how about creating an AI Model only specific to the Bitcointalk forum? Since you have developed a search engine, it can scrape posts. Why not train an AI model using it? I don't know if it will be helpful for forum users. But if an AI model found that scraps all the answers from the Bitcointalk forum Topic, replies, I think it won't be bad. A user can get their answer within a few seconds rather than scraping all the data it had been on Bitcointalk. Other AI models like Chatgpt look everywhere for an answer. So, if an AI model specifically only looks at forum data, this would be great. I don't have a dev team, so this will take a very long time to implement. It is not a priority at the moment. In fact, only about 5 million chunks out of almost a hundred million have been uploaded so far. Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: hopenotlate on May 09, 2025, 03:10:44 PM I had some free time, and as a sign of gratitude for the efforts you make to improve users experience of this forum I took the liberty of translating opening post into Italian as I noticed it hadn't been done yet, without even asking your permission to do it.
I hope you don't mind and please let me know if it's okay or if I should remove it. Translation link : Talksearch.io - Motore di ricerca avanzato per Bitcointalk (https://bitcointalk.org/index.php?topic=5542282.0) Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: NotATether on May 12, 2025, 06:01:34 AM I had some free time, and as a sign of gratitude for the efforts you make to improve users experience of this forum I took the liberty of translating opening post into Italian as I noticed it hadn't been done yet, without even asking your permission to do it. I hope you don't mind and please let me know if it's okay or if I should remove it. Translation link : Talksearch.io - Motore di ricerca avanzato per Bitcointalk (https://bitcointalk.org/index.php?topic=5542282.0) Anybody can make a translation of this topic without asking me. But to avoid duplicate efforts, people should make sure that a local translation doesn't already exist. On an unrelated note - Google Cloud is so useful! It's like having a free VScode in the cloud that doesn't cost anything extra, along with a database and git integration. HTTP server URLs are practically free as well. I am even using it for other projects too. It's too bad that Elasticsearch is not keeping up with the load :P, I guess I will have to wait a while for the upload to complete. Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: hopenotlate on May 12, 2025, 09:26:18 AM I had some free time, and as a sign of gratitude for the efforts you make to improve users experience of this forum I took the liberty of translating opening post into Italian as I noticed it hadn't been done yet, without even asking your permission to do it. I hope you don't mind and please let me know if it's okay or if I should remove it. Translation link : Talksearch.io - Motore di ricerca avanzato per Bitcointalk (https://bitcointalk.org/index.php?topic=5542282.0) Anybody can make a translation of this topic without asking me. But to avoid duplicate efforts, people should make sure that a local translation doesn't already exist. -snip- Glad to hear everything it's ok with it; maybe to avoid a duplicate you might want to add my translation link in opening post just for everyone to make sure at a first look it has already been done. Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: NotATether on May 12, 2025, 09:29:43 AM It’s missing a few key features though. Not being able to tweak the search text is a bit of a letdown since that’s pretty important for narrowing things down. Can you elaborate on this? I don't really understand what you mean by tweaking. Would you like variations that are more professional, casual, or critical? As in what? Sorry but just like the other part, I'm not very sure what you're asking for here. I am working on automatically including synonyms and verb conjugations of search terms in order to capture additional relevant topics though. This is something I can do independently of the document upload. Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: nutildah on May 18, 2025, 10:20:08 AM It’s missing a few key features though. Not being able to tweak the search text is a bit of a letdown since that’s pretty important for narrowing things down. Can you elaborate on this? I don't really understand what you mean by tweaking. Would you like variations that are more professional, casual, or critical? As in what? Sorry but just like the other part, I'm not very sure what you're asking for here. ... The problem is you're talking with a bot, or a human emulating a bot (https://bitcointalk.org/index.php?topic=5456516.msg65392364#msg65392364), rather. This last part is the AI asking him if he want the output rephrased but he just copy/pasted it because, naturally, he's a maroon: Would you like variations that are more professional, casual, or critical? Don't let the bots bring you down, NotATether! :D As a human, I for one applaud your efforts and think its great to see alternative resources being built around forum data. Will remember to add it to my arsenal the next time I am researching something. Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: Wouter Mense on May 19, 2025, 09:50:57 AM The issue is, I currently don't have a reliable way to measure post quality. Suggest to look at "user quality". Example post history (https://bitcointalk.org/index.php?action=profile;u=3618422;sa=showPosts;start=0). A lot of this kind of user exists. Looked at recent unread topics and this one I found at my third try. The patterns to look for in this case there are about 1200 posts that all "look" the same: - Each post begins with a quote. - Followed by one or two lines of text. Other things to look for: - All roughly the same total length. - All roughly the same number of paragraphs, of the same length. - Same number of sentences, of the same length. - Each with for example one image. All these are in my opinion the result of "forced" content generation. Usually with financial incentive I would assume. Of course above metric can be gamed. The thing here is that this pattern is predictable. The next posts of above user will also look the same. Introucing more variety in post style will take more effort, and would possibly also be indicative of improved quality. Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: NotATether on May 19, 2025, 10:53:33 AM Don't let the bots bring you down, NotATether! :D As a human, I for one applaud your efforts and think its great to see alternative resources being built around forum data. Will remember to add it to my arsenal the next time I am researching something. Thanks, I appreciate it. The issue is, I currently don't have a reliable way to measure post quality. Suggest to look at "user quality". Example post history (https://bitcointalk.org/index.php?action=profile;u=3618422;sa=showPosts;start=0). A lot of this kind of user exists. Looked at recent unread topics and this one I found at my third try. The patterns to look for in this case there are about 1200 posts that all "look" the same: - Each post begins with a quote. - Followed by one or two lines of text. Other things to look for: - All roughly the same total length. - All roughly the same number of paragraphs, of the same length. - Same number of sentences, of the same length. - Each with for example one image. All these are in my opinion the result of "forced" content generation. Usually with financial incentive I would assume. Of course above metric can be gamed. The thing here is that this pattern is predictable. The next posts of above user will also look the same. Introucing more variety in post style will take more effort, and would possibly also be indicative of improved quality. Noted. I do think, however, that post quality can be quantified somehow, so I'm going to look for some research on how that would be calculated. Probably it should be between 0 and 1. Then the user quality can be set to the mean of all post qualities from that user, which is then used as a weight for search results, but will not dampen results too much compared to post quality. Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: Wouter Mense on May 19, 2025, 12:10:51 PM I do assume a strong correlation between post and user quality but I don't have proof.
Also I totally ignored topic context. post quality can be quantified somehow Looking at just one post without context? I guess it would be less cpu time?Quote look for some research After reading your post I did pose a few questions to ai chat with possibly interesting results. Queries (in order, with typos, and ai chat answers between each query):
Offtopic, I hope you appreciate getting more questions instead of more answers. I do believe asking the right questions is more helpful to start your research. I can't vouch for the quality of ai answers, just that it looked interesting. I'm not a programmer, but it does offer to write your code as well. Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: NotATether on May 23, 2025, 11:33:51 AM Offtopic, I hope you appreciate getting more questions instead of more answers. I do believe asking the right questions is more helpful to start your research. I can't vouch for the quality of ai answers, just that it looked interesting. I'm not a programmer, but it does offer to write your code as well. I appreciate it greatly. I have done some looking around over the past few days, and I found a machine learning model called BERT (https://research.google/pubs/bert-pre-training-of-deep-bidirectional-transformers-for-language-understanding/) that was made by Google in 2018 for search engines. Can you believe that. An AI model from before AI models were a thing. :) I do have sort of a background in machine learning models, so I can summarize it briefly here: Instead of vectorizing words, and thus relying on keywords to search, it vectorizes entire phrases. Words that are adjacent to each other in a sentence. This makes natural language search possible (example: "block size wars" returning debates about segwit and bcash instead of only posts with "block size" in them). There are many improved versions of BERT nowadays, large ones and small ones. However, the models require dedicated hardware with GPUs to run. The good news is, Elasticsearch makes it painfully easy to deploy a model. You literally just have to press the "Run" button next to it. And then search algorithms will be using the model automatically. The bad news is, they don't come cheap. There is one ML node in my cluster, which I receive at no additional cost, but it only has 1GB of RAM and can't store any model, so it's pretty useless. Upgrading to the next hardware tier that has 2GB is going to bump the total monthly bill to around $300. And I am already hounded enough by Google with biweekly invoices. Therefore I want to wait until all the new post content is uploaded before I delete the old, incomplete post content, which will allow me to slash the storage size by about half. Then adding a larger ML node will make Talksearch's running cost somewhat lower than they are right now. It will be a wise investment, though. GPUs on dedicated servers are not plentiful, and are much expensive than this. Unfortunately, despite thousands and thousands of post chunks a day being uploaded, I am only about 10% of the way there. I can't experiment with BERT search until it's done. And pray my server doesn't run out of memory mid-upload, because my disk being the primary bottleneck means that retries will not be faster. But move to an SSD or something and the Elasticsearch nodes get overwhelmed with requests and run out of memory themselves. I imagine this whole process becomes much faster with even larger hardware, but that is not an amount I'm willing to spend, especially on a beta product. Good thing there is only one "initial block download" - after that I'll never have to worry about that again (unless catastrophic data loss occurs, as I'm only paying for one availability zone - ugh). Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: NotATether on May 26, 2025, 02:49:33 PM It appears that there is a problem with making search queries again. I will investigate this.
Please do not delete this post. Update: The problem has been identified. It appears that the access token has expired. I am currently deploying a fix and will update you when this is done. Update 2: It has been fixed. Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: NotATether on May 31, 2025, 08:34:00 AM Guys, I need some suggestions. I want to move the Elasticsearch server off of Google Cloud, due to AML problems I'm now facing when I attempt to load my card $100 to pay the bills.
What are some hosting providers that *do not* use Coingate or Cryptomus? Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: LoyceV on May 31, 2025, 08:47:08 AM What are some hosting providers that *do not* use Coingate or Cryptomus? I got my last VPS from Servarica, but can't remember which payment provider they used. I checked my email, and it doesn't show anything from any external provider. I can't really test it by making a payment now, maybe just ask them?This is the offer I took (https://lowendtalk.com/discussion/199994/servarica-black-friday-2024-dedicated-servers-unified-plans-and-storage-incredible/p1) (8 slices Slim Plan + 2 TB SAN Storage). Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: NotATether on May 31, 2025, 09:48:17 AM What are some hosting providers that *do not* use Coingate or Cryptomus? I got my last VPS from Servarica, but can't remember which payment provider they used. I checked my email, and it doesn't show anything from any external provider. I can't really test it by making a payment now, maybe just ask them?This is the offer I took (https://lowendtalk.com/discussion/199994/servarica-black-friday-2024-dedicated-servers-unified-plans-and-storage-incredible/p1) (8 slices Slim Plan + 2 TB SAN Storage). It's only enough to know if they support Monero payments or not. If so, then no transaction screening on any coins since XMR is untraceable anyway. Looks like I'm going to be scouring LowEndTalk for a while. Some specs I'm looking for to make searching easier: - 512GB SSD - At least 16GB of memory - more is obviously good, I want indexing to be instantaneous this time, instead of taking months. - A regular Intel/AMD processor will do (Apparently, I don't need a GPU (https://discuss.elastic.co/t/elastic-machine-learning-need-gpu/359471/4). w00t!) - 1 Gbps Ethernet I'm fine with spending $100/month on this, but deals are obviously nice. Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: LoyceV on May 31, 2025, 10:00:58 AM Looks like I'm going to be scouring LowEndTalk for a while. Note: I've seen and paid good and bad providers, and I've been burned more than once. So be careful who you trust.I'm quite happy with Racknerd too: Code: up 741 days, 19:49 Quote - 512GB SSD At that price, you could get a Premium KVM or VDS with 4 CPU, 16 GB RAM and 400 NVMe at Ramnode Cloud. They're good, but expensive. I only use them when I need it shortly: $0.15 per hour you use it.- At least 16GB of memory - more is obviously good, I want indexing to be instantaneous this time, instead of taking months. - 1 Gbps Ethernet I'm fine with spending $100/month on this, but deals are obviously nice. Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: psycodad on May 31, 2025, 11:35:29 AM Looks like I'm going to be scouring LowEndTalk for a while. Note: I've seen and paid good and bad providers, and I've been burned more than once. So be careful who you trust.I'm quite happy with Racknerd too: Code: up 741 days, 19:49 I can second that statement about Racknerd, running a kvm vps there since ~3yrs and no single problem so far. But I concede that I am a few Code: up 593 days, 16:12 Though unfortunately Racknerd accepts some crypto but not Monero: Quote from: What payment methods do you accept? We accept the following payment methods: ALL major credit cards (AMEX, Discover, VISA, Master). PayPal Cryptocurrency (Bitcoin, Bitcoin Cash, Litecoin, Ethereum, USDT, USDC) Alipay/支付宝 Wire More payment methods are supported upon checking out. Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: NotATether on May 31, 2025, 12:03:57 PM I've settled on this beauty from Dartnode:
Code: Model: Dual Xeon E5-2650 v4 It only costs me $100 a month, so it's a massive improvement from Google Cloud. The application itself will still be hosted there by the way, as it costs almost nothing to run. It's just the Elasticsearch server(s) being moved. It isn't actually usable yet, it is still in the setup phase. Edit: For some reason, the DDoS protection is a $35 addon. Whatever. That's already been added. I only have about 3 or so days to set up the new server with elasticsearch before I have to move money around again to the cards, so I have to do it fast as I'd like to avoid that. Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: NotATether on June 02, 2025, 09:41:56 AM Ingestion has now started on the new Elasticsearch server, and compared to my old cluster it's going lightning fast. If all goes well, it should be finished in about a day or two, and then I will redirect the search queries towards it and then shut down the old cluster.
Edit: Wow, already over 200k posts indexed in just an hour! https://www.talkimg.com/images/2025/06/02/UXrRfd.png According to my calculations, about 5 million posts can be uploaded in a single day. Therefore it's going to take up to 2 weeks for everything to get in there, but guess what? No resource exhaustion this time, so no crashes. I still plan to shut off the old cluster ASAP. Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: Vod on June 02, 2025, 07:20:15 PM Ingestion has now started on the new Elasticsearch server I looked into that for my new project - it allows you to search for a minimum of TWO characters instead of three. It's expensive though... Hopefully you'll let me add your engine to my extension so the user can choose. Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: NotATether on June 05, 2025, 06:38:50 AM NOTICE
Planned maintenance has commenced on Talksearch. (It did not start exactly as planned, because of ongoing bullshit from my internet provider.) During this time, search queries will be redirected to the new cluster. This post will be updated periodically with the status as it progresses. Update 06:57 utc - migration has finished and the service is being brought back online. Update 10:09 utc - Talksearch service brought back offline. Search traffic was moved to a new cluster. Posts may be missing while the index is filled over the next few days. Update 10:13 utc - the old Elasticsearch cluster on Google Cloud has been deleted. Maintenance has been completed. I looked into that for my new project - it allows you to search for a minimum of TWO characters instead of three. It's expensive though... It has a free, open source version, but it needs to run on very powerful hardware to be useful. Hopefully you'll let me add your engine to my extension so the user can choose. We can talk about that later. The immediate priority right now for me is to create significantly more powerful search parameters on the website. Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: NotATether on June 09, 2025, 10:52:31 AM Bump (Merit overload managed to push this thread all the way down to page 2 :o)
The algorithm feels awful though - any suggestions for suggestions on how I should improve it? Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: joker_josue on June 09, 2025, 05:19:00 PM The algorithm feels awful though - any suggestions for suggestions on how I should improve it? What do you mean horrible? What do you think he's doing wrong for the proposed goals? Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: Ivystar5 on June 09, 2025, 06:45:56 PM The algorithm feels awful though - any suggestions for suggestions on how I should improve it? I was thinking of we can get to an advanced stage where I can input a prompt like "what does Satoshi say about Bitcointalk adminstration?" and it will give results of threads where Satoshi talked about the administration of the forum, which in there on will able to figure out the exact thread or discussion that he or she is searching for. More like an AI type of research response with links to several related threads. I did try to ask a question like this but, it only delivers threads with titles that has each of the word in accordance. Why I wanted this, is because sometimes having an argument that requires you to provide links or thread where a user said something somehow becomes difficult as one will have to search several times or even have to remember some statements that are in the thread. Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: NotATether on June 10, 2025, 05:28:41 AM The algorithm feels awful though - any suggestions for suggestions on how I should improve it? What do you mean horrible? What do you think he's doing wrong for the proposed goals? It prioritizes occurrences too much. So when you search "casino" for example, the top results are the ones that have written casino two or three times in the title. It makes it feel spammy, but I'm waiting until all the content is uploaded before I do anything about it. Fortunately, this time, it will only take a few more days. Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: joker_josue on June 10, 2025, 06:58:05 AM It prioritizes occurrences too much. So when you search "casino" for example, the top results are the ones that have written casino two or three times in the title. It makes it feel spammy, but I'm waiting until all the content is uploaded before I do anything about it. Fortunately, this time, it will only take a few more days. Well, that's the biggest challenge for search engines. It took Google years to create an algorithm that could minimize this situation. To help minimize this, you have to create more filter criteria. For example, in addition to looking at just the title, it has to look at the topic content. An example: the topic title has the word "casino" 3 times and how many times does the OP have? Throughout the topic, does the word "casino" appear more often or not at all? Is the term "casino" in a conversational context or in the context of a name of something? Applying the rules and ensuring a good balance is not easy. This will undoubtedly be the biggest challenge of the project. Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: NotATether on June 10, 2025, 03:42:17 PM Well, that's the biggest challenge for search engines. It took Google years to create an algorithm that could minimize this situation. To help minimize this, you have to create more filter criteria. For example, in addition to looking at just the title, it has to look at the topic content. An example: the topic title has the word "casino" 3 times and how many times does the OP have? Throughout the topic, does the word "casino" appear more often or not at all? Is the term "casino" in a conversational context or in the context of a name of something? Applying the rules and ensuring a good balance is not easy. This will undoubtedly be the biggest challenge of the project. It's not just spam, there are for some reason a ton of topics in search results that have been deleted on the forum. So they all have to be purged. Finding out which ones are deleted is going to be a challenge as it will require another forum scrape. Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: joker_josue on June 10, 2025, 04:09:10 PM It's not just spam, there are for some reason a ton of topics in search results that have been deleted on the forum. So they all have to be purged. Finding out which ones are deleted is going to be a challenge as it will require another forum scrape. But what kind of sweep did you do to collect topics that have already been deleted? Did you use an old database? Maybe you can just run a script to validate if a certain topic exists, if it doesn't exist it deletes it from the DB. Or you may want to use this as a historical archive. Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: NotATether on June 11, 2025, 08:28:44 AM But what kind of sweep did you do to collect topics that have already been deleted? Did you use an old database? Maybe you can just run a script to validate if a certain topic exists, if it doesn't exist it deletes it from the DB. Or you may want to use this as a historical archive. Most of the old posts came were from Ninjastic.space. While I figure out how to weed out the old posts, I've ran some tests on Google Collaboratory with three different spam-detection LLM models (well, they are not specifically for spam detection except for the first one, but it can be used to classify text) on various categories. https://pdflink.to/bert-tiny-finetuned-sms-spam-detection/ https://pdflink.to/distilbert-base-uncased-finetuned-sst-2-english/ https://pdflink.to/deberta-large-mnli/ I think they get the overall sentiment, especially the last one, but it would be unwise to rely only on a LLM as a universal quality score. Additional measures must be taken in place to identify e.g. application posts, obviously AI-generated posts, and such in order to not return them in search results. I'm also going to place a minimum post length, to avoid indexing things like bumps. Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: joker_josue on June 11, 2025, 05:15:01 PM I'm also going to place a minimum post length, to avoid indexing things like bumps. Have you ever thought about a post/topic author rating system? A higher ranked user - more posts, merit, ranking - has passes the filters. The rest have to go through tighter filters. This may help reduce the number of posts analyzed, and help filter better. Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: NotATether on June 12, 2025, 08:21:27 AM New content update v1.0.3 and backend update v1.0.2 published
These updates add advanced search capability to Talksearch. Search features:
App features:
48 million posts have been index now. We are approaching indexing completion. Have you ever thought about a post/topic author rating system? A higher ranked user - more posts, merit, ranking - has passes the filters. The rest have to go through tighter filters. This may help reduce the number of posts analyzed, and help filter better. I don't like such a system because it will bias search results for users with a lot of merit. Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: cygan on June 12, 2025, 08:56:01 AM New content update v1.0.2 published ✂️ very nice to see another update of your search engine :) to update the translated threads from (taufik123, satscraper, Abdulzuruku01, katanic97, Adiljutt156, mela65, r_victory, GazetaBitcoin, Danica22 and Porfirii) i would ask you to update your op and the changelog - but you were probably planning to do that anyway ;) Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: taufik123 on June 12, 2025, 01:18:45 PM New content update v1.0.2 published Shouldn't this be a v1.0.3 update, because there was already a v1.0.2 updateNew app update v1.0.2 published Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: nutildah on June 13, 2025, 01:11:39 AM I think they get the overall sentiment, especially the last one, but it would be unwise to rely only on a LLM as a universal quality score. Agreed -- what is interesting or relevant to a LLM might not be so for people actually utilizing your search engine. Additional measures must be taken in place to identify e.g. application posts, obviously AI-generated posts, and such in order to not return them in search results. I like the initiative you're talking here. Whats interesting is that, last I checked, Google doesn't filter out AI-generated content, but it may do so in the future if it turns out that nobody wants to read such content, thereby making their search results not as accurate or relevant to the query as could potentially be. Seems like it would be super easy to game SEO ranking with AI content, so I don't know why they wouldn't attempt to block it. Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: NotATether on June 13, 2025, 04:13:01 AM New content update v1.0.2 published Shouldn't this be a v1.0.3 update, because there was already a v1.0.2 updateNew app update v1.0.2 published You're right - But only the frontend would be v1.0.3, because there was no v1.0.2 update for the backend. I like the initiative you're talking here. Whats interesting is that, last I checked, Google doesn't filter out AI-generated content, but it may do so in the future if it turns out that nobody wants to read such content, thereby making their search results not as accurate or relevant to the query as could potentially be. Seems like it would be super easy to game SEO ranking with AI content, so I don't know why they wouldn't attempt to block it. I am fortunate that my preliminary tests can detect AI to a similar degree of accuracy to other types of spam. New content update v1.0.4 published This is a minor update that adds missing time controls for the Date From and Date To filters. All posts up to March 2025 have now been indexed. I am actively working on enabling real-time indexing. Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: NotATether on June 14, 2025, 05:36:18 AM All posts from March - June 2025 are now being uploaded to the index, while I continue to contrive an automated solution to this problem.
Edit: this batch was uploaded with wrong dates which will cause search errors, and has been deleted and is being reuploaded again. Edit 2: All done. I want to implement a spam score as soon as possible, but I'm still not exactly sure how I will do that without re-indexing all the posts. At any rate, I will figure something out. Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: Rashlyowl on June 16, 2025, 04:11:48 AM Hey bros @NotATether, is it possible to implement pagination/paging directly on the site?
https://talkimg.com/images/2025/06/16/UdpQAJ.jpeg https://talkimg.com/images/2025/06/16/UdpM5b.png When I've gone too far, I want to go back to the page I want, but opening previous pages is a barrier for me. The solution is actually easy, just by changing: Current page Code: https://talksearch.io/search?q=Bitcointalk&page=9 To Page I want to see Code: https://talksearch.io/search?q=Bitcointalk&page=4 But it makes me a bit annoyed, after all, pagination can improve user experience to a better level. Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: NotATether on June 16, 2025, 06:17:28 AM Hey bros @NotATether, is it possible to implement pagination/paging directly on the site? https://talkimg.com/images/2025/06/16/UdpQAJ.jpeg https://talkimg.com/images/2025/06/16/UdpM5b.png When I've gone too far, I want to go back to the page I want, but opening previous pages is a barrier for me. The solution is actually easy, just by changing: Current page Code: https://talksearch.io/search?q=Bitcointalk&page=9 To Page I want to see Code: https://talksearch.io/search?q=Bitcointalk&page=4 But it makes me a bit annoyed, after all, pagination can improve user experience to a better level. As you said, this sort of change is very easy to do, and I will make sure to find some time with it. Due to a lack of software available on Github for this purpose, I'm currently busy building a project for calculating "embeddings" in text classification LLMs. It is a software that is blatantly missing from open-source repositories, and essential for anybody who is building a search application without a paid-for Elasticseearch subscription, which are expensive, even though they package AI search directly. The hope is that some others in the AI community will find it useful. Edit: This is what it's going to look like: https://bert-embedding-playground.lovable.app/ - it's designed to be self-hosted. This is just the frontend though, I haven't written much of the backend yet. And even the frontend was made by AI, because I suck at designing HTML by hand. :-\ (Modified again to avoid double-posting) New app update v1.0.5 published This is a minor update that adds detailed pagination to search results. Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: NotATether on June 22, 2025, 09:08:19 AM In the near future, after I set up the scraper to run automatically on the index, I will set up an API for retrieving posts, topics and users and enable limited access to it.
The Elasticsearch cluster is healthy at the moment and is running without issues. Thank you for using Talksearch. PSA: Date filtering older than September, 2020 posts does not work properly. I will create a new index to fix this. New content update v1.0.3 published This is a minor update that implements a workaround to the date filtering bug described above. A long-term re-indexing will be done at a future time. Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: NotATether on June 28, 2025, 11:36:07 AM Maintenance Alert
I have just received notice from the upstream provider that there might be some network-related maintenance on my node between 2025-06-28 11:00 UTC and 2025-06-29 07:00 UTC. While I am not planning any modifications to Talksearch as this time, please be informed that access to the Elasticsearch server might be impacted during this time, and as a result, searches might fail. The website itself is unaffected. I will keep you guys updated in the coming days. Edit: It seems that it was a false alarm. Nothing was affected. Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: NotATether on July 06, 2025, 10:42:24 AM Gentlemen, I am pleased to announce that I have designed a local AI detection capability.
https://www.talkimg.com/images/2025/07/06/UwIlf2.png This prototype ML inferencing server which I have developed over the past week is able to download any model from Hugging Face and use it for the purpose of classifying content. As a side effect, it can also generate word embeddings, enabling full natural-text search. I will be performing a few optimizations on this software before I deploy it live for scoring Bitcointalk posts. Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: taufik123 on July 06, 2025, 03:28:24 PM Gentlemen, I am pleased to announce that I have designed a local AI detection capability. An amazing update, Talksearch will be updated with AI that will classify the content thoroughly, so that every content that is the result of AI can be detected easily, -snip- this will be a very useful feature especially for project managers, it can detect which content is the result of AI and then combine it with the time span of the running campaign. Will any content that is considered AI or some AI-generated word insertions be tagged AI as a sign that it is AI-generated? And maybe it will be added by what percentage of AI usage. Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: NotATether on July 06, 2025, 05:48:32 PM An amazing update, Talksearch will be updated with AI that will classify the content thoroughly, so that every content that is the result of AI can be detected easily, this will be a very useful feature especially for project managers, it can detect which content is the result of AI and then combine it with the time span of the running campaign. Thanks. Quote Will any content that is considered AI or some AI-generated word insertions be tagged AI as a sign that it is AI-generated? And maybe it will be added by what percentage of AI usage. I have no idea. It's entirely dependent on the model I use. There are many open-source models for AI detection, and each one makes slightly different results. Regarding the percentage of AI, you can see it at the bottom of the screenshot. Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: TryNinja on July 06, 2025, 11:56:59 PM Gentlemen, I am pleased to announce that I have designed a local AI detection capability. I tried that before but a lot of human made content was being tagged as 100% AI, I didn't think it was accurate enough so I just ditched the idea.This prototype ML inferencing server which I have developed over the past week is able to download any model from Hugging Face and use it for the purpose of classifying content. As a side effect, it can also generate word embeddings, enabling full natural-text search. I will be performing a few optimizations on this software before I deploy it live for scoring Bitcointalk posts. I even had a friend in university getting a zero because his work was "detected as AI" with one of those online tools. He tried that with something from his professor and... it was also detected as AI, so he got his work graded. :P Might go back to the idea, but I think it's better to tag something as "potential AI" than "definitely AI", becase those models aren't that accurate at the end of the day. Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: NotATether on July 07, 2025, 05:44:56 AM Might go back to the idea, but I think it's better to tag something as "potential AI" than "definitely AI", becase those models aren't that accurate at the end of the day. This is not going to be a huge problem, because AI posts will simply be demoted and not labeled. I guess maybe I will record the percentage in the data score though so that my results are reproducible. Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: NotATether on July 13, 2025, 06:09:09 AM Your opinion matters!
Do you have suggestions or improvements for Talksearch? Anything you want to see on the platform? Reply here with your idea. Together, we well make the #1 search engine on Bitcointalk, and find topics that you have never seen before. So for this reason, community feedback is essential. Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: taufik123 on July 15, 2025, 05:10:24 PM Your opinion matters! A great development for Talksearch.io, This is your very optimistic project and you are devoting all your energy and mind to this project. Do you have suggestions or improvements for Talksearch? Anything you want to see on the platform? Reply here with your idea. Together, we well make the #1 search engine on Bitcointalk, and find topics that you have never seen before. So for this reason, community feedback is essential. I just want to give you some suggestions that might be useful for input or some new features such as: Add talksearch links in the BPIP browser extension such as ninjastic.space and loyce.club that will appear next to the user's account, which will automatically redirect to the user's post history. https://www.talkimg.com/images/2025/07/15/UAZhYZ.png https://www.talkimg.com/images/2025/07/15/UAZoi8.png and secondly, maybe you could add a Tab for the Historical thread, because as the title of the tagline talksearch.io "Explore BitcoinTalk History" can be easily seen on the talksearch home page and several other important threads. Maybe some kind of tab like this. https://www.talkimg.com/images/2025/07/15/UAZ5w3.png Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: Little Mouse on July 16, 2025, 02:49:43 AM Your opinion matters! I have tried to find out topics created by a user in a specific board. I have found a way (https://bitcointalk.org/index.php?topic=5549409.msg65571053#msg65571053), but it doesn't allow me to filter by a date range. I think this would be a great edition.Do you have suggestions or improvements for Talksearch? Anything you want to see on the platform? Reply here with your idea. Together, we well make the #1 search engine on Bitcointalk, and find topics that you have never seen before. So for this reason, community feedback is essential. Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: NotATether on July 29, 2025, 10:31:10 AM Not very important, but I was running an encoding job on the server, which kept lagging on my desktop, and apparently it can receive several gigabits of traffic as well!
Just wanted to share that with you. I thought it was really cool. https://www.talkimg.com/images/2025/07/29/UHm7dd.png Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: NotATether on August 05, 2025, 08:08:05 AM New topics up to August 2025 have finally been indexed.
I apologize for the delay. This took way longer than it should have. I was dealing with some personal stuff. But we are getting closer to fully automated indexing! These past few months have taught me how difficult it is to run a search engine, though. Much respect to anyone who can pull this off. Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: NotATether on August 06, 2025, 09:31:12 AM I am testing a prototype version of an automatic scraper. Can you guys please check after you make some posts today whether you can find your posts in the search engine?
Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: TypoTonic on August 06, 2025, 11:30:53 AM I've been searching about newbie guides for specific topics using the forum search engine and often times it will say that I am searching too quickly. It also goes on a timeout when I input something quite long, it's honestly a bit frustrating. But based on what I have read so far, I think it has something to do with my forum rank. Thankfully I came across this thread, and I immediately had a better experience. This is definitely a great help for me. ;D
Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: DYING_S0UL on August 06, 2025, 06:25:37 PM I've been searching about newbie guides for specific topics using the forum search engine and often times it will say that I am searching too quickly. It also goes on a timeout when I input something quite long, it's honestly a bit frustrating. But based on what I have read so far, I think it has something to do with my forum rank. Thankfully I came across this thread, and I immediately had a better experience. This is definitely a great help for me. ;D You can also use the google based search function which is avaliable inside the search, and afaik it doesn't have any search rate limit or that posting too quickly thingy. But yeah, I do agree that Talksearch is a great alternative for searching our desired topics and stuff.. ;) This below https://www.talkimg.com/images/2025/08/06/UHqv8j.jpeg Edit: I am testing a prototype version of an automatic scraper. Can you guys please check after you make some posts today whether you can find your posts in the search engine? I tried to search my last post! But it didn't show up! Or maybe I didn't searched properly or used relevant keywords. Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: NotATether on August 07, 2025, 08:42:10 AM I tried to search my last post! But it didn't show up! Or maybe I didn't searched properly or used relevant keywords. Something is wrong with the search engine algorithm when finding posts. But it is highly likely your post has been indexed. Please share the post ID. The search engine is not broken per se, but the internal search queries are quite crappy. Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: taufik123 on August 07, 2025, 08:58:36 AM Can you give an example of how to find keywords that are relevant enough for a post that you might be looking for.
Because I can't find my most recent posts, I only have a few old posts and the ones that appear are also in no order from the latest to the oldest. https://www.talkimg.com/images/2025/08/07/UHx3QJ.png https://www.talkimg.com/images/2025/08/07/UHxkfC.png My post: https://bitcointalk.org/index.php?topic=5545585.msg65662997#msg65662997 Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: GazetaBitcoin on August 07, 2025, 06:09:36 PM Hey NotATehter, good news! A Bangla translation was made by AOBT for your topic ☺️
Here is it: https://bitcointalk.org/index.php?topic=631891.msg65666928#msg65666928 Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: NotATether on August 08, 2025, 12:07:25 PM Hey NotATehter, good news! A Bangla translation was made by AOBT for your topic ☺️ Here is it: https://bitcointalk.org/index.php?topic=631891.msg65666928#msg65666928 Thank you! Can you give an example of how to find keywords that are relevant enough for a post that you might be looking for. Because I can't find my most recent posts, I only have a few old posts and the ones that appear are also in no order from the latest to the oldest. My post: https://bitcointalk.org/index.php?topic=5545585.msg65662997#msg65662997 Your post is not in the index. However, the topic has been indexed. This is the most recent post in the index: https://bitcointalk.org/index.php?topic=5545585.msg65461283#msg65461283 Topics made prior to Feburary 2025 do now have all of their new replies indexed. One of the challenges involved in searching posts is how to index giant topics like this one (https://bitcointalk.org/index.php?topic=1220979.61480). This is where I tap into existing forum archives - the services created by other uses such as Ninjastic.space and loyce.club allows me to get at least a mostly-complete copy of the data set. Usually, this works for the vast majority of long topics. However, the topics in the Gambling Discussion need special care. Because occasionally, there may be dozens of posts made on any given topic per day. Also, the forum's built-in rate limiter will make the indexing quite slow. In this case, I use an incremental updating method that only indexes some of the posts on any given day. Eventually, after a number of days, the topics will finally be in sync. Checking for edits is a harder challenge though. Just wanted to share that with you guys. Please consider donating to support the project. Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: DYING_S0UL on August 08, 2025, 05:57:57 PM I tried to search my last post! But it didn't show up! Or maybe I didn't searched properly or used relevant keywords. Something is wrong with the search engine algorithm when finding posts. But it is highly likely your post has been indexed. Please share the post ID. The search engine is not broken per se, but the internal search queries are quite crappy. Sorry I missed your notification somehow. I tried to find it but it didn't show up. I'm not sure whether I filled the right data! This was the post: https://bitcointalk.org/index.php?topic=5450546.msg65659774#msg65659774 https://talkimg.com/images/2025/08/08/USHZUP.png https://talkimg.com/images/2025/08/08/USHnRq.png Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: NotATether on August 09, 2025, 01:09:30 PM Sorry I missed your notification somehow. I tried to find it but it didn't show up. I'm not sure whether I filled the right data! This was the post: https://bitcointalk.org/index.php?topic=5450546.msg65659774#msg65659774 Your post is also not in the index. Moreover, it is getting more difficult for me to scrape posts. It seems that scraping a very long topic such as this one (https://bitcointalk.org/index.php?topic=5088875.4000) is causing the scraper to encounter 429 errors. These are not like the temporary 503 "back off" errors, and they have completely halted the scraper. Additionally, there is no reason for my scraper to be scraping this particular topic from the beginning, since I already have some older part of it. So I think this is a bug in my scraper. I will try to see if I can get it to resume with proxies/tor but I am not going to be hammering the forum with concurrent instances anytime soon. Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: NotATether on August 11, 2025, 09:05:37 AM Final tests are being conducted. If they pass, then real-time indexing will be enabled permanently.
Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: NotATether on August 15, 2025, 02:24:47 PM Real-time indexing seems to be working well. But I am currently filling the index with every single post from February 2025 onwards, because not all of them were indexed the first time.
Once that is done then real-time search will be really fast. Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: taufik123 on August 15, 2025, 03:43:23 PM Real-time indexing seems to be working well. But I am currently filling the index with every single post from February 2025 onwards, because not all of them were indexed the first time. Will it be faster than before?Once that is done then real-time search will be really fast. You work hard enough to make everything run smoothly and quickly. It takes a lot of effort and money to build all of this, Not impossible to achieve better progress on this tool. Making it especially useful when Indexing is already running in real-time. Good job bro Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: NotATether on August 15, 2025, 03:50:20 PM Will it be faster than before? Yes, new posts will appear on the website much quicker now. Even most edited posts will be taken care of, because I am scanning for replied-to topics up to a certain date, which leaves some room for edits. Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: DYING_S0UL on August 15, 2025, 07:15:30 PM Yes, new posts will appear on the website much quicker now. Even most edited posts will be taken care of, because I am scanning for replied-to topics up to a certain date, which leaves some room for edits. How many times does a post gets re indexed/scanned actually? Or is it a one time thing? In situations where a post is edited multiple times, does it get re indexed every single times when something is changed, or is there a limit after the re indexing stops?? For example, if someone keeps making small changes to his post, adding new infos, removing old datas, will the tool still update it with latest changes?? Sorry i don't know how it works, so I was a little curious, so I asked anyway. Btw, still now the searches feels slower! Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: NotATether on August 18, 2025, 11:57:04 AM Yes, new posts will appear on the website much quicker now. Even most edited posts will be taken care of, because I am scanning for replied-to topics up to a certain date, which leaves some room for edits. How many times does a post gets re indexed/scanned actually? Or is it a one time thing? In situations where a post is edited multiple times, does it get re indexed every single times when something is changed, or is there a limit after the re indexing stops?? For example, if someone keeps making small changes to his post, adding new infos, removing old datas, will the tool still update it with latest changes?? Sorry i don't know how it works, so I was a little curious, so I asked anyway. Btw, still now the searches feels slower! The bitcointalk scraper goes through every board like a person would, and just clicks next until it finds topics that are older than X number of days. Then it opens each topic and reads all of them. It does this every 5 minutes. It is currently jammed at the moment due to reading really long topics (like Wall Observer) over and over again, so I'm going to modify it to read only the posts made within the last X days. Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: DYING_S0UL on August 18, 2025, 03:48:01 PM Yes, new posts will appear on the website much quicker now. Even most edited posts will be taken care of, because I am scanning for replied-to topics up to a certain date, which leaves some room for edits. How many times does a post gets re indexed/scanned actually? Or is it a one time thing? In situations where a post is edited multiple times, does it get re indexed every single times when something is changed, or is there a limit after the re indexing stops?? For example, if someone keeps making small changes to his post, adding new infos, removing old datas, will the tool still update it with latest changes?? Sorry i don't know how it works, so I was a little curious, so I asked anyway. Btw, still now the searches feels slower! The bitcointalk scraper goes through every board like a person would, and just clicks next until it finds topics that are older than X number of days. Then it opens each topic and reads all of them. It does this every 5 minutes. Ahhh I see. I didn't knew it actually worked like that. To be honest, this feels way to much work just for scrapping. I mean going through every single boards, clicking next and reading all those posts, that's basically mimicking an active user but on steroids, LOL :P. An endless cycle of browsing the forum and scanning every corner of it. But again u have powerful hardware. Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: NotATether on August 19, 2025, 05:55:59 AM I am aware of an issue occurring with the Elasticsearch service that is impacting the ability to search. I am working on a fix. Sorry for the inconvenience.
Update: This issue has been resolved. It seems to have been caused by the Elasticsearch service running out of memory. This caused the OOM killer to terminate the process. I can't say I'm happy about this because this means even 32GB of RAM may not be enough for ingesting.... Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: NotATether on August 23, 2025, 10:36:24 AM New content update v1.1 published
This update brings the long-awaited real-time indexing feature to Bitcointalk. Please note that due to speed constraints, it takes quite a bit of time to index the recent forum topics and as a result, it might take several hours to a few days for your post to be indexed. I am working on making this happen quicker. I have also added a tracking code to the Talksearch links in the OP (only) so I can measure how many people click on the link from Bitcointalk. Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: DYING_S0UL on August 23, 2025, 10:58:07 AM snip Hey there NotATether, I just wanna let you know that you never updated OP with this translation. It was made several days ago. Hope that helps. Bangla (https://bitcointalk.org/index.php?topic=631891.msg65666928#msg65666928k) translation by DYING_S0UL Hey NotATehter, good news! A Bangla translation was made by AOBT for your topic ☺️ Here is it: https://bitcointalk.org/index.php?topic=631891.msg65666928#msg65666928 Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: beveryu778 on August 23, 2025, 11:24:46 AM I'm looking for an option here. But it's not working, TalkSearch can't do it.
The thing is that if someone quotes my post and posts it, then if I type the text of my post and search, I will find the links to all the posts in the search results. For example, if I am posting this, now if I copy the text of my post exactly and search, then my post shows: If someone else quotes my post, then all of them will come up on search results. Why is this necessary? What this will do is ensure that before reporting a post by quoting it, it will show someone has already reported it. This can help when reporting plagiarism posts, AI posts, malware, and to avoid duplicate reports. Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: NotATether on August 23, 2025, 01:00:49 PM snip Hey there NotATether, I just wanna let you know that you never updated OP with this translation. It was made several days ago. Hope that helps. Very sorry I didn't add this before, I totally forgot about it. I added it now. I'm looking for an option here. But it's not working, TalkSearch can't do it. The thing is that if someone quotes my post and posts it, then if I type the text of my post and search, I will find the links to all the posts in the search results. For example, if I am posting this, now if I copy the text of my post exactly and search, then my post shows: If someone else quotes my post, then all of them will come up on search results. Why is this necessary? What this will do is ensure that before reporting a post by quoting it, it will show someone has already reported it. This can help when reporting plagiarism posts, AI posts, malware, and to avoid duplicate reports. This is a known issue and I'm working on fixing this by searching by vector embeddings instead of by plain text, and these embeddings will be made by stripping the quotes from the posts first. The previous algorithm I designed is (by my own standards) quite garbage. Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: beveryu778 on August 23, 2025, 01:16:27 PM 1755945384] The sound is good. this option will change the searching experience and will provide many advantage to find information easily.I'm looking for an option here. But it's not working, TalkSearch can't do it. The thing is that if someone quotes my post and posts it, then if I type the text of my post and search, I will find the links to all the posts in the search results. For example, if I am posting this, now if I copy the text of my post exactly and search, then my post shows: If someone else quotes my post, then all of them will come up on search results. Why is this necessary? What this will do is ensure that before reporting a post by quoting it, it will show someone has already reported it. This can help when reporting plagiarism posts, AI posts, malware, and to avoid duplicate reports. This is a known issue and I'm working on fixing this by searching by vector embeddings instead of by plain text, and these embeddings will be made by stripping the quotes from the posts first. The previous algorithm I designed is (by my own standards) quite garbage. Good luck, you are doing great. Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: DYING_S0UL on August 23, 2025, 01:22:44 PM Very sorry I didn't add this before, I totally forgot about it. I added it now. Sorry for making you do all the extra work, but you did a typo in the language. It should be "Bangla" not bengala. Opps :-X Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: NotATether on August 30, 2025, 10:19:16 PM I have spawned a second scraper running in parallel, with a shorter post creation window for indexing.
This will go a long way into making posts get indexed faster. Your posts are now expected to be indexed within five days. The goal is to eventually get all posts indexed in 5 minutes or less. PS. Adaptive rate limiting is used in all scrapers, so we always slow down enough to never get any 503 errors. Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: GazetaBitcoin on August 31, 2025, 03:00:29 PM Hey NotATether, please be aware that a new translation was made by AOBT for your topic:
- Hindi translation (https://bitcointalk.org/index.php?topic=5557435.0), made by M47AK16 I hope this is good news! Are you so kind, please, to add it to OP? :) Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: NotATether on September 01, 2025, 06:24:16 AM Hey NotATether, please be aware that a new translation was made by AOBT for your topic: - Hindi translation (https://bitcointalk.org/index.php?topic=5557435.0), made by M47AK16 I hope this is good news! Are you so kind, please, to add it to OP? :) It has been added. Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: ptrk on September 01, 2025, 01:02:14 PM @NotAThether: I'd be interested to know which programming language you used to write the site. Java?
Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: NotATether on September 04, 2025, 08:57:00 AM New app update v1.0.6 and search update v1.1.1 published
You can now report posts that are in the search engine using the "Report Content" feature at the bottom of the page. This allows you to remove content that exposes sensitive information about yourself or other people. There is no captcha for now. However if this feature gets abused then I might have to start using reCaptcha. Note that you can only remove your own posts or topics, and they must have already been deleted from Bitcointalk first. Or they will just be scraped again. The exception to this is posts containing malicious links - please make sure you also use the "Report to Moderator" feature. I try to delete old reports after 30 days. @NotAThether: I'd be interested to know which programming language you used to write the site. Java? The site is written in NextJS. The search algorithms themselves are using Node in Docker. Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: NotATether on September 06, 2025, 10:50:35 AM For some reason, the scraper has stopped. I am working on getting it back online.
Edit: resolved (since a few hours ago) Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: TESY on September 24, 2025, 12:01:48 AM I checked it and it seems very useful as a newbie I can find things more easy
I suggest your strongly to add a SSL/TLS encryption so the site is not served as Not Secure second suggest add target="_blank" so when user search and click on a thread it don't leave ur site Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: NotATether on September 26, 2025, 02:07:07 PM I checked it and it seems very useful as a newbie I can find things more easy I suggest your strongly to add a SSL/TLS encryption so the site is not served as Not Secure second suggest add target="_blank" so when user search and click on a thread it don't leave ur site That's strange... Talksearch already has a TLS certificate from Google App Engine. It should not display the "not secure" prompt. Can you try with a different device and browser ans see if you can replicate the issue? target=_blank sounds like an interesting idea, but it depends on whether people want to leave talksearch after finding the thread they want, or if they are using this as a way to read old posts. Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: LoyceV on September 26, 2025, 04:15:25 PM target=_blank sounds like an interesting idea, but it depends on whether people want to leave talksearch after finding the thread they want, or if they are using this as a way to read old posts. Whenever a site does that, I middle-click on the site's tab to close it. It's extra work, and annoying. If I want to open a new tab, I use the same middle-click to open it in a new tab. I don't need any site to decide that for me.Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: TryNinja on September 27, 2025, 05:30:30 AM Whenever a site does that, I middle-click on the site's tab to close it. It's extra work, and annoying. If I want to open a new tab, I use the same middle-click to open it in a new tab. I don't need any site to decide that for me. FYI, I used to use the same target=_blank for all external links in my projects/sites. I think it was an old post of yours that changed my mind, and now I'm changing all my links so that the user has the option to decide where they want it to open. ;DTitle: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: joker_josue on September 27, 2025, 07:17:37 AM Whenever a site does that, I middle-click on the site's tab to close it. It's extra work, and annoying. If I want to open a new tab, I use the same middle-click to open it in a new tab. I don't need any site to decide that for me. FYI, I used to use the same target=_blank for all external links in my projects/sites. I think it was an old post of yours that changed my mind, and now I'm changing all my links so that the user has the option to decide where they want it to open. ;DYou're absolutely right. I'd never thought of it from that perspective. I'll soon be adjusting this situation at TalkImg and applying this "philosophy" to other projects. It's simpler, safer, and gives the website visitor more control! Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: Fivestar4everMVP on September 29, 2025, 01:47:14 AM I checked it and it seems very useful as a newbie I can find things more easy I suggest your strongly to add a SSL/TLS encryption so the site is not served as Not Secure second suggest add target="_blank" so when user search and click on a thread it don't leave ur site That's strange... Talksearch already has a TLS certificate from Google App Engine. It should not display the "not secure" prompt. Can you try with a different device and browser ans see if you can replicate the issue? Quote target=_blank sounds like an interesting idea, but it depends on whether people want to leave talksearch after finding the thread they want, or if they are using this as a way to read old posts. So, instead of opening a clicked link on the same tab, I think it would make more sense if the link opens in a new tab while the search engine maintains its own tab. Congratulations bud, this is a very useful tool for this forum for sure. Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: NotATether on September 29, 2025, 05:30:32 AM The target=_blank is a good idea, I don't know about other users but I personally like the search engine to remain active/open even when I had click on a search result, as there is always the possibility that the results I click on may not be exactly what I am looking for but I still may like to leave that open and return to the search engine and click on another result. So, instead of opening a clicked link on the same tab, I think it would make more sense if the link opens in a new tab while the search engine maintains its own tab. Google and DuckDuckGo don't open search result links in a new tab, to take an example. So it's one of those things where you have to go by precedent and do what all the other services are doing. So I don't think I will be adding target=_blank to the links. Besides, I feel like the search results page could use some polishing if people actually stay on it for more than a few seconds. Congratulations bud, this is a very useful tool for this forum for sure. Thanks. I've meant to add vector embeddings to each of the posts in order to make ML search possible, but I've never had the time to do that yet. I have the hardware and everything ready (48 CPU cores but no GPUs), it's just waiting for my action. Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: GazetaBitcoin on September 30, 2025, 07:24:04 AM Hey NotATether, please be aware that a new translation was made by AOBT for your topic:
- Serbian translation (https://bitcointalk.org/index.php?topic=51450.msg65374182#msg65374182), made by katanic97 Are you so kind, please, to add it to OP as well? Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: NotATether on September 30, 2025, 01:22:05 PM Hey NotATether, please be aware that a new translation was made by AOBT for your topic: - Serbian translation (https://bitcointalk.org/index.php?topic=51450.msg65374182#msg65374182), made by katanic97 Are you so kind, please, to add it to OP as well? It has been done. Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: NotATether on October 02, 2025, 08:17:10 AM Add Talksearch to Google Chrome search engine!
https://www.talkimg.com/images/2025/10/02/UGXNR9.png 1. Go to chrome://settings/search 2. Click on "Manage search engines and site search" 3. Scroll down to Site search and click on "Add" 4. Fill in the values as follows: - Name: Talksearch - Shortcut: @talk - URL: https://talksearch.io/search?q=%s Now you can directly search Bitcointalk posts from the address bar. For example, you can type "@talk bitcointalk memes" to find memes on Bitcointalk. Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: NotATether on October 15, 2025, 12:07:54 PM Bump
Natural language search coming soon... Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: NotATether on October 20, 2025, 06:52:39 PM PSA: We are not affected by the Amazon Web Services global outage. Our infrastructure is hosted on Google Cloud and on dedicated providers.
Cheers! Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: examplens on November 06, 2025, 11:34:07 AM I used TalkSearch a little, I would suggest two improvements if possible. It is related to the search result.
For example, what I miss is the sorting of search results by creation date and by the date of the last post in the topic. I'm not sure how the algorithm decides the order of the prints, but it seems that they are thrown randomly. Also, if possible, separate or at least mark archived or locked threads. It took me a long time to search and check each link, and many were just archived or locked, which was useless to me. Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: NotATether on November 09, 2025, 11:20:33 AM I used TalkSearch a little, I would suggest two improvements if possible. It is related to the search result. For example, what I miss is the sorting of search results by creation date and by the date of the last post in the topic. I'm not sure how the algorithm decides the order of the prints, but it seems that they are thrown randomly. Also, if possible, separate or at least mark archived or locked threads. It took me a long time to search and check each link, and many were just archived or locked, which was useless to me. I hope to finally add vector embedding search within the next few days. It would require a complete re-index though since I have to add new fields. Also the second request might be a good idea. I return the entire post data to the frontend anyway so it would be possible to do such a thing. The bitcointalk scraper is being restarted, in order to use a lower post search threshold which is expected to index posts twice as fast. There may be a minor disruption in indexing during this operation, but it is expected to be immediate. Update 14:08:00 UTC: as expected, the ML server is causing a few errors with Talksearch that are preventing searches from completing successfully. Please be patient while I fix these. Update 14:37:00 UTC: All errors have now been fixed. Working on vectorizing all the posts now. I really should've ordered this thing with 64GB memory :-\ Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: NotATether on November 15, 2025, 06:56:10 AM I'm still working on adding ML features to the search queries, but first I am scanning through the entire post index to identify and trash deleted posts. Talksearch does not currently detect when a post is deleted so I have to do this manually.
Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: NotATether on November 20, 2025, 05:45:44 AM Community: What is the fastest way to scan the forum to check for deleted posts given a set of post & topic ID pairs?
I'm not interested in edited posts, only deleted posts. To check over 50 million posts, at an average speed of 1 post per 2.64 seconds, it will take 132 million seconds or over four years. I need to do a one-time scan after I downloaded a post set which may or may not include deleted posts some time ago, and obviously I can't wait that long. Even if I checked checked 20 posts per page, that would still take 2.5 months assuming no downtime. I would like to be able to query all this information taking only several days, or at worst a few weeks. These times are way too long for me. Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: LoyceV on November 20, 2025, 07:45:31 AM Community: What is the fastest way to scan the forum to check for deleted posts given a set of post & topic ID pairs? First scrape all board lists, that gives you a list of all topics that haven't been deleted yet. Then scrape all topics.Quote To check over 50 million posts, at an average speed of 1 post per 2.64 seconds, it will take 132 million seconds or over four years. You're allowed one page request per second, so waiting 2.64 seconds isn't necessary. Maybe you can use "All (https://bitcointalk.org/index.php?topic=5160863.0;all)" for topics with no more than 26 pages to get up to 500 posts at once, but Cloudflare will probably stop you from doing that.Quote Even if I checked checked 20 posts per page, that would still take 2.5 months assuming no downtime. It took me several months (years back), but that included scraping non-existing topics because I hadn't thought of scraping the boards first.Quote I would like to be able to query all this information taking only several days, or at worst a few weeks. Does it help to prioritize boards? Forget about the altcoin bounty boards, that should take off millions if not tens of millions of posts.Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: NotATether on November 20, 2025, 10:37:08 AM You're allowed one page request per second, so waiting 2.64 seconds isn't necessary. Maybe you can use "All (https://bitcointalk.org/index.php?topic=5160863.0;all)" for topics with no more than 26 pages to get up to 500 posts at once, but Cloudflare will probably stop you from doing that. It appears twice as slow because the actual indexer is also running in parallel. I just ran the deletion script on top of it and it's also running at the same speed. Cloudflare would work, but I would have to embed a token of some kind. Quote Does it help to prioritize boards? Forget about the altcoin bounty boards, that should take off millions if not tens of millions of posts. Not really. The deleted posts occur randomly across any board (though I suspect the gambling discussion board has a higher proportion of deleted posts). Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: LoyceV on November 20, 2025, 11:57:39 AM The deleted posts occur randomly across any board (though I suspect the gambling discussion board has a higher proportion of deleted posts). Have you considered modlog (https://bitcointalk.org/modlog.php) as a first hint of where to look for deleted posts? I don't think there's a foolproof way to catch all deleted posts. You could also prioritize more recent topics over older ones: there's no need to check this topic from 2010 (https://bitcointalk.org/index.php?topic=820.40) every few weeks for deleted posts.Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: NotATether on November 23, 2025, 08:41:31 AM Have you considered modlog (https://bitcointalk.org/modlog.php) as a first hint of where to look for deleted posts? I don't think there's a foolproof way to catch all deleted posts. You could also prioritize more recent topics over older ones: there's no need to check this topic from 2010 (https://bitcointalk.org/index.php?topic=820.40) every few weeks for deleted posts. That may not be necessary anymore - theymos suggested I should use the sitemap.xml page. It's a little convoluted, but I managed to hack together a script that will check for deleted/updated posts. There is one big pass I have to make first, before I can revert to tiny passes from the past day or two. Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: LoyceV on November 23, 2025, 09:20:34 AM theymos suggested I should use the sitemap.xml page. Is that https://bitcointalk.org/sitemap.php? I've seen it before, but wasn't sure how to use it (and later on couldn't find it back).Note: I can't post this the way I wanted, Bitcointalk turns it into the above: [url=https://bitcointalk.org/sitemap.php]bitcointalk.org/sitemap.php[/url]? Even code tags can't post the nobbc-code correctly, the above turns into this: Code: Is that https://bitcointalk.org/sitemap.php? Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: NotATether on November 23, 2025, 12:06:55 PM theymos suggested I should use the sitemap.xml page. Is that https://bitcointalk.org/sitemap.php? I've seen it before, but wasn't sure how to use it (and later on couldn't find it back).Yep, that's the one. Basically, he explained to me that the sitemap is organized into many smaller sitemaps, presumably to avoid generating a single giant sitemap. The main sitemap.php contains topic/page boundaries, where p= and o= denote the topic and page inside the topic (such as .20) respectively. The inner sitemap XML looks like this Code:
From this data, the most important are the loc and the lastmod. These basically tell you whether a page has been edited or deleted. Deleted posts will cause subsequent pages to appear modified as well. I guess the other two parameters can be used to gauge how frequently to check for updates, but I don't use those in my current implementation - I am performing a full sweep over the sitemap first. This process is also causing regular indexing to slow down because I am throttling the indexer speed in order to avoid breaching the rate limit. New posts should now appear instantly though (but only because I am using the recentposts page, nothing to do with the sitemap). But edits will appear to be much slower until I finish crawling through the pages linked by the sitemap. Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: NotATether on December 03, 2025, 05:59:51 PM Bump (I guess a lot of people are still using forum search. I'm purging deleted posts as fast as I can.)
I'm starting to think it is a bad idea to boost search terms in titles. I'm going to adjust it to consider only post bodies. And make a new internal field for posts containing "effective title" that is only set for the OP, and concatenation they with the post body whenever someone searches for something. Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: yhiaali3 on December 06, 2025, 04:35:15 AM Hello @NotATether
First of all, thank you for the wonderful work and great efforts you have put in so far in developing this advanced Search Engine for BitcoinTalk. I didn't see the Arabic translation among the translations, so I am honored to contribute to this effort through the Arabic translation I did at the following link: https://bitcointalk.org/index.php?topic=5567569 If you intend to translate the site into multiple languages, I would also be happy to participate in translating it into Arabic. Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: NotATether on December 06, 2025, 07:00:00 AM Hello @NotATether First of all, thank you for the wonderful work and great efforts you have put in so far in developing this advanced Search Engine for BitcoinTalk. I didn't see the Arabic translation among the translations, so I am honored to contribute to this effort through the Arabic translation I did at the following link: https://bitcointalk.org/index.php?topic=5567569 If you intend to translate the site into multiple languages, I would also be happy to participate in translating it into Arabic. Thanks brother. I wanted to translate it to Arabic myself but I never found the time to. So this is very helpful. Sitemap update/delete is still going strong. Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: yhiaali3 on December 06, 2025, 11:29:31 AM Hello @NotATether First of all, thank you for the wonderful work and great efforts you have put in so far in developing this advanced Search Engine for BitcoinTalk. I didn't see the Arabic translation among the translations, so I am honored to contribute to this effort through the Arabic translation I did at the following link: https://bitcointalk.org/index.php?topic=5567569 If you intend to translate the site into multiple languages, I would also be happy to participate in translating it into Arabic. Thanks brother. I wanted to translate it to Arabic myself but I never found the time to. So this is very helpful. Sitemap update/delete is still going strong. I've been following your wonderful efforts so far, and you're still developing it. So, when all the development is complete, just let me know and send me the language file so I can translate it into Arabic immediately. I also hope you'll update the OP to include the Arabic translation in the main thread. Thank you. Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: NotATether on December 09, 2025, 07:20:16 AM Already the mass edited/deleted posts updater using the sitemaps has refreshed 100k threads (a pittance when you take note that there's over 5 million topics, but we're getting there!)
Here are some logs from this operation, which runs simultaneously with the main indexer by the way: Code: 🔄 Starting real-time scrape and ingest for topic 101198 (mode: update) I also hope you'll update the OP to include the Arabic translation in the main thread. It will be added soon! Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: LoyceV on December 09, 2025, 08:47:12 AM Already the mass edited/deleted posts updater using the sitemaps has refreshed 100k threads (a pittance when you take note that there's over 5 million topics, but we're getting there!) Last time I checked (2 months ago), I counted only 1,415,773 topics (https://loyce.club/other/board_topicID_title.txt) (102 MB txt file). Almost 75% is deleted.Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: Halab on December 09, 2025, 11:44:02 AM Last time I checked (2 months ago), I counted only 1,415,773 topics (https://loyce.club/other/board_topicID_title.txt) (102 MB txt file). Almost 75% is deleted. I began to dare to doubt these figures. But I forgot MindlessElectron, which fights against other spamming bots. The percentage of topics deleted by human mods must be much lower. Title: Re: Talksearch.io - Advanced Bitcointalk Search Engine Post by: NotATether on December 09, 2025, 06:17:28 PM Already the mass edited/deleted posts updater using the sitemaps has refreshed 100k threads (a pittance when you take note that there's over 5 million topics, but we're getting there!) Last time I checked (2 months ago), I counted only 1,415,773 topics (https://loyce.club/other/board_topicID_title.txt) (102 MB txt file). Almost 75% is deleted.Right. I forgot that I ran a query on my internal Elasticsearch node some time ago for the number of unique topics, and got similar to this number too. This significantly cuts down on runtime! I still have to search for and remove any deleted topics post-2020 though (when I loaded your archive into the storage). |