The dataSee
loyce.club/badpostsAll categories / spam / scam / other / advertising / emailMost recent posts are shown first.
Whitelisted usersLoyceV
LoyceMobile
TheBeardedBaby
Lafu
Rizzrack
Timelord2067
marlboroza
morvillz7z
NotATether
actmyname
nutildah
hosseinimr93
Mitchell
Bthd
light_warrior
ABCbits
Post keywordsWhitelisted users can post keywords. Please don't use very common words (such as "scam") that trigger too many false positives. And please keep all keywords in
one code tag in one post: edit it to add/remove keywords.
Other users can also post, with 2 possible outcomes:
- I whitelist you and process your keywords
- I remove your post
Please
don't quote code-tags.
TipsLeave out "https" from scam links.
FormatWithin code-tags, post either "scam:thiswebsiteisascam" or "spam:Ikeeppostingthesamecrapeverywhere". One phrase per line. See
this example.
Remove the line to exclude the keyword in the next update.
Only use a space after "scam:" if you want to include a space in your search.
Keyword "other:" is meant for text that isn't necessarily spam or a scam, but needs highlighting nonetheless.
Keyword "advertising" is meant for sites that are often used to copy an article and create a backlink.
Keyword categories:
- spam:
- scam:
- other:
- advertising:
- email:
FeaturesI search the last ~200,000 posts for new keywords. This covers approximately 1 month.
I update all keywords every 20 minutes.
I show all matches for all usernames and ranks. No exceptions. Not all of them are bad.
Whitelisted users are shown in green.
LimitationsThere's a maximum of 60 keywords per post (if you add more those will be ignored). I can increase this if needed.
There's a maximum of 4000 matches per category. If there are more, the oldest are removed.
The minimum keyword length is 4 (for other) or 5 (for spam/scam) characters.
I also search quotes.
Report posts!This list is only useful if someone actually reports the bad posts
Post removalI want to keep this topic compact (so I can quickly scrape it many times). That means I'll delete almost all posts that don't contain a list from a Whitelisted user. I would say "I hope I'm not offending anyone", but really,
I'm okay with that
Q&AWhat are you trying to accomplish with this thread?
See
around here Will this look into titles?
Some types of scams involve posting very little content in the OP body but then go on to include important keywords in the title.
For now: nope
I don't keep track of titles with this data.
This could be cross-checked with your list of banned accounts.
Thanks, I've added it.
Can I use it for searching for alts? Like links to twitter, facebook, telegram usernames, etc?
Can I use it for catching plagiarism, like searching for a whole sentence or this will clogged the server, or maybe only a phrases and not so common words like we did in the SpamBuster club with suchmoon?
My plan was to only look back about 100k posts (currently just over 2 weeks), so it won't really help you here. But it would (near) instantly add new posts, and that's what I'm aiming for here.
Searching all my data without database takes too long to do on a regular basis. You should try
TryNinja's database though!
Without "www", the url turns into 9gag.com. I think theymos should give "
www.bitcointalk.org" the same treatment.
I've added category "other" for things like "
www.bitcointalk.org".
It shows every word which has keyword "moron" or "moran" inside:
umoran
Is it supposed to work like this?
I search for the exact phrase (case insensitive), so it matches anything. You can add a space in front of it (as you did already), but that might miss some matches too. It's more or less as intended, if I change this, it might overlook other matches.
I search my
deleted posts, I can't search live posts this way.
Can we also include the reasoning behind why something is a scam? Perhaps a link to a thread that explains it?
It can be explained in the post in this topic. I don't want to add repetitive explanations to my
badposts page.
It would be great if you can add this to your earlier post
Please don't quote code-tags.
Please don't quote code-tags.
You should write that on the first row of the OP.
It's a tad higher now. Don't worry about overlooking it: I only added it today. I don't think it's much of a problem though: only the first code tag in each post is processed, and I now remove duplicate keywords to reduce search time.
Each 15 minute update takes about 1 second to process.
Each new keyword takes a few minutes to process, reading all 200,000 posts is slow. Processing several new keywords at once is more efficient, so feel free to add them
spam:minepi.com/
scam:github.com/pillforeth/
I'm trying to improve searching for whole words only. I now remove the trailing slash ("/") from the keyword before searching. I don't think it matters for your keywords, but it can improve other strings.
other: moran
other: moron
other: moron
Try without the spaces now
I've tried with space in front and the end, it didn't find anything. I have also tried with space in front it also didn't find anything.
You were trying to adjust for my old search, right while I was adjusting it to improve matching complete words only.
This is too much for me...I am going to drink a beer and think about "drink another beer finding " "ej" space in the back" for another beer.
Enjoy!
scam:https://github.com/pillforethereum/ETHpillAN/
There's many of those altcoin pills nowadays.
You should probably omit the "https://"-part, a scammer can do the same.
How far back does your search go?
See:
I search the last ~200,000 posts for new keywords. This covers approximately 1 month.
This search takes about 2 minutes for new keywords. It's mainly meant to catch new posts, older posts can be found through other means.
My examples come up multiple times in the BTC search box, but not via this thread's search?
It might also be because I search unedited posts.
Any thoughts why this post and User wasnt catched today
The post was edited, see
the unedited post.
Unfortunately, I can't know which posts have been edited, so this is a loophole to escape my
badposts list.
I think you should just make another link/html file that contains all cointelegraph spammers.
I did already, see
loyce.club/badposts/advertising.html.
I don't know if it would be a hard tas to migrate or create another file
I can easily add new categories.
Or should I suggest that each spam website should had its own html file
That's too much work to check, I now just quickly check a page once in a while, and report a few posts.
@Loyce can you please remove one of these keywords :
github_com/ProjectEthereumPill
github_com/ProjectEthereumPill/EthereumPill/
Thanks, done:
The following overlapping keywords have been removed:
github
.com/ProjectEthereumPill/EthereumPill
github
.com/pillforethereum/ETHpillAN
(because of github.com/pillforeth)I also noticed the "Banned" notification is only useful for new keywords, because
my banned list is only updated once a day. I ran a one-time update from scratch, searching the last ~800,000 posts, this updates the banned-status on older posts.
Hope I did it right.
HYIP and MLM are too short, the minimum word length is 5 for scams.
@LoyceV did you delete the old info ? The
scam link displays only 22 archived posts.
I do have logs, but it's too many lines to search now. I
think someone must have entered a keyword with many hits, then removed the keyword again. My "badposts" only shows the latest 4000 posts each time I update it, but when a keyword is removed, all those entries are removed too.
I've manually reset it to re-check all keywords in the last ~200k posts. This restored a longer list again.
Is ETC officially a scam? I see it in the blacklist words as a scam.
That is debatable...
However the "scam : ETChash" keyword just helps catch posts like
this one that have malware download links
I think the keyword "PhoenixMiner" is too general and gives out a lot of fake positives...
I made a new toy:
[Newbie scrutiny instead of jail] Every new user's first post: loyce.club/patrol:
See
loyce.club/patrol/Please Report (or Merit) the posts when needed
It's updated once a minute.
Sample: