Title: "Multiple Accounts" / Copy-pasta detection scripts/bots Post by: Initscri on September 18, 2018, 06:31:59 PM Hey all,
I've been planning to write a few scripts relating to BitcoinTalk. It's been on my "developer bucket list" to write something to detect users who have multiple accounts. In order to accomplish this, and have a reliable list, I'd have to determine some logic in order to base this. Content within HR tags will be updated as the thread goes along. I have a few things in mind (and I'll be updating this as the thread goes along - adding new ideas & such): When and if topics are created for the data (which will either be by me, or others): I'll post the links here under the respective categories. For account quality detection: - Looking for # of words, paragraphs, sentences, etc.. gathering the average of each user in order to determine a account quality number. This number can be used in tandem to determine if a report is made on a account (with other scripts). Obviously this isn't enough to report by itself, but usernames w/ low quality could be sent into a spreadsheet of some sort for manual lookup. - [your/others ideas here] For multiple account detection: - Look for same address usage between posts (BTC, ETH, etc) - Look for same account usage between posts (telegram, skype, emails, etc) - [your/others ideas here] For copy-pasta detection: - Write a script to determine copy-pasta from accounts by matching the text of posts to similar text of other sites in order to return a probability percentage of the user copy/pasting (including src for manual analysis). Users w/ percentage points above a certain number will be put into a list & potentially reported to threads/mods. IE: external plagiarism detection - Write a script to determine copy-pasta by matching post content against other users post content. High similarities will raise red flags. IE: internal plagiarism detection *note: suchmoon mentioned that working on something similar, so other scripts may set precedence* - Original script may want to ignore quote tags. However, if the case, depending on how built (if use full text, or word by word) another side-script would have to be built to prevent users from just wrapping their messages in quote tags. - [your/others ideas here] For trust abuse/merit abuse: - Detecting trust abuse (users who send out a large amount of negative trusts, using the same text). This would obviously avoid trusted members (as some good campaign managers send out trusts w/ same text). This is mostly targeted towards members w/ no trust, or negative trust (ie: newer members, no trade history, etc). Results would be posted in a thread in a list format using tildas "~" so people can copy/paste the list of abusers into their trust lists. Allowing the ability for users to request they be removed from this list by public poll within thread (this should probably be handled manually) - [your/others ideas here] General ideas for all scripts - Automatic posting to anti-spam threads w/ results (in such a way as to not create more spam though) - Platform where users which have been reported by scripts can be documented, with automatic ban detection. That way scripts aren't looking into users if they have already been reported/banned. - [your/others ideas here] Results would be posted here for mods to look at (if need be), or just to keep a record of such a connection. I'd also probably link to results in this topic (https://bitcointalk.org/index.php?topic=1926895.0) and maybe load it up on a website of mine. I wanted to post this thread in advance to see if anyone else had any other logic / ideas in mind for these scripts/bots? This will solely be when I have the time to create this (which won't be for a couple of weeks), so I thought I'd post this well in advance. I'll update the above list with approved suggestions that I plan to work on. Thanks! P.S: If any mods/admins aren't ok with me scraping the site, by all means let me know. I'd obviously write the bot/script in such a way that it doesn't slam the server & only send a certain amount of requests per second/minute (more or less like a Google bot). I know other users have written similar bots/scraping tools, so I thought it'd be ok. But if not, just let me know :) Change log: Code: Edit (September 19th, 2018): I'll be updating this thread (see under bolds) with new ideas as this thread progresses. Also, if anyone else wishes to contribute to my scripts (or even build their own one-offs targeting the ideas above), just let me know that you're working on it, and I'll mark it in the thread. While I agree different scripts/algorithms would be harder to avoid/abuse, obviously I'd want all of the scripts to developed in a timely manner, so duplicating work probably isn't a good idea as of this moment. Title: Re: "Multiple Accounts" / Copy-pasta detection scripts/bots Post by: Piggy on September 18, 2018, 07:28:09 PM Good idea, i have been thinking about doing something like that myself but at the moment got busy with other things. Automated checks is the way to go for the spam problems, plagiarism and so on.
A trivial check i was experimenting with, is getting the hash of the messages posted, save it in a dictionary and see if the same hash comes up again. Another simple check which could be done for monitoring activity on threads. Using the global average of posts per thread and calculating then the variance for a thread you should be able to spot spam-spree. The same can be applied to user posting. There are other more complex techniques out there, but better start with something simple at start. Title: Re: "Multiple Accounts" / Copy-pasta detection scripts/bots Post by: Initscri on September 18, 2018, 07:33:56 PM Good idea, i have been thinking about doing something like that myself but at the moment got busy with other things. Automated checks is the way to go for the spam problems, plagiarism and so on. A trivial check i was experimenting with, is getting the hash of the messages posted, save it in a dictionary and see if the same hash comes up again. Another simple check which could be done for monitoring activity on threads. Using the global average of posts per thread and calculating then the variance for a thread you should be able to spot spam-spree. The same can be applied to user posting. There are other more complex techniques out there, but better start with something simple at start. Not bad ideas. I like the idea of monitoring threads for abnormal posting frequencies/amount of posts. OFC these threads would have to be manually checked through (as there may be extenuating circumstances where a thread may require a higher post frequency). TBH, If I do create this, I may just create a repo so others can contribute. My only fear is that others will run the script (which is okay, unless many users run it. I don't want to add unnecessary load to BitcoinTalk servers unintentionally) Title: Re: "Multiple Accounts" / Copy-pasta detection scripts/bots Post by: TheBeardedBaby on September 18, 2018, 07:34:57 PM I will closely follow this project. We've been waiting for such thing for a very long time. Most of the bots are using now word spinner to hide the copy-pasting, it's not easy to detect them but it's not impossible either.
Title: Re: "Multiple Accounts" / Copy-pasta detection scripts/bots Post by: LoyceV on September 18, 2018, 07:35:59 PM P.S: If any mods/admins aren't ok with me scraping the site, by all means let me know. I'd obviously write the bot/script in such a way that it doesn't slam the server & only send a certain amount of requests per second/minute (more or less like a Google bot). I know other users have written similar bots/scraping tools, so I thought it'd be ok. But if not, just let me know :) I've recently started scraping recent (https://bitcointalk.org/index.php?action=recent). My script saves the first unedited version of the post in raw HTML, excluding quotes. Your post for example looks like this:Code: Initscri In compressed format, it takes about 10 MB per day. Instead of scraping the same data again, I could easily send it to you, and a few day's worth of data should be enough for you to start testing. If interested, let me know. You'll be in for a surprise if you start looking for plagiarism! I sometimes sort a day's worth of posts and search for exact duplicates. This typically gives a few dozen posts that are posted a few dozen times. Most of them are spam, many of them are just spammers posting the same useless "proof of authentication" and more crap like that. Detecting the text spinners will be a whole different level! Title: Re: "Multiple Accounts" / Copy-pasta detection scripts/bots Post by: Initscri on September 18, 2018, 07:42:03 PM P.S: If any mods/admins aren't ok with me scraping the site, by all means let me know. I'd obviously write the bot/script in such a way that it doesn't slam the server & only send a certain amount of requests per second/minute (more or less like a Google bot). I know other users have written similar bots/scraping tools, so I thought it'd be ok. But if not, just let me know :) I've recently started scraping recent (https://bitcointalk.org/index.php?action=recent). It saves the first unedited version of the post in raw HTML, excluding quotes. Your post for example looks like this:Code: Initscri In compressed format, it takes about 10 MB per day. Instead of scraping the same data again, I could easily send it to you, and a few day's worth of data should be enough for you to start testing. If interested, let me know. Not a bad idea. I'll take that into account. I probably won't be starting for a little while, but I'll send you a message in a little while if I need it. I like the idea of scraping recent & just grabbing raw HTML to compare. In order to minimize requests but allow multiple filtering scripts to parse the data separately, I'll probably end up scraping recent with 1 bot, caching that for a set time period (sort of like a mirror), and then using multiple other scripts to parse the data on the caching server/mirror. What I might end up doing is creating the server that stores the cache & keeping it closed source. But I'll release the scripts that parse the data / determining abusers as open source. These scripts would connect back to the mirror server/site instead of BitcoinTalk. That way if others wish to volunteer by using some computational power to run those scripts, they can do so and it allows for others to contribute code without slamming BitcoinTalk with a massive amount of requests by testing. I will closely follow this project. We've been waiting for such thing for a very long time. Most of the bots are using now word spinner to hide the copy-pasting, it's not easy to detect them but it's not impossible either. Thanks! I'll try to keep this thread updated as much as I can. Title: Re: "Multiple Accounts" / Copy-pasta detection scripts/bots Post by: mr_smith99 on September 18, 2018, 07:52:04 PM That's a nice idea. And you should run the script for copy-paste eth accounts on the registry forum on the bounty thread. They have a Google sheet with all the ETH addresses
Title: Re: "Multiple Accounts" / Copy-pasta detection scripts/bots Post by: my1st on September 18, 2018, 09:19:15 PM If your script will be catching multi-accounts that do not hesitate to write proof of authentication in the bounty threads one after one with the same error like at screenshot, it will be an excellent trap for them!
https://i.imgur.com/Uu9mRAg.png Title: Re: "Multiple Accounts" / Copy-pasta detection scripts/bots Post by: Initscri on September 18, 2018, 09:48:44 PM If your script will be catching multi-accounts that do not hesitate to write proof of authentication in the bounty threads one after one with the same error like at screenshot, it will be an excellent trap for them! https://i.imgur.com/Uu9mRAg.png I probably wouldn't worry too much about misspellings of "address", considering for Ethereum addresses I would just look for strings starting with 0x (unless I'm wrong on this, I'm more familiar with Bitcoin) and then just gather the entire address until the next space. Not to mention, not all people start off with "Ethereum Address:", some threads may require other formats, so it's better to go off the string itself. Also, and this unrelated to the above quote. I did post the following response on Theymos' announcement the other day: https://bitcointalk.org/index.php?topic=5030366.msg45889515#msg45889515 If merit requirements are posted to above 1 merit, I'll probably introduce a feature into my script looking for random merit sending of whatever the amount may be. Unfortunately, with the merit requirement only being 1, it would be much more difficult to detect abuse of this from a programming perspective. Title: Re: "Multiple Accounts" / Copy-pasta detection scripts/bots Post by: suchmoon on September 19, 2018, 04:07:13 AM Once you get it running to some meaningful extent I would suggest to post the scope you're working on (set of users, threads) in iasenko's thread here:
https://bitcointalk.org/index.php?topic=4720640.0 So that we don't duplicate the effort. I'm experimenting with some NLP techniques for plagiarism detection and the results are promising although scalability is a bit of an issue. Currently working just on comparing Bitcointalk posts (not to outside sources). Perhaps it's better not to publicize too many specific details on how the scripts work - might inadvertently help bot-farmers. I wish there was a section of the forum designated for spam-busting efforts, I believe hilarious has suggested this. Title: Re: "Multiple Accounts" / Copy-pasta detection scripts/bots Post by: Initscri on September 19, 2018, 05:16:25 AM Once you get it running to some meaningful extent I would suggest to post the scope you're working on (set of users, threads) in iasenko's thread here: https://bitcointalk.org/index.php?topic=4720640.0 So that we don't duplicate the effort. I'm experimenting with some NLP techniques for plagiarism detection and the results are promising although scalability is a bit of an issue. Currently working just on comparing Bitcointalk posts (not to outside sources). Perhaps it's better not to publicize too many specific details on how the scripts work - might inadvertently help bot-farmers. I wish there was a section of the forum designated for spam-busting efforts, I believe hilarious has suggested this. Good point, I'll post within that thread once completed. I'm also hoping to hook it into BitcoinTalk & have it automatically update threads, but we'll see. Very much in the planning stage TBH That's the thing, I've been debating closed source vs open source and the perks of both. What I might end up doing is creating a repo for these scripts, but keeping it private (I have an account on Github I can do this with), and then just inviting users who wish to contribute. Might just leave this to "if there's interest", but thanks for the flag on the potentials of abuse if open sourcing it. I didn't clue into that until now. +1 for the forum section for spam busting, it'd be easier to keep lists of reported within. If you're working on plagiarism detection already, I'll probably work on multiple account detection first. Granted, multiple bots running from different developers with different sets of algorithms probably isn't a bad idea (will make it harder for bots to avoid) Title: Re: "Multiple Accounts" / Copy-pasta detection scripts/bots Post by: Jet Cash on September 19, 2018, 08:52:33 AM If it helps you guys to know about declared alts, here are mine.
Talk Merit JetAid Title: Re: "Multiple Accounts" / Copy-pasta detection scripts/bots Post by: suchmoon on September 19, 2018, 01:29:12 PM If you're working on plagiarism detection already, I'll probably work on multiple account detection first. Granted, multiple bots running from different developers with different sets of algorithms probably isn't a bad idea (will make it harder for bots to avoid) I think we can certainly run multiple attacks on plagiarism as long as we coordinate to reduce overlap in which users we've reported etc, e.g. using the thread I mentioned and also https://bpip.org to check for bans. With the little time I have available I'm still probably weeks away from a reasonably usable product and even then it would cover only a relatively small set of potential plagiarism. LoyceV mentioned that forum gets ~50k posts a day - many of which can be ignored or whitelisted but still that's a lot of garbage to sift through. Title: Re: "Multiple Accounts" / Copy-pasta detection scripts/bots Post by: khaled0111 on September 19, 2018, 03:53:58 PM I don't know how it works but I think there is a bot on Steemit "@cheetah" that detect plagiarism, thus developing a similar bot wont be a problem (there are many senior developers in this forum).
It will be great if you succeed to write a script that detects members sendig Merits to each others. I don't think it is going to be hard to code such script but you will need an access to the Merit database. Title: Re: "Multiple Accounts" / Copy-pasta detection scripts/bots Post by: coinlocket$ on September 19, 2018, 03:55:28 PM It will be great if you succeed to write a script that detects members sendig Merits to each others. I don't think it is going to be hard to code such script but you will need an access to the Merit database. We already have several tools for this purpose, you can see one here done by @DdmrDdmr Code: https://public.tableau.com/profile/ddmrddmr#!/vizhome/BitcointalkMeritDashboard/GlobalSummary Title: Re: "Multiple Accounts" / Copy-pasta detection scripts/bots Post by: manfredmann on September 19, 2018, 04:10:15 PM We already have several tools for this purpose, you can see one here done by @DdmrDdmr This forum has full of enthusiast people working together shaping up for the betterment of this forum. I do believe that it could be achieve with the help from other members collaborating with each other. Thus, collaboration will help and get the job done easier. If i only have this kind of expertise then definitely I am more than willing to help you guys. Sad to say I am just only following and taking down important details for the future implmentation and update with this forum. GO! GO! GO!Code: https://public.tableau.com/profile/ddmrddmr#!/vizhome/BitcointalkMeritDashboard/GlobalSummary Title: Re: "Multiple Accounts" / Copy-pasta detection scripts/bots Post by: qwk on September 19, 2018, 04:27:46 PM Detecting the text spinners will be a whole different level! I guess a quick and dirty approach could be something like this:1. take samples of all occurrences of 4 consecutive words 2. create their md5 (or whatever you prefer) hashes 3. store those hashes in a database 4. count number of hash collisions with other posts So, a simple text like: The quick brown fox jumps over the lazy dog would result in 6 individual hashes: The quick brown fox quick brown fox jumps brown fox jumps over fox jumps over the jumps over the lazy over the lazy dog Tinker a little with the number of words and the threshold for detection of duplicates, and you're probably almost there for a large share of the copy-pasta spam. Title: Re: "Multiple Accounts" / Copy-pasta detection scripts/bots Post by: Initscri on September 19, 2018, 04:32:11 PM I don't know how it works but I think there is a bot on Steemit "@cheetah" that detect plagiarism, thus developing a similar bot wont be a problem (there are many senior developers in this forum). It will be great if you succeed to write a script that detects members sendig Merits to each others. I don't think it is going to be hard to code such script but you will need an access to the Merit database. There's plenty of paid APIs to support plagiarism detection externally, so if I was lazy and rich I'd use those lol. Although, I'm uncertain of their reliability. But realistically, external plagiarism detection isn't super difficult; although it may be more difficult than internal detection. I won't go too far into details (hashing, storage methods, etc), but essentially you're taking the copy of the text (or portions of it) & matching it against search engine results / meta descriptions. I'm sure there's plenty of other methods as well. The difficulty will be to find sources to match against (unsure if scraping Google will be permitted, we'll see). Point is though: if 3 different developers develop it 3 different ways (using different sources) it will be far more difficult for bots/spammers to reverse engineer/abuse. If you're working on plagiarism detection already, I'll probably work on multiple account detection first. Granted, multiple bots running from different developers with different sets of algorithms probably isn't a bad idea (will make it harder for bots to avoid) I think we can certainly run multiple attacks on plagiarism as long as we coordinate to reduce overlap in which users we've reported etc, e.g. using the thread I mentioned and also https://bpip.org to check for bans. With the little time I have available I'm still probably weeks away from a reasonably usable product and even then it would cover only a relatively small set of potential plagiarism. LoyceV mentioned that forum gets ~50k posts a day - many of which can be ignored or whitelisted but still that's a lot of garbage to sift through. Maybe we can create some sort of central location for defining which users have been reported by bots. If I have time, maybe I'll create something web-based, and just give out API keys to users who can prove they have an operating script. Would just sort of be a web-based platform to set which users are reported by scripts/bots, and then it would track if those users actually have a ban through the use of BPIP (If Vod permits) Dumping the info into a thread probably isn't ideal, but worst comes to worst we can rely on that until a more advanced system is produced. If it helps you guys to know about declared alts, here are mine. Talk Merit JetAid Thanks Jet Cash, if I do implement an alt detection system, I'd make the reporting of users more manual than automated. I'm sure there's many users (such as yourself) who have alts for various reasons and aren't being nefarious and don't deserve a report. If anyone has any further ideas for methods, keep em comin' :) Title: Re: "Multiple Accounts" / Copy-pasta detection scripts/bots Post by: suchmoon on September 19, 2018, 05:01:24 PM Tinker a little with the number of words and the threshold for detection of duplicates, and you're probably almost there for a large share of the copy-pasta spam. I experimented with n-grams a little bit and couldn't find a good value. Low n yields too many false positives, high n doesn't detect spinners, etc. So I'm using a mixture of algorithms and base the decision on the pattern of the results of those algorithms - e.g. if the similarity of two texts using algorithm A is 70%, then union/intersect/otherwise manipulate the texts, run algorithm B, if it scores 90% then run algorithm C to eliminate false positives - made up numbers but you get the idea. Works ok-ish, but as I mentioned it doesn't scale well and I need to do more testing on larger samples. The difficulty will be to find sources to match against (unsure if scraping Google will be permitted, we'll see). Google has a search API. Not sure if there is a free tier though. Title: Re: "Multiple Accounts" / Copy-pasta detection scripts/bots Post by: LoyceV on September 19, 2018, 05:12:52 PM Tinker a little with the number of words and the threshold for detection of duplicates, and you're probably almost there for a large share of the copy-pasta spam. I'm more worried about the very high number of positive results. Let me play around a bit with yesterday's data, from post 45850092 (https://bitcointalk.org/index.php?topic=3109781.msg45850092#msg45850092) up to post 45893434 (https://bitcointalk.org/index.php?topic=5032478.msg45893434#msg45893434). My scraper caught 43184 out of 43343 posts (it misses some burst posts). This is after the new Merit requirements, so there's less spam (https://bitcointalk.org/index.php?topic=5032314.0) already.I'll show the 50 most used posts (raw HTML excluding quotes; the number at the start of each line shows how often they appear). Those posts are exactly the same each time they were posted: Code: 288 (post was empty or only a quote) This doesn't really catch plagiarism, but it catches spam. When you're looking for word phrases to detect plagiarism, you're likely to get even more hits than this. The second entry came from Cidonar (https://bitcointalk.org/index.php?action=profile;u=2146397), who bumped this thread (https://bitcointalk.org/index.php?topic=5012439.msg45243118#msg45243118) 162 times. That board shouldn't allow deleting posts within 24 hours, but it does. The user isn't banned, as he deleted the evidence. The third entry ("Proof of Authentication") came from many different users in this thread (https://bitcointalk.org/index.php?topic=4787345.msg45886294#msg45886294). I've just reported a few asking to check the thread. The sixth entry ("microguy talks to himself") came from BitCoin ranger (https://bitcointalk.org/index.php?action=profile;u=2070881), who had 24 posts deleted (https://bpip.org/profile.aspx?p=bitcoin%20ranger) by moderators. Manually going through this list is a lot of work, while there aren't many posts to report. It's not very effective to do. Title: Re: "Multiple Accounts" / Copy-pasta detection scripts/bots Post by: Initscri on September 19, 2018, 06:25:23 PM The difficulty will be to find sources to match against (unsure if scraping Google will be permitted, we'll see). Google has a search API. Not sure if there is a free tier though. Considering their pricing change on maps, I'm going to assume not. I'll look into it though, thanks :) @LoyceV: I'll make a note that if comparing messages for plagiarism, we should probably be ignoring ["quote"] tags within our scripts. I know it would probably make plagiarism detection more reliable, I'd just have to write a side-script to prevent users from just wrapping their messages in ["quote"] tags. Title: Re: "Multiple Accounts" / Copy-pasta detection scripts/bots Post by: LoyceV on September 19, 2018, 06:55:50 PM I'd just have to write a side-script to prevent users from just wrapping their messages in ["quote"] tags. Quoted text isn't counted for payment for signature spammers*, so they're unlikely to hide their plagiarism that way.* Assuming the campaign has a campaign manager that does at least some of his job. Title: Re: "Multiple Accounts" / Copy-pasta detection scripts/bots Post by: Piggy on September 19, 2018, 07:27:54 PM Few thoughts about the spinned texts:
If the spinned text is not using synonymous it may help before to run any check to prepare the data, for example reorder all the word of the sentence in alphabetical order. If it is using synonymous is becoming quite complicated, you need to be able to identify two different words are actually the same. Peraphs as you read the sentences you should substitute words with a code wich corresponds to a subset of synonymous then use these cleaned sentences to run the checks. Maybe there are dictionaries ready for this sort of things. In any case comparing 1 message with all the previous message running perhaps multiple check can be quite expensive to perform. Title: Re: "Multiple Accounts" / Copy-pasta detection scripts/bots Post by: prashanta on September 19, 2018, 07:47:56 PM there are a lot of users who use actively bounty work & majority percent bounty post almost like same. Not only that script will detect similarity percentage so bounty post approximately 60/70% similar to each others. So how could it detected this script.
Title: Re: "Multiple Accounts" / Copy-pasta detection scripts/bots Post by: suchmoon on September 19, 2018, 08:59:53 PM If it is using synonymous is becoming quite complicated, you need to be able to identify two different words are actually the same. Peraphs as you read the sentences you should substitute words with a code wich corresponds to a subset of synonymous then use these cleaned sentences to run the checks. Maybe there are dictionaries ready for this sort of things. In any case comparing 1 message with all the previous message running perhaps multiple check can be quite expensive to perform. There are dictionaries and other methods to deal with synonyms but they don't work well for crypto-themed texts without a serious ML effort. Worse yet, Bitcointalk text spinning bots don't really care much if the text makes sense so they'll replace "cryptocurrency" with "financial encoding" or some bullshit like that. Semantic comparison seemed quite useless to me so far in this context though I'm not an expert by any means - just learning as I go. there are a lot of users who use actively bounty work & majority percent bounty post almost like same. Not only that script will detect similarity percentage so bounty post approximately 60/70% similar to each others. So how could it detected this script. Good idea, I think we should report the whole bounty board as plagiarism :) Title: Re: "Multiple Accounts" / Copy-pasta detection scripts/bots Post by: Initscri on September 19, 2018, 09:53:12 PM Good idea, I think we should report the whole bounty board as plagiarism :) Code: foreach($forum_categories as $category_name => $category_values) { Well that's done ;) (was gonna write it in python, but I've been coding with PHP all day so) there are a lot of users who use actively bounty work & majority percent bounty post almost like same. Not only that script will detect similarity percentage so bounty post approximately 60/70% similar to each others. So how could it detected this script. TBH, that's kind of the point. We'd have to determine a percentage of similarity that we agree is "report-worthy"; but I wouldn't be surprised if these scripts report a large amount of bounty users. I'd just have to write a side-script to prevent users from just wrapping their messages in ["quote"] tags. Quoted text isn't counted for payment for signature spammers, so they're unlikely to hide their plagiarism that way.I guess, is that a standardized thing among all campaign managers though? I'm guessing eventually it would become rather obvious to them though. Definitely not a top priority script if required. Title: Re: "Multiple Accounts" / Copy-pasta detection scripts/bots Post by: Getadaaa on September 20, 2018, 09:35:11 AM I'd just have to write a side-script to prevent users from just wrapping their messages in ["quote"] tags. Quoted text isn't counted for payment for signature spammers, so they're unlikely to hide their plagiarism that way.Never knew that. But I can’t see why some one would copy-paste. Good to know it’s checked. At first glance BTT looks complicated for me. But I now understand the complicated for a lot of reasons. Title: Re: "Multiple Accounts" / Copy-pasta detection scripts/bots Post by: Initscri on September 20, 2018, 02:34:57 PM I'd just have to write a side-script to prevent users from just wrapping their messages in ["quote"] tags. Quoted text isn't counted for payment for signature spammers, so they're unlikely to hide their plagiarism that way.Never knew that. But I can’t see why some one would copy-paste. Good to know it’s checked. At first glance BTT looks complicated for me. But I now understand the complicated for a lot of reasons. Copy/pasting is rampant on this forum. For bounty sig scammers it's the easiest way to get a high post count in a quick amount of time while looking like you're spending the time to write out a post. Because most campaign managers have to manage many participants, plagiarism (copy/pasting) can get overlooked. TBH, by building these scripts, campaign managers should have an easier time (in theory). Update: added ideas sent from a user in PM: account quality detection Update 2: adding idea for detecting trust abuse Title: Re: "Multiple Accounts" / Copy-pasta detection scripts/bots Post by: Piggy on September 22, 2018, 08:57:14 PM Another thought i had about plagiarism.
As far as i can see the main goal of faking content is just obtaining merits, wouldn't save a lot of time and resources to just check directly messages which receive merits, like every friday. If from those you even remove messages from higher ranks, which are unlikely to risk the account, this reduces the total number to be checked to a very small fraction. More manageable and the possibility to run deeper and lengthy methods to verify the content is authentic. Title: Re: "Multiple Accounts" / Copy-pasta detection scripts/bots Post by: Initscri on September 23, 2018, 06:03:51 AM Another thought i had about plagiarism. As far as i can see the main goal of faking content is just obtaining merits, wouldn't save a lot of time and resources to just check directly messages which receive merits, like every friday. If from those you even remove messages from higher ranks, which are unlikely to risk the account, this reduces the total number to be checked to a very small fraction. More manageable and the possibility to run deeper and lengthy methods to verify the content is authentic. So like contrasting the # of merits against the quality of post, generating a list of users to be looked into? Excluding merits sent/received by HQ members? Let me know if I got that right. It's not a bad idea. Would still require some manual labor & not be completely automated. I'll throw it on the list if it's cool with you? Title: Re: "Multiple Accounts" / Copy-pasta detection scripts/bots Post by: Piggy on September 23, 2018, 06:57:45 AM Not quite like that but, that sound usefull too in giving some more insight on the merited post.
I meant to run different kind of automated check on merited comments wrote from low rank members since as we can see, may have higher chances to have been faked. So running few different kind of automated techniques, (even if they take several seconds per post) on few thousand messages is going to be easier than do the same on every single unmerited post. Title: Re: "Multiple Accounts" / Copy-pasta detection scripts/bots Post by: Initscri on September 25, 2018, 06:37:52 PM Not quite like that but, that sound usefull too in giving some more insight on the merited post. I meant to run different kind of automated check on merited comments wrote from low rank members since as we can see, may have higher chances to have been faked. So running few different kind of automated techniques, (even if they take several seconds per post) on few thousand messages is going to be easier than do the same on every single unmerited post. So more or less targeting "newbie/"/low ranking members w/ 1 merit+. We could add a script where it does the above, and then looks at top senders of merit to newbies for abuse. Although there may be legitimate senders within that list. |