"Multiple Accounts" / Copy-pasta detection scripts/bots

Initscri (OP)

Hero Member

Offline

Activity: 1540
Merit: 759

Re: "Multiple Accounts" / Copy-pasta detection scripts/bots

September 19, 2018, 06:25:23 PM

#21

Quote from: suchmoon on September 19, 2018, 05:01:24 PM

Quote from: Initscri on September 19, 2018, 04:32:11 PM

The difficulty will be to find sources to match against (unsure if scraping Google will be permitted, we'll see).

Google has a search API. Not sure if there is a free tier though.

Considering their pricing change on maps, I'm going to assume not. I'll look into it though, thanks

@LoyceV: I'll make a note that if comparing messages for plagiarism, we should probably be ignoring ["quote"] tags within our scripts. I know it would probably make plagiarism detection more reliable, I'd just have to write a side-script to prevent users from just wrapping their messages in ["quote"] tags.

----------------------------------
Web Developer. PM for details.
----------------------------------

LoyceV

Legendary

Online

Activity: 3304
Merit: 16609

Thick-Skinned Gang Leader and Golden Feather 2021

Re: "Multiple Accounts" / Copy-pasta detection scripts/bots

September 19, 2018, 06:55:50 PM
Last edit: September 20, 2018, 09:44:40 AM by LoyceV

#22

Quote from: Initscri on September 19, 2018, 06:25:23 PM

I'd just have to write a side-script to prevent users from just wrapping their messages in ["quote"] tags.

Quoted text isn't counted for payment for signature spammers^*, so they're unlikely to hide their plagiarism that way.

^* Assuming the campaign has a campaign manager that does at least some of his job.

LoyceV's Signature for rent

Piggy

Hero Member

Offline

Activity: 784
Merit: 1416

Re: "Multiple Accounts" / Copy-pasta detection scripts/bots

September 19, 2018, 07:27:54 PM

Merited by LoyceV (1)

#23

Few thoughts about the spinned texts:

If the spinned text is not using synonymous it may help before to run any check to prepare the data, for example reorder all the word of the sentence in alphabetical order.

If it is using synonymous is becoming quite complicated, you need to be able to identify two different words are actually the same. Peraphs as you read the sentences you should substitute words with a code wich corresponds to a subset of synonymous then use these cleaned sentences to run the checks.
Maybe there are dictionaries ready for this sort of things. In any case comparing 1 message with all the previous message running perhaps multiple check can be quite expensive to perform.

prashanta

Copper Member
Sr. Member

Offline

Activity: 728
Merit: 250

Re: "Multiple Accounts" / Copy-pasta detection scripts/bots

September 19, 2018, 07:47:56 PM

#24

there are a lot of users who use actively bounty work & majority percent bounty post almost like same. Not only that script will detect similarity percentage so bounty post approximately 60/70% similar to each others. So how could it detected this script.

suchmoon

Legendary

Offline

Activity: 3654
Merit: 8922

https://bpip.org

Re: "Multiple Accounts" / Copy-pasta detection scripts/bots

September 19, 2018, 08:59:53 PM

#25

Quote from: Piggy on September 19, 2018, 07:27:54 PM

If it is using synonymous is becoming quite complicated, you need to be able to identify two different words are actually the same. Peraphs as you read the sentences you should substitute words with a code wich corresponds to a subset of synonymous then use these cleaned sentences to run the checks.
Maybe there are dictionaries ready for this sort of things. In any case comparing 1 message with all the previous message running perhaps multiple check can be quite expensive to perform.

There are dictionaries and other methods to deal with synonyms but they don't work well for crypto-themed texts without a serious ML effort. Worse yet, Bitcointalk text spinning bots don't really care much if the text makes sense so they'll replace "cryptocurrency" with "financial encoding" or some bullshit like that. Semantic comparison seemed quite useless to me so far in this context though I'm not an expert by any means - just learning as I go.

Quote from: prashanta on September 19, 2018, 07:47:56 PM

there are a lot of users who use actively bounty work & majority percent bounty post almost like same. Not only that script will detect similarity percentage so bounty post approximately 60/70% similar to each others. So how could it detected this script.

Good idea, I think we should report the whole bounty board as plagiarism

Initscri (OP)

Hero Member

Offline

Activity: 1540
Merit: 759

Re: "Multiple Accounts" / Copy-pasta detection scripts/bots

September 19, 2018, 09:53:12 PM

#26

Quote from: suchmoon on September 19, 2018, 08:59:53 PM

Good idea, I think we should report the whole bounty board as plagiarism

Code:

foreach($forum_categories as $category_name => $category_values) {
	if($category_name == 'Bounties (Altcoins)') {
		foreach($category_values['posts'] as $post_id => $post_content) {
			BitcoinTalkAPI::report($post_id);
		}
	}
}

Well that's done Wink

_{(was gonna write it in python, but I've been coding with PHP all day so)}

Quote from: prashanta on September 19, 2018, 07:47:56 PM

there are a lot of users who use actively bounty work & majority percent bounty post almost like same. Not only that script will detect similarity percentage so bounty post approximately 60/70% similar to each others. So how could it detected this script.

TBH, that's kind of the point. We'd have to determine a percentage of similarity that we agree is "report-worthy"; but I wouldn't be surprised if these scripts report a large amount of bounty users.

Quote from: LoyceV on September 19, 2018, 06:55:50 PM

Quote from: Initscri on September 19, 2018, 06:25:23 PM

I'd just have to write a side-script to prevent users from just wrapping their messages in ["quote"] tags.

Quoted text isn't counted for payment for signature spammers, so they're unlikely to hide their plagiarism that way.

I guess, is that a standardized thing among all campaign managers though? I'm guessing eventually it would become rather obvious to them though. Definitely not a top priority script if required.

----------------------------------
Web Developer. PM for details.
----------------------------------

Getadaaa

Jr. Member

Offline

Activity: 448
Merit: 3

Re: "Multiple Accounts" / Copy-pasta detection scripts/bots

September 20, 2018, 09:35:11 AM

#27

Quote from: LoyceV on September 19, 2018, 06:55:50 PM

Quote from: Initscri on September 19, 2018, 06:25:23 PM

I'd just have to write a side-script to prevent users from just wrapping their messages in ["quote"] tags.

Quoted text isn't counted for payment for signature spammers, so they're unlikely to hide their plagiarism that way.

Never knew that. But I can’t see why some one would copy-paste. Good to know it’s checked. At first glance BTT looks complicated for me. But I now understand the complicated for a lot of reasons.

BEST TIME AND PLACE TO EXIST

Initscri (OP)

Hero Member

Offline

Activity: 1540
Merit: 759

Re: "Multiple Accounts" / Copy-pasta detection scripts/bots

September 20, 2018, 02:34:57 PM
Last edit: September 20, 2018, 06:26:25 PM by Initscri

#28

Quote from: Getadaaa on September 20, 2018, 09:35:11 AM

Quote from: LoyceV on September 19, 2018, 06:55:50 PM

Quote from: Initscri on September 19, 2018, 06:25:23 PM

I'd just have to write a side-script to prevent users from just wrapping their messages in ["quote"] tags.

Quoted text isn't counted for payment for signature spammers, so they're unlikely to hide their plagiarism that way.

Never knew that. But I can’t see why some one would copy-paste. Good to know it’s checked. At first glance BTT looks complicated for me. But I now understand the complicated for a lot of reasons.

Copy/pasting is rampant on this forum. For bounty sig scammers it's the easiest way to get a high post count in a quick amount of time while looking like you're spending the time to write out a post.

Because most campaign managers have to manage many participants, plagiarism (copy/pasting) can get overlooked.

TBH, by building these scripts, campaign managers should have an easier time (in theory).

Update: added ideas sent from a user in PM: account quality detection

Update 2: adding idea for detecting trust abuse

----------------------------------
Web Developer. PM for details.
----------------------------------

Piggy

Hero Member

Offline

Activity: 784
Merit: 1416

Re: "Multiple Accounts" / Copy-pasta detection scripts/bots

September 22, 2018, 08:57:14 PM

#29

Another thought i had about plagiarism.

As far as i can see the main goal of faking content is just obtaining merits, wouldn't save a lot of time and resources to just check directly messages which receive merits, like every friday.
If from those you even remove messages from higher ranks, which are unlikely to risk the account, this reduces the total number to be checked to a very small fraction.
More manageable and the possibility to run deeper and lengthy methods to verify the content is authentic.

Initscri (OP)

Hero Member

Offline

Activity: 1540
Merit: 759

Re: "Multiple Accounts" / Copy-pasta detection scripts/bots

September 23, 2018, 06:03:51 AM

#30

Quote from: Piggy on September 22, 2018, 08:57:14 PM

Another thought i had about plagiarism.

As far as i can see the main goal of faking content is just obtaining merits, wouldn't save a lot of time and resources to just check directly messages which receive merits, like every friday.
If from those you even remove messages from higher ranks, which are unlikely to risk the account, this reduces the total number to be checked to a very small fraction.
More manageable and the possibility to run deeper and lengthy methods to verify the content is authentic.

So like contrasting the # of merits against the quality of post, generating a list of users to be looked into? Excluding merits sent/received by HQ members?

Let me know if I got that right.

It's not a bad idea. Would still require some manual labor & not be completely automated. I'll throw it on the list if it's cool with you?

----------------------------------
Web Developer. PM for details.
----------------------------------

Piggy

Hero Member

Offline

Activity: 784
Merit: 1416

Re: "Multiple Accounts" / Copy-pasta detection scripts/bots

September 23, 2018, 06:57:45 AM

#31

Not quite like that but, that sound usefull too in giving some more insight on the merited post.

I meant to run different kind of automated check on merited comments wrote from low rank members since as we can see, may have higher chances to have been faked.
So running few different kind of automated techniques, (even if they take several seconds per post) on few thousand messages is going to be easier than do the same on every single unmerited post.

Initscri (OP)

Hero Member

Offline

Activity: 1540
Merit: 759

Re: "Multiple Accounts" / Copy-pasta detection scripts/bots

September 25, 2018, 06:37:52 PM

#32

Quote from: Piggy on September 23, 2018, 06:57:45 AM

Not quite like that but, that sound usefull too in giving some more insight on the merited post.

I meant to run different kind of automated check on merited comments wrote from low rank members since as we can see, may have higher chances to have been faked.
So running few different kind of automated techniques, (even if they take several seconds per post) on few thousand messages is going to be easier than do the same on every single unmerited post.

So more or less targeting "newbie/"/low ranking members w/ 1 merit+.

We could add a script where it does the above, and then looks at top senders of merit to newbies for abuse. Although there may be legitimate senders within that list.

----------------------------------
Web Developer. PM for details.
----------------------------------