Bitcoin Forum
May 06, 2024, 07:53:11 PM *
News: Latest Bitcoin Core release: 27.0 [Torrent]
 
   Home   Help Search Login Register More  
Pages: « 1 [2]  All
  Print  
Author Topic: "Multiple Accounts" / Copy-pasta detection scripts/bots  (Read 876 times)
Initscri (OP)
Hero Member
*****
Offline Offline

Activity: 1540
Merit: 759


View Profile WWW
September 19, 2018, 06:25:23 PM
 #21

The difficulty will be to find sources to match against (unsure if scraping Google will be permitted, we'll see).

Google has a search API. Not sure if there is a free tier though.

Considering their pricing change on maps, I'm going to assume not. I'll look into it though, thanks Smiley

@LoyceV: I'll make a note that if comparing messages for plagiarism, we should probably be ignoring ["quote"] tags within our scripts. I know it would probably make plagiarism detection more reliable, I'd just have to write a side-script to prevent users from just wrapping their messages in ["quote"] tags.

----------------------------------
Web Developer. PM for details.
----------------------------------
1715025191
Hero Member
*
Offline Offline

Posts: 1715025191

View Profile Personal Message (Offline)

Ignore
1715025191
Reply with quote  #2

1715025191
Report to moderator
1715025191
Hero Member
*
Offline Offline

Posts: 1715025191

View Profile Personal Message (Offline)

Ignore
1715025191
Reply with quote  #2

1715025191
Report to moderator
1715025191
Hero Member
*
Offline Offline

Posts: 1715025191

View Profile Personal Message (Offline)

Ignore
1715025191
Reply with quote  #2

1715025191
Report to moderator
"Bitcoin: mining our own business since 2009" -- Pieter Wuille
Advertised sites are not endorsed by the Bitcoin Forum. They may be unsafe, untrustworthy, or illegal in your jurisdiction.
LoyceV
Legendary
*
Online Online

Activity: 3304
Merit: 16609


Thick-Skinned Gang Leader and Golden Feather 2021


View Profile WWW
September 19, 2018, 06:55:50 PM
Last edit: September 20, 2018, 09:44:40 AM by LoyceV
 #22

I'd just have to write a side-script to prevent users from just wrapping their messages in ["quote"] tags.
Quoted text isn't counted for payment for signature spammers*, so they're unlikely to hide their plagiarism that way.

* Assuming the campaign has a campaign manager that does at least some of his job.

Piggy
Hero Member
*****
Offline Offline

Activity: 784
Merit: 1416



View Profile WWW
September 19, 2018, 07:27:54 PM
Merited by LoyceV (1)
 #23

Few thoughts about the spinned texts:

If the spinned text is not using synonymous it may help before to run any check to prepare the data, for example reorder all the word of the sentence in alphabetical order.

If it is using synonymous is becoming quite complicated, you need to be able to identify two different words are actually the same. Peraphs as you read the sentences you should substitute words with a code wich corresponds to a subset of synonymous then use these cleaned sentences to run the checks.
Maybe there are dictionaries ready for this sort of things. In any case comparing 1 message with all the previous message running perhaps multiple check can be quite expensive to perform.
prashanta
Copper Member
Sr. Member
****
Offline Offline

Activity: 728
Merit: 250



View Profile
September 19, 2018, 07:47:56 PM
 #24

there are a lot of users who use actively bounty work & majority percent bounty post almost like same. Not only that script will detect similarity percentage so bounty post approximately 60/70% similar to each others. So how could it detected this script.
suchmoon
Legendary
*
Offline Offline

Activity: 3654
Merit: 8922


https://bpip.org


View Profile WWW
September 19, 2018, 08:59:53 PM
 #25

If it is using synonymous is becoming quite complicated, you need to be able to identify two different words are actually the same. Peraphs as you read the sentences you should substitute words with a code wich corresponds to a subset of synonymous then use these cleaned sentences to run the checks.
Maybe there are dictionaries ready for this sort of things. In any case comparing 1 message with all the previous message running perhaps multiple check can be quite expensive to perform.

There are dictionaries and other methods to deal with synonyms but they don't work well for crypto-themed texts without a serious ML effort. Worse yet, Bitcointalk text spinning bots don't really care much if the text makes sense so they'll replace "cryptocurrency" with "financial encoding" or some bullshit like that. Semantic comparison seemed quite useless to me so far in this context though I'm not an expert by any means - just learning as I go.

there are a lot of users who use actively bounty work & majority percent bounty post almost like same. Not only that script will detect similarity percentage so bounty post approximately 60/70% similar to each others. So how could it detected this script.

Good idea, I think we should report the whole bounty board as plagiarism Smiley
Initscri (OP)
Hero Member
*****
Offline Offline

Activity: 1540
Merit: 759


View Profile WWW
September 19, 2018, 09:53:12 PM
 #26

Good idea, I think we should report the whole bounty board as plagiarism Smiley

Code:
foreach($forum_categories as $category_name => $category_values) {
if($category_name == 'Bounties (Altcoins)') {
foreach($category_values['posts'] as $post_id => $post_content) {
BitcoinTalkAPI::report($post_id);
}
}
}

Well that's done Wink (was gonna write it in python, but I've been coding with PHP all day so)

there are a lot of users who use actively bounty work & majority percent bounty post almost like same. Not only that script will detect similarity percentage so bounty post approximately 60/70% similar to each others. So how could it detected this script.

TBH, that's kind of the point. We'd have to determine a percentage of similarity that we agree is "report-worthy"; but I wouldn't be surprised if these scripts report a large amount of bounty users.

I'd just have to write a side-script to prevent users from just wrapping their messages in ["quote"] tags.
Quoted text isn't counted for payment for signature spammers, so they're unlikely to hide their plagiarism that way.

I guess, is that a standardized thing among all campaign managers though? I'm guessing eventually it would become rather obvious to them though. Definitely not a top priority script if required.

----------------------------------
Web Developer. PM for details.
----------------------------------
Getadaaa
Jr. Member
*
Offline Offline

Activity: 448
Merit: 3


View Profile WWW
September 20, 2018, 09:35:11 AM
 #27

I'd just have to write a side-script to prevent users from just wrapping their messages in ["quote"] tags.
Quoted text isn't counted for payment for signature spammers, so they're unlikely to hide their plagiarism that way.

Never knew that. But I can’t see why some one would copy-paste. Good to know it’s checked. At first glance BTT looks complicated for me. But I now understand the complicated for a lot of reasons.

BEST TIME AND PLACE TO EXIST
Initscri (OP)
Hero Member
*****
Offline Offline

Activity: 1540
Merit: 759


View Profile WWW
September 20, 2018, 02:34:57 PM
Last edit: September 20, 2018, 06:26:25 PM by Initscri
 #28

I'd just have to write a side-script to prevent users from just wrapping their messages in ["quote"] tags.
Quoted text isn't counted for payment for signature spammers, so they're unlikely to hide their plagiarism that way.

Never knew that. But I can’t see why some one would copy-paste. Good to know it’s checked. At first glance BTT looks complicated for me. But I now understand the complicated for a lot of reasons.

Copy/pasting is rampant on this forum. For bounty sig scammers it's the easiest way to get a high post count in a quick amount of time while looking like you're spending the time to write out a post.

Because most campaign managers have to manage many participants, plagiarism (copy/pasting) can get overlooked.

TBH, by building these scripts, campaign managers should have an easier time (in theory).

Update: added ideas sent from a user in PM: account quality detection

Update 2: adding idea for detecting trust abuse

----------------------------------
Web Developer. PM for details.
----------------------------------
Piggy
Hero Member
*****
Offline Offline

Activity: 784
Merit: 1416



View Profile WWW
September 22, 2018, 08:57:14 PM
 #29

Another thought i had about plagiarism.

As far as i can see the main goal of faking content is just obtaining merits, wouldn't save a lot of time and resources to just check directly messages which receive merits, like every friday.
If from those you even remove messages from higher ranks, which are unlikely to risk the account, this reduces the total number to be checked to a very small fraction.
More manageable and the possibility to run deeper and lengthy methods to verify the content is authentic.
Initscri (OP)
Hero Member
*****
Offline Offline

Activity: 1540
Merit: 759


View Profile WWW
September 23, 2018, 06:03:51 AM
 #30

Another thought i had about plagiarism.

As far as i can see the main goal of faking content is just obtaining merits, wouldn't save a lot of time and resources to just check directly messages which receive merits, like every friday.
If from those you even remove messages from higher ranks, which are unlikely to risk the account, this reduces the total number to be checked to a very small fraction.
More manageable and the possibility to run deeper and lengthy methods to verify the content is authentic.

So like contrasting the # of merits against the quality of post, generating a list of users to be looked into? Excluding merits sent/received by HQ members?

Let me know if I got that right.

It's not a bad idea. Would still require some manual labor & not be completely automated. I'll throw it on the list if it's cool with you?

----------------------------------
Web Developer. PM for details.
----------------------------------
Piggy
Hero Member
*****
Offline Offline

Activity: 784
Merit: 1416



View Profile WWW
September 23, 2018, 06:57:45 AM
 #31

Not quite like that but, that sound usefull too in giving some more insight on the merited post.

I meant to run different kind of automated check on merited comments wrote from low rank members since as we can see, may have higher chances to have been faked.
So running few different kind of automated techniques, (even if they take several seconds per post) on few thousand messages is going to be easier than do the same on every single unmerited post.
Initscri (OP)
Hero Member
*****
Offline Offline

Activity: 1540
Merit: 759


View Profile WWW
September 25, 2018, 06:37:52 PM
 #32

Not quite like that but, that sound usefull too in giving some more insight on the merited post.

I meant to run different kind of automated check on merited comments wrote from low rank members since as we can see, may have higher chances to have been faked.
So running few different kind of automated techniques, (even if they take several seconds per post) on few thousand messages is going to be easier than do the same on every single unmerited post.

So more or less targeting "newbie/"/low ranking members w/ 1 merit+.

We could add a script where it does the above, and then looks at top senders of merit to newbies for abuse. Although there may be legitimate senders within that list.

----------------------------------
Web Developer. PM for details.
----------------------------------
Pages: « 1 [2]  All
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!