helloworld (OP)
|
|
October 29, 2012, 11:49:38 PM |
|
I'm going to be helping someone set up a forum, but we haven't decided on what software yet. Anyway, it got me curious about how sockpuppets and alt accounts can be detected, with varying degrees of accuracy. What ingeniously-complicated algorithms already exist to do this? And what could be coded better? Some obvious metrics to consider: - IP address
- Language
- Grammar
- Vocabulary
- Profile preferences, eg. Timezone
- Regular login and post days/times
- Use of smileys or images
What else? Is software to do this built in to the major forum scripts or is this kind of thing studied separately by mods?
|
|
|
|
MysteryMiner
Legendary
Offline
Activity: 1512
Merit: 1049
Death to enemies!
|
|
October 30, 2012, 12:08:16 AM |
|
I guess many accounts will be stockpuppetdetected with total strangers. With more registered and active users there will be more false positives.
|
bc1q59y5jp2rrwgxuekc8kjk6s8k2es73uawprre4j
|
|
|
helloworld (OP)
|
|
October 30, 2012, 12:44:54 AM |
|
I guess many accounts will be stockpuppetdetected with total strangers. With more registered and active users there will be more false positives.
Agreed, but if it were scored based, an automated system could at least rank users from 1 (very unlikely to be an alt) to 100 (highly likely to be an alt) and then a human could keep an eye on the higher-scoring accounts. I'm sure that absolute certainty would be impossible, however I'm still interested in what metrics could be used to come up with such a score. And the pursuit of accuracy through refining the algorithms is rather intriguing to me.
|
|
|
|
P_Shep
Legendary
Offline
Activity: 1795
Merit: 1208
This is not OK.
|
|
October 30, 2012, 12:53:55 AM |
|
Language is pretty easy.
Form a histograms of: Syllables per word Words per sentence Sentences per paragraph. Vocabulary.
|
|
|
|
paraipan
In memoriam
Legendary
Offline
Activity: 924
Merit: 1004
Firstbits: 1pirata
|
|
October 30, 2012, 12:57:35 AM |
|
Language is pretty easy.
Form a histograms of: Syllables per word Words per sentence Sentences per paragraph. Vocabulary.
And use of CR's in every post too
|
BTCitcoin: An Idea Worth Saving - Q&A with bitcoins on rugatu.com - Check my rep
|
|
|
helloworld (OP)
|
|
October 30, 2012, 01:01:04 AM |
|
Language is pretty easy.
Form a histograms of: Syllables per word Words per sentence Sentences per paragraph. Vocabulary.
And ratio of exclamation marks to word count!!! Thanks, keep 'em coming
|
|
|
|
MysteryMiner
Legendary
Offline
Activity: 1512
Merit: 1049
Death to enemies!
|
|
October 30, 2012, 01:57:10 AM |
|
The average IQ level for all posts will also help at suckpuppet detection. The problem might be that most suckpuppet masters might fall on lower 20% on IQ scale.
|
bc1q59y5jp2rrwgxuekc8kjk6s8k2es73uawprre4j
|
|
|
helloworld (OP)
|
|
October 30, 2012, 02:05:35 AM |
|
The average IQ level for all posts will also help at suckpuppet detection. The problem might be that most suckpuppet masters might fall on lower 20% on IQ scale.
That would probably depend on how easily they were detected! The dumb or careless puppet masters would be more obvious while the high-IQ alts would probably be more conscious of their traits and hence be better at flying under the radar.
|
|
|
|
paraipan
In memoriam
Legendary
Offline
Activity: 924
Merit: 1004
Firstbits: 1pirata
|
|
October 30, 2012, 02:06:34 AM |
|
The average IQ level for all posts will also help at suckpuppet detection. The problem might be that most suckpuppet masters might fall on lower 20% on IQ scale.
Proof or STFU They obviously have a higher than the average IQ and suffer some multiple personality disorder, so they can keep a good level of post quality if they wanted to, or the agenda requires it.
|
BTCitcoin: An Idea Worth Saving - Q&A with bitcoins on rugatu.com - Check my rep
|
|
|
MysteryMiner
Legendary
Offline
Activity: 1512
Merit: 1049
Death to enemies!
|
|
October 30, 2012, 02:08:48 AM |
|
If the algorithm detects that masters have low IQ but the stuckpuppets have high IQ then we have a problem LOL The average IQ level for all posts will also help at suckpuppet detection. The problem might be that most suckpuppet masters might fall on lower 20% on IQ scale.
Proof or STFU Are You a stuckpuppet master?
|
bc1q59y5jp2rrwgxuekc8kjk6s8k2es73uawprre4j
|
|
|
caffeinewriter
|
|
October 30, 2012, 02:21:35 AM |
|
Similar signatures/same website on profile might also help.
|
|
|
|
MysteryMiner
Legendary
Offline
Activity: 1512
Merit: 1049
Death to enemies!
|
|
October 30, 2012, 02:51:58 AM |
|
Similar signatures/same website on profile might also help.
most suckpuppet masters might fall on lower 20% on IQ scale. Is this this case?
|
bc1q59y5jp2rrwgxuekc8kjk6s8k2es73uawprre4j
|
|
|
caffeinewriter
|
|
October 30, 2012, 03:24:13 AM |
|
Similar signatures/same website on profile might also help.
most suckpuppet masters might fall on lower 20% on IQ scale. Is this this case? If this, then that.
|
|
|
|
helloworld (OP)
|
|
October 30, 2012, 03:34:20 AM |
|
Here's a conundrum:
I'd assume that alt accounts are more likely to agree with points raised by their primary account (although some would also be set up to argue), but what would the percentages be? e.g. 99% of alts align their views with their primary account, and 1% argue opposing points, or would it be closer to say, 60/40?
You could probably write an entire thesis (and more) on the topic, and if I were still in college, perhaps that's what I'd do.
|
|
|
|
caffeinewriter
|
|
October 30, 2012, 03:41:39 AM |
|
Here's a conundrum:
I'd assume that alt accounts are more likely to agree with points raised by their primary account (although some would also be set up to argue), but what would the percentages be? e.g. 99% of alts align their views with their primary account, and 1% argue opposing points, or would it be closer to say, 60/40?
You could probably write an entire thesis (and more) on the topic, and if I were still in college, perhaps that's what I'd do.
Hmm, I can see the title now. "Detecting Recurring Patterns Between Accounts To Find Individual Users with Multiple Accounts" Maybe compare the users' introduction posts in the "Introduce Yourself" thread.
|
|
|
|
helloworld (OP)
|
|
October 30, 2012, 03:57:10 AM |
|
Here's a conundrum:
I'd assume that alt accounts are more likely to agree with points raised by their primary account (although some would also be set up to argue), but what would the percentages be? e.g. 99% of alts align their views with their primary account, and 1% argue opposing points, or would it be closer to say, 60/40?
You could probably write an entire thesis (and more) on the topic, and if I were still in college, perhaps that's what I'd do.
Hmm, I can see the title now. "Detecting Recurring Patterns Between Accounts To Find Individual Users with Multiple Accounts" Maybe compare the users' introduction posts in the "Introduce Yourself" thread. Except accounts on internet forums would be just a small subset of the research. Detectives already do similar stuff IRL when someone forges a signature, or steals an identity. The signature may look okay to the naked eye, but up close it has very specific traits that can identify the real writer. Although I am interested in the programming aspect, it's not really an I.T. topic at heart. It's probably more psychological / social.
|
|
|
|
myrkul
|
|
October 30, 2012, 04:38:13 AM |
|
I'm sure this thread will help you test your algorithms. The inevitable pony-themed companion thread will help, too.
|
|
|
|
niko
|
|
October 30, 2012, 05:01:28 AM |
|
Whatever algorithm you come up with, if you describe it in public it will quit working from that point on.
|
They're there, in their room. Your mining rig is on fire, yet you're very calm.
|
|
|
helloworld (OP)
|
|
October 30, 2012, 05:12:16 AM |
|
Whatever algorithm you come up with, if you describe it in public it will quit working from that point on.
I very much doubt that, although it's effectiveness might reduce slightly. The public description of ponzi schemes does not stop people falling for them. And police catch criminals using fingerprint matching. This is well-known, and still works despite everyone knowing that they do this. Still though, you bring up another dimension to the issue: Would researching and designing a system be made more difficult by those who would rather such detection methods remain secret?
|
|
|
|
caffeinewriter
|
|
October 30, 2012, 05:27:57 AM |
|
Whatever algorithm you come up with, if you describe it in public it will quit working from that point on.
I very much doubt that, although it's effectiveness might reduce slightly. The public description of ponzi schemes does not stop people falling for them. And police catch criminals using fingerprint matching. This is well-known, and still works despite everyone knowing that they do this. Still though, you bring up another dimension to the issue: Would researching and designing a system be made more difficult by those who would rather such detection methods remain secret? I think this is similar to the security concerns that surround Open Source software. Sure, proprietary companies keep their source a secret, but when it's open source, the community can push out a fix instead of waiting for the company to do it. Not to mention everyone can improve upon it.
|
|
|
|
|