Bitcoin Forum
April 23, 2024, 06:05:09 PM *
News: Latest Bitcoin Core release: 27.0 [Torrent]
 
   Home   Help Search Login Register More  
Pages: [1] 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 »
  Print  
Author Topic: Quickseller escrowing for himself  (Read 33607 times)
tspacepilot (OP)
Legendary
*
Offline Offline

Activity: 1456
Merit: 1076


I may write code in exchange for bitcoins.


View Profile
September 04, 2015, 08:10:14 PM
Last edit: March 28, 2018, 04:18:15 AM by tspacepilot
Merited by suchmoon (15), LoyceV (15), bones261 (5), hugeblack (2), TMAN (2), mOgliE (2), JayJuanGee (1), Timelord2067 (1), nutildah (1), nullius (1), Coinonomous (1)
 #1

I wouldn't have come across this if it hadn't been for the long saga of quickseller abuse I've been suffering.  But recently he's started using a new alt/sockpuppet to try to attack me and I started looking more closely at the situation.  I realized that it seems that Panthers52 has done several deals which were escrowed by Quickseller.  The fact that Quickseller is escrowing for himself seems like a scammy behavior.  I'm not a trader here so it may be that there's nothing wrong with this.  But in any case, I'll go ahead and present some quantitative evidence here and you guys can discuss it as you please.


I happen to have some training in Statistical Methods for Natural Language Processing so I know a thing or two about how people use language and how to measure it quantitatively.  Although QS does a few funny things to try to disguise his use of Panthers52 as an alt (he doesn't use a sig-ad, he signs each message with "Kind Regards", etc), these techniques are not very robust---they don't disguise QS's style of writing at all when looked at from a big-picture perspective, and this is just what language modeling allows us to do.

One reason that I set out to do this experiment is because all of the pieces are there.  QS has written a pretty large corpus of posts under his main account.  And there's a secondary account as well (one of his alts which was outed only a few months ago) to do model checking on.  So, here's the big picture set up.  We're going to download the corpus of posts of Quickseller, ACCTSeller (his outed alt), Panthers52 (his accused alt), hilariousandco, dooglus, and me.  We'll then build language models using maximum likelihood parameter estimation for all of the ngrams in each corpus up to n=3.  For those who don't know, 1-grams are all of the single word tokens in the corpus, 2-grams (called bigrams) are all of the word pairs, 3-grams are all of the word triples, etc.  The reason I don't use 4 grams or any higher n is that the data just gets more and more sparse the higher you go, unless you have an incredibly large amount of data.  For this project, a 3 gram model seemed appropriate (and the 3-gram section wasn't terribly sparse).  So, step one, I downloaded all of the posts of theses members as raw html.  I used this script:

Code:
#!/bin/bash
u=$1
outdir=$2

curl --data "action=profile&u=${u}&sa=showPosts"  https://bitcointalk.org/index.php > $outdir/page0.html
dend=`cat $outdir/page0.html | sed -n -e 's/.*>\([0-9]\+\)<\/a> <span class="prevnext.*/\1/p'`
# dend=`cat $outdir/page0.html | sed -n -e 's/.*Pages:.*\.\.\. <\/b><a class="navPages" href="https:\/\/bitcointalk.org\/index.php?action=profile;u=[0-9]\+;sa=showPosts;start=[0-9]\+">\([0-9]\+\).*/\1/p'`
end=`echo "$dend" | head -n 1`
echo $end

i=1
while [[ $i -le $end ]]; do
  start=$(($i*20))
  curl --data "action=profile&u=${u}&sa=showPosts&start=$start"  https://bitcointalk.org/index.php > $outdir/page${start}.html
  ((i= i+1))
done

What's going on here is that you pass in a UID and a output directory and then use curl to get the first page of the "recent posts" of this member.  You then use sed to grab the last page of the post history, then you loop and do curl on each page and save the entire html into an output directory.  After doing this, I had a directory called rawhtml/ with subdirectories for each of the accounts in my experiment.

The next step was to strip out all of the irrelevant html stuff.  Thankfully, the html has a class "post" which contains people's posts.  And has another class for quotes and quoteheaders so it's pretty easy to load a page into beautifulsoup html parser, strip out the quotes and quoteheaders.  Here's my short-n-sweet python script to leave you with what I call "rawposts".

Code:
#!/usr/bin/env python
import sys
import os
from bs4 import BeautifulSoup

indir = sys.argv[1]
outdir = sys.argv[2]
for infile in os.listdir(indir):
  soup = BeautifulSoup(open(indir+"/"+infile),'html.parser')
  quoteheaders = soup.find_all("div", "quoteheader")
  for qh in quoteheaders:
    qh.extract()
  quotes = soup.find_all("div", "quote")
  for q in quotes:
    q.extract()

  posts = soup.find_all("div","post")
  f = open(outdir+"/"+infile,"w")
  for p in posts:
    print>>f, p
  print("done writing "+infile)

So, I ran this script to create a subdirectory for each account in the experment and I end up with a collection of posts, still as html, but without the embedded quotes.  The next step was to tokenize the file and to do some final cleanup before building the models.  By tokenize, I explicitly want to deal with punctuation and other funny stuff.  Imagine, if you leave periods and question marks stuck to the sides of words then you get some really funny counts which misses generalizations.  In fact, a period is a really common token at the end of a sentence so you want your model to have a high count of "." as a unigram, of ". </s>" as a bigram.  But if you leave the periods stuck to words you'll end up with lots of singletons "something.", "do.", "find." etc.  I also realized that the smiley html tags would be better replaced by single tokens so that we could see how they play into sentences.  Finally, I wanted to replace links which still showed up as <a href="..." target="_blank">link text with only their href value.  The rest is just constant and gets in the way of measuring what urls are actually being references.  This latter point could be important in identifying authorship.

So, I made a sed file and tokenized the corpus.  Here's my sed file:
Code:
# change smiley html for a tag
# remove <div class="post"> and <\/div>
s/\(<div class="post">\)\|\(<\/div>\)//g
s/<img alt="[A-Za-z]\+" border="0" src="https:\/\/bitcointalk.org\/Smileys\/default\/\([A-Za-z.]\+\)"\/>/--\1--/g
# change <br> for a real line break
s/<br\/>/\n/g
s/<hr\/>/\n/g
# do sentence breaking after . and ! and ? when space cap
s/\([?!\.]\)\s\+\([A-Z]\)/\1\n\2/g

# cleanup links, just use their href as if it was text
s:</a>\|<a href=\|target="_blank">::g
# punctuation stuff
s/\([,\.?]\)\($\|\s\)/ \1 \2/g
s/'s/ 's/g
s/\([()]\)/ \1 /g

# cleanup any spurious space at the end of the lines
s/\s\+$/\n/g
I also piped the output of this through "sed -e '/^$/d'" to remove any blank lines.  After doing this, I had what I thought was a pretty useable, tokenized, once "sentence" per line corpus of each of the accounts in my experiment.  Hand inspection of the corpus showed that there was still some noise in there, but crucially, all of the corpora were run through the same preprocessing and tokenization scripts, so any noise wouldn't be biased.

So, the next step was to do ngram counts over each of these models.  To do this, you simply count all of the 1, 2 and 3 grams in the corpus and create a counts file that you can use to create language models.  Note, I'm quite happy to share these count files for anyone who wants to see them.  The thing is that I guess they're a little too large for most pastebin services.  The quickseller counts file is approximately 8MB, for example.  I can tar these up and email them to anyone who's interested.  Or if anyone has a site they don't mind hosting them on then I could send them to that person.  Just let me know.

Code:
tspacepilot@computer:~/lm/counts$ ls -lah  
total 43M
drwxr-xr-x 2 tspacepilot tspacepilot 4.0K Sep  4 12:05 .
drwxr-xr-x 8 tspacepilot tspacepilot  16K Sep  4 11:55 ..
-rw-r--r-- 1 tspacepilot tspacepilot 1.3M Sep  3 10:40 as.count
-rw-r--r-- 1 tspacepilot tspacepilot  16M Sep  4 08:21 d.count
-rw-r--r-- 1 tspacepilot tspacepilot  12M Sep  4 08:20 h.count
-rw-r--r-- 1 tspacepilot tspacepilot 617K Sep  3 10:41 pan.count
-rw-r--r-- 1 tspacepilot tspacepilot 8.2M Sep  3 10:38 qs.count
-rw-r--r-- 1 tspacepilot tspacepilot 5.8M Sep  3 10:40 tsp.count

The next step is to generate language models from the count files.  I used Good-Turning smoothing over an MLE parameter estimation in order to generate plain text files that include the models.  These models are in the standard NIST format.  Here's the top of the file from tsp:

Code:
tspacepilot@computer:~/lm/lms$ head tsp.lm
\data\
ngram 1: type=21218 token=294893
ngram 2: type=117148 token=287741
ngram 3: type=215034 token=280589
\1-grams:
9787 0.0331883089798673 -1.4790148753233 ,
9243 0.0313435720752951 -1.50385151060555 the
8592 0.0291359916986839 -1.53557019528667 to
7152 0.0242528645983458 -1.61523695785429 </s>
7152 0.0242528645983458 -1.61523695785429 <s>

What you're seeing thereis the counts for each ngram type.  So the tspacepilot model has 294893 tokens/word instances, which fall into 21218 types.  To be clear for those who don't have a background in this, if I say "the" twice, that's two tokens and one type.  Then, you see the start of the 1 grams section.  You can see that I used a comma "," 9787 times and that the comma represents 0.033... of the probability mass of the unigram model, the second colum is that mass converted to a log value.  Here I reused a perl script that I had made some time ago.  It's short enough to show you the entirety here:

Code:
#!/usr/bin/perl 
# Build ngram LM for given count file
# tspacepilot
use strict;


#setting up the input file handles
$#ARGV != 1 and die "Usage: $0 <ngram_count_file> <lm_file>\n";
my $ngram_count_file = $ARGV[0];
my $lm_file_name = $ARGV[1];
open(DATA, "<:", $ngram_count_file) || die "cannot open $ngram_count_file.\n";
open(OUT, ">:", $lm_file_name) || die "cannot open $lm_file_name for writing.\n";

my @data = <DATA>;

my %unis;
my $uni_toks;
my %bis;
my %flat_bis;
my $bi_toks;
my %tris;
my %flat_tris;
my $tri_toks;

#here we build up the hash tables that we'll use to print the answer
foreach my $line (@data){
my @tokens = split(/\s+/, $line);
my $l = $#tokens;
if($l<1){
print "error on this line of count file:\n$line\n";
print "l = $l";
} elsif($l==1){
#print "this is a unigram\n";
$unis{$tokens[0]}=$tokens[1];
$uni_toks += $tokens[1];
} elsif($l==2){
#print "this is a bigram\n";
$bis{$tokens[0]}{$tokens[1]}=$tokens[2];
$flat_bis{"$tokens[0] $tokens[1]"}=$tokens[2];
$bi_toks += $tokens[2];
} elsif($l==3){
#print "this is a trigram\n";
$tris{"$tokens[0] $tokens[1]"}{$tokens[2]}=$tokens[3];
$flat_tris{"$tokens[0] $tokens[1] $tokens[2]"}=$tokens[3];
$tri_toks += $tokens[3];
}  else {
print "error on this line of count file:\n$line\n";
print "l = $l";
}
}

print OUT "\\data\\\n";
print OUT "ngram 1: type=",scalar keys %unis," token=$uni_toks\n";
print OUT "ngram 2: type=", scalar keys %flat_bis," token=$bi_toks\n";
print OUT "ngram 3: type=", scalar keys %flat_tris," token=$tri_toks\n";

print OUT "\\1-grams:\n";
foreach my $uni (sort {$unis{$b} <=> $unis{$a} or $a cmp $b } (keys %unis)){
my $prob = $unis{$uni}/$uni_toks;
my $lgprob;
$lgprob = log10($prob);
print OUT "$unis{$uni} $prob $lgprob $uni\n";
}

print OUT "\\2-grams:\n";

#compute output for two grams
my @two_gram_output;
foreach my $flat_bi(keys %flat_bis){
my ($firstword) = $flat_bi =~ m/(\S+)/;
my $denominator;
foreach my $secondword (keys % {$bis{$firstword}}){
$denominator += $bis{$firstword}{$secondword};
}
my $prob = $flat_bis{$flat_bi}/$denominator;
my $lgprob = log10($prob);
push(@two_gram_output, "$flat_bis{$flat_bi} $prob $lgprob $flat_bi\n");
}

my @sorted_two_grams = sort{(split /\s+/,$b)[0] <=> (split /\s+/,$a)[0]} @two_gram_output;

#print output for two grams
foreach (@sorted_two_grams){
print OUT;
}


#compute output for 3grams
print OUT "\\3-grams:\n";
my @three_gram_output;
foreach my $flat_tri (keys %flat_tris){
my ($first_two_words) = $flat_tri =~ m/(\S+\s+\S+)/;
my $denominator;
foreach my $thirdword (keys % {$tris{$first_two_words}}){
$denominator += $tris{$first_two_words}{$thirdword};
}
my $prob = $flat_tris{$flat_tri}/$denominator;
my $lgprob = log10($prob);
push(@three_gram_output, "$flat_tris{$flat_tri} $prob $lgprob $flat_tri\n");
}

my @sorted_three_grams = sort{(split /\s+/,$b)[0] <=> (split /\s+/,$a)[0]} @three_gram_output;
#print output for 3grams
foreach(@sorted_three_grams){
print OUT;
}

sub log10 {
my $n = shift;
return log($n)/log(10);
}


Okay, with the language models all built (again, email me or PM me if you want to see the models themselves, I don't mind sharing them) we can start to get to the fun stuff.  The goal of the experiment is to use the language models as predictors of the other accounts texts.  The typical measure for this is called "perplexity" (https://en.wikipedia.org/wiki/Perplexity).   One nitty-gritty detail about this is what sorts of weighting to give to the 1,2,3 gram portions of the model when calculating perplexity.  Intuitively, putting more weight into the 1 grams puts more value on shared single-words, ie, the basic vocabulary of the person.  Putting more weight onto the 3-grams puts more weight on how that person puts words together, what three-word phrases they tend to use.  I ended up using weights 0.3 0.4 0.3 (uni,bi,tri grams) in calculating perplexity.  For each language model, I calculated the perplexity it assigns to each of the corpora of the accounts in the experiment.  Here comes the fun stuff, then, the results:

As plain text, checking the QS language model against every corpus:
Code:
==> qstest-acctseller-3.4.3.ppl <==
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
sent_num=2722 num=57708 oov_num=1393
logprob=-119405.183085554 ave_logprob=-2.02254828472914 ppl=105.329078517105

==> qstest-dooglus-3.4.3.ppl <==
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
sent_num=48667 num=827638 oov_num=108735
logprob=-1963318.24588274 ave_logprob=-2.55783608776103 ppl=361.273484388214

==> qstest-hilariousandco-3.4.3.ppl <==
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
sent_num=42799 num=636455 oov_num=53676
logprob=-1514039.01569095 ave_logprob=-2.42022420176373 ppl=263.162620156841

==> qstest-panthers52-3.4.3.ppl <==
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
sent_num=1663 num=25359 oov_num=1093
logprob=-53775.973489288 ave_logprob=-2.07397020669089 ppl=118.568740528906

==> qstest-tspacepilot-3.4.3.ppl <==
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
sent_num=7150 num=280589 oov_num=29664
logprob=-666393.992923604 ave_logprob=-2.5821718218487 ppl=382.09541103913

Well, as you can see, qs' model predicts my corpus with a perplexity of 382, predicts hillarious with 263, predicts dooglus with 361.  But crucially, predicts the posts of ACCTSeller and Panthers52 at 105 and 118!!!!

What this means is that QS's posting style, when measured quantitatively shows through his attempts to hide what he was doing.  This isn't too surprising for anyone who knows how language works, but it may be to others.  For fun, I also ran each model as a predictor against each of the other corpora.

hillariousancco against all:
Code:
==> htest-acctseller-3.4.3.ppl <==
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
sent_num=2722 num=57708 oov_num=2260
logprob=-136595.372784586 ave_logprob=-2.34820994988114 ppl=222.951269646594

==> htest-dooglus-3.4.3.ppl <==
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
sent_num=48667 num=827638 oov_num=109662
logprob=-1934327.44440288 ave_logprob=-2.52311368446967 ppl=333.513704608138

==> htest-panthers52-3.4.3.ppl <==
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
sent_num=1663 num=25359 oov_num=1828
logprob=-60634.1796607556 ave_logprob=-2.40669126223528 ppl=255.088724501193

==> htest-quickseller-3.4.3.ppl <==
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
sent_num=24371 num=503617 oov_num=25750
logprob=-1193959.69530073 ave_logprob=-2.37727869117974 ppl=238.384871857193

==> htest-tspacepilot-3.4.3.ppl <==
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
sent_num=7150 num=280589 oov_num=26006
logprob=-662995.55023098 ave_logprob=-2.5330988076818 ppl=341.270546308425

So, we can see that hillarious doesn't really have a style predicts any of the rest of us better than another.   At least not significantly.  However, it is interesting that hillarious' model assigns perplexities to all three of quickseller's accounts which are in the same range.  This provides an oblique suggestion as to the similarities of those corpora.  Here is dooglus' model predicting each of the other accounts:

Code:
==> dtest-acctseller-3.4.3.ppl <==
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
sent_num=2722 num=57708 oov_num=2518
logprob=-141009.183781008 ave_logprob=-2.43488713532615 ppl=272.199382299313

==> dtest-hilariousandco-3.4.3.ppl <==
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
sent_num=42799 num=636455 oov_num=44764
logprob=-1532563.94318701 ave_logprob=-2.4154264735252 ppl=260.271415205445

==> dtest-panthers52-3.4.3.ppl <==
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
sent_num=1663 num=25359 oov_num=1752
logprob=-61358.7835651667 ave_logprob=-2.42812756490569 ppl=267.995538997277

==> dtest-quickseller-3.4.3.ppl <==
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
sent_num=24371 num=503617 oov_num=26384
logprob=-1223316.26268869 ave_logprob=-2.43880882666145 ppl=274.668481585288

==> dtest-tspacepilot-3.4.3.ppl <==
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
sent_num=7150 num=280589 oov_num=20198
logprob=-680500.394458114 ave_logprob=-2.5435368577456 ppl=349.572175864552

here's my model predicting all the other corpora

Code:
==> ttest-acctseller-3.4.3.ppl <==
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
sent_num=2722 num=57708 oov_num=2850
logprob=-139530.390079984 ave_logprob=-2.42324400972532 ppl=264.998862488461

==> ttest-dooglus-3.4.3.ppl <==
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
sent_num=48667 num=827638 oov_num=99717
logprob=-1946265.50900313 ave_logprob=-2.50617510057216 ppl=320.756230152803

==> ttest-hilariousandco-3.4.3.ppl <==
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
sent_num=42799 num=636455 oov_num=50287
logprob=-1518909.27782387 ave_logprob=-2.41492682099994 ppl=259.972147091511

==> ttest-panthers52-3.4.3.ppl <==
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
sent_num=1663 num=25359 oov_num=2043
logprob=-61310.1514410114 ave_logprob=-2.45446781060136 ppl=284.752673700336

==> ttest-quickseller-3.4.3.ppl <==
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
sent_num=24371 num=503617 oov_num=30864
logprob=-1209678.28851218 ave_logprob=-2.43335322477326 ppl=271.239680896164

Finally, we can also use the acctseller models and the panthers models to predict the other corpora.  These models are a bit smaller than the qs model, so I think it's not as impressive as the results from the QS model.  But they do demonstrate the same pattern.

Code:
==> atest-dooglus-3.4.3.ppl <==
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
sent_num=48667 num=827638 oov_num=158655
logprob=-1864342.35403158 ave_logprob=-2.59784345298067 ppl=396.135216494324

==> atest-hilariousandco-3.4.3.ppl <==
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
sent_num=42799 num=636455 oov_num=87812
logprob=-1444217.53179264 ave_logprob=-2.44185825794015 ppl=276.603873729012

==> atest-panthers52-3.4.3.ppl <==
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
sent_num=1663 num=25359 oov_num=2433
logprob=-54938.2415881704 ave_logprob=-2.23426091293548 ppl=171.498731827101

==> atest-quickseller-3.4.3.ppl <==
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
sent_num=24371 num=503617 oov_num=36302
logprob=-1072293.35965131 ave_logprob=-2.18084989129508 ppl=151.652610771117

==> atest-tspacepilot-3.4.3.ppl <==
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
sent_num=7150 num=280589 oov_num=47163
logprob=-623320.832692272 ave_logprob=-2.59095185177354 ppl=389.898758003026

Again, dooglus, me and hillariuos are all above 270 whereas the other known quickseller account is at 151 and the "suspected" alt is at 171.  And with the panthers model:

Code:
tspacepilot@computer:~/quickseller/ppls/ptest$ tail -n 3 *
==> ptest-acctseller-3.4.3.ppl <==
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
sent_num=2722 num=57708 oov_num=5835
logprob=-126943.515020739 ave_logprob=-2.32518573167395 ppl=211.439309416701

==> ptest-dooglus-3.4.3.ppl <==
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
sent_num=48667 num=827638 oov_num=200298
logprob=-1733046.66220228 ave_logprob=-2.56365194769031 ppl=366.144021870075

==> ptest-hilariousandco-3.4.3.ppl <==
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
sent_num=42799 num=636455 oov_num=110187
logprob=-1420281.45120892 ave_logprob=-2.49580708635173 ppl=313.18942275869

==> ptest-quickseller-3.4.3.ppl <==
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
sent_num=24371 num=503617 oov_num=55974
logprob=-1089757.40317691 ave_logprob=-2.30873957801444 ppl=203.582094424962

==> ptest-tspacepilot-3.4.3.ppl <==
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
sent_num=7150 num=280589 oov_num=56725
logprob=-602993.466557261 ave_logprob=-2.61020313295844 ppl=407.570866725746

Again, the panthers model is actually the smallest in terms of input data, so you can see how it's a little less robust for that reason.  Nevertheless, the similarities with the acctseller corpus and the quickseller corpus really stand out when comparing to values assigned to the dooglus, hillarious and tspacepilot corpora.

Lets summarize this in a table:

qsacctspan52dooghilarioustsp
qsX105.3118.1361.2263382.1
accts151.6X171.4396.1276.6389.9
pan52203.5211.4X366.1313.1407.6
doog274.6272.1267.9X260.3349.5
hilarious238.3222.9255.1333.5X341.2
tsp271.264.9284.7320.7259.9X

So, one thing I want to be clear on.  Perplexity measures how well a model predicts a certain corpus.  The first row shows us that the QS model predicts the acctseller and panthers52 corpora at approximately equally well, and far better than it predicts any of the others.  Most of the other rows here are just providing prespective to you.  You can see that the dooglus, hillarious and tsp models don't predict any of the other corpora very well (nothing anywhere below 250).

For completeness, here's the script I used to calculate perplexity:
Code:
#!/usr/bin/perl 
#Build ngram LM for given count file
#
use strict;
use Try::Tiny;

#setting up the input file handles
$#ARGV != 5 and die "Usage: $0 <lm_file> <l1> <l2> <l3> <test_data> <output>\n";
my $lm_file = $ARGV[0];
my($l1,$l2,$l3) = ($ARGV[1], $ARGV[2], $ARGV[3]);
my $test_data = $ARGV[4];
my $output = $ARGV[5];
open(LM, "<:", $lm_file) || die "cannot open $lm_file.\n";
my @data;
if ($test_data eq "-") {
  @data = <STDIN>;
} else {
  open(DATA, "<:", $test_data) || die "cannot open $test_data.\n";
  @data = <DATA>;
}
open(OUT, ">:", $output) || die "cannot open $output for writing.\n";

my $lmstring;
while (<LM>){
$lmstring .= $_;
}

#build up the lm data structures for quicker retreival
my @lm = split (/\\data\\|\\1-grams:|\\2-grams:|\\3-grams:/ ,$lmstring);
shift @lm;
my @data_lines = split (/\n/, $lm[0]);
my @one_gram_lines = split(/\n/, $lm[1]);
my @two_gram_lines = split(/\n/, $lm[2]);
my @three_gram_lines = split(/\n/, $lm[3]);
my %unis;
foreach (@one_gram_lines){
my($prob, $w)=$_=~/\S+\s+(\S+)\s+\S+\s+(\S+)/;
$unis{$w}=$prob;
}
my %bis;
foreach (@two_gram_lines){
my($prob, $w1, $w2)=$_=~/\S+\s+(\S+)\s+\S+\s+(\S+)\s+(\S+)/;
$bis{"$w1 $w2"}=$prob;
}
my %tris;
foreach (@three_gram_lines){
my($prob, $w1, $w2, $w3)=$_=~/\S+\s+(\S+)\s+\S+\s+(\S+)\s+(\S+)\s+(\S+)/;
$tris{"$w1 $w2 $w3"}=$prob;
}


my $sum;
my $cnt;
my $word_num;
my $oov_num;
my $sent_num;

for my $s (0 .. $#data){
  if($data[$s]=~m/^\s*$/) {
    next;
  }
$sent_num++;
chomp $data[$s];
$data[$s] = "<s> ".$data[$s]." </s>";
my @words = split /\s+/, $data[$s];
print OUT "\n\nSent #".($s+1).": @words\n";
my $sprob = 0;
my $soov = 0;
  for my $i (1 .. $#words){
$word_num++;
if($i==1){
#w1 given <s>:
my ($w1, $w2) =($words[$i-1], $words[$i]);
my $onegramprob;
my $twogramprob;
my $unknown_word;
my $smoothed_prob;
if(defined($unis{$w2})){
$onegramprob = $unis{$w2};
} else {
$unknown_word = 1;
}
if (!$unknown_word){
if(defined($bis{"$w1 $w2"})){
$twogramprob = $bis{"$w1 $w2"};
} else {
$unknown_word = 1;
}
}
if ($unknown_word) {
$smoothed_prob = "-inf (unknown word)";
$soov++;
} else {
$smoothed_prob = log10((($l3+$l2) * $twogramprob)+($l1*$onegramprob));
$sprob+= $smoothed_prob;
}
print OUT ($i);
print OUT ": LogP( $w2 | $w1 ) = $smoothed_prob\n";
} else {
my ($w1, $w2, $w3) = ($words[$i-2], $words[$i-1], $words[$i]);
my $threegramprob;
my $twogramprob;
my $onegramprob;
my $unknown_word;
my $unknown_ngram;
my $smoothed_prob;

#the trigrams
if(defined($unis{$w3})){
$onegramprob = $unis{$w3};
} else {
$unknown_word = 1;
}
if(defined($bis{"$w2 $w3"})){
$twogramprob = $bis{"$w2 $w3"};
} else {
$unknown_ngram = 1;
}
if(defined($tris{"$w1 $w2 $w3"})){
$threegramprob = $tris{"$w1 $w2 $w3"};
} else {
$unknown_ngram = 1;
}

print OUT ($i);
if ($unknown_word) {
print OUT ": LogP( $w3 | $w1 $w2 ) = -inf (unknown word)";
$soov++;
} elsif ($unknown_ngram){
$smoothed_prob = log10(($l3*$threegramprob)+($l2*$twogramprob)+($l1*$onegramprob));
print OUT ": LogP( $w3 | $w1 $w2 ) = $smoothed_prob (unknown ngrams)\n";
} else {
$smoothed_prob = log10(($l3*$threegramprob)+($l2*$twogramprob)+($l1*$onegramprob));
print OUT ": LogP( $w3 | $w1 $w2 ) = $smoothed_prob\n";
}
$sprob+=$smoothed_prob;
}
}
my $sppl = 10**(-($sprob/($#words-1)));

print OUT "1 sentence, ".($#words-1)." words, $soov OOVs\n";
print OUT "logprob=$sprob, ppl=$sppl";
$sum+=$sprob;
$oov_num+=$soov;
$cnt += $#words-1;
}

my $ave_logprob = $sum/($sent_num + $cnt - $oov_num);
my $ppl = 10**(-$ave_logprob);
print OUT "\n%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%\n";
print OUT "sent_num=$sent_num num=$cnt oov_num=$oov_num\n";
print OUT "logprob=$sum ave_logprob=$ave_logprob ppl=$ppl\n";

sub log10 {
my $n = shift;
return log($n)/log(10);
}


In sum, we know that Quickseller is adept at checking the blockchain to reveal transactions signed by particular accounts and to link them.  So it makes sense that he knows how to cover his tracks there and to use mixers and whatnot to make it difficult to detect his alts in that way.  He is an expert in this, so while I haven't tried, I suspect it would be difficult to link any of his accounts on the blockchain.  However, presumably, he's not an expert in forensic linguistics and statistical NLP so he didn't realize that providing a corpus of 552365 word tokens would actually give someone who wanted to detect his alts a reasonably reliable way to find the statistical fingerprint which is right there in the statistics of how he writes.

There's plenty of other circumstantial evidence that Panthers52 is an alt of Quickseller, but I'll leave that for others to talk about and discuss.  Also, I'm not a trader here so I'm not really affected by QS giving escrow for himself, but perhaps others who are will have more to say about whether this practice is truly a scam.  I opened this thread here because it seemed like scammy behavior to me, and I wanted others to be aware of it.

Here is a screenshot of QS feedback taken today:

Again, if anyone has any questions about this experiment or wants access to the particular data I ended up using, just let me know.  I believe I've provided all the tools in this post in order to replicate these results for yourself, but if something's missing, let me know about it.
"I'm sure that in 20 years there will either be very large transaction volume or no volume." -- Satoshi
Advertised sites are not endorsed by the Bitcoin Forum. They may be unsafe, untrustworthy, or illegal in your jurisdiction.
brin999
Newbie
*
Offline Offline

Activity: 2
Merit: 0


View Profile
September 04, 2015, 08:30:56 PM
 #2

Looks like you are just throwing shit at a wall to see what sticks are you jealous that quickseller is on the default trust list and you are not this proves how much he derserves it and not you
Panthers52
Hero Member
*****
Offline Offline

Activity: 675
Merit: 502


#SuperBowl50 #NFCchamps


View Profile WWW
September 04, 2015, 08:40:10 PM
 #3

Does this mean that I am a global moderator? If this is the case then I am going to need a little bit of training.

Hopefully QS will respond to this thread once he is able to post/PM again to address your concerns.

I am not an expert in this kind of analysis, I do however believe you will need a larger pool of users to compare against.

Looks like you are just throwing shit at a wall to see what sticks are you jealous that quickseller is on the default trust list and you are not this proves how much he derserves it and not you
I agree kind stranger, I agree.

PGP 827D2A60

Tired of annoying signature ads? Ad block for signatures
Danksagung
Newbie
*
Offline Offline

Activity: 5
Merit: 0


View Profile
September 04, 2015, 08:43:37 PM
 #4

please show more respect for quickseller,
he tries so hard to keep us all safe.
tspacepilot (OP)
Legendary
*
Offline Offline

Activity: 1456
Merit: 1076


I may write code in exchange for bitcoins.


View Profile
September 04, 2015, 08:45:29 PM
 #5

Looks like you are just throwing shit at a wall to see what sticks are you jealous that quickseller is on the default trust list and you are not this proves how much he derserves it and not you

Not true, I have a very large corpus of text under the account Quickseller to build a model with.  The fact that that model fits the text of the panthers52 corpus and the acctseller corpus equally well and much, much better than the corpora of other accounts is what it is--->evidence that quickseller wrote the posts of those accounts.  The experiment wouldn't have been possible without the large corpus of quickseller posts and the outed alt of acctseller only adds to the results.

People should look at the experiment and decide for themselves what it means.  They should also take it in the context of all the other evidence that panthers52 is quickseller.  Personally, I'm looking forward to more sophisticated criticism than "thowing shit at a wall", but thanks for your input!

Does this mean that I am a global moderator? If this is the case then I am going to need a little bit of training.
Not sure how you come up with that.  Care to elaborate?
Quote

I am not an expert in this kind of analysis, I do however believe you will need a larger pool of users to compare against.
Feel free to suggest modifications to the experiment.

Or just go ahead and log in with your main account.  You really keep upping the ante.  Are you ready to deny that you're an alt of Quickseller?  This kind of evasive answer, while mildly entertaining, isn't doing you any good at this point:

I am Panthers52. I don't see the point to your question. Does me being QS (if true) make any of my points any less valid?

Yamato no Orochi
Full Member
***
Offline Offline

Activity: 120
Merit: 100

HYPOCRISY!


View Profile
September 04, 2015, 09:28:10 PM
 #6

this proves nothing other than Panthers52 talks in similar way as Quickseller.

Kind Regards
Yamato no Orochi

for rent. 10BTC a month. Cheesy
galbros
Legendary
*
Offline Offline

Activity: 1022
Merit: 1000


View Profile
September 04, 2015, 09:50:53 PM
 #7

I think this is really interesting analysis that could help with the issue of sock puppets in the forums if it achieved legitimacy here.

I have a remedial question.

For the index numbers why is lower better?  It seems like you are using one users posts' content to predict the others, right? Or amount of commonality?  I tend to conceptualize this as a percent or ratio, how does that translate into the index numbers you've calculated?

What do the number mean in relation to each other, are they a linear index?

Yamato has a point right?  All your analysis does is say their posts are written in a similar manner.  Can you do some statistical test to show how likely that is just random chance? e.g. a t-test?

Because I follow dooglus's posts I have been keeping up with your other thread in meta.  I initially didn't think panther was an alt but when he failed to follow up on this:

.....  If you do respond like a child again then you will only get added to my permanent ignore list and I will forever leave this thread.

Kind Regards
Panthers52

I started to have my doubts.

If this type of analysis does become a useful tool on the forum I can see lots of applications, for example in detecting account sales.
tspacepilot (OP)
Legendary
*
Offline Offline

Activity: 1456
Merit: 1076


I may write code in exchange for bitcoins.


View Profile
September 04, 2015, 09:59:09 PM
 #8

this proves nothing other than Panthers52 talks in similar way as Quickseller.

Kind Regards
Yamato no Orochi

I'd say that's a reasonably good layman's terms summary of what's going on here.  But to be more concrete, what it shows is that a large-ish statisical model of quickseller's posts predicts the posts of acctseller and panthers52 equally well and signficantly better than it predicts the posts of me, dooglus, or hillarious.  The reason it's important is that these kinds of relationships are "hidden in plain sight" in the language we use all the time, they're not easily maniuplated.  They're built up over the 1/2 million words that qs has posted on this forum.  The statistical relationships in that corpus allow us to look for matches against other accounts.  I've posted al the code and techniques here for anyone who wants to replicate the experiment or try the techniques on other accounts or pairs of known alts.

It's also important to note that this isn't the only evidence that Panthers52 is QS's alt.  There's actually a sorta overwhelming amount of circumstantial evidence.  I suppose that sooner or later we'll hear from badbear about whether there's any IP evidence.  There's also a particular "tick" that QS and his alts have, and as far as I've seen, no other accounts have this.  I won't say what it is yet because it's nicer to leave it hidden in case QS creates more alts.

My issues with Quickseller have to do with the fact that he's been bullying me relentlessly for nearly a half a year.  I didn't expect that I would uncover evidence of him doing an escrow scam.  This came out on accident, as he pulled in another sockpuppet account to try to attack me with.  I leave it to those who trade with him and use him as escrow to decide whether it's dishonest to trade with someone who's an alt of the escrow provider.

achow101
Staff
Legendary
*
Offline Offline

Activity: 3374
Merit: 6531


Just writing some code


View Profile WWW
September 04, 2015, 10:09:16 PM
 #9

You should definitely write this into a program and put it in services or project development. It would be great for analysis on finding people's alts, not just quickseller. In fact, maybe quickseller himself might even use it.

tspacepilot (OP)
Legendary
*
Offline Offline

Activity: 1456
Merit: 1076


I may write code in exchange for bitcoins.


View Profile
September 04, 2015, 10:17:32 PM
 #10

I think this is really interesting analysis that could help with the issue of sock puppets in the forums if it achieved legitimacy here.
It might.  But it only becomes useful when you have a lot of text to start out with.  In QS' case, he's written a whole lotta words .5 million words is not a small amount of text.  And it's especially useful here because we have a known alt to compare the model's accuracy with.  That is, it's quite interesting that the model predicts acctseller's text with the same metric as panthers52.  I imagine that there are few cases where the person suspected of scamming has written as much as QS.  But you're right, it could be useful.

Quote
I have a remedial question.

For the index numbers why is lower better?  It seems like you are using one users posts' content to predict the others, right? Or amount of commonality?  I tend to conceptualize this as a percent or ratio, how does that translate into the index numbers you've calculated?

I think the wikipedia article on perplexity is a reasonably good place to start https://en.wikipedia.org/wiki/Perplexity.  I'm not a very good teacher, and much better prose than I can produce has been written to explain the metric.  I think the simple way to think of it is as the cross-entropy of the model<->test-corpus relationship.

Here's another web-page which talks about it and how it's used to predict text. http://itl.nist.gov/iad/mig/publications/proceedings/darpa98/html/lm30/lm30.htm

Quote
What do the number mean in relation to each other, are they a linear index?

It should be more or less linear.

Quote
Yamato has a point right?  All your analysis does is say their posts are written in a similar manner.  Can you do some statistical test to show how likely that is just random chance? e.g. a t-test?

You could certainly compute pearsons r on the matrix I provided.

I also have the intuition that a t-test could be relevant.  But it's not clear to me at the moment how to set up the parameters.  Perhaps someone smarter than me will pick this up and/or describe how the t-test would work in this particular case.

Quote
If this type of analysis does become a useful tool on the forum I can see lots of applications, for example in detecting account sales.

As I said above, I doubt it would become generally useful unless the people in question have a large number of posts.  Having a lot of data to start from makes a better model.  You couldn't use this on newb accounts.
xetsr
Legendary
*
Offline Offline

Activity: 1120
Merit: 1000


View Profile
September 04, 2015, 10:19:51 PM
 #11

You should definitely write this into a program and put it in services or project development. It would be great for analysis on finding people's alts, not just quickseller. In fact, maybe quickseller himself might even use it.

Great idea! Then people can do to others what tspacepilot claims quickseller did to him, make accusations without solid proof, just assumptions based on what others say or in this case, a script / algorithm or whatever you want to call it. No?

Not taking sides here. Would I be shocked that quickseller escrowed for himself? No. I've seen other high ranked members do it.
achow101
Staff
Legendary
*
Offline Offline

Activity: 3374
Merit: 6531


Just writing some code


View Profile WWW
September 04, 2015, 10:26:29 PM
 #12

You should definitely write this into a program and put it in services or project development. It would be great for analysis on finding people's alts, not just quickseller. In fact, maybe quickseller himself might even use it.

Great idea! Then people can do to others what tspacepilot claims quickseller did to him, make accusations without solid proof, just assumptions based on what others say or in this case, a script / algorithm or whatever you want to call it. No?
It would be a good baseline to start with. Instead of attempting to hunt and search for proof that might not even exist, it would at least provide someone with something to work off of. It would at least show that it is possible for two accounts to be alts for further research instead of going directly to searching for blockchain evidence which can take a very long time. Such a program would also be useful for seeing whether an account was sold and roughly when.

Panthers52
Hero Member
*****
Offline Offline

Activity: 675
Merit: 502


#SuperBowl50 #NFCchamps


View Profile WWW
September 05, 2015, 01:03:50 AM
 #13

I know tsp from when he created a useless threads whose point is little more then a better-then-most-veiled attempt to boost his post count. The thread was about him claiming to have a transaction that would not confirm for days, I tried calling him out on this being BS, and he eventually gave me a5e169d60a797e4585c299cfa8bd2aff457f4d80a5b0c70e0f467e35fd21e1ad which was broadcast ~an hour prior to it confirming, and confirming in a block that was found days after the thread was created Roll Eyes I am fairly confident this was a transaction to himself as cover for creating the thread, however I didn't think I had strong enough evidence to create a meta thread regarding the pointless thread. For some strange reason, tsp thought that he wanted me to post in his thread about QS and as soon as I did he claimed that I was QS and started to flame me.

The only reason I can think that tsp wanted me to post his thread about QS is because I had disagreed with him wanting to make the trust system changed so it would be easier to scam and easier to farm trust.

From what I can see, tsp is going to troll anyone for months who disagrees with him, and as a result I would suggest anyone reading this to avoid giving any reason to say that tsp is wrong in this thread or any other one. If you do disagree with tsp then prepare to be trolled and harassed for months by him, including from a potential sock-puppet.

PGP 827D2A60

Tired of annoying signature ads? Ad block for signatures
tspacepilot (OP)
Legendary
*
Offline Offline

Activity: 1456
Merit: 1076


I may write code in exchange for bitcoins.


View Profile
September 05, 2015, 01:15:30 AM
Last edit: September 05, 2015, 01:33:10 AM by tspacepilot
 #14

FUD

^^^ Still going.  Writing weird nonsense about some post or thread which is intended to distract and cause confusion.  If you can't handle the message, attack the messenger.  You're looking more and more desperate with these sort of shenanigans.  I do sorta love how as his emotional level is going through the roof, his attempts to disguise his demeanor as quickseller have sorta fallen away completely.

Quickseller, more on-topic here would be to confirm or deny that panthers52 is your alt.

Apparantely, it might not even be an issue:

Not taking sides here. Would I be shocked that quickseller escrowed for himself? No. I've seen other high ranked members do it.

I don't trade here so I don't know what's considered honest/dishonest.  My intuition is that suggesting to use yourself as an escrow is dishonest, but I leave it to the community to decide.
tspacepilot (OP)
Legendary
*
Offline Offline

Activity: 1456
Merit: 1076


I may write code in exchange for bitcoins.


View Profile
September 05, 2015, 01:40:56 AM
 #15

Run the script on this account versus turtlehurricane.
That doesn't make any sense to do.  This account has 13 posts.  It's nowhere near enough data to get a coherent sample.

Quote
We need more data points, people aren't taking this seriously because your experiments are lacking. You got the code built, now do some research!

People can draw their own conclusions.  I think that comparing the quickseller model as a predictor of three classes was pretty informative.  The three classes are (1) a known alt; (2) a suspected alt; (3) people we know aren't his alt.  The data I presented show that the suspected alt and the known alt are predicted equally well and much better than the people we know aren't his alt.  They other models provide context, predicting the posts of another person with a model that isn't theirs you end up with perplexity from 250--400.  When we used quickseller's model to predict the text of his alts, we got perplexity around 100-150.

We could tweak the model parameters and we could run more experments.  I'm happy to help someone else to do that but I don't have the time or resources to do nothing but run models all day.  What's more, it seems that QS has given up even trying to defend himself here.  If panthers52 wasn't his alt, wouldn't the first thing he'd do be to say "nope".  Instead, he says this:

Quote

I am Panthers52. I don't see the point to your question. Does me being QS (if true) make any of my points any less valid?
xetsr
Legendary
*
Offline Offline

Activity: 1120
Merit: 1000


View Profile
September 05, 2015, 01:56:25 AM
Last edit: September 05, 2015, 03:09:36 AM by xetsr
 #16


People can draw their own conclusions.  I think that comparing the quickseller model as a predictor of three classes was pretty informative.  The three classes are (1) a known alt; (2) a suspected alt; (3) people we know aren't his alt.  The data I presented show that the suspected alt and the known alt are predicted equally well and much better than the people we know aren't his alt.  They other models provide context, predicting the posts of another person with a model that isn't theirs you end up with perplexity from 250--400.  When we used quickseller's model to predict the text of his alts, we got perplexity around 100-150.

We could tweak the model parameters and we could run more experments.  I'm happy to help someone else to do that but I don't have the time or resources to do nothing but run models all day.  What's more, it seems that QS has given up even trying to defend himself here.  If panthers52 wasn't his alt, wouldn't the first thing he'd do be to say "nope".  Instead, he says this:


Someone would have brought this up sooner or later, so I guess I will now and you can get it out of the way. https://bitcointalk.org/index.php?action=profile;u=358020 - "Forced 3 day break from the forum Be back Saturday". This thread was created today.

BTW, a scammer could and will most likely use multiple posting styles. Your algo could make other members believe they are dealing with someone with no alts. The algo could also be wrong and flag someone as a scammer when they just post similar to a scammer, or that scammer has studied their posting history and decided to copy it. A scammer could claim the same : Setup an alt, post normally, scam and when caught say someone copied him. Unlikely but over time possible. This could result in either a scammer getting away or the one not scamming having his reputation ruined. What a minute, that kind of reminds me of the flaws in the default trust list a few members keep bringing him.

Once again, not taking sides here. I'm just pointing out the flaws. The algo is pretty cool but appears to need a lot more work and even then could be wrong, resulting in someone's reputation getting ruined or someone getting away. Same as now. So nothing really has changed.
Blazed
Casascius Addict
Legendary
*
Offline Offline

Activity: 2128
Merit: 1119



View Profile WWW
September 05, 2015, 03:08:21 AM
 #17

Does forced break = 3 day ban?
xetsr
Legendary
*
Offline Offline

Activity: 1120
Merit: 1000


View Profile
September 05, 2015, 03:10:59 AM
 #18

Does forced break = 3 day ban?

Good question.

I think you should make another thread not calling out quickseller, since this technology is promising if real, and muddling it with a flame war is detrimental.

So is the code for the entire model posted here? I will happily run the necessary experiments for you.

About as promising as the default trust list, that neither of you like. See my updated post above Wink
Bit_Happy
Legendary
*
Offline Offline

Activity: 2100
Merit: 1040


A Great Time to Start Something!


View Profile
September 05, 2015, 03:14:51 AM
 #19

Does forced break = 3 day ban?

I read almost the whole thread, but didn't see "forced break", what is that?
Sorry about the dumb question.

tspacepilot (OP)
Legendary
*
Offline Offline

Activity: 1456
Merit: 1076


I may write code in exchange for bitcoins.


View Profile
September 05, 2015, 03:16:02 AM
 #20


Someone would have brought this up sooner or later, so I guess I will now and you can get it out of the way. https://bitcointalk.org/index.php?action=profile;u=358020 - "Forced 3 day break from the forum Be back Saturday". This thread was created today. So how exactly is he going to defend himself in this thread?
Except that he's right here.  Even he's not denying it.  He's just sorta posting off-topic nonsense in order distract.
Quote
BTW, a scammer could and will most likely use multiple posting styles. ...
Actually, the point of this experiment is that it's not the kind of thing you can manipulate with your conscious brain.  To manipulate these models, you'd have to do so across hundreds of thousands of words.  It's pretty clear to anyone who reads the posts of panthers52 and quickseller that he made a conscious effort to disguise his "posting style".  But once you measure "posting style" in a more concrete way, the charade falls apart.

Another thing to keep in mind, I'm not suggesting that people turn off their brains and subsitute their rational thoughts with the results of this experiment.  I provided the experiment results because I thought they shined an interesting light on a fellow who seems to be acting pretty shadily (not only in his use of sockpuppets for argument sake) but in providing himself with escrow services without his trading partners being aware of this.  I'm not asking to become the new tyrant, I'm just providing information.  Which, apparantely, even Quickseller values:

It is also an example as to the value of information, ...

Smiley
Pages: [1] 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 »
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!