Bitcoin Forum
May 24, 2024, 01:07:15 PM *
News: Latest Bitcoin Core release: 27.0 [Torrent]
 
  Home Help Search Login Register More  
  Show Posts
Pages: [1] 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 ... 221 »
1  Other / Ivory Tower / Re: Linux without windows on: January 21, 2019, 09:05:07 PM
I want just a terminal if I can/cli.

It's actually pretty to easy to end up with a non-graphical system.  In the debian installer, just don't select the desktop environment task.

(https://www.debian.org/releases/stable/i386/ch06s03.html.en see 6.3.5.2. Selecting and Installing Software)

That said, if you want to boot into an xserver and still be very keyboard driven, tiling window managers like i3 (as pointed out by Anduck) can be quite wonderful.

Also, Carlton Banks nailed it when it comes to the popular linuxes.  If you really want to get into rolling your own, you could try the linux-from-scratch method (warning: not for the faint of heart).
2  Alternate cryptocurrencies / Announcements (Altcoins) / Re: BYTEBALL: Totally new consensus algorithm + private untraceable payments on: March 28, 2018, 04:23:47 AM
I had the BTC - Byteball exchange bot "ignore" my deposit because it was "too small".  Admittedly, it wasn't a lot, but it was in the ballpark of 20USD at the time of the deposit.  The message was just "Received your payment of 0.059 GB it is too small to be exchanged, will be ignored.".

In my opinion, this is broken behavior.  If the amount is too small to exchange, send it back after some time, don't just confiscate the money..

Is there anyone I can talk to to get my money back?

I was glad to see that whoever runs the BEEB exchange is around to take care of their customers.  Does anyone know who runs the BTC-Byteball exchange bot?  I still haven't heard anything.  It seems like whoever runs this bot just stole my 20$

This bot has a buyer beware and explicitly says not to deposit the amount you deposited. So I'd have to respectfully say that there's only one place to put the blame if you don't get refunded.

I mean, if this bot's policy is to take for itself all values under 20$, and there's a fast moving exchange rate, and i'm just hosed.  Ok, it was 20$, not my life savings, no problem.  Still, I wouldn't recommend doing business with such a person.  And I bet I'm not the only one who has had their money stolen.

Does anyone know who the dev is.  I'd like to talk to them.
3  Other / Off-topic / Re: 😎 [ANN] The Utmost Epic, Totally Tubular sMerit Giveaway ~ 170 sM Available 😎 on: March 14, 2018, 03:53:35 AM
I wouldn't have come across this if it hadn't been for the long saga of quickseller abuse I've been suffering.  But recently he's started using a new alt/sockpuppet to try to attack me and I started looking more closely at the situation.  I realized that it seems that Panthers52 has done several deals which were escrowed by Quickseller.  The fact that Quickseller is escrowing for himself seems like a scammy behavior.  I'm not a trader here so it may be that there's nothing wrong with this.  But in any case, I'll go ahead and present some quantitative evidence here and you guys can discuss it as you please.


I happen to have some training in Statistical Methods for Natural Language Processing so I know a thing or two about how people use language and how to measure it quantitatively.  Although QS does a few funny things to try to disguise his use of Panthers52 as an alt (he doesn't use a sig-ad, he signs each message with "Kind Regards", etc), these techniques are not very robust---they don't disguise QS's style of writing at all when looked at from a big-picture perspective, and this is just what language modeling allows us to do.

One reason that I set out to do this experiment is because all of the pieces are there.  QS has written a pretty large corpus of posts under his main account.  And there's a secondary account as well (one of his alts which was outed only a few months ago) to do model checking on.  So, here's the big picture set up.  We're going to download the corpus of posts of Quickseller, ACCTSeller (his outed alt), Panthers52 (his accused alt), hilariousandco, dooglus, and me.  We'll then build language models using maximum likelihood parameter estimation for all of the ngrams in each corpus up to n=3.  For those who don't know, 1-grams are all of the single word tokens in the corpus, 2-grams (called bigrams) are all of the word pairs, 3-grams are all of the word triples, etc.  The reason I don't use 4 grams or any higher n is that the data just gets more and more sparse the higher you go, unless you have an incredibly large amount of data.  For this project, a 3 gram model seemed appropriate (and the 3-gram section wasn't terribly sparse).  So, step one, I downloaded all of the posts of theses members as raw html.  I used this script:

Code:
#!/bin/bash
u=$1
outdir=$2

curl --data "action=profile&u=${u}&sa=showPosts"  https://bitcointalk.org/index.php > $outdir/page0.html
dend=`cat $outdir/page0.html | sed -n -e 's/.*>\([0-9]\+\)<\/a> <span class="prevnext.*/\1/p'`
# dend=`cat $outdir/page0.html | sed -n -e 's/.*Pages:.*\.\.\. <\/b><a class="navPages" href="https:\/\/bitcointalk.org\/index.php?action=profile;u=[0-9]\+;sa=showPosts;start=[0-9]\+">\([0-9]\+\).*/\1/p'`
end=`echo "$dend" | head -n 1`
echo $end

i=1
while [[ $i -le $end ]]; do
  start=$(($i*20))
  curl --data "action=profile&u=${u}&sa=showPosts&start=$start"  https://bitcointalk.org/index.php > $outdir/page${start}.html
  ((i= i+1))
done

What's going on here is that you pass in a UID and a output directory and then use curl to get the first page of the "recent posts" of this member.  You then use sed to grab the last page of the post history, then you loop and do curl on each page and save the entire html into an output directory.  After doing this, I had a directory called rawhtml/ with subdirectories for each of the accounts in my experiment.

The next step was to strip out all of the irrelevant html stuff.  Thankfully, the html has a class "post" which contains people's posts.  And has another class for quotes and quoteheaders so it's pretty easy to load a page into beautifulsoup html parser, strip out the quotes and quoteheaders.  Here's my short-n-sweet python script to leave you with what I call "rawposts".

Code:
#!/usr/bin/env python
import sys
import os
from bs4 import BeautifulSoup

indir = sys.argv[1]
outdir = sys.argv[2]
for infile in os.listdir(indir):
  soup = BeautifulSoup(open(indir+"/"+infile),'html.parser')
  quoteheaders = soup.find_all("div", "quoteheader")
  for qh in quoteheaders:
    qh.extract()
  quotes = soup.find_all("div", "quote")
  for q in quotes:
    q.extract()

  posts = soup.find_all("div","post")
  f = open(outdir+"/"+infile,"w")
  for p in posts:
    print>>f, p
  print("done writing "+infile)

So, I ran this script to create a subdirectory for each account in the experment and I end up with a collection of posts, still as html, but without the embedded quotes.  The next step was to tokenize the file and to do some final cleanup before building the models.  By tokenize, I explicitly want to deal with punctuation and other funny stuff.  Imagine, if you leave periods and question marks stuck to the sides of words then you get some really funny counts which misses generalizations.  In fact, a period is a really common token at the end of a sentence so you want your model to have a high count of "." as a unigram, of ". </s>" as a bigram.  But if you leave the periods stuck to words you'll end up with lots of singletons "something.", "do.", "find." etc.  I also realized that the smiley html tags would be better replaced by single tokens so that we could see how they play into sentences.  Finally, I wanted to replace links which still showed up as <a href="..." target="_blank">link text with only their href value.  The rest is just constant and gets in the way of measuring what urls are actually being references.  This latter point could be important in identifying authorship.

So, I made a sed file and tokenized the corpus.  Here's my sed file:
Code:
# change smiley html for a tag
# remove <div class="post"> and <\/div>
s/\(<div class="post">\)\|\(<\/div>\)//g
s/<img alt="[A-Za-z]\+" border="0" src="https:\/\/bitcointalk.org\/Smileys\/default\/\([A-Za-z.]\+\)"\/>/--\1--/g
# change <br> for a real line break
s/<br\/>/\n/g
s/<hr\/>/\n/g
# do sentence breaking after . and ! and ? when space cap
s/\([?!\.]\)\s\+\([A-Z]\)/\1\n\2/g

# cleanup links, just use their href as if it was text
s:</a>\|<a href=\|target="_blank">::g
# punctuation stuff
s/\([,\.?]\)\($\|\s\)/ \1 \2/g
s/'s/ 's/g
s/\([()]\)/ \1 /g

# cleanup any spurious space at the end of the lines
s/\s\+$/\n/g
I also piped the output of this through "sed -e '/^$/d'" to remove any blank lines.  After doing this, I had what I thought was a pretty useable, tokenized, once "sentence" per line corpus of each of the accounts in my experiment.  Hand inspection of the corpus showed that there was still some noise in there, but crucially, all of the corpora were run through the same preprocessing and tokenization scripts, so any noise wouldn't be biased.

So, the next step was to do ngram counts over each of these models.  To do this, you simply count all of the 1, 2 and 3 grams in the corpus and create a counts file that you can use to create language models.  Note, I'm quite happy to share these count files for anyone who wants to see them.  The thing is that I guess they're a little too large for most pastebin services.  The quickseller counts file is approximately 8MB, for example.  I can tar these up and email them to anyone who's interested.  Or if anyone has a site they don't mind hosting them on then I could send them to that person.  Just let me know.

Code:
tspacepilot@computer:~/lm/counts$ ls -lah  
total 43M
drwxr-xr-x 2 tspacepilot tspacepilot 4.0K Sep  4 12:05 .
drwxr-xr-x 8 tspacepilot tspacepilot  16K Sep  4 11:55 ..
-rw-r--r-- 1 tspacepilot tspacepilot 1.3M Sep  3 10:40 as.count
-rw-r--r-- 1 tspacepilot tspacepilot  16M Sep  4 08:21 d.count
-rw-r--r-- 1 tspacepilot tspacepilot  12M Sep  4 08:20 h.count
-rw-r--r-- 1 tspacepilot tspacepilot 617K Sep  3 10:41 pan.count
-rw-r--r-- 1 tspacepilot tspacepilot 8.2M Sep  3 10:38 qs.count
-rw-r--r-- 1 tspacepilot tspacepilot 5.8M Sep  3 10:40 tsp.count

The next step is to generate language models from the count files.  I used Good-Turning smoothing over an MLE parameter estimation in order to generate plain text files that include the models.  These models are in the standard NIST format.  Here's the top of the file from tsp:

Code:
tspacepilot@computer:~/lm/lms$ head tsp.lm
\data\
ngram 1: type=21218 token=294893
ngram 2: type=117148 token=287741
ngram 3: type=215034 token=280589
\1-grams:
9787 0.0331883089798673 -1.4790148753233 ,
9243 0.0313435720752951 -1.50385151060555 the
8592 0.0291359916986839 -1.53557019528667 to
7152 0.0242528645983458 -1.61523695785429 </s>
7152 0.0242528645983458 -1.61523695785429 <s>

What you're seeing thereis the counts for each ngram type.  So the tspacepilot model has 294893 tokens/word instances, which fall into 21218 types.  To be clear for those who don't have a background in this, if I say "the" twice, that's two tokens and one type.  Then, you see the start of the 1 grams section.  You can see that I used a comma "," 9787 times and that the comma represents 0.033... of the probability mass of the unigram model, the second colum is that mass converted to a log value.  Here I reused a perl script that I had made some time ago.  It's short enough to show you the entirety here:

Code:
#!/usr/bin/perl 
# Build ngram LM for given count file
# tspacepilot
use strict;


#setting up the input file handles
$#ARGV != 1 and die "Usage: $0 <ngram_count_file> <lm_file>\n";
my $ngram_count_file = $ARGV[0];
my $lm_file_name = $ARGV[1];
open(DATA, "<:", $ngram_count_file) || die "cannot open $ngram_count_file.\n";
open(OUT, ">:", $lm_file_name) || die "cannot open $lm_file_name for writing.\n";

my @data = <DATA>;

my %unis;
my $uni_toks;
my %bis;
my %flat_bis;
my $bi_toks;
my %tris;
my %flat_tris;
my $tri_toks;

#here we build up the hash tables that we'll use to print the answer
foreach my $line (@data){
my @tokens = split(/\s+/, $line);
my $l = $#tokens;
if($l<1){
print "error on this line of count file:\n$line\n";
print "l = $l";
} elsif($l==1){
#print "this is a unigram\n";
$unis{$tokens[0]}=$tokens[1];
$uni_toks += $tokens[1];
} elsif($l==2){
#print "this is a bigram\n";
$bis{$tokens[0]}{$tokens[1]}=$tokens[2];
$flat_bis{"$tokens[0] $tokens[1]"}=$tokens[2];
$bi_toks += $tokens[2];
} elsif($l==3){
#print "this is a trigram\n";
$tris{"$tokens[0] $tokens[1]"}{$tokens[2]}=$tokens[3];
$flat_tris{"$tokens[0] $tokens[1] $tokens[2]"}=$tokens[3];
$tri_toks += $tokens[3];
}  else {
print "error on this line of count file:\n$line\n";
print "l = $l";
}
}

print OUT "\\data\\\n";
print OUT "ngram 1: type=",scalar keys %unis," token=$uni_toks\n";
print OUT "ngram 2: type=", scalar keys %flat_bis," token=$bi_toks\n";
print OUT "ngram 3: type=", scalar keys %flat_tris," token=$tri_toks\n";

print OUT "\\1-grams:\n";
foreach my $uni (sort {$unis{$b} <=> $unis{$a} or $a cmp $b } (keys %unis)){
my $prob = $unis{$uni}/$uni_toks;
my $lgprob;
$lgprob = log10($prob);
print OUT "$unis{$uni} $prob $lgprob $uni\n";
}

print OUT "\\2-grams:\n";

#compute output for two grams
my @two_gram_output;
foreach my $flat_bi(keys %flat_bis){
my ($firstword) = $flat_bi =~ m/(\S+)/;
my $denominator;
foreach my $secondword (keys % {$bis{$firstword}}){
$denominator += $bis{$firstword}{$secondword};
}
my $prob = $flat_bis{$flat_bi}/$denominator;
my $lgprob = log10($prob);
push(@two_gram_output, "$flat_bis{$flat_bi} $prob $lgprob $flat_bi\n");
}

my @sorted_two_grams = sort{(split /\s+/,$b)[0] <=> (split /\s+/,$a)[0]} @two_gram_output;

#print output for two grams
foreach (@sorted_two_grams){
print OUT;
}


#compute output for 3grams
print OUT "\\3-grams:\n";
my @three_gram_output;
foreach my $flat_tri (keys %flat_tris){
my ($first_two_words) = $flat_tri =~ m/(\S+\s+\S+)/;
my $denominator;
foreach my $thirdword (keys % {$tris{$first_two_words}}){
$denominator += $tris{$first_two_words}{$thirdword};
}
my $prob = $flat_tris{$flat_tri}/$denominator;
my $lgprob = log10($prob);
push(@three_gram_output, "$flat_tris{$flat_tri} $prob $lgprob $flat_tri\n");
}

my @sorted_three_grams = sort{(split /\s+/,$b)[0] <=> (split /\s+/,$a)[0]} @three_gram_output;
#print output for 3grams
foreach(@sorted_three_grams){
print OUT;
}

sub log10 {
my $n = shift;
return log($n)/log(10);
}


Okay, with the language models all built (again, email me or PM me if you want to see the models themselves, I don't mind sharing them) we can start to get to the fun stuff.  The goal of the experiment is to use the language models as predictors of the other accounts texts.  The typical measure for this is called "perplexity" (https://en.wikipedia.org/wiki/Perplexity).   One nitty-gritty detail about this is what sorts of weighting to give to the 1,2,3 gram portions of the model when calculating perplexity.  Intuitively, putting more weight into the 1 grams puts more value on shared single-words, ie, the basic vocabulary of the person.  Putting more weight onto the 3-grams puts more weight on how that person puts words together, what three-word phrases they tend to use.  I ended up using weights 0.3 0.4 0.3 (uni,bi,tri grams) in calculating perplexity.  For each language model, I calculated the perplexity it assigns to each of the corpora of the accounts in the experiment.  Here comes the fun stuff, then, the results:

As plain text, checking the QS language model against every corpus:
Code:
==> qstest-acctseller-3.4.3.ppl <==
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
sent_num=2722 num=57708 oov_num=1393
logprob=-119405.183085554 ave_logprob=-2.02254828472914 ppl=105.329078517105

==> qstest-dooglus-3.4.3.ppl <==
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
sent_num=48667 num=827638 oov_num=108735
logprob=-1963318.24588274 ave_logprob=-2.55783608776103 ppl=361.273484388214

==> qstest-hilariousandco-3.4.3.ppl <==
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
sent_num=42799 num=636455 oov_num=53676
logprob=-1514039.01569095 ave_logprob=-2.42022420176373 ppl=263.162620156841

==> qstest-panthers52-3.4.3.ppl <==
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
sent_num=1663 num=25359 oov_num=1093
logprob=-53775.973489288 ave_logprob=-2.07397020669089 ppl=118.568740528906

==> qstest-tspacepilot-3.4.3.ppl <==
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
sent_num=7150 num=280589 oov_num=29664
logprob=-666393.992923604 ave_logprob=-2.5821718218487 ppl=382.09541103913

Well, as you can see, qs' model predicts my corpus with a perplexity of 382, predicts hillarious with 263, predicts dooglus with 361.  But crucially, predicts the posts of ACCTSeller and Panthers52 at 105 and 118!!!!

What this means is that QS's posting style, when measured quantitatively shows through his attempts to hide what he was doing.  This isn't too surprising for anyone who knows how language works, but it may be to others.  For fun, I also ran each model as a predictor against each of the other corpora.

hillariousancco against all:
Code:
==> htest-acctseller-3.4.3.ppl <==
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
sent_num=2722 num=57708 oov_num=2260
logprob=-136595.372784586 ave_logprob=-2.34820994988114 ppl=222.951269646594

==> htest-dooglus-3.4.3.ppl <==
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
sent_num=48667 num=827638 oov_num=109662
logprob=-1934327.44440288 ave_logprob=-2.52311368446967 ppl=333.513704608138

==> htest-panthers52-3.4.3.ppl <==
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
sent_num=1663 num=25359 oov_num=1828
logprob=-60634.1796607556 ave_logprob=-2.40669126223528 ppl=255.088724501193

==> htest-quickseller-3.4.3.ppl <==
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
sent_num=24371 num=503617 oov_num=25750
logprob=-1193959.69530073 ave_logprob=-2.37727869117974 ppl=238.384871857193

==> htest-tspacepilot-3.4.3.ppl <==
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
sent_num=7150 num=280589 oov_num=26006
logprob=-662995.55023098 ave_logprob=-2.5330988076818 ppl=341.270546308425

So, we can see that hillarious doesn't really have a style predicts any of the rest of us better than another.   At least not significantly.  However, it is interesting that hillarious' model assigns perplexities to all three of quickseller's accounts which are in the same range.  This provides an oblique suggestion as to the similarities of those corpora.  Here is dooglus' model predicting each of the other accounts:

Code:
==> dtest-acctseller-3.4.3.ppl <==
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
sent_num=2722 num=57708 oov_num=2518
logprob=-141009.183781008 ave_logprob=-2.43488713532615 ppl=272.199382299313

==> dtest-hilariousandco-3.4.3.ppl <==
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
sent_num=42799 num=636455 oov_num=44764
logprob=-1532563.94318701 ave_logprob=-2.4154264735252 ppl=260.271415205445

==> dtest-panthers52-3.4.3.ppl <==
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
sent_num=1663 num=25359 oov_num=1752
logprob=-61358.7835651667 ave_logprob=-2.42812756490569 ppl=267.995538997277

==> dtest-quickseller-3.4.3.ppl <==
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
sent_num=24371 num=503617 oov_num=26384
logprob=-1223316.26268869 ave_logprob=-2.43880882666145 ppl=274.668481585288

==> dtest-tspacepilot-3.4.3.ppl <==
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
sent_num=7150 num=280589 oov_num=20198
logprob=-680500.394458114 ave_logprob=-2.5435368577456 ppl=349.572175864552

here's my model predicting all the other corpora

Code:
==> ttest-acctseller-3.4.3.ppl <==
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
sent_num=2722 num=57708 oov_num=2850
logprob=-139530.390079984 ave_logprob=-2.42324400972532 ppl=264.998862488461

==> ttest-dooglus-3.4.3.ppl <==
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
sent_num=48667 num=827638 oov_num=99717
logprob=-1946265.50900313 ave_logprob=-2.50617510057216 ppl=320.756230152803

==> ttest-hilariousandco-3.4.3.ppl <==
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
sent_num=42799 num=636455 oov_num=50287
logprob=-1518909.27782387 ave_logprob=-2.41492682099994 ppl=259.972147091511

==> ttest-panthers52-3.4.3.ppl <==
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
sent_num=1663 num=25359 oov_num=2043
logprob=-61310.1514410114 ave_logprob=-2.45446781060136 ppl=284.752673700336

==> ttest-quickseller-3.4.3.ppl <==
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
sent_num=24371 num=503617 oov_num=30864
logprob=-1209678.28851218 ave_logprob=-2.43335322477326 ppl=271.239680896164

Finally, we can also use the acctseller models and the panthers models to predict the other corpora.  These models are a bit smaller than the qs model, so I think it's not as impressive as the results from the QS model.  But they do demonstrate the same pattern.

Code:
==> atest-dooglus-3.4.3.ppl <==
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
sent_num=48667 num=827638 oov_num=158655
logprob=-1864342.35403158 ave_logprob=-2.59784345298067 ppl=396.135216494324

==> atest-hilariousandco-3.4.3.ppl <==
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
sent_num=42799 num=636455 oov_num=87812
logprob=-1444217.53179264 ave_logprob=-2.44185825794015 ppl=276.603873729012

==> atest-panthers52-3.4.3.ppl <==
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
sent_num=1663 num=25359 oov_num=2433
logprob=-54938.2415881704 ave_logprob=-2.23426091293548 ppl=171.498731827101

==> atest-quickseller-3.4.3.ppl <==
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
sent_num=24371 num=503617 oov_num=36302
logprob=-1072293.35965131 ave_logprob=-2.18084989129508 ppl=151.652610771117

==> atest-tspacepilot-3.4.3.ppl <==
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
sent_num=7150 num=280589 oov_num=47163
logprob=-623320.832692272 ave_logprob=-2.59095185177354 ppl=389.898758003026

Again, dooglus, me and hillariuos are all above 270 whereas the other known quickseller account is at 151 and the "suspected" alt is at 171.  And with the panthers model:

Code:
tspacepilot@computer:~/quickseller/ppls/ptest$ tail -n 3 *
==> ptest-acctseller-3.4.3.ppl <==
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
sent_num=2722 num=57708 oov_num=5835
logprob=-126943.515020739 ave_logprob=-2.32518573167395 ppl=211.439309416701

==> ptest-dooglus-3.4.3.ppl <==
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
sent_num=48667 num=827638 oov_num=200298
logprob=-1733046.66220228 ave_logprob=-2.56365194769031 ppl=366.144021870075

==> ptest-hilariousandco-3.4.3.ppl <==
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
sent_num=42799 num=636455 oov_num=110187
logprob=-1420281.45120892 ave_logprob=-2.49580708635173 ppl=313.18942275869

==> ptest-quickseller-3.4.3.ppl <==
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
sent_num=24371 num=503617 oov_num=55974
logprob=-1089757.40317691 ave_logprob=-2.30873957801444 ppl=203.582094424962

==> ptest-tspacepilot-3.4.3.ppl <==
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
sent_num=7150 num=280589 oov_num=56725
logprob=-602993.466557261 ave_logprob=-2.61020313295844 ppl=407.570866725746

Again, the panthers model is actually the smallest in terms of input data, so you can see how it's a little less robust for that reason.  Nevertheless, the similarities with the acctseller corpus and the quickseller corpus really stand out when comparing to values assigned to the dooglus, hillarious and tspacepilot corpora.

Lets summarize this in a table:

qsacctspan52dooghilarioustsp
qsX105.3118.1361.2263382.1
accts151.6X171.4396.1276.6389.9
pan52203.5211.4X366.1313.1407.6
doog274.6272.1267.9X260.3349.5
hilarious238.3222.9255.1333.5X341.2
tsp271.264.9284.7320.7259.9X

So, one thing I want to be clear on.  Perplexity measures how well a model predicts a certain corpus.  The first row shows us that the QS model predicts the acctseller and panthers52 corpora at approximately equally well, and far better than it predicts any of the others.  Most of the other rows here are just providing prespective to you.  You can see that the dooglus, hillarious and tsp models don't predict any of the other corpora very well (nothing anywhere below 250).

For completeness, here's the script I used to calculate perplexity:
Code:
#!/usr/bin/perl 
#Build ngram LM for given count file
#
use strict;
use Try::Tiny;

#setting up the input file handles
$#ARGV != 5 and die "Usage: $0 <lm_file> <l1> <l2> <l3> <test_data> <output>\n";
my $lm_file = $ARGV[0];
my($l1,$l2,$l3) = ($ARGV[1], $ARGV[2], $ARGV[3]);
my $test_data = $ARGV[4];
my $output = $ARGV[5];
open(LM, "<:", $lm_file) || die "cannot open $lm_file.\n";
my @data;
if ($test_data eq "-") {
  @data = <STDIN>;
} else {
  open(DATA, "<:", $test_data) || die "cannot open $test_data.\n";
  @data = <DATA>;
}
open(OUT, ">:", $output) || die "cannot open $output for writing.\n";

my $lmstring;
while (<LM>){
$lmstring .= $_;
}

#build up the lm data structures for quicker retreival
my @lm = split (/\\data\\|\\1-grams:|\\2-grams:|\\3-grams:/ ,$lmstring);
shift @lm;
my @data_lines = split (/\n/, $lm[0]);
my @one_gram_lines = split(/\n/, $lm[1]);
my @two_gram_lines = split(/\n/, $lm[2]);
my @three_gram_lines = split(/\n/, $lm[3]);
my %unis;
foreach (@one_gram_lines){
my($prob, $w)=$_=~/\S+\s+(\S+)\s+\S+\s+(\S+)/;
$unis{$w}=$prob;
}
my %bis;
foreach (@two_gram_lines){
my($prob, $w1, $w2)=$_=~/\S+\s+(\S+)\s+\S+\s+(\S+)\s+(\S+)/;
$bis{"$w1 $w2"}=$prob;
}
my %tris;
foreach (@three_gram_lines){
my($prob, $w1, $w2, $w3)=$_=~/\S+\s+(\S+)\s+\S+\s+(\S+)\s+(\S+)\s+(\S+)/;
$tris{"$w1 $w2 $w3"}=$prob;
}


my $sum;
my $cnt;
my $word_num;
my $oov_num;
my $sent_num;

for my $s (0 .. $#data){
  if($data[$s]=~m/^\s*$/) {
    next;
  }
$sent_num++;
chomp $data[$s];
$data[$s] = "<s> ".$data[$s]." </s>";
my @words = split /\s+/, $data[$s];
print OUT "\n\nSent #".($s+1).": @words\n";
my $sprob = 0;
my $soov = 0;
  for my $i (1 .. $#words){
$word_num++;
if($i==1){
#w1 given <s>:
my ($w1, $w2) =($words[$i-1], $words[$i]);
my $onegramprob;
my $twogramprob;
my $unknown_word;
my $smoothed_prob;
if(defined($unis{$w2})){
$onegramprob = $unis{$w2};
} else {
$unknown_word = 1;
}
if (!$unknown_word){
if(defined($bis{"$w1 $w2"})){
$twogramprob = $bis{"$w1 $w2"};
} else {
$unknown_word = 1;
}
}
if ($unknown_word) {
$smoothed_prob = "-inf (unknown word)";
$soov++;
} else {
$smoothed_prob = log10((($l3+$l2) * $twogramprob)+($l1*$onegramprob));
$sprob+= $smoothed_prob;
}
print OUT ($i);
print OUT ": LogP( $w2 | $w1 ) = $smoothed_prob\n";
} else {
my ($w1, $w2, $w3) = ($words[$i-2], $words[$i-1], $words[$i]);
my $threegramprob;
my $twogramprob;
my $onegramprob;
my $unknown_word;
my $unknown_ngram;
my $smoothed_prob;

#the trigrams
if(defined($unis{$w3})){
$onegramprob = $unis{$w3};
} else {
$unknown_word = 1;
}
if(defined($bis{"$w2 $w3"})){
$twogramprob = $bis{"$w2 $w3"};
} else {
$unknown_ngram = 1;
}
if(defined($tris{"$w1 $w2 $w3"})){
$threegramprob = $tris{"$w1 $w2 $w3"};
} else {
$unknown_ngram = 1;
}

print OUT ($i);
if ($unknown_word) {
print OUT ": LogP( $w3 | $w1 $w2 ) = -inf (unknown word)";
$soov++;
} elsif ($unknown_ngram){
$smoothed_prob = log10(($l3*$threegramprob)+($l2*$twogramprob)+($l1*$onegramprob));
print OUT ": LogP( $w3 | $w1 $w2 ) = $smoothed_prob (unknown ngrams)\n";
} else {
$smoothed_prob = log10(($l3*$threegramprob)+($l2*$twogramprob)+($l1*$onegramprob));
print OUT ": LogP( $w3 | $w1 $w2 ) = $smoothed_prob\n";
}
$sprob+=$smoothed_prob;
}
}
my $sppl = 10**(-($sprob/($#words-1)));

print OUT "1 sentence, ".($#words-1)." words, $soov OOVs\n";
print OUT "logprob=$sprob, ppl=$sppl";
$sum+=$sprob;
$oov_num+=$soov;
$cnt += $#words-1;
}

my $ave_logprob = $sum/($sent_num + $cnt - $oov_num);
my $ppl = 10**(-$ave_logprob);
print OUT "\n%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%\n";
print OUT "sent_num=$sent_num num=$cnt oov_num=$oov_num\n";
print OUT "logprob=$sum ave_logprob=$ave_logprob ppl=$ppl\n";

sub log10 {
my $n = shift;
return log($n)/log(10);
}


In sum, we know that Quickseller is adept at checking the blockchain to reveal transactions signed by particular accounts and to link them.  So it makes sense that he knows how to cover his tracks there and to use mixers and whatnot to make it difficult to detect his alts in that way.  He is an expert in this, so while I haven't tried, I suspect it would be difficult to link any of his accounts on the blockchain.  However, presumably, he's not an expert in forensic linguistics and statistical NLP so he didn't realize that providing a corpus of 552365 word tokens would actually give someone who wanted to detect his alts a reasonably reliable way to find the statistical fingerprint which is right there in the statistics of how he writes.

There's plenty of other circumstantial evidence that Panthers52 is an alt of Quickseller, but I'll leave that for others to talk about and discuss.  Also, I'm not a trader here so I'm not really affected by QS giving escrow for himself, but perhaps others who are will have more to say about whether this practice is truly a scam.  I opened this thread here because it seemed like scammy behavior to me, and I wanted others to be aware of it.

Here is a screenshot of QS feedback taken today:

Again, if anyone has any questions about this experiment or wants access to the particular data I ended up using, just let me know.  I believe I've provided all the tools in this post in order to replicate these results for yourself, but if something's missing, let me know about it.

4  Alternate cryptocurrencies / Announcements (Altcoins) / Re: BYTEBALL: Totally new consensus algorithm + private untraceable payments on: March 14, 2018, 03:44:12 AM
I had the BTC - Byteball exchange bot "ignore" my deposit because it was "too small".  Admittedly, it wasn't a lot, but it was in the ballpark of 20USD at the time of the deposit.  The message was just "Received your payment of 0.059 GB it is too small to be exchanged, will be ignored.".

In my opinion, this is broken behavior.  If the amount is too small to exchange, send it back after some time, don't just confiscate the money..

Is there anyone I can talk to to get my money back?

I was glad to see that whoever runs the BEEB exchange is around to take care of their customers.  Does anyone know who runs the BTC-Byteball exchange bot?  I still haven't heard anything.  It seems like whoever runs this bot just stole my 20$
5  Alternate cryptocurrencies / Announcements (Altcoins) / Re: BYTEBALL: Totally new consensus algorithm + private untraceable payments on: March 12, 2018, 03:27:49 AM
I had the BTC - Byteball exchange bot "ignore" my deposit because it was "too small".  Admittedly, it wasn't a lot, but it was in the ballpark of 20USD at the time of the deposit.  The message was just "Received your payment of 0.059 GB it is too small to be exchanged, will be ignored.".

In my opinion, this is broken behavior.  If the amount is too small to exchange, send it back after some time, don't just confiscate the money..

Is there anyone I can talk to to get my money back?
6  Bitcoin / Development & Technical Discussion / Re: Hex value of ip address why reversed? on: March 10, 2018, 07:39:30 AM
I admit that I'm not exactly sure what your context is or what you're looking at, but over the wire, IP addresses are encoded in big-endian format.  

If that's the kind of 'reversal' you're talking about, then I suggest you take a look at:

Code:
$ man 3 htons
7  Economy / Reputation / Re: Can you still believe aTriz words? Reopened, too many open questions on: March 09, 2018, 12:38:16 AM
Not funny, when I already posted that hash:

ah, indeed.  oh well.

Wait so I screwed up?

sorry, yah, just teasing.  poor timing.  I'll try again next year.  gl!

Quote from: nullius
Nothing can be discerned about the script from its SHA-256 hash.  (Nothing, including whether you “got it correct or not”.  I sincerely hope you did.)

Well, strictly speaking, that isn't true, since the hash allows you to determine identity (to a very high degree of probability) with an object that you already know the hash of.  That's how you knew that what I posted was the hash of null input.  But anyway, I fixate on irrelevant technicalities too often.  I return this thread to its regularly scheduled programming of intrigue and insult.
8  Economy / Reputation / Re: Can you still believe aTriz words? Reopened, too many open questions on: March 09, 2018, 12:27:21 AM
I sorta think I know the sha256 of this script.

Code:
e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855

Wink
9  Economy / Reputation / Re: Scammer seeks unchecked buffer on: March 07, 2018, 07:24:49 AM
Then how, exactly, do you send it?  (And how would aTriz run it?)

https://tools.ietf.org/html/rfc1149

But I suppose that may put one at risk for another sort of virus. Hehe.


Regarding generation of a sha256 hash on macOS, the world wide web suggests to me that

Code:
$ shasum -a 256 path/to/yer/gamblin/script

is the recipe (but I haven't tested it, I use GNU/Linux).
10  Economy / Reputation / Re: Scammer seeks unchecked buffer on: March 07, 2018, 07:00:17 AM
Hi everyone!  What intrigue!

(I am still also asking the SHA-256 hash of the script.)
What's this and how do I do it?

Code:
$ sha256sum path/to/your/gamblin/script

Cheers
11  Bitcoin / Bitcoin Discussion / reasonable block explorer on: January 17, 2018, 04:28:39 AM
So, it seems like blockexplorer.com has lost their minds, and I don't really want to support that.  blockchain.info is a classic fuck-up, not really into them either.  Is there a reasonable block-explorer on the web that I can use with confidence?
12  Economy / Services / Re: [CLOSED] | 🔥 Bitmixer.io Signature Campaign 🔥 | on: July 24, 2017, 06:00:19 AM
I received (what it turns out is my final) payment today.  Just FYI for those waiting to see how cleanly this wraps up.

Sorry to see a reputable service go away.  Wish them luck.
13  Bitcoin / Bitcoin Discussion / Re: [UPDATED]: The OFFICIAL SegWit2x Lock-in Thread on: July 23, 2017, 06:35:41 AM
Has anyone been following core devs response to this?  Are we gonna see btc1 merged into core?  Do I need to update my node (I think not since I'm not mining with it)?
14  Economy / Services / Re: [OPEN] | 🔥 Bitmixer.io Signature Campaign 🔥 | Earn up to 0.035 BTC/week on: July 22, 2017, 11:18:23 PM
Just got an reply from bitmixer support team about when they are going to solve this matter.

Quote
Hi,
Unfortunately there are no ETA at the moment.

Best Regards,
Bitmixer Support


That's pretty unfortunate.  The fix is as easy as funding their bot.  It really shouldn't take days to resolve this once it's come to their attention.  This makes me wonder about whether this is a service I should be advertising for.
15  Bitcoin / Bitcoin Discussion / Re: About the "Unknown block versions being mined!" warning on: July 22, 2017, 06:38:21 AM

  • BIP 9 ("version bits") is a standard for proposing Bitcoin upgrades or "deployments".  Code for this is included in Bitcoin Core.
  • BIP 141, together with BIP 143 and BIP 147, ("SegWit") is a deployment which follows the BIP 9 standard.  Code for this is included in Bitcoin Core.
  • BIP 91 ("SegSignal") is a deployment which uses the BIP 9 machinery while not strictly following the standard.  Code for this is not included in Bitcoin Core but is included in a fork of Bitcoin Core called "btc1".

By modifying the version field in the block header, miners can "signal" their support for any combination of deployments.  Bit 1 of the version field corresponds to SegWit and bit 4 corresponds to SegSignal.

https://blockchain.info/charts/bip-9-segwit charts the signalling of SegWit (bit 1) where each datapoint is in fact the average signalling rate of the prior 2016 blocks (~ 2 weeks).  However, as you rightly observe, there has been a recent increase in SegWit signalling.  Of the last 144 blocks (~ 1 day), 131 have signaled for SegWit (~ 91%).  If the 2016-block moving average chart is at 95% or higher at a difficulty change then SegWit will "lock in".
That's really clear, junan1, thanks.  I appreciate "SegSignal" as a name instead of numbers, which are easier to mix up.
Quote
SegSignal is already locked in and all nodes following SegSignal will consider invalid any blocks at height 477120 or greater (~ 2017-07-23, 8:00am UTC).  Note again that Bitcoin Core does not include code for this deployment (hence the warnings) and so Bitcoin Core will not reject blocks as SegSignal requires.
I understand that it's also a theoretical possibility that those who are currently sending SegSignal may not actually follow through and orphan non-segwit blocks.  But I understand that we expect them to do so.
16  Economy / Service Discussion / Re: Overview of Bitcointalk Signature-Ad Campaigns [Last update: 18-Jul-2017] on: July 21, 2017, 07:05:53 PM
https://bitcointalk.org/index.php?topic=1657397.msg20275748#msg20275748

It may be worth it to add the star to bitmixer.io campaign.  I have confidence they'll sort this out, but as of the last few days, people aren't getting paid.
17  Bitcoin / Bitcoin Discussion / Re: About the "Unknown block versions being mined!" warning on: July 21, 2017, 07:03:49 PM

I guess the confusion here comes from the fact that you can signal BIP9 without signaling SegWit.   BIP9 signals on bit 4 and we definitely have like 80ish% of blocks in the last day or so signaling on bit 4.   SegWit readiness is signaled on bit 1 and I think that's what bc.i is tracking there.  As far as I know, in a couple hundred more blocks, the miners who signaled BIP9 are supposed to start rejecting blocks that don't signal segwit on bit 1.  We'll have to see whether all the folks who signaled BIP9 start putting bit1 and whether or not those who signaled BIP9 (bit4) actually do start orphaning blocks that don't signal bit1.

Disclaimer: this topic is confusing, I may be wrong.


EDIT: see next post by achow101: everywhere above that I said BIP9, I should have said BIP91.
 
18  Bitcoin / Development & Technical Discussion / Re: UTXO amount decrease on: July 20, 2017, 07:59:18 PM
Right, OP seems to be confusing the BTC value of unspent outputs with the count of unspent outputs.

OP: UTXO db is there to keep track of spendable outputs, irrespective of their combined value.  I think your intuition is correct that the value of UTXO should be increasing over time because of the block reward (it is possible to "burn" bitcoins by sending them to unusable addresses, but I think you're right about the trend).

However, you have to consider the way transactions actually work (in the simple case): they assign a number of Satoshis to a script which specifies spending conditions.  In the case of sending N Satoshi's to address A, what you're doing is creating a script which says that a good signature from the private key corresponding to address A is good enough to spend N Satoshi's.  These N Satoshis are then in an unspent output waiting for someone with the private key for address A to send them elsewhere.  Now, imagine this, someone else sends M Satoshis to address A in the same fashion, there are now N+M spendable satoshis for the person with the private key for address A, but there are two outputs in the UTXO db that represent them (the two transactions which funded address A).  If the person with address A now sends those M+N satoshis to address B, We just got one less unspent output in the UTXO.  The value M+N is still there, but it's now stored in a single unspent output instead of two.

You can find out more about this stuff by looking at raw transactions and how they work.  You'll see that when your wallet sends a transaction to spend X, that transaction might have any number of inputs and may have more than one output as well (change addresses, etc).

Have fun!
19  Bitcoin / Bitcoin Discussion / Re: About the "Unknown block versions being mined!" warning on: July 20, 2017, 04:28:06 PM
Anyone know what that small percentage of blocks signaling with a 30000000 is about?  I saw one or two blocks with that version number yesterday and I couldn't find any BIPs or docs referencing that number.
20  Bitcoin / Bitcoin Discussion / Re: About the "Unknown block versions being mined!" warning on: July 19, 2017, 07:01:13 PM
I see this error as well - that is why I'm on this thread, reading through it ....

I'm trying to reconcile this thread with Coinbase's recent email.  Are we talking about here the User Activated Soft Fork (UASF) that Coinbase is referring to?

Coinbase also refers to User Activated Hard Fork (UAHF) that is scheduled for Aug, 1st.  They are not supporting this fork.

This error is related to the UASF - which does not require a software upgrade on wallets of core 14+ (or some 13s versions), and Coinbase is supporting?

Is this coinbase message since we started seeing the BIP 91 signaling?  What is coinbase saying in their message?  Some of us on the thread aren't coinbase customers.
Pages: [1] 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 ... 221 »
Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!