1. IntroductionI’ve often seen the “size doesn’t matter” as opposed to “size does matter” used when talking about merited posts. I wondered if this was true, and to what degree (in this context). While being at it, I came across other overall merited post features that answered an initial set of questions I wanted to resolve (limited to sMerited posts):
Q1) How often are user images included in posts?
Q2) What about forum images (i.e. smilies, etc.)?
Q3) How about quotes?
Q4) Is the thread’s OP the main merited post in general?
Q5) How many posts prior to Merit System kick-off have been merited, and how far back?
Q6) How close does sMerit awarding occur in relation to the date of the post?
Q7) The size one: What size do merited post have? The longer, the more sMerited?
The information exposed here is done from the
global Forum Merited Post’s angle. I’m sure that if we took this analysis down a level (forum Section/Subsection), the profiles would not all concur, but a general view is a good starting point, at least for now.
Dataset baseline:Data as of 18/05/2018 (and around).
Total Merited Post Base: 45.947 (non-deleted messages).
I’ve decided to show the graphs and drop the data tables on this occasion for a more fluid lecture. I’ve also included extreme cases. These are, as they say, extreme cases, and should not take the focus of the core information shown on the graphs (they are kinky though..)
Disclaimer: Getting hold of this information is unfortunately a pain, and trying to break it down even more due to the HTML tags and specially to the quotes (be them nested or standalone). After sometime, I believe I’ve managed to extract the text from the awarded messages pretty well, excluding the quoted text parts which I consider are part of the context to messages, but do not add “real length” to the message body (therefore I exclude them from the message length).
The algorithm still has some flaws when the post has many objects of different nature (quotes, tables, code, images, etc.). I’m not going to build a 100% robust parser now, since I consider that even a 95% correct cleanse of the post for word count is good enough at this stage.
Some local languages are more troublesome. For example, Chinese characters often do not have many spaces and therefore the word count can be erratic there. I have not excluded these posts as they are not too many and represent only a small noise in the overall picture.
Also references to urls are counted as a word.
So, all in all, this is a rather good approximation, but not a 100% exact one.2. User ImagesIt turns out that the majority, 88,58% of the awarded posts, do not have user images. 6,61% have only one, 1,43% have 2 images, and 3,38% have 3 or more images.
Note: that I’m counting images here and cannot (nor wish to) retrieve information as to the actual image size. Some a large images with graphs and text, whilst others are mere icons.
Examples of the extreme cases are (most of the time the images do not all load even after refreshing the page):
a) 170 images - A post from August 2015 (services) post awarded with 5 sMerits:
Case 1b) 131 Images - A post from September 2017 (services) awarded with 27 sMerits:
Case 2c) 118 Images - A post from March 2018 (mining - harware) awarded with 6 sMerits:
Case 3d) 108 Images - A post from October 2017 (services) awarded with 9 sMerits:
Case 4e) 106 Images - A post from September 2015 (mining support – small dog icons count) awarded with 1 sMerit:
Case 52. Forum ImagesForum images are those icons such as emoticons that reference an address on the forum, and not an external link.
79,17% of awarded posts do not use any forum images, 13,53% use one, 7,31% use two or above. Emoticons are therefore not abused and seem to be kept at bay.
Examples of the extreme cases are:
a) 83 forum images - A post from April 2018 (Economics Speculation) post awarded with 5 sMerits:
Case 6 (I’ll skip a few now, since the same author has the top 8 cases in the same forum area).
b) 30 forum images - A post from February 2018 (Altcoin Discussion) post awarded with 1 sMerit:
Case 7 (It looks like there are more than 30, but some are ascii characters).
c) 27 forum images - A post from January 2018 (Economics Speculation) post awarded with 1 sMerit:
Case 8d) 24 forum images - A post from February 2018 (Italian Trading) post awarded with 1 sMerits:
Case 9e) 21 forum images - A post from April 2018 (Spanish) post awarded with 2 sMerits:
Case 103. QuotesQuotes are on the other hand a heavily used feature: 54,59% of the awarded posts, do not use quotes, but the remaining 45,41% do. 3,1% use 5 or more quotes.
Examples of the extreme cases are:
a) 290 quotes! - A post from April 2014 (Altcoin Discussion) post awarded with 7 sMerits:
Case 11b) 84 quotes - A post from April 2018 (Meta) post awarded with 9 sMerits:
Case 12c) 55 quotes - A post from February 2018 (Marketplace Gambling) post awarded with 2 sMerits (heavily nested quotes):
case 13d) 52 quotes - A post from May 2018 (Meta) post awarded with 2 sMerits (by me!):
Case 14e) 50 quotes - A post from March 2018 (Indonesian) post awarded with 2 sMerits:
Case 154. Post NumberThis one really startled me: 32,58% of merited posts are on mega threads (which I tend to ignore altogether), 40,31% if we count post position 201 onwards. Wow! This happens especially in Ann sections and Economy (The Wall Observer is the extreme case).
15% of awarded posts are Ops, but if we add up to post number 20 (which is the first page of a thread) , we get 40,21% of awarded posts. This graph actually does look like a crypto wall:
Examples of the extreme cases are:
a) Post Nº 409230- A post from May 2018 (Economics Speculation) post awarded with 2 sMerits:
Case 16 (There are a trillion in the same section/subsection).
b) Post Nº 10049 - A post from May 2018 (Ann Altcoin) post awarded with 40 sMerits:
Case 17c) Post Nº 10018- A post from January 2018 (Russian) post awarded with 47 sMerits:
Case 18d) Post Nº 9324- A post from April 2018 (Ann Altcoin) post awarded with 50 sMerits:
Case 19e) Post Nº 8912- A post from January 2018 (Ann Altcoin) post awarded with 50 sMerits:
Case 205. Post DateI thought there would be many more posts awarded sMerit from the days prior to the Merit System kick-off (since I has seen many cases when performing previous analytical tasks), but there really are not that many. If we consider that the system started in late February 2018, getting sMerit on posts back to January 2018 is pretty normal.
All in all, 93,89% of awarded posts are 2018 posts, 4,52% are 2017 posts and only 1,59% are posts from 2016. In terms of proportion, old awarded posts are outliers in the overall scheme of things.
Examples of the extreme cases are:
a) A post from November 2009 (Bitcoin Discussion – Satoshi’s welcome post) post awarded with 751 sMerits:
Case 21 (The oldest 5 awarded posts are all Satoshi’s)
b) A post from January 2010 (Economy - Marketplace) post awarded with 1 sMerit:
Case 22c) A post from January 2010 (Economy - Marketplace) post awarded with 1 sMerit:
Case 23d) A post from January 2010 (Economy - Marketplace) post awarded with 1 sMerit:
Case 24e) A post from May 2010 (Economy – Marketplace -> Pizza case) post awarded with 132 sMerit:
Case 256. Time between Publishing and MeritingI really wanted to see this one. It seems that 13,73% are merited within the first hour after posting, and another 10,05% within the second hour.
On a day scale, 56,50% of sMerit awarding occurs within the first 24 hours after posting, and an additional 20,47% gets awarded before the posts reaches an age of a tender week. Even so, 23,03% get awarded after two weeks or more since the post was published.
Note: time should be interpreted as “within the” (within the 1 (first) hour, within the second hour and so on). Also data represents number of Merit Txs, not number of posts.
Examples of the extreme cases are:
The first are all Satoshi’s posts as seen above, so I’ll give them a skip now in the examples.
a) 71222 hours: A post from March 2010 (Economy Marketplace -> must see: 10k bitcoins for 50$ and no one bought them) , awarded with 2 sMerits:
Case 26b) 70708 hours: A post from February 2010 (Economy Marketplace) , awarded with 1 sMerit:
Case 27c) 69941 hours: A post from February 2010 (Economics) , awarded with 2 sMerits:
Case 28d) 59227 hours: A post from July 2011 (Bitcoin Development and Technical Discussion) , awarded with 19 sMerits:
Case 29e) 2256 hours: A post from February 2018 (Altcoin Discussion) , awarded with 105 sMerits:
Case 307.Post lengthI’m measuring post length in words, and clustering them into groups of 100. As I’ve stated before, this part is not perfect since for example URLs get counted as words, no spaces after a full stop may cause in correct exact count, some html tags are a bother, etc. Quoted text has been removed.
On the whole, grouping posts in groups of 100, the data is pretty accurate and way better than no data at all.
It turns out that 65,07% of the sMerited posts have less than 100 words, another 18,41% have between 100 and 200 words, 6,45% between 200 and 300 words, 3,24% have between 300 and 400 words, and only 6,82% are above the 400 word barrier (somewhere near a word page in size).
I was also interested to see if longer posts get more merited, and it seems so. Looking at the graphs, there’s hardly any difference between posts with up to 100 words (avg. 2,79 sMerits) and post with up to 200 words (avg. 2,76 sMerits), but it does build up from there. The larger the Word Group the less posts there are of the kind, so the less conclusive the related awarded sMerits become.
Nevertheless, the conclusion is not “go and create larger posts”, since the content is what makes the difference in these cases and not the post size per se (and content analysis is another world).
Note: ‘Words’ should be interpreted as within the group of x hundred words (so on the graph, ‘0’ represents between 0 and 99 words, a ‘1’ between 100 and 199 words, and so on).
Examples of the extreme cases are (MS Word and my algorithm don’t always agree on word count due to elements before pointed out):
a) 10.159 words: A post from December 2014 (Altcoin Discussion) , awarded with 20 sMerits:
Case 31b) 8.204 words: A post from March 2018 (Bitcoin Discussion) , awarded with 6 sMerits:
Case 32c) 1 word: A post from February 2018 (Economics Speculation) , awarded with 25 sMerits (for a full stop -> probably deleted text):
Case 33d) 1 word: A post from April 2018 (French) , awarded with 50 sMerits (a crypto address):
Case 34e) 0 word: A post from March 2018 (Russian) , awarded with 1 sMerits (quotes):
Case 35f) 0 word: A post from February 2018 (Economics, Speculation) , awarded with 14 sMerits (image):
Case 36