Usually bounty hunters of BitcoinTalk signature campaigns are required to write a certain number of posts within a week, participants are credited with the stakes for this activity. Sometimes unscrupulous users copy a messages of
other members or a paragraphs from the external articles in the Internet and post them here on the forum. Such posts can easily be compared and tracked by SEO services, therefore these bounty hunters began using
homographs to complicate detection.
Simplistically saying,
homographs are symbols in the international Unicode table which look the
same visually. The english alphabet uses only ASCII characters.
If homographs from different languages are mixed in some text, the human reading it will not distinguish any difference, however the analyzing systems will not be able to detect
plagiarism by simply comparing texts encoded in UTF-8.
For example:
- "SEO". Here are the ASCII characters only, homographs are not used. The word length in UTF-8 is 3 bytes.
- "SEO". The first symbol "S" is taken from the macedonian alphabet, the second symbols "E" is taken from the greek alphabet, the third symbols "O" is taken from the russian alphabet here. These non-english letters look the same as an ASCII characters, but they are encoded by two bytes, so the word length in UTF-8 is 6 bytes.
Such a way some members who use homographs write posts on the forum, simply copying and modifying the texts of
other people. Therefore I decided to create the
full list of homographs that can be used in the texts in English.
According to the HTML code, the forum uses the
following CSS style:
style="font-family: Verdana, Arial, sans-serif;"
Thus, the messages uses three fonts: "
Verdana", "
Arial" and "
Sans Serif". Also, the "
Courier New" is used for mono-space texts.
The table shows the ASCII characters and their homographs near by them that are written in all four of these fonts. Look at my next post
below.