Bitcoin Forum

Other => Meta => Topic started by: TheBeardedBaby on August 27, 2018, 08:41:23 PM



Title: [FIXED]Homographs are fixed.Thank you theymos, again. See my report.
Post by: TheBeardedBaby on August 27, 2018, 08:41:23 PM
The problem with the homographs is finally solved.

Done. I only did the ones that look really similar to Latin characters, and it only applies to English sections. It's done at display time, so it's retroactive.

I've tested it and it's better than ever. See my conclusion here > https://bitcointalk.org/index.php?topic=4967143.msg44859677#msg44859677
It won't affect the legit Cyrillic posts outside the local section, with the only exception that if you copy/quote the text from a Cyrillic post the changed/fixed letters will remain in Latin. See an example in the conclusion.


So the hompgraphs are back and are more active than before.
Just for the last 24 hours got 82 cases of hompgraph attacks.

Hompgraphs from the last 24 hours:

Had a look through the whitepaper and have to say it looks great. It explained a few things around masternodes as well.
~
... And many mone up to 82 cases

What we need to do is to make one simple list of all the characters and theymos will fix it.

~
BTW, the main blocker for me taking action was that I never got around to compiling the table of homographic characters and their ASCII counterparts. If this crops up again, it'd be helpful if someone would compile a nice plaintext "<char_from> -> <char_to>" table.

Anyone, who had the time to do it?
I'm on the mobile and it's awful.



[UPDATE] we have some lists finished bellow, now it's just to decide which one to use.

I kind of liked the homographs, it's pretty easy to spot the plagiarism, maybe it will be a good idea to just color them in red or put a dot after each homograph so we can see them easily.


Title: Re: Hompgraphs are back. Someone up to make a list of hompgraphic characters.
Post by: mdayonliner on August 27, 2018, 08:49:59 PM
Anyone, who had the time to do it?
I'm on the mobile and it's awful.
Give me some resources to start (URL, keywords or stuffs). I have time to make the list if it does not take too much time like a day or two.


Title: Re: Hompgraphs are back. Someone up to make a list of homographic characters?
Post by: TheBeardedBaby on August 27, 2018, 08:55:38 PM
See here :

http://sites.psu.edu/symbolcodes/languages/europe/cyrillic/cyrillicchart/

I guess there are other alfabets which can be used too but the Cyrillic is what is mainly used in the hompgraph attacks here.

There are some more resources and info in the quoted thread :

Does someone have a table of these characters? I can automatically convert non-standard characters to ASCII.

This is the one I use:
http://sites.psu.edu/symbolcodes/languages/europe/cyrillic/cyrillicchart/
~
There are more characters see the link I posted, not only the main Cyrillic, like:
CYRILLIC CAPITAL LETTER DZE   S   &‌#1029;   &‌#x0405;




Title: Re: Hompgraphs are back. Someone up to make a list of homographic characters?
Post by: mdayonliner on August 27, 2018, 09:17:33 PM
Reference: http://jrgraphix.net/r/Unicode/0400-04FF

Version one:
Ѐ = 0400
Ё = 0401
Ђ = 0402
Ѓ = 0403
Є = 0404
S = 0405
I = 0406
Ї = 0407
J = 0408
Љ = 0409
Њ = 040a
Ћ = 040b
Ќ = 040c
Ѝ = 040d
Ў = 040e
Џ = 040f
A = 0410
Б = 0411
B = 0412
Г = 0413
Д = 0414
E = 0415
Ж = 0416
З = 0417
И = 0418
Й = 0419
К = 041a
Л = 041b
M = 041c
H = 041d
O = 041e
П = 041f
P = 0420
C = 0421
T = 0422
У = 0423
Ф = 0424
X = 0425
Ц = 0426
Ч = 0427
Ш = 0428
Щ = 0429
Ъ = 042a
Ы = 042b
Ь = 042c
Э = 042d
Ю = 042e
Я = 042f
a = 0430
б = 0431
в = 0432
г = 0433
д = 0434
e = 0435
ж = 0436
з = 0437
и = 0438
й = 0439
к = 043a
л = 043b
м = 043c
н = 043d
o = 043e
п = 043f
p = 0440
c = 0441
т = 0442
y = 0443
ф = 0444
x = 0445
ц = 0446
ч = 0447
ш = 0448
щ = 0449
ъ = 044a
ы = 044b
ь = 044c
э = 044d
ю = 044e
я = 044f
ѐ = 0450
ё = 0451
ђ = 0452
ѓ = 0453
є = 0454
s = 0455
i = 0456
ї = 0457
j = 0458
љ = 0459
њ = 045a
ћ = 045b
ќ = 045c
ѝ = 045d
ў = 045e
џ = 045f
Ѡ = 0460
ѡ = 0461
Ѣ = 0462
ѣ = 0463
Ѥ = 0464
ѥ = 0465
Ѧ = 0466
ѧ = 0467
Ѩ = 0468
ѩ = 0469
Ѫ = 046a
ѫ = 046b
Ѭ = 046c
ѭ = 046d
Ѯ = 046e
ѯ = 046f
Ѱ = 0470
ѱ = 0471
Ѳ = 0472
ѳ = 0473
Ѵ = 0474
ѵ = 0475
Ѷ = 0476
ѷ = 0477
Ѹ = 0478
ѹ = 0479
Ѻ = 047a
ѻ = 047b
Ѽ = 047c
ѽ = 047d
Ѿ = 047e
ѿ = 047f
Ҁ = 0480
ҁ = 0481
҂ = 0482
҃ = 0483
҄ = 0484
҅ = 0485
҆ = 0486
҇ = 0487
҈ = 0488
҉ = 0489
Ҋ = 048a
ҋ = 048b
Ҍ = 048c
ҍ = 048d
Ҏ = 048e
ҏ = 048f
Ґ = 0490
ґ = 0491
Ғ = 0492
ғ = 0493
Ҕ = 0494
ҕ = 0495
Җ = 0496
җ = 0497
Ҙ = 0498
ҙ = 0499
Қ = 049a
қ = 049b
Ҝ = 049c
ҝ = 049d
Ҟ = 049e
ҟ = 049f
Ҡ = 04a0
ҡ = 04a1
Ң = 04a2
ң = 04a3
Ҥ = 04a4
ҥ = 04a5
Ҧ = 04a6
ҧ = 04a7
Ҩ = 04a8
ҩ = 04a9
Ҫ = 04aa
ҫ = 04ab
Ҭ = 04ac
ҭ = 04ad
Y = 04ae
ү = 04af
Ұ = 04b0
ұ = 04b1
Ҳ = 04b2
ҳ = 04b3
Ҵ = 04b4
ҵ = 04b5
Ҷ = 04b6
ҷ = 04b7
Ҹ = 04b8
ҹ = 04b9
Һ = 04ba
h = 04bb
Ҽ = 04bc
ҽ = 04bd
Ҿ = 04be
ҿ = 04bf
Ӏ = 04c0
Ӂ = 04c1
ӂ = 04c2
Ӄ = 04c3
ӄ = 04c4
Ӆ = 04c5
ӆ = 04c6
Ӈ = 04c7
ӈ = 04c8
Ӊ = 04c9
ӊ = 04ca
Ӌ = 04cb
ӌ = 04cc
Ӎ = 04cd
ӎ = 04ce
ӏ = 04cf
Ӑ = 04d0
ӑ = 04d1
Ӓ = 04d2
ӓ = 04d3
Ӕ = 04d4
ӕ = 04d5
Ӗ = 04d6
ӗ = 04d7
Ә = 04d8
ә = 04d9
Ӛ = 04da
ӛ = 04db
Ӝ = 04dc
ӝ = 04dd
Ӟ = 04de
ӟ = 04df
Ӡ = 04e0
ӡ = 04e1
Ӣ = 04e2
ӣ = 04e3
Ӥ = 04e4
ӥ = 04e5
Ӧ = 04e6
ӧ = 04e7
Ө = 04e8
ө = 04e9
Ӫ = 04ea
ӫ = 04eb
Ӭ = 04ec
ӭ = 04ed
Ӯ = 04ee
ӯ = 04ef
Ӱ = 04f0
ӱ = 04f1
Ӳ = 04f2
ӳ = 04f3
Ӵ = 04f4
ӵ = 04f5
Ӷ = 04f6
ӷ = 04f7
Ӹ = 04f8
ӹ = 04f9
Ӻ = 04fa
ӻ = 04fb
Ӽ = 04fc
ӽ = 04fd
Ӿ = 04fe
ӿ = 04ff

without the equal sign and the new line symbol:
Code:
Ѐ 0400
Ё 0401
Ђ 0402
Ѓ 0403
Є 0404
S 0405
I 0406
Ї 0407
J 0408
Љ 0409
Њ 040a
Ћ 040b
Ќ 040c
Ѝ 040d
Ў 040e
Џ 040f
A 0410
Б 0411
B 0412
Г 0413
Д 0414
E 0415
Ж 0416
З 0417
И 0418
Й 0419
К 041a
Л 041b
M 041c
H 041d
O 041e
П 041f
P 0420
C 0421
T 0422
У 0423
Ф 0424
X 0425
Ц 0426
Ч 0427
Ш 0428
Щ 0429
Ъ 042a
Ы 042b
Ь 042c
Э 042d
Ю 042e
Я 042f
a 0430
б 0431
в 0432
г 0433
д 0434
e 0435
ж 0436
з 0437
и 0438
й 0439
к 043a
л 043b
м 043c
н 043d
o 043e
п 043f
p 0440
c 0441
т 0442
y 0443
ф 0444
x 0445
ц 0446
ч 0447
ш 0448
щ 0449
ъ 044a
ы 044b
ь 044c
э 044d
ю 044e
я 044f
ѐ 0450
ё 0451
ђ 0452
ѓ 0453
є 0454
s 0455
i 0456
ї 0457
j 0458
љ 0459
њ 045a
ћ 045b
ќ 045c
ѝ 045d
ў 045e
џ 045f
Ѡ 0460
ѡ 0461
Ѣ 0462
ѣ 0463
Ѥ 0464
ѥ 0465
Ѧ 0466
ѧ 0467
Ѩ 0468
ѩ 0469
Ѫ 046a
ѫ 046b
Ѭ 046c
ѭ 046d
Ѯ 046e
ѯ 046f
Ѱ 0470
ѱ 0471
Ѳ 0472
ѳ 0473
Ѵ 0474
ѵ 0475
Ѷ 0476
ѷ 0477
Ѹ 0478
ѹ 0479
Ѻ 047a
ѻ 047b
Ѽ 047c
ѽ 047d
Ѿ 047e
ѿ 047f
Ҁ 0480
ҁ 0481
҂ 0482
҃ 0483
҄ 0484
҅ 0485
҆ 0486
҇ 0487
҈ 0488
҉ 0489
Ҋ 048a
ҋ 048b
Ҍ 048c
ҍ 048d
Ҏ 048e
ҏ 048f
Ґ 0490
ґ 0491
Ғ 0492
ғ 0493
Ҕ 0494
ҕ 0495
Җ 0496
җ 0497
Ҙ 0498
ҙ 0499
Қ 049a
қ 049b
Ҝ 049c
ҝ 049d
Ҟ 049e
ҟ 049f
Ҡ 04a0
ҡ 04a1
Ң 04a2
ң 04a3
Ҥ 04a4
ҥ 04a5
Ҧ 04a6
ҧ 04a7
Ҩ 04a8
ҩ 04a9
Ҫ 04aa
ҫ 04ab
Ҭ 04ac
ҭ 04ad
Y 04ae
ү 04af
Ұ 04b0
ұ 04b1
Ҳ 04b2
ҳ 04b3
Ҵ 04b4
ҵ 04b5
Ҷ 04b6
ҷ 04b7
Ҹ 04b8
ҹ 04b9
Һ 04ba
h 04bb
Ҽ 04bc
ҽ 04bd
Ҿ 04be
ҿ 04bf
Ӏ 04c0
Ӂ 04c1
ӂ 04c2
Ӄ 04c3
ӄ 04c4
Ӆ 04c5
ӆ 04c6
Ӈ 04c7
ӈ 04c8
Ӊ 04c9
ӊ 04ca
Ӌ 04cb
ӌ 04cc
Ӎ 04cd
ӎ 04ce
ӏ 04cf
Ӑ 04d0
ӑ 04d1
Ӓ 04d2
ӓ 04d3
Ӕ 04d4
ӕ 04d5
Ӗ 04d6
ӗ 04d7
Ә 04d8
ә 04d9
Ӛ 04da
ӛ 04db
Ӝ 04dc
ӝ 04dd
Ӟ 04de
ӟ 04df
Ӡ 04e0
ӡ 04e1
Ӣ 04e2
ӣ 04e3
Ӥ 04e4
ӥ 04e5
Ӧ 04e6
ӧ 04e7
Ө 04e8
ө 04e9
Ӫ 04ea
ӫ 04eb
Ӭ 04ec
ӭ 04ed
Ӯ 04ee
ӯ 04ef
Ӱ 04f0
ӱ 04f1
Ӳ 04f2
ӳ 04f3
Ӵ 04f4
ӵ 04f5
Ӷ 04f6
ӷ 04f7
Ӹ 04f8
ӹ 04f9
Ӻ 04fa
ӻ 04fb
Ӽ 04fc
ӽ 04fd
Ӿ 04fe
ӿ 04ff


Version two:

&#1024; = 0400
&#1025; = 0401
&#1026; = 0402
&#1027; = 0403
&#1028; = 0404
&#1029; = 0405
&#1030; = 0406
&#1031; = 0407
&#1032; = 0408
&#1033; = 0409
&#1034; = 040a
&#1035; = 040b
&#1036; = 040c
&#1037; = 040d
&#1038; = 040e
&#1039; = 040f
&#1040; = 0410
&#1041; = 0411
&#1042; = 0412
&#1043; = 0413
&#1044; = 0414
&#1045; = 0415
&#1046; = 0416
&#1047; = 0417
&#1048; = 0418
&#1049; = 0419
&#1050; = 041a
&#1051; = 041b
&#1052; = 041c
&#1053; = 041d
&#1054; = 041e
&#1055; = 041f
&#1056; = 0420
&#1057; = 0421
&#1058; = 0422
&#1059; = 0423
&#1060; = 0424
&#1061; = 0425
&#1062; = 0426
&#1063; = 0427
&#1064; = 0428
&#1065; = 0429
&#1066; = 042a
&#1067; = 042b
&#1068; = 042c
&#1069; = 042d
&#1070; = 042e
&#1071; = 042f
&#1072; = 0430
&#1073; = 0431
&#1074; = 0432
&#1075; = 0433
&#1076; = 0434
&#1077; = 0435
&#1078; = 0436
&#1079; = 0437
&#1080; = 0438
&#1081; = 0439
&#1082; = 043a
&#1083; = 043b
&#1084; = 043c
&#1085; = 043d
&#1086; = 043e
&#1087; = 043f
&#1088; = 0440
&#1089; = 0441
&#1090; = 0442
&#1091; = 0443
&#1092; = 0444
&#1093; = 0445
&#1094; = 0446
&#1095; = 0447
&#1096; = 0448
&#1097; = 0449
&#1098; = 044a
&#1099; = 044b
&#1100; = 044c
&#1101; = 044d
&#1102; = 044e
&#1103; = 044f
&#1104; = 0450
&#1105; = 0451
&#1106; = 0452
&#1107; = 0453
&#1108; = 0454
&#1109; = 0455
&#1110; = 0456
&#1111; = 0457
&#1112; = 0458
&#1113; = 0459
&#1114; = 045a
&#1115; = 045b
&#1116; = 045c
&#1117; = 045d
&#1118; = 045e
&#1119; = 045f
&#1120; = 0460
&#1121; = 0461
&#1122; = 0462
&#1123; = 0463
&#1124; = 0464
&#1125; = 0465
&#1126; = 0466
&#1127; = 0467
&#1128; = 0468
&#1129; = 0469
&#1130; = 046a
&#1131; = 046b
&#1132; = 046c
&#1133; = 046d
&#1134; = 046e
&#1135; = 046f
&#1136; = 0470
&#1137; = 0471
&#1138; = 0472
&#1139; = 0473
&#1140; = 0474
&#1141; = 0475
&#1142; = 0476
&#1143; = 0477
&#1144; = 0478
&#1145; = 0479
&#1146; = 047a
&#1147; = 047b
&#1148; = 047c
&#1149; = 047d
&#1150; = 047e
&#1151; = 047f
&#1152; = 0480
&#1153; = 0481
&#1154; = 0482
&#1155; = 0483
&#1156; = 0484
&#1157; = 0485
&#1158; = 0486
&#1159; = 0487
&#1160; = 0488
&#1161; = 0489
&#1162; = 048a
&#1163; = 048b
&#1164; = 048c
&#1165; = 048d
&#1166; = 048e
&#1167; = 048f
&#1168; = 0490
&#1169; = 0491
&#1170; = 0492
&#1171; = 0493
&#1172; = 0494
&#1173; = 0495
&#1174; = 0496
&#1175; = 0497
&#1176; = 0498
&#1177; = 0499
&#1178; = 049a
&#1179; = 049b
&#1180; = 049c
&#1181; = 049d
&#1182; = 049e
&#1183; = 049f
&#1184; = 04a0
&#1185; = 04a1
&#1186; = 04a2
&#1187; = 04a3
&#1188; = 04a4
&#1189; = 04a5
&#1190; = 04a6
&#1191; = 04a7
&#1192; = 04a8
&#1193; = 04a9
&#1194; = 04aa
&#1195; = 04ab
&#1196; = 04ac
&#1197; = 04ad
&#1198; = 04ae
&#1199; = 04af
&#1200; = 04b0
&#1201; = 04b1
&#1202; = 04b2
&#1203; = 04b3
&#1204; = 04b4
&#1205; = 04b5
&#1206; = 04b6
&#1207; = 04b7
&#1208; = 04b8
&#1209; = 04b9
&#1210; = 04ba
&#1211; = 04bb
&#1212; = 04bc
&#1213; = 04bd
&#1214; = 04be
&#1215; = 04bf
&#1216; = 04c0
&#1217; = 04c1
&#1218; = 04c2
&#1219; = 04c3
&#1220; = 04c4
&#1221; = 04c5
&#1222; = 04c6
&#1223; = 04c7
&#1224; = 04c8
&#1225; = 04c9
&#1226; = 04ca
&#1227; = 04cb
&#1228; = 04cc
&#1229; = 04cd
&#1230; = 04ce
&#1231; = 04cf
&#1232; = 04d0
&#1233; = 04d1
&#1234; = 04d2
&#1235; = 04d3
&#1236; = 04d4
&#1237; = 04d5
&#1238; = 04d6
&#1239; = 04d7
&#1240; = 04d8
&#1241; = 04d9
&#1242; = 04da
&#1243; = 04db
&#1244; = 04dc
&#1245; = 04dd
&#1246; = 04de
&#1247; = 04df
&#1248; = 04e0
&#1249; = 04e1
&#1250; = 04e2
&#1251; = 04e3
&#1252; = 04e4
&#1253; = 04e5
&#1254; = 04e6
&#1255; = 04e7
&#1256; = 04e8
&#1257; = 04e9
&#1258; = 04ea
&#1259; = 04eb
&#1260; = 04ec
&#1261; = 04ed
&#1262; = 04ee
&#1263; = 04ef
&#1264; = 04f0
&#1265; = 04f1
&#1266; = 04f2
&#1267; = 04f3
&#1268; = 04f4
&#1269; = 04f5
&#1270; = 04f6
&#1271; = 04f7
&#1272; = 04f8
&#1273; = 04f9
&#1274; = 04fa
&#1275; = 04fb
&#1276; = 04fc
&#1277; = 04fd
&#1278; = 04fe
&#1279; = 04ff


without the equal sign and the new line symbol:
Code:
&#1024; 0400
&#1025; 0401
&#1026; 0402
&#1027; 0403
&#1028; 0404
&#1029; 0405
&#1030; 0406
&#1031; 0407
&#1032; 0408
&#1033; 0409
&#1034; 040a
&#1035; 040b
&#1036; 040c
&#1037; 040d
&#1038; 040e
&#1039; 040f
&#1040; 0410
&#1041; 0411
&#1042; 0412
&#1043; 0413
&#1044; 0414
&#1045; 0415
&#1046; 0416
&#1047; 0417
&#1048; 0418
&#1049; 0419
&#1050; 041a
&#1051; 041b
&#1052; 041c
&#1053; 041d
&#1054; 041e
&#1055; 041f
&#1056; 0420
&#1057; 0421
&#1058; 0422
&#1059; 0423
&#1060; 0424
&#1061; 0425
&#1062; 0426
&#1063; 0427
&#1064; 0428
&#1065; 0429
&#1066; 042a
&#1067; 042b
&#1068; 042c
&#1069; 042d
&#1070; 042e
&#1071; 042f
&#1072; 0430
&#1073; 0431
&#1074; 0432
&#1075; 0433
&#1076; 0434
&#1077; 0435
&#1078; 0436
&#1079; 0437
&#1080; 0438
&#1081; 0439
&#1082; 043a
&#1083; 043b
&#1084; 043c
&#1085; 043d
&#1086; 043e
&#1087; 043f
&#1088; 0440
&#1089; 0441
&#1090; 0442
&#1091; 0443
&#1092; 0444
&#1093; 0445
&#1094; 0446
&#1095; 0447
&#1096; 0448
&#1097; 0449
&#1098; 044a
&#1099; 044b
&#1100; 044c
&#1101; 044d
&#1102; 044e
&#1103; 044f
&#1104; 0450
&#1105; 0451
&#1106; 0452
&#1107; 0453
&#1108; 0454
&#1109; 0455
&#1110; 0456
&#1111; 0457
&#1112; 0458
&#1113; 0459
&#1114; 045a
&#1115; 045b
&#1116; 045c
&#1117; 045d
&#1118; 045e
&#1119; 045f
&#1120; 0460
&#1121; 0461
&#1122; 0462
&#1123; 0463
&#1124; 0464
&#1125; 0465
&#1126; 0466
&#1127; 0467
&#1128; 0468
&#1129; 0469
&#1130; 046a
&#1131; 046b
&#1132; 046c
&#1133; 046d
&#1134; 046e
&#1135; 046f
&#1136; 0470
&#1137; 0471
&#1138; 0472
&#1139; 0473
&#1140; 0474
&#1141; 0475
&#1142; 0476
&#1143; 0477
&#1144; 0478
&#1145; 0479
&#1146; 047a
&#1147; 047b
&#1148; 047c
&#1149; 047d
&#1150; 047e
&#1151; 047f
&#1152; 0480
&#1153; 0481
&#1154; 0482
&#1155; 0483
&#1156; 0484
&#1157; 0485
&#1158; 0486
&#1159; 0487
&#1160; 0488
&#1161; 0489
&#1162; 048a
&#1163; 048b
&#1164; 048c
&#1165; 048d
&#1166; 048e
&#1167; 048f
&#1168; 0490
&#1169; 0491
&#1170; 0492
&#1171; 0493
&#1172; 0494
&#1173; 0495
&#1174; 0496
&#1175; 0497
&#1176; 0498
&#1177; 0499
&#1178; 049a
&#1179; 049b
&#1180; 049c
&#1181; 049d
&#1182; 049e
&#1183; 049f
&#1184; 04a0
&#1185; 04a1
&#1186; 04a2
&#1187; 04a3
&#1188; 04a4
&#1189; 04a5
&#1190; 04a6
&#1191; 04a7
&#1192; 04a8
&#1193; 04a9
&#1194; 04aa
&#1195; 04ab
&#1196; 04ac
&#1197; 04ad
&#1198; 04ae
&#1199; 04af
&#1200; 04b0
&#1201; 04b1
&#1202; 04b2
&#1203; 04b3
&#1204; 04b4
&#1205; 04b5
&#1206; 04b6
&#1207; 04b7
&#1208; 04b8
&#1209; 04b9
&#1210; 04ba
&#1211; 04bb
&#1212; 04bc
&#1213; 04bd
&#1214; 04be
&#1215; 04bf
&#1216; 04c0
&#1217; 04c1
&#1218; 04c2
&#1219; 04c3
&#1220; 04c4
&#1221; 04c5
&#1222; 04c6
&#1223; 04c7
&#1224; 04c8
&#1225; 04c9
&#1226; 04ca
&#1227; 04cb
&#1228; 04cc
&#1229; 04cd
&#1230; 04ce
&#1231; 04cf
&#1232; 04d0
&#1233; 04d1
&#1234; 04d2
&#1235; 04d3
&#1236; 04d4
&#1237; 04d5
&#1238; 04d6
&#1239; 04d7
&#1240; 04d8
&#1241; 04d9
&#1242; 04da
&#1243; 04db
&#1244; 04dc
&#1245; 04dd
&#1246; 04de
&#1247; 04df
&#1248; 04e0
&#1249; 04e1
&#1250; 04e2
&#1251; 04e3
&#1252; 04e4
&#1253; 04e5
&#1254; 04e6
&#1255; 04e7
&#1256; 04e8
&#1257; 04e9
&#1258; 04ea
&#1259; 04eb
&#1260; 04ec
&#1261; 04ed
&#1262; 04ee
&#1263; 04ef
&#1264; 04f0
&#1265; 04f1
&#1266; 04f2
&#1267; 04f3
&#1268; 04f4
&#1269; 04f5
&#1270; 04f6
&#1271; 04f7
&#1272; 04f8
&#1273; 04f9
&#1274; 04fa
&#1275; 04fb
&#1276; 04fc
&#1277; 04fd
&#1278; 04fe
&#1279; 04ff


Let me know how much it helps.


Title: Re: Hompgraphs are back. Someone up to make a list of homographic characters?
Post by: nkampala on August 27, 2018, 10:15:21 PM
I think I got most of them. They come from the Cyrillic, Greek, and Armenian alphabets. Info from this wikipedia page: https://en.wikipedia.org/wiki/IDN_homograph_attack (https://en.wikipedia.org/wiki/IDN_homograph_attack)

Homograph Character -> Regular Latin Character

Uppercase

Code:
A -> A
A -> A
B -> B
B -> B
C -> C
E -> E
E -> E
Ғ -> F
G -> G
H -> H
H -> H
I -> I
I -> I
J -> J
К -> K
K -> K
Լ -> L
M -> M
M -> M
N -> N
O -> O
O -> O
O -> O
P -> P
P -> P
S -> S
S -> S
T -> T
T -> T
U -> U
X -> X
X -> X
Y -> Y
Y -> Y
Z -> Z

Lowercase

Code:
a -> a
c -> c
d -> d
e -> e
ε -> e
g -> g
h -> h
h -> h
h -> h
i -> i
ι -> i
j -> j
κ -> k
Ӏ -> l
յ -> j
n -> n
η -> n
n -> n
o -> o
o -> o
o -> o
o -> o
p -> p
ρ -> p
q -> q
զ -> q
s -> s
τ -> t
υ -> u
u -> u
u -> u
ѵ -> v
ν -> v
w -> w
ω -> w
x -> x
χ -> x
y -> y
γ -> y

Accents & Other Marks

Code:
Ӓ -> Ä
Ё -> Ë
Ї -> Ï
Ӧ -> Ö
ӓ -> ä
ё -> ë
ї -> ï
ӧ -> ö

Numbers

Code:
Ձ -> 2
շ -> 2
З -> 3
Յ -> 3
Ч -> 4
б -> 6

CJK Compatability (not used as much b/c it doesn't look as similar, but might as well add it to the list anyway)
https://en.wikipedia.org/wiki/CJK_Compatibility (https://en.wikipedia.org/wiki/CJK_Compatibility)

Code:
㍲ -> da
㍳ -> AU
㍴ -> bar
㍶ -> pc
㍷ -> dm
㍺ -> IU
㎅ -> KB
㎆ -> MB
㎇ -> GB
㎎ -> mg
㎏ -> kg
㎙ -> fm
㎚ -> nm
㎜ -> mm
㎝ -> cm
㎞ -> km
㎩ -> Pa
㎭ -> rad
㎰ -> ps
㎱ -> ns
㎳ -> ms
㎹ -> MV
㎿ -> MW
㏄ -> cc
㏅ -> cd
㏊ -> ha
㏌ -> in
㏐ -> lm
㏑ -> ln
㏒ -> log
㏓ -> lx
㏕ -> mil
㏖ -> mol
㏚ -> PR
㏛ -> sr


Title: Re: Hompgraphs are back. Someone up to make a list of homographic characters?
Post by: Xal0lex on August 27, 2018, 11:05:39 PM
Characters that look the same in Latin and Cyrillic:

Code:
a, A, c, C, e, E, p, P, o, O, y, x, X, B, H, K, T, M


Latin

a  &#97;  -->
A  &#65;  -->
c  &#99;  -->
C  &#67;  -->
e  &#101; -->
E  &#69;  -->
K  &#75;  -->
p  &#112; -->
P  &#80;  -->
o  &#111; -->
O  &#79;  -->
y  &#121; -->
x  &#120; -->
X  &#88;  -->
B  &#66;  -->
H  &#72;  -->
T  &#84;  -->
M  &#77;  -->
....................
Cyrillic

a  &‌#1072;
A  &‌#1040;
c  &‌#1089;
C  &‌#1057;
e  &‌#1077;
E  &‌#1045;
К  &‌#1050;
p  &‌#1088;
P  &‌#1056;
o  &‌#1086;
O  &‌#1054;
y  &‌#1091;
x  &‌#1093;
X  &‌#1061;
B  &‌#1042;
H  &‌#1053;
T  &‌#1058;
M  &‌#1052;
....................







Title: Re: Hompgraphs are back. Someone up to make a list of homographic characters?
Post by: Quickseller on August 28, 2018, 02:52:03 AM
Some of these are "legit" symbols in various languages, correct? For example Russian and I believe Hebrew use different symbols than English does.

Maybe someone can compile a list of symbols used in each language in the local section (along with English), and those symbols can be all that is allowed to be used.

Edit : 🔑


Title: Re: Hompgraphs are back. Someone up to make a list of homographic characters?
Post by: Piggy on August 28, 2018, 04:32:51 AM
if i understood correctly they are mixing letter from different alphabets, this could be quite easy to spot by:

  • parsing the message
  • reporting the message
  • then check manually the message

I mean i don't see this going very far with this trick


Title: Re: Hompgraphs are back. Someone up to make a list of homographic characters?
Post by: xtraelv on August 28, 2018, 06:18:55 AM
if i understood correctly they are mixing letter from different alphabets, this could be quite easy to spot by:

  • parsing the message
  • reporting the message
  • then check manually the message

I mean i don't see this going very far with this trick

I think Quickseller was hinting maybe on an automated program that can check in the different language sections for the valid and invalid characters.

Personally I favor posting unpleasant messages on ICOs that employ Bots and shills to promote their product. (Like I have done before)
I make them all different so I can't be reported for multi posts.

If others start doing that then eventually it will be pointless to use shills to promote ICOs.

I read the white paper. I’ve also stayed around after the countless delays and dates being changed

I think that the potential of the Кrios to take advantage of the computing power of the entire Internet destroys the fictitious belief that the cryptocurrency has no value, is a bubble or the latest fashion of technology.

Кrios provides new forms of financing to companies wishing to raise funds for their startup projects. The Кrios platform includes a centralized exchange of listings with decentralized interconnection.

This is a great project, bringing great benefits to the community. Not only that, it also brings new development for all of us. It is pride and happiness to be able to participate and I've known about ico for a long time, but this is probably the first time I've been so surprised to see the benefits and the advancement of your ideas.

Go for it guys this is a great project!! This is an amazing project, When I read your white sheet, I was totally delighted with how this would change our life. I think that it depends on each of us.

Just some of the fake comments by new shills posting on this thread.

A shill is a confidence trickster or swindler who poses as a genuine customer to entice or encourage others.


https://blockogy.com/wp-content/uploads/2017/11/Shill-crypto-like-a-pro-feature-image.png

Is it wise to trust a ICO or coin that uses dishonesty to attract investors ?




Title: Re: Hompgraphs are back. Someone up to make a list of homographic characters?
Post by: Piggy on August 28, 2018, 06:29:43 AM
if i understood correctly they are mixing letter from different alphabets, this could be quite easy to spot by:

  • parsing the message
  • reporting the message
  • then check manually the message

I mean i don't see this going very far with this trick

I think Quickseller was hinting maybe on an automated program that can check in the different language sections for the valid and invalid characters.


What i meant was, if your message contains a small % of cyrillic caracthers, because some ordinary character was substituted to hide the plagiarism, that can quite easily spotted by checking the the text.
 
Regular expressions, if i remember correctly, can do that quite easily for other languages characters.



Title: Re: [UPDATE]Homographs are back. We have a list now. Theymos, it's your turn.
Post by: TheBeardedBaby on August 28, 2018, 01:18:26 PM
if i understood correctly they are mixing letter from different alphabets, this could be quite easy to spot by:

  • parsing the message
  • reporting the message
  • then check manually the message

I mean i don't see this going very far with this trick

I think Quickseller was hinting maybe on an automated program that can check in the different language sections for the valid and invalid characters.


What i meant was, if your message contains a small % of cyrillic caracthers, because some ordinary character was substituted to hide the plagiarism, that can quite easily spotted by checking the the text.
 
Regular expressions, if i remember correctly, can do that quite easily for other languages characters.



The easiest way to spot it is by searching for a single Cyrillic character, like for example "a", and excluding the local sections.
Then you get all the posts listed, often there are posts in Russian which I also report. I wish I had a report button from the search results but.. no.


Great, thanks everyone for the help, now we gonna sit and wait for reaction from the headquarters.

A bump to attract theymos' attention, I think I have to hire a bumping bot here :D jk.


Title: Re: Hompgraphs are back. Someone up to make a list of homographic characters?
Post by: LoyceV on August 28, 2018, 01:59:36 PM
Some of these are "legit" symbols in various languages, correct? For example Russian and I believe Hebrew use different symbols than English does.
Correct. That's why theymos wants to auto-replace them only on the English boards.
I'm not sure if that's going to help though, plagiarism by homograph attacks is much easier to detect than plagiarism through text spinners.


Title: Re: [UPDATE]Homographs are back. We have a list now. Theymos, it's your turn.
Post by: vlad230 on August 28, 2018, 02:29:21 PM
It's good that you guys created a list with them but to be realistic, I don't think theymos will fix any of these.

I think he has more important items on his agenda.


Title: Re: [UPDATE]Homographs are back. We have a list now. Theymos, it's your turn.
Post by: TheBeardedBaby on August 28, 2018, 05:56:38 PM
It's good that you guys created a list with them but to be realistic, I don't think theymos will fix any of these.

I think he has more important items on his agenda.

He said that if the things with the hompgraphs became more serious, he gonna implement this "fix". I think 80 hompgraphs per day is a serious thing


Title: Re: Hompgraphs are back. Someone up to make a list of homographic characters?
Post by: bitart on August 28, 2018, 10:02:40 PM
Some of these are "legit" symbols in various languages, correct? For example Russian and I believe Hebrew use different symbols than English does.
Correct. That's why theymos wants to auto-replace them only on the English boards.
I'm not sure if that's going to help though, plagiarism by homograph attacks is much easier to detect than plagiarism through text spinners.
Is it also possible to auto-replace some other kind of strings like 'good project' etc.. to something like this: 'please ban me I'm a bounty hunter' ? :D
Also, you have to wait, report badges were here first to implement :)

To be more serious:
These spammers who use these special characters don't think that mods will easily find those posts and will delete them? Their activity will decrease and they won't get paid... It's a kind of thing that can be spotted easily so it doesn't worth the effort, but this is just my opinion.... Do you think they don't read the Meta section at all?



Title: Re: Hompgraphs are back. Someone up to make a list of homographic characters?
Post by: TheBeardedBaby on August 28, 2018, 10:16:44 PM

To be more serious:
These spammers who use these special characters don't think that mods will easily find those posts and will delete them? Their activity will decrease and they won't get paid... It's a kind of thing that can be spotted easily so it doesn't worth the effort, but this is just my opinion.... Do you think they don't read the Meta section at all?

I asked many times to add those to the rules (https://bitcointalk.org/index.php?topic=3393118.msg35534083#msg35534083) but got no support from theymos. They can get away only with deleted posts instead of ban as they are hiding plagiarism but it's difficult to prove it.
Oh they read meta for sure, when I suggested to ban everyone who has more than 1 changed character in a post, they started posting with only one hompgraph - the popular "a very" spam.


Title: Re: [UPDATE]Homographs are back. We have a list now. Theymos, it's your turn.
Post by: theymos on August 29, 2018, 01:30:03 AM
Done. I only did the ones that look really similar to Latin characters, and it only applies to English sections. It's done at display time, so it's retroactive.


Title: Re: [UPDATE]Homographs are back. We have a list now. Theymos, it's your turn.
Post by: Foxpup on August 29, 2018, 04:04:57 AM
Done. I only did the ones that look really similar to Latin characters, and it only applies to English sections. It's done at display time, so it's retroactive.
What does this mean for Russian text that is legitimately posted in English sections?

For reference, the correct translation of "ктo-нибyдь" is "someone" or "somebody", not "who - нибyдь". Come on, even Google Translate gets that one right. ::)
Nope, Google Translate can't make heads or tails of it now. :( This could be a problem (though whether it's a bigger problem than plagiarism remains to be seen).


Title: Re: [UPDATE]Homographs are back. We have a list now. Theymos, it's your turn.
Post by: Quickseller on August 29, 2018, 04:30:21 AM
The English sections should only contain English. If a post is posted in Russian in one of the English sections it would be off topic and should be reported


Title: Re: [UPDATE]Homographs are back. We have a list now. Theymos, it's your turn.
Post by: TheBeardedBaby on August 29, 2018, 04:43:53 AM
The English sections should only contain English. If a post is posted in Russian in one of the English sections it would be off topic and should be reported

Yes, I report every post I find written in other languages than English.

Done. I only did the ones that look really similar to Latin characters, and it only applies to English sections. It's done at display time, so it's retroactive.

Great, I'll be monitoring the next few days to see how it goes :)


Title: Re: [UPDATE]Homographs are back. We have a list now. [FIXED]
Post by: TheBeardedBaby on August 29, 2018, 06:25:27 AM
I have tested it now and I can say that this is great!! I'm able to search for homographs as before and I get results highlighted in yellow as before, but not the actual posts are converted to Latin so it's easier to directly search for plagiarism.

Test Example >

Here is one from today >
Greetings  Well, forum may not be the best solution for publishing news but a lot of people are simply accustomed to forums ... anyway thank you for your efforts and answers here.
This is how it looks in the search engine, same as before>
https://i.imgur.com/bqVEye3.png

Copied text from the search results and tested here  (https://www.textmagic.com/free-tools/unicode-detector) >
https://i.imgur.com/EKOpB1D.png

Copied text from the actual post and tested here (https://www.textmagic.com/free-tools/unicode-detector) >

https://i.imgur.com/vaZgzyU.png


Here is the effect of the fix on the Cyrillic posting outside the local section >

я paд чтo yзнaл o пpoeктe пoчти в caмoм нaчaлe eгo paбoты, нpaвитcя чтo paзpaбoтчики внeдpяют cиcтeмy пocтeпeннo, нe фopcиpyя coбытия!лyчшe cдeлaть внaчaлe пpoдyкт и yжe пoтoм выxoдить c ним, a нe дopaбaтывaть eгo yжe в пpoцecce.yдaчи вaм!

https://i.imgur.com/HC4maIC.png


Title: Re: [FIXED]Homographs are fixed.Thank you theymos, again. See my report.
Post by: Piggy on August 29, 2018, 08:45:30 AM
could you point me to the original content for the first message or the source from where was copied?


Title: Re: [FIXED]Homographs are fixed.Thank you theymos, again. See my report.
Post by: TheBeardedBaby on August 29, 2018, 09:02:18 AM
could you point me to the original content for the first message or the source from where was copied?

No problem >

The original :
~ snip~

Well forum may not be the best solution for publishing news but a lot of people are simply accustomed to forums ... anyway thank you for your efforts and answers here.
The copy:
Greetings  Well, forum may not be the best solution for publishing news but a lot of people are simply accustomed to forums ... anyway thank you for your efforts and answers here.

BTW,I'll copy this to the Plagiarisms accusation thread.


Title: Re: [UPDATE]Homographs are back. We have a list now. [FIXED]
Post by: inPRIVACYweBELIEVE on August 29, 2018, 09:35:11 AM
Copied text from the search results and tested here  (http://) >
The URL is missing however I guess it's the same URL below
https://www.textmagic.com/free-tools/unicode-detector


Title: Re: [UPDATE]Homographs are back. We have a list now. [FIXED]
Post by: TheBeardedBaby on August 29, 2018, 10:02:49 AM
~
The URL is missing however I guess it's the same URL below
https://www.textmagic.com/free-tools/unicode-detector

Thanks, fixed it :)