Hello Everyone,
The Achievement:
I have made an amazing discovery and spent a year trying to talk everyone I know into helping me with my solution for compressing 99.8% of all data out of a file, leaving only a basic crypto key containing the thread of how to re-create the entire file from scratch.
-snip-
Patent / Open Source or it didnt happen.
What you describe sounds like a very primitive compression method with a bit of "the engine just knows" magic.
Here is the whole thing boiled down a succinct as I can make it. There's no magic here.
Import data, convert to binary. Save. Re-import binary, convert to Layer 1, letters a-H (aAbBcC... etc) according to the Table above. The file will double in size.
> sure double the size, nothing complicated or magical yet, but this probably has to be done, so lets go on.
Save. Re-import and begin Layer 2. Encoding will begins here. Encode 1st chunk of 1024K using Crossword Grid alignment, 20 spaces per row.
> 1024K Byte data would be 1024*10^3 Byte unless you talk about Kibibyte (1024^2 Byte)
> So with 20 spaces per row (why 20?) we have a 51200*20 2-dim array
> dont know what this is for but I can follow without problem
Fill Array with data from Layer 1.
> left -> right? top -> low? the other way around? does it matter?
Now search data from current row, 2 rows at a time simultaneously,
> simultaneously? so this only works on parallel computers? well sad story there nothing to sell the masses, but lets assume its not needed
> and can be done alternating
20 spaces apart (essentially on top of each other in the crossword puzzle grid.
> 20 spaces apart? so we have an array like this
> 0123456789abcdefhijk
> 0123456789abcdefhijk
> 0123456789abcdefhijk
> ...
> 0 and k are 19 spaces apart, so you talk about 0 in line 0 and 0 in line 1?
Let's call that the CWG from now on.
> m'kay, why? you never use CWG again?
When a match is found, replace topmost row's match with unique identifier that means the same thing to the 2nd Layering engine as the match found. For example "aa" is replaced by "i"
> how do you code "i"? you have 37 (with "space") different things in layer 2, you need at least 6 bits to code "i"
> if you have a fixed table in your algorithm, so instead of the original 4 bits we made it to 6 with just a little stop at 8 bits.
> also as you state in your picture you loose data.
> aA or 0000 0001 can not be coded as well as 220 of 256 of possible binary codes
> ~ 86% data loss in this step, you might safe some with shifting, but you'd have to note at least the amount of shifts
> given a fixed direction so aA >>7 = ab, but that would cost you another 3 bit of encoding and i doubt
> that you can shift all possible combinations with the code in layer2
Now delete the bottommost match, leaving an empty space (the compression/encoding). Now shift the whole array to the right, displacing that empty space, which now becomes a zero at Row 1 Column 1. Now continue forward.
> with a chance of lossing information @ ~86% of the time.
Find all matches until the last line is (the last 20 cells are) reached.
> what if there is no match? what if 20 places from my "i" is "t"? what makes you think that all your binary input has the same data
> periodically?
At the last line, no more compression can be done because there isn't a line under it to compare to. What we are left with here is essentially a boiled down key. Nothing more can be done with this chunk. That key is saved inside the file we are building as such:
1004(0)_GbcDeEafFBAAcbeEBDfga_6(0)
Where the first block above is the topmost last line to the halfway point of the line and the 2ndmost line is the bottommost line to where it ends halfway (20 spaces total combined) and the final part is a message to software that 1004 zeros precede those two keys.
> 1004+20 = 1024 Byte? where did the other 1.022.976 Byte go? we had 1024 KByte when we started, so there should be
> 1.024.000-20 = 1.023.980 leading zeros and 20 magic signs
Now the engine can get rid of the 0's, leaving that small chunk to retrieve all the data later.
It would do so like this ....
0000000000000000000
0000000000000000000
0000000000000000000
0000000000000000000
0000000000000000000
0000000000000000000
GbcDeEafFBAcbeEBDfga000000
> ... hooold on a second here, lets test what we get when we do this.
> input: William Shakespeare (26 April 1564 (baptised) – 23 April 1616)[nb 1] was an English poet and playwright, widely regarded as the
> greatest writer in the English language and the world's pre-eminent dramatist.
> the first sentence on wikipedia regarding Shakespeare. Put it in a file and safe it with utf-8 and load that with our favorite
> hex editor to view the binary data.
EFBBBF57696C6C69616D205368616B657370656172652028323620417072696C203135363420286
2617074697365642920E2809320323320417072696C2031363136295B6E6220315D207761732061
6E20456E676C69736820706F657420616E6420706C61797772696768742C20776964656C7920726
5676172646564206173207468652067726561746573742077726974657220696E2074686520456E
676C697368206C616E677561676520616E642074686520776F726C642773207072652D656D696E6
56E74206472616D61746973742E
> Well its hex, no biggy, soo we need at least 20 things, lets take a little more for fun
EFBBBF57696C6C69616D205368616B657370656172652028323620417072696C203135363420286
2617074697365642920E2809320323320417072696C2031363136295B6E6220315D207761732061
6E
> we convert these binary data with your lay 1 table and get
hHFFFHCDdEdgdgdEdAdGbaCBdedAdFdCDBDadCdADbdCbabeBbBdbacADaDbdEdgbaBABCBdBcbabed
bdADaDcdEDBdCdcbEbahbeaEBbaBbBBbacADaDbdEdgbaBABdBABdbECFdhdbbaBACGbaDDdADBbadA
dh
> lets assume they are doubled and just take 10 in a row each
hHFFFHCDdE
dgdgdEdAdG
baCBdedAdF
dCDBDadCdA
DbdCbabeBb
BdbacADaDb
dEdgbaBABC
BdBcbabedb
dADaDcdEDB
!
dCdcbEbahb
eaEBbaBbBB
bacADaDbdE
dgbaBABdBA
BdbECFdhdb
baBACGbaDD
dADBbadAdh
> and we have 1 match, I cant even get to layer 2 with this. maybe you can enlighten us with continuing this example.
> back to your example
Now the engine counts how many zeros are in the block before the first actual piece of data to know how many iterations it had done to reach that final sequence. It counts the zeros, here we see 120 zeros total.
> 120 zeros? didnt we just have 1004 zeros befor the break??
The engine is told the key goes on the last line plus the number of zeros (empty spaces) that were left over. Now having figured out how may iterations to start from backwards, it begins comparing data back out, starting by re-ordering the entire sequence to the left. So the key would actually look like this:
0000000000000000000
0000000000000000000
0000000000000000000
0000000000000000000
0000000000000000000
0000000000000GbcDeE
afFBAcbeEBDfga000000
Which we can easily see the last line because the G and the a are not overtop of each other, meaning this is indeed the true last line. Depending on how good the compression is, the last line can occur anywhere in the block, as such :
0000000000000000000
0000000000000000000
0000000GbcDeEafFBAc
beEBDfga00000000000
0000000000000000000
0000000000000000000
0000000000000000000
As long as the block is totally intact and none of the pieces overlap, it is complete. The reason we need to know how many empty spaces were left at the end is so we can separate how many iterations occured with how many empty spaces were left, since not all the zeros here mean iterations.
I hope you can see this as clearly as I see it in my head. Its efficient and would totally work. I hope you will be able to see that by studying this.
> I hope you can clearly see now that all you have shown is that you can make x bytes of data as big as 1,5*x with some
> strange tabulars where is the layer 3 gone in this explanation?
> why cant you post a simple example? Take the binary data of the shakespeare sentence I provided.
by now I have to thank you for posting this. Quite a lot of fun reading thanks to it