Say you have a binary file that is 20 bytes long that you want to embed the message "Helloworld" (10 bytes, one byte per character) into it, in a way not so obvious to anyone but whom the message is intended for. We will divide both filesizes by 5 in this example for the size of our grid... The grid can be any size that is you can write an equation around.
[M][Z][0][0][0]
[0][0][0][0][0]
[0][0][0][0][0] + [H][e][l][l][o]
[0][0][0][0][0] + [w][o][r][l][d]
We don't care about the binary file, it is the message that is important. If it just seems 'corrupted' by anyone else all the better.
Most of us will probably remember rise over run from elementary school. Treating each row in the message separately, we can insert the message using a slope evenly and easily into the binary file (using 1/1 and -1/1 for "world" and "Hello", respectively). With some trig thrown in, you can get some nice graph-like steganography:
[H][Z][0][0][o]
\ /
[0][e][r][l][0]
/\ /\
[0][o][l][l][0]
/ \
[w][0][0][0][d]
Pardon the rough mockup. Hopefully it is easy to follow.
For the technicals, we assume that [w] is located at (0,0). Both y and x = 0. That gives us a y-max of 4 and an x-max of 5. Our 'Hello' row has its trough at (3,1) and its peak at both (0,4) and (5,4). Our 'world' row has its peak at (3,3) and its troughs at (0,0) and (0,5) . From this, we can derive an equation for each line and can piece together the message.
Of course, as the message/data to be hidden grows, the math can become more and more complicated, but the amount of stealthiness is only limited to your imagination.
That falls by Zipf's law, a simple statistical exercise that can reveal both the presence of a language (1/f distribution of tokens) and its characteristics, e.g. which language.
ReplyDelete