Monday, July 7, 2008

Any compression experts out there?

I had this idea today. Can anyone tell me how possible this is? I don't know anything about compression algorithms.

You create a mutating virus. The virus in itself includes a mutating compression algorithm that uses the MD5 sum of the mutating virus as the basis of the mutations of the compression algorithm. Given the compression will change each time the virus mutates, the virus definitions must be very generic* which would lend itself to many false-positives, unless you have an excellent heuristics engine (Avira has one of the best out there).

*This is speculation. I have only written small virus definitions, nothing to the extent of a mutating virus.


  1. I can't call myself a compression expert, but there are only so many correct ways to write MD5 of a given byte length. I think it would be very hard to mutate that part without changing the algorithm. It would be simple then pick out the bad eggs based on that. I'm not sure what the purpose of MD5 would be here; secretly encoding a random seed?

    It would be very difficult to do that I think, since you'd want to include the compressed data in the MD5, but now you have a circular loop of value dependencies that won't be easily solved.

  2. Any compressed virus (or any form of executable) needs to be able to decompress itself. Therefore, you will have a decompression stub at the start of any compressed executable. It's this stub that is generally used to define a virus signature, for compressed viruses.

    A clever virus would re-write the decompression routine every time. However, there is really a limited amount you can do[1] with this and therefore it's normally still possible to use a signature - or a fuzzy signature - to detect.

    I guess, since the decompression routine is the only thing that's not compressed, the key to what you're saying would be to write a compressor that re-writes the decompressor perfectly - something I'm sure virus writers are trying to do :-)

    [1] I don't know a huge amount, but there are a bunch of opcodes that don't really mean anything when you put them together. For instance:

    inc eax
    dec eax
    xor eax, 1
    xor eax, 1

    An intelligent anti-virus program may parse this through an interpreter and strip out the useless stuff, leaving the real, detectable decoder stub.