Found a couple of cases where the php functions array_shift and addcslashes were used in base64 encoded malware.
Adding strings to catch any references to 'cslashes' which will catch both addcslashes and strip cslashes
Adding strings to catch any references to 'array_' which will catch about a dozen array modification functions.
removed mail b64, added chr b64
mail was generating too many false positives.
chr has only one pattern that is long enough to use with any sort of reliability, but it is one that we want to look out for anyway.
shortened the base64 fingerprints of 'base64_decode' to just 'base64'. will also catch cases of base64_encode which isn't quite so bad but still worth finding.
interesting note from the php.net manual on create_function:
Caution
This function internally performs an eval() and as such has the same security issues as eval(). Additionally it has bad performance and memory usage characteristics.
There's enough raw patterns in here to justify organizing the file.
Now that whitespace and comments are supported, I've been dividing it into sections
More critical problems should be near the top as I would rather the script identify a file as a backdoor instead of as a spammer.
I don't know the history behind a lot of these or the implication of the code, so I'm sure I mis-categorized many. There are also many that I have not done yet.
preg_replace should be shortened to just replace as it will also match str_replace, str_ireplace, ereg_replace, eregi_replace and many others I'm sure. Should increase number of hits.
'preg_replace' base64 strings: (removed)
cHJlZ19yZXBsYWNl
ByZWdfcmVwbGFjZ
wcmVnX3JlcGxhY2
'replace' base64 strings: (added)
cmVwbGFjZ
JlcGxhY2
yZXBsYWNl
JHZpc2l0Y291bnQgPSAkSFRUUF9DT09LSUVf is correct. encoded version of "$visitcount = $HTTP_COOKIE_"
I seem to have added a couple of extra characters than what I should have. Not sure where they came from.
Because base64 converts from an 8 bit to a 6 bit character system, you can get 3 unique base64 strings from a single ascii string depending on the position of the first character.
for example:
base64_encode("system");
base64_encode(" system");
base64_encode("( system");
The above 3 input strings all produce very different base64 signatures even though they all contain the same keyword 'system'. This is because the first letter of system, 's' fall on indices 0,1,2 respectively.
I updated several of the base64 samples to include their offset counterparts as the originals would only catch about 1 in 3 of the actual present matches.