Previous limit of 750 chars enclosed in php tags in a single line was too low... false positives were being triggered by a w3 total cache file because some guy decided to print one gigantic long message in a single line.
Raising to 1100
- Gave each flag option a short or long option; like i:ignore or d:directory or k:hide-ok
- Added a verbose option that instructs the scan to scan a file for ALL matches and not just stop at the first one.
- Restructured the output code to allow for the verbose flag, mainly a new function printPath and where the md5 hash is computed
- Modified the output to be cleaner, checksum is printed first as it is fix-width and to make it easier to paste into the whitelist file.
- Modified the output to be 'bash safe', ie when I accidentally paste my scan results into my terminal, the '#' should make sure everything is treated as a comment. This is in contrast to possibly attempting to execute absolute paths to potentially malicious PHP scripts and the usage of the '>' which tells the shell to write to a file. Also enclosed each path in {} for similar purposes.
- Printing the matched string/pattern in $color... might change later depending on preference.
the 's' flag tells preg_match to operate in multi-line mode. the 'm' flag does the same, but allows line begin and ends to still be matched which is useful in some cases.
One common tactic is to shove all of your PHP code into a single line, often contained within its own PHP tags, and drop it into any .php file that you want. This pattern should detect if more than 750 characters are contained within PHP tags on a single line.
Added a single short flag for every long flag and a single long flag for every short flag.
This now gives us 2 ways to set each flag.
Also updated the showhelp.
Dropped an unnecessary 'else' statement.
str_ will match 13 separate php functions, many of which can be used for string/modifcation aka obfuscation
function added to catch function defining.
echo added as it is a common php keyword, though experimental... may cause a of false positives
include added as it is often used to link in other malware files.
This is a file containing a list of PHP keywords converted to base64. It's designed to be used as a pattern file to identify common keywords used in obfuscated code.
Found a couple of cases where the php functions array_shift and addcslashes were used in base64 encoded malware.
Adding strings to catch any references to 'cslashes' which will catch both addcslashes and strip cslashes
Adding strings to catch any references to 'array_' which will catch about a dozen array modification functions.
removed mail b64, added chr b64
mail was generating too many false positives.
chr has only one pattern that is long enough to use with any sort of reliability, but it is one that we want to look out for anyway.
Example usage:
I want to see if a giant block of base64 code contains any references to the string 'base64'.
The naive approach is to convert the string to it's base64 equivalent, YmFzZTY0.
There are two problems with this approach. The first is that the string will be different depending on the position of the first character 'Y' in the input string. Possible offents are 0 bits, 2 bits or 4 bits. The above example only calculates the 0 bit offset. There should be 3 separate base64 strings to look for.
The second problem is that base64 strings use a 6 bit encoding, so the characters don't align the same as 8 bit encoding. This leads to character bleeding at the beginning and ends of a string where the string will change depending on its immediate context. This script calculates the maximum constant string length that should be present. Unfortunately it requires trimming characters which can often lead to very short strings.
Found a bug in my base64 converter
My base64 conversion script is supposed to find the maximum length string that is guaranteed to be present if the input plain text string is somewhere in the original plain text code, however there was an off by 1 error which made some patterns 1 character longer than they should have been. Short patterns (ie 4 chars) were prone to false positives because they really were 3 character patterns which is too short to be useful. Long patterns were likely missing results.
Should be fixed now.
shortened the base64 fingerprints of 'base64_decode' to just 'base64'. will also catch cases of base64_encode which isn't quite so bad but still worth finding.
interesting note from the php.net manual on create_function:
Caution
This function internally performs an eval() and as such has the same security issues as eval(). Additionally it has bad performance and memory usage characteristics.
This file just contains a list of internal php 7 functions (probably incomplete depending on extensions etc) and their 3 base64 fingerprints. It is designed to be used as either a pattern file to explore potential patterns that may be effective, or simply as a reference to translate between plain text php and the 3 different base64 versions.
This is a file of base64 patterns that represent strings that would be present if any of the functions in php7 were encoded to base64. I'll probably add structure later by grouping them with their plain text translation.
This file is useful to swap out with patterns_raw.txt to gain additional insights into other strings to search for in base64.
There's enough raw patterns in here to justify organizing the file.
Now that whitespace and comments are supported, I've been dividing it into sections
More critical problems should be near the top as I would rather the script identify a file as a backdoor instead of as a spammer.
I don't know the history behind a lot of these or the implication of the code, so I'm sure I mis-categorized many. There are also many that I have not done yet.
The pattern files are large and complex enough to justify some whitespace and comments to explain what each entry is.
Added logic to check if the line is empty or if the first character is equal to '#' before using it as a pattern. Simply skips over empty and commented lines.
cat php-malware-scanner-master/whitelist.txt | sort -k 2,2 -k 1,1 | less
More of an OCD thing than anything, but might as well sort primarily by file path, secondarily by hash value.
cat whitelist.txt | sort -k 2 | less
No reason this shouldn't be sorted perfectly to keep like files together.
No white list rules changed... just plain sorting.
preg_replace should be shortened to just replace as it will also match str_replace, str_ireplace, ereg_replace, eregi_replace and many others I'm sure. Should increase number of hits.
'preg_replace' base64 strings: (removed)
cHJlZ19yZXBsYWNl
ByZWdfcmVwbGFjZ
wcmVnX3JlcGxhY2
'replace' base64 strings: (added)
cmVwbGFjZ
JlcGxhY2
yZXBsYWNl