Signatures
Signatures
Signatures
1 Introduction
CVD (ClamAV Virus Database) is a digitally signed container that includes signa-
ture databases in various text formats. The header of the container is a 512 bytes
long string with colon separated fields:
ClamAV-VDB:build time:version:number of signatures:functionality
level required:MD5 checksum:digital signature:builder name:build
time (sec)
The ClamAV project distributes a number of CVD files, including main.cvd and
daily.cvd.
1
at the debug information from libclamav. You can do it by calling clamscan with
the --debug and --leave-temps flags. The first switch makes clamscan display
all the interesting information from libclamav and the second one avoids deleting
temporary files so they can be analyzed further. The now important part of the
info is:
2
LibClamAV debug: PointerToRawData: 0x400 0x400
LibClamAV debug: Section’s memory is executable
LibClamAV debug: Section’s memory is writeable
LibClamAV debug: ------------------------------------
LibClamAV debug: Section 1
LibClamAV debug: Section name: UPX1
LibClamAV debug: Section data (from headers - in memory)
LibClamAV debug: VirtualSize: 0x9000 0x9000
LibClamAV debug: VirtualAddress: 0x1f000 0x1f000
LibClamAV debug: SizeOfRawData: 0x8200 0x8200
LibClamAV debug: PointerToRawData: 0x400 0x400
LibClamAV debug: Section’s memory is executable
LibClamAV debug: Section’s memory is writeable
LibClamAV debug: ------------------------------------
LibClamAV debug: Section 2
LibClamAV debug: Section name: UPX2
LibClamAV debug: Section data (from headers - in memory)
LibClamAV debug: VirtualSize: 0x1000 0x1000
LibClamAV debug: VirtualAddress: 0x28000 0x28000
LibClamAV debug: SizeOfRawData: 0x200 0x1ff
LibClamAV debug: PointerToRawData: 0x8600 0x8600
LibClamAV debug: Section’s memory is writeable
LibClamAV debug: ------------------------------------
LibClamAV debug: EntryPoint offset: 0x8470 (33904)
The section structure displayed above suggests the executable is packed with
UPX.
LibClamAV debug: ------------------------------------
LibClamAV debug: EntryPoint offset: 0x8470 (33904)
LibClamAV debug: UPX/FSG/MEW: empty section found - assuming
compression
LibClamAV debug: UPX: bad magic - scanning for imports
LibClamAV debug: UPX: PE structure rebuilt from compressed file
LibClamAV debug: UPX: Successfully decompressed with NRV2B
LibClamAV debug: UPX/FSG: Decompressed data saved in
/tmp/clamav-90d2d25c9dca42bae6fa9a764a4bcede
LibClamAV debug: ***** Scanning decompressed file *****
LibClamAV debug: Recognized MS-EXE/DLL file
LibClamAV debug: Matched signature for file type PE
3
Indeed, libclamav recognizes the UPX data and saves the decompressed (and
rebuilt) executable into /tmp/clamav-90d2d25c9dca42bae6fa9a764a4bcede.
Then it continues by scanning this new file:
4
LibClamAV debug: SizeOfRawData: 0x2000 0x2000
LibClamAV debug: PointerToRawData: 0xd000 0xd000
LibClamAV debug: ------------------------------------
LibClamAV debug: Section 2
LibClamAV debug: Section name: .data
LibClamAV debug: Section data (from headers - in memory)
LibClamAV debug: VirtualSize: 0x17000 0x17000
LibClamAV debug: VirtualAddress: 0xf000 0xf000
LibClamAV debug: SizeOfRawData: 0x17000 0x17000
LibClamAV debug: PointerToRawData: 0xf000 0xf000
LibClamAV debug: Section’s memory is writeable
LibClamAV debug: ------------------------------------
LibClamAV debug: EntryPoint offset: 0x7b9f (31647)
LibClamAV debug: Bytecode executing hook id 257 (0 hooks)
attachment.exe: OK
[...]
No additional files get created by libclamav. By writing a signature for the de-
compressed file you have more chances that the engine will detect the target data
when it gets compressed with another packer.
This method should be applied to all files for which you want to create sig-
natures. By analyzing the debug information you can quickly see how the engine
recognizes and preprocesses the data and what additional files get created. Signa-
tures created for bottom-level temporary files are usually more generic and should
help detecting the same malware in different forms.
3 Signature formats
3.1 Hash-based signatures
The easiest way to create signatures for ClamAV is to use filehash checksums,
however this method can be only used against static malware.
5
That’s it! The signature is ready for use:
zolw@localhost:/tmp/test$ clamscan -d test.hdb test.exe
test.exe: test.exe FOUND
You can change the name (by default sigtool uses the name of the file) and place
it inside a *.hdb file. A single database file can include any number of signatures.
To get them automatically loaded each time clamscan/clamd starts just copy the
database file(s) into the local virus database directory (eg. /usr/local/share/clamav).
The hash-based signatures shall not be used for text files, HTML and any other
data that gets internally preprocessed before pattern matching. If you really want
to use a hash signature in such a case, run clamscan with –debug and –leave-
temps flags as described above and create a signature for a preprocessed file left
in /tmp. Please keep in mind that a hash signature will stop matching as soon as
a single byte changes in the target file.
6
The easiest way to generate MD5 based section signatures is to extract target PE
sections into separate files and then run sigtool with the option --mdb
ClamAV 0.98 has also added support for SHA1 and SHA256 section based
signatures. The format is the same as for MD5 PE section based signatures. It can
differentiate between them based on the length of the hash string in the signature.
For best backwards compatibility, these should be placed inside a *.msb file.
3.2.2 Wildcards
ClamAV supports the following wildcards for hex-signatures:
7
• ??
Match any byte.
• a?
Match a high nibble (the four high bits).
IMPORTANT NOTE: The nibble matching is only available in libcla-
mav with the functionality level 17 and higher therefore please only use
it with .ndb signatures followed by ”:17” (MinEngineFunctionalityLevel,
see 3.2.7).
• ?a
Match a low nibble (the four low bits).
• *
Match any number of bytes.
• {n}
Match n bytes.
• {-n}
Match n or less bytes.
• {n-}
Match n or more bytes.
• {n-m}
Match between n and m bytes (m > n).
• HEXSIG[x-y]aa or aa[x-y]HEXSIG
Match aa anchored to a hex-signature, see https://bugzilla.clamav.
net/show_bug.cgi?id=776 for discussion and examples.
The range signatures * and {} virtually separate a hex-signature into two parts,
eg. aabbcc*bbaacc is treated as two sub-signatures aabbcc and bbaacc with
any number of bytes between them. It’s a requirement that each sub-signature
includes a block of two static characters somewhere in its body. Note that there
is one exception to this restriction; that is when the range wildcard is of the form
{n} with n<128. In this case, ClamAV uses an optimization and translates {n}
to the string consisting of n ?? character wildcards. Character wildcards do not
divide hex signatures into two parts and so the two static character requirement
does not apply.
8
3.2.3 Character classes
ClamAV supports the following character classes for hex-signatures:
• (B)
Match word boundary (including file boundaries).
• (L)
Match CR, CRLF or file boundaries.
• (W)
Match a non-alphanumeric character.
9
Note that using signature modifiers and wildcards classifies the alternate type to
be a generic alternate. Thus single-byte alternates and multi-byte fixed length al-
ternates can use signature modifiers and wildcards but will be classified as generic
alternate. This means that negation cannot be applied in this situation and there is
a slight performance impact.
ClamAV will scan the entire file looking for HexSignature. All signatures of this
type must be placed inside *.db files.
where TargetType is one of the following numbers specifying the type of the
target file:
• 0 = any file
• 4 = Mail file
• 5 = Graphics
10
• 6 = ELF
• 8 = Unused
• 9 = Mach-O files
• 10 = PDF files
• 11 = Flash files
• * = any
• n = absolute offset
All the above offsets except * can be turned into floating offsets and represented
as Offset,MaxShift where MaxShift is an unsigned integer. A floating offset
will match every offset between Offset and Offset+MaxShift, eg. 10,5 will
match all offsets from 10 to 15 and EP+n,y will match all offsets from EP+n to
EP+n+y. Versions of ClamAV older than 0.91 will silently ignore the MaxShift
extension and only use Offset.
Optional MinFL and MaxFL parameters can restrict the signature to specific engine
releases. All signatures in the extended format must be placed inside *.ndb files.
11
3.2.7 Logical signatures
Logical signatures allow combining of multiple signatures in extended format us-
ing logical operators. They can provide both more detailed and flexible pattern
matching. The logical sigs are stored inside *.ldb files in the following format:
SignatureName;TargetDescriptionBlock;LogicalExpression;Subsig0;
Subsig1;Subsig2;...
where:
12
• Intermediates:CL_TYPE_*>CL_TYPE_*: File types of intermediate con-
tainers which stores the scanned file. Specify 1-16 file types separated by
’>’ in top-down order (’>’ separator not needed for single file type), last type
should be the immediate container for the malicious content. CL_TYPE_ANY
can be used as a wildcard file type. (expr; 0.99.3)
• IconGroup1: Icon group name 1 from .idb signature Required engine func-
tionality (range; 0.96)
• IconGroup2: Icon group name 2 from .idb signature Required engine func-
tionality (range; 0.96)
Examples:
Sig1;Target:0;(0&1&2&3)&(4|1);6b6f74656b;616c61;7a6f6c77;7374656
6616e;deadbeef
Sig2;Target:0;((0|1|2)>5,2)&(3|1);6b6f74656b;616c61;7a6f6c77;737
13
46566616e
Sig3;Target:0;((0|1|2|3)=2)&(4|1);6b6f74656b;616c61;7a6f6c77;737
46566616e;deadbeef
Sig4;Target:1;Engine:18-20;((0|1)&(2|3))&4;EP+123:33c06834f04100
f2aef7d14951684cf04100e8110a00;S2+78:22??232c2d252229{-15}6e6573
(63|64)61706528;S3+50:68efa311c3b9963cb1ee8e586d32aeb9043e;f9c58
dcf43987e4f519d629b103375;SL+550:6300680065005c0046006900
• Case-Insensitive [i]
Specifying the i modifier causes ClamAV to match all alphabetic hex bytes
as case-insensitive. All patterns in ClamAV are case-sensitive by default.
• Wide [w]
Specifying the w causes ClamAV to match all hex bytes encoded with two
bytes per character. Note this simply interweaves each character with NULL
characters and does not truly support UTF-16 characters. Wildcards for
’wide’ subsignatures are not treated as wide (i.e. there can be an odd num-
ber of intermittent characters). This can be combined with a to search for
patterns in both wide and ascii.
• Fullword [f]
Match subsignature as a fullword (delimited by non-alphanumeric charac-
ters).
• Ascii [a]
Match subsignature as ascii characters. This can be combined with w to
search for patterns in both ascii and wide.
Examples:
clamav-nocase-A;Engine:81-255,Target:0;0&1;41414141::i;424242424242::i
-matches ’AAAA’(nocase) and ’BBBBBB’(nocase)
14
clamav-fullword-A;Engine:81-255,Target:0;0&1;414141;68656c6c6f::f
-matches ’AAA’ and ’hello’(fullword)
clamav-fullword-B;Engine:81-255,Target:0;0&1;414141;68656c6c6f::fi
-matches ’AAA’ and ’hello’(fullword nocase)
clamav-wide-B2;Engine:81-255,Target:0;0&1;414141;68656c6c6f::wa
-matches ’AAA’ and ’hello’(wide ascii)
clamav-wide-C0;Engine:81-255,Target:0;0&1;414141;68656c6c6f::iwfa
-matches ’AAA’ and ’hello’(nocase wide fullword ascii)
Example:
test.ldb:
TestMacro;Engine:51-255,Target:0;0&1;616161;${6-7}12$
test.ndb:
D1:0:$12:626262
D2:0:$12:636363
D3:0:$30:626264
15
– In the example, {min-max} is {6-7} and it is relative to the start of a
616161 match.
• PCRE is the expression representing the regex to execute. PCRE must be de-
limited by ’/’ and usage of ’/’ within the expression need to be escaped.
For backward compatibility, ’;’ within the expression must be expressed
as ’\x3B’. PCRE cannot be empty and (?UTF*) control sequence is not al-
lowed. If debug is specified, named capture groups are displayed in a post-
execution report.
• Flags are a series of characters which affect the compilation and execution
of PCRE within the PCRE compiler and the ClamAV engine. This field is
optional.
16
– s [PCRE_DOTALL]
– m [PCRE_MULTILINE]
– x [PCRE_EXTENDED]
– A [PCRE_ANCHORED]
– E [PCRE_DOLLAR_ENODNLY]
– U [PCRE_UNGREEDY]
Examples:
Find.All.ClamAV;Engine:81-255,Target:0;1;6265676c6164697427736e6
f7462797465636f6465;0/clamav/g
Find.ClamAV.OnlyAt.299;Engine:81-255,Target:0;2;7374756c747a6765
7473;7063726572656765786c6f6c;299:0&1/clamav/
Find.ClamAV.StartAt.300;Engine:81-255,Target:0;3;616c61696e;6275
6731393238;636c6f736564;300:0&1&2/clamav/r
Find.All.Encompassed.ClamAV;Engine:81-255,Target:0;3;77687961726
56e2774;796f757573696e67;79617261;200,300:0&1&2/clamav/ge
Named.CapGroup.Pcre;Engine:81-255,Target:0;3;636f75727479617264;
616c62756d;74657272696572;50:0&1&2/variable=(?<nilshell>.{16})en
d/gr
Firefox.TreeRange.UseAfterFree;Engine:81-255,Target:0,Engine:81-
255;0&1&2;2e766965772e73656c656374696f6e;2e696e76616c69646174655
3656c656374696f6e;0&1/\x2Eview\x2Eselection.*?\x2Etree\s*\x3D\s*
null.*?\x2Einvalidate/smi
Firefox.IDB.UseAfterFree;Engine:81-255,Target:0;0&1;4944424b6579
52616e6765;0/ˆ\x2e(only|lowerBound|upperBound|bound)\x28.*?\x29.
*?\x2e(lower|upper|lowerOpen|upperOpen)/smi
Firefox.boundElements;Engine:81-255,Target:0;0&1&2;6576656e742e6
26f756e64456c656d656e7473;77696e646f772e636c6f7365;0&1/on(load|c
lick)\s*=\s*\x22?window\.close\s*\x28/si
17
3.4 Icon signatures for PE files
ClamAV 0.96 includes an approximate/fuzzy icon matcher to help detecting ma-
licious executables disguising themselves as innocent looking image files, office
documents and the like.
Icon matching is only triggered via .ldb signatures using the special attribute
tokens IconGroup1 or IconGroup2. These identify two (optional) groups of icons
defined in a .idb database file. The format of the .idb file is:
ICONNAME:GROUP1:GROUP2:ICON_HASH
where:
The ICON_HASH field can be obtained from the debug output of libclamav. For
example:
18
to date. Suffice to say, this approach never really worked and is generally never
used.
The second block is much more interesting: it is a simple list of key/value
strings, intended for user information and completely ignored by the OS. For ex-
ample, if you look at ping.exe you can see the company being ”Microsoft Corpo-
ration”, the description ”TCP/IP Ping command”, the internal name ”ping.exe”
and so on... Depending on the OS version, some keys may be given peculiar
visibility in the file properties dialog, however they are internally all the same.
To match a versioninfo key/value pair, the special file offset anchor VI was
introduced. This is similar to the other anchors (like EP and SL) except that, in-
stead of matching the hex pattern against a single offset, it checks it against each
and every key/value pair in the file. The VI token doesn’t need nor accept a +/-
offset like e.g. EP+1. As for the hex signature itself, it’s just the utf16 dump of the
key and value. Only the ?? and (aa|bb) wildcards are allowed in the signature.
Usually, you don’t need to bother figuring it out: each key/value pair together with
the corresponding VI-based signature is printed by clamscan when the --debug
option is given.
For example clamscan --debug freecell.exe produces:
[...]
Recognized MS-EXE/DLL file
in cli_peheader
versioninfo_cb: type: 10, name: 1, lang: 410, rva: 9608
cli_peheader: parsing version info @ rva 9608 (1/1)
VersionInfo (d2de): ’CompanyName’=’Microsoft Corporation’ -
VI:43006f006d00700061006e0079004e0061006d006500000000004d006900
630072006f0073006f0066007400200043006f00720070006f0072006100740
069006f006e000000
VersionInfo (d32a): ’FileDescription’=’Entertainment Pack
FreeCell Game’ - VI:460069006c006500440065007300630072006900700
0740069006f006e000000000045006e007400650072007400610069006e006d
0065006e00740020005000610063006b0020004600720065006500430065006
c006c002000470061006d0065000000
VersionInfo (d396): ’FileVersion’=’5.1.2600.0 (xpclient.010817
-1148)’ - VI:460069006c006500560065007200730069006f006e00000000
0035002e0031002e0032003600300030002e003000200028007800700063006
c00690065006e0074002e003000310030003800310037002d00310031003400
380029000000
VersionInfo (d3fa): ’InternalName’=’freecell’ - VI:49006e007400
650072006e0061006c004e0061006d006500000066007200650065006300650
06c006c000000
19
VersionInfo (d4ba): ’OriginalFilename’=’freecell’ - VI:4f007200
6900670069006e0061006c00460069006c0065006e0061006d0065000000660
0720065006500630065006c006c000000
VersionInfo (d4f6): ’ProductName’=’Sistema operativo Microsoft
Windows’ - VI:500072006f0064007500630074004e0061006d00650000000
000530069007300740065006d00610020006f00700065007200610074006900
76006f0020004d006900630072006f0073006f0066007400ae0020005700690
06e0064006f0077007300ae000000
VersionInfo (d562): ’ProductVersion’=’5.1.2600.0’ - VI:50007200
6f006400750063007400560065007200730069006f006e00000035002e00310
02e0032003600300030002e0030000000
[...]
Although VI-based signatures are intended for use in logical signatures you can
test them using ordinary .ndb files. For example:
my_test_vi_sig:1:VI:paste_your_hex_sig_here
Final note. If you want to decode a VI-based signature into a human readable
form you can use:
echo hex_string | xxd -r -p | strings -el
For example:
$ echo 460069006c0065004400650073006300720069007000740069006f006e
000000000045006e007400650072007400610069006e006d0065006e007400200
05000610063006b0020004600720065006500430065006c006c00200047006100
6d0065000000 | xxd -r -p | strings -el
FileDescription
Entertainment Pack FreeCell Game
20
where the corresponding fields are:
• Trusted: bit field, specifying whether the cert is trusted. 1 for trusted. 0
for revoked
• CodeSign: bit field, specifying whether this cert can sign code. 1 for true,
0 for false
• CertSign: bit field, specifying whether this cert can sign other certs. 1 for
true, 0 for false
• NotBefore: integer, cert should not be added before this variable. Defaults
to 0 if left empty
VirusName:ContainerType:ContainerSize:FileNameREGEX:
FileSizeInContainer:FileSizeReal:IsEncrypted:FilePos:
Res1:Res2[:MinFL[:MaxFL]]
21
• ContainerType: one of CL_TYPE_ZIP, CL_TYPE_RAR, CL_TYPE_ARJ,
CL_TYPE_MSCAB, CL_TYPE_7Z, CL_TYPE_MAIL, CL_TYPE_(POSIX|OLD)_TAR,
CL_TYPE_CPIO_(OLD|ODC|NEWC|CRC) or * to match any of the container
types listed here
• ContainerSize: size of the container file itself (eg. size of the zip archive)
specified in bytes as absolute value or range x-y
22
• Compressed size (* to ignore)
• CRC32 (* to ignore)
The database file should have the extension of .zmd or .rmd for zip or rar metadata
respectively.
To whitelist a specific signature from the database you just add its name into a
local file called local.ign2 stored inside the database directory. You can addition-
ally follow the signature name with the MD5 of the entire database entry for this
signature, eg:
Eicar-Test-Signature:bc356bae4c42f19a3de16e333ba3569c
In such a case, the signature will no longer be whitelisted when its entry in the
database gets modified (eg. the signature gets updated to avoid false alerts).
23
• IRC for IRC trojans
• always use a -zippwd suffix in the malware name for signatures of type zmd,
• always use a -rarpwd suffix in the malware name for signatures of type rmd,
• only use alphanumeric characters, dash (-), dot (.), underscores ( ) in mal-
ware names, never use space, apostrophe or quote mark.
24
3.11 Using YARA rules in ClamAV
ClamAV version 0.99 and above can process YARA rules. ClamAV virus database
file names ending with “.yar” or “.yara” are parsed as yara rule files. The link to
the YARA rule grammar documentation may be found at http://plusvic.github.io/yara/.
There are currently a few limitations on using YARA rules within ClamAV:
• YARA modules are not yet supported by ClamAV. This includes the “im-
port” keyword and any YARA module-specific keywords.
• YARA rules pre-compiled with the yarac command are not supported.
In addition, there are a few more ClamAV processing modes that may affect the
outcome of YARA rules.
25
• YARA conditions driven by string matches - All YARA conditions are driven
by string matches in ClamAV. This saves from executing every YARA rule
on every file. Any YARA condition may be augmented with a string match
clause which is always true, such as:
rule CheckFileSize
{
strings:
$abc = "abc"
condition:
($abc or not $abc) and filesize < 200KB
}
This will ensure that the YARA condition always performs the desired ac-
tion (checking the file size in this example),
– 0 = cleartext
– 1 = hex
26
• Password: value used in password attempt
The signatures for password attempts are stored inside .pwdb files.
4 Special files
4.1 HTML
ClamAV contains a special HTML normalisation code which helps to detect HTML
exploits. Running sigtool --html-normalise on a HTML file should generate
the following files:
The code automatically decodes JScript.encode parts and char ref’s (e.g. f).
You need to create a signature against one of the created files. To eliminate poten-
tial false positive alerts the target type should be set to 3.
27