Let's cover more ground in the Android realm: this rule matches on Java .class files while making sure that the constant pool of those files is within sane boundaries. Feel free to negate those checks to find weird .class files instead.
```
rule JavaClass {
meta:
description = "Java class file with a sane constant pool and the first constant being printable"
author = "@larsborn"
date = "2024-02-18"
reference = "https://en.wikipedia.org/wiki/Java_class_file"
example_hash = "158a19eb94aa2f3e2f459db69ee10276c73b945dd6c5f8fc223cf2d85e2b5e33"
DaysofYARA = "24/100"
condition:
uint32be(0) == 0xcafebabe
and uint16be(6) & 0xff >= 43 // major version
and 3 < uint16be(8) and uint16be(8) <= 3000 // sane constant pool count bounds
and 3 < uint16be(11) and uint16be(11) <= 300 // sane first constant length
and for all i in ( 1 .. uint16be(11) ) : ( // first constant printable
0x20 <= (uint16be(11 + i) & 0xff) and (uint16be(11 + i) & 0xff) < 127
)
}
```
Virus Total have released a (new?) cheat sheet for their Live Hunt YARA service, which requires the use of their custom "vt" YARA module:
https://assets.virustotal.com/reports/livehunt-cheatsheet.pdf
The original Virus Total Intelligence cheat sheet is available at:
https://storage.googleapis.com/vtpublic/reports/VTI%20Cheatsheet.pdf
Continuing with the Android theming: those file formats seem to make a point having their own size in the second DWORD. So here we go, a rule that matches on Android resource files (often named `resources.arsc`).
```
rule AndroidResourceArsc {
meta:
description = "Probably an Android resource file (i.e. resources.arsc)"
author = "@larsborn"
date = "2024-02-10"
reference = "https://androguard.readthedocs.io/en/latest/api/androguard.core.bytecodes.html#androguard.core.bytecodes.axml.AXMLParser"
example_hash = "e81b50d46350e67d4c60e156556e2698a9acbe73b8c2008ca0f8696a3e0e391a"
DaysofYARA = "22/100"
condition:
uint16be(0) == 0x0200 and uint32(4) == filesize
}
```
#100DaysofYARA day 45 - from the great find from Talos today, and inspired by ESET's great paper on the rich header (+ their sig formats) - a rule on the toolIDs+counts!
https://blog.talosintelligence.com/tinyturla-next-generation/
https://github.com/100DaysofYARA/2024/blob/main/glesnewich/APT_RU_Turla_TinyTurlaNG.yar
#100DaysofYARA catching up with Day 42 and 43 - rules looking for Zardoor! one looking for a weird export, narrowed by build information from the rich header, and another rule looking for weird resource type strings, CODER
https://github.com/100DaysofYARA/2024/blob/main/glesnewich/MAL_Zardoor.yar
#100DaysofYARA Day 38 - whyyyy would you embed an ISO into an LNK?? if you do it right you can mount run the file as an LNK or mount it as an ISO apparently... but WHY???
https://github.com/100DaysofYARA/2024/blob/main/glesnewich/SUSP_LNK_Embedded_ISO.yar
@glesnewich @captainGeech hey my two fellow #100DaysofYARA on #mastdon companions. Haven't seen your posts for a few days. Taking a break? I really enjoyed reading your rules and also company is always motivating for me!
@0x1c Great question!
It’s because `$mz at 0` isn’t very efficient. In the background, that condition causes YARA to first search for *every* single instance of "MZ" in the file. And because that is such a short sequence of bytes, there are likely to be a great number of them. Only after YARA has found ALL "MZ" occurrences, does it evaluate the `at 0` portion of the rule.
In comparison, `uint16be(0) == 0x4D5A` (and other $string-less conditions), evaluate that part of the condition immediately, and are therefore more performant. Which really makes a difference when searching across a huge corpus of samples. Hope this explanation helps!
Rule that checks the file magic of ZIP files, then inspects the "central directory" to only allow archives with singles files, then retrieves the location of the first (and only) entry in the central directory, and finally checks if it has the bit set for password protection.
While researching (for the n-th time) how ZIP files work, I realized (again) that there doesn't seem to be a canonical way to find the end of directory. Everyone just suggests to "hunt for it" starting at the end of the file.
```
rule SingleFileInPasswordProtectedZip {
meta:
description = "Inspects ZIP-specific data structures to match on archives containing a single encrypted file"
author = "@larsborn"
date = "2024-02-08"
reference = "https://users.cs.jmu.edu/buchhofp/forensics/formats/pkzip.html"
example_hash = "8bfc289b12e0900c2e9e9116c54cd7c7f6dad53916ff48620a7d8a6a8ee09564"
DaysofYARA = "17/100"
condition:
uint32be(0) == 0x504b0304 // ZIP magic
and for any i in ( 0 .. 0x100 ) : ( // hunt for end of directory
uint32be(filesize - i) == 0x504b0506 // end of central directory magic
and uint16(filesize - i + 0xa) == 1 // single file
and uint32be(uint32(filesize - i + 0x10)) == 0x504b0102 // file header magic
and uint16(uint32(filesize - i + 0x10) + 8) & 1 == 1 // password protection
)
}
```
Many cryptograhic algorithms need some initial values. If you would just use a randomly generated byte sequence for those, folks might accuse you of actually not generating the sequence at random. Instead you could have integrated sort of trapdoor or trick to give you — as the designer of the algorithm — an advantage when attacking it. A nothing-up-my-sleeve number is an "otherwise famouse" sequence of bytes making the above-described scenario much less probable: "Using the first 10 digits of π as constants allow an attacker to predict the pseudo-random numbers generated by the algorithm" said nobody ever.
ANYWAY: here's a rule matching on two nothing-up-my-sleeve strings used in the Salsa20 stream cipher.
```
rule salsa20
{
meta:
description = "Nothing-up-my-sleeve number used in the Salsa20 stream cipher"
author = "@larsborn"
author = "@huettenhain"
date = "2020-08-23"
reference = "https://en.wikipedia.org/wiki/Salsa20"
example_hash_abcbot = "1fc59a86915eca78dbe0f90c7e0ee3fac6f7e5160c26a04330bf3858f7e5c1f2"
example_hash_egregor = "d893f26330906bedcad2627f41135f0fda65bc4dfe1f4186cd60d4546469b3c3"
example_hash_netwalker = "de04d2402154f676f757cf1380671f396f3fc9f7dbb683d9461edd2718c4e09d"
example_hash_revil = "12d8bfa1aeb557c146b98f069f3456cc8392863a2f4ad938722cd7ca1a773b39"
example_hash_stealth_worker = "f48628472e35ac54f2b0b42583dfa04ae62ae644ba036dad5abf7efc545393c9"
example_hash_xaynnalc = "b277fb8b666f8b5c179ddac940fad90a3e38b23170931e1226dd1676404dbfec"
DaysofYARA = "15/100"
strings:
$ = "expand 32-byte k"
$ = "expand 16-byte k"
condition:
any of them
}
```
That's me all caught up now! I will probably commit in batches where I can't do daily for another while https://github.com/100DaysofYARA/2024/pull/113 Made some updates to previous rules looking for Rust & Golang FreeBSD kernel modules with inspiration from @captainGeech rules. And also added two rules looking for suspicious drivers. #100DaysofYARA
Had to take a little time out from yara while I was sick. Been trying to catch up & I'm almost there. Here's 19 rules that brings me up to Day 28. Rest to follow. https://github.com/100DaysofYARA/2024/pull/112 Nothing particularly interesting in the rules. Mostly quick detection rules for old & new code families. Mostly spent my time sleeping & doing some RE to make rules to match across samples. Played around with MCRIT too to generate rules across some code families. I THIIIIINK I've tested most of them on VT. YMMV. #100DaysofYARA