Vol. 8 No 1 (2023): Actes de Botconf 2023
Conference proceedings

Yara Studies: A Deep Dive into Scanning Performance

Dominika Regéciová
Gen

Publiée 2023-04-14

Mots-clés

  • Pattern matching,
  • Performance,
  • Regular expressions,
  • Yara,
  • Malware detection

Comment citer

Regéciová, D. (2023). Yara Studies: A Deep Dive into Scanning Performance. Le Journal De La Cybercriminalité Et Des Investigations Numériques, 8(1), 1-6. https://doi.org/10.18464/cybin.v8i1.40

Télécharger la référence bibliographique

Résumé

You probably know this scenario - you spent a while analyzing new samples, which was not easy, but you’re finally done. You also created a neat Yara rule to match the samples, and you're ready to send it off and move on to your next task (or lunch). But oopsie - the Yara rule is warning of slowed scanning. Or your colleague comments they do not like a particular part and wants to be sure the rule is effective.

While working with Yara, I consulted with many analysts about this problem. They knew what they wanted to detect, but Yara was not always helping them write the rules more effectively. Based on my experience with algorithms used in Yara, we worked together to find a solution to improve scanning speed and limit potential hurdles for future usage.

This paper presents five studies with descriptions of the five problems, an explanation of why Yara does not like the first solution, and tips on what can be improved. Note that no sensitive information is disclosed in this paper. All studies were anonymized, so the general problem is the same, but there is no direct link to a specific malware family mentioned, nor can it be tracked.

Références

  1. “The Official Yara Documentation.” https://yara.readthedocs.io/en/v4.2.3/.
  2. “Awesome YARA.” https://github.com/InQuest/awesome-yara.
  3. D. Regéciová, “Yara: Down the Rabbit Hole Without Slowing Down,” The Journal on Cybercrime & Digital Investigations, vol. 7, 2022.
  4. D. Regéciová, D. Kolář, and M. MilkoviĬ “Pattern Matching in Yara: Improved Aho-Corasick Algorithm,” IEEE Access, vol. 9, pp. 62857–62866, 2021.
  5. “Yara GitHub.” https://virustotal.github.io/yara/.
  6. E. de O. Andrade, “MC-dataset-binary and MCdataset-multiclass.” https://figshare.com/authors/Eduardo_de_O_Andrade/4923649.
  7. B. Bosansky, D. Kouba, O. Manhal, T. Sick, V. Lisy, J. Kroustek, and P. Somol, “Avast-CTU Public CAPE Dataset,” 2022.
  8. “Regex Performance.” https://github.com/rust-leipzig/regex-performance.
  9. “Yara Performance Guidelines.” https://github.com/Neo23x0/Yara-Performance-Guidelines.