What Does Heuristic Really Mean?
“Heuristic” refers to the act or process of finding or discovering. The Oxford English Dictionary defines heuristic as “enabling a person to discover or learn something for themselves” or (in the computing context) “proceeding to a solution by trial and error or by rules that are only loosely defined”. [6] The Merriam-Webster Dictionary defines it as “an aid to learning, discovery, or problem-solving by experimental and especially trial-and-error methods” or (again, in the context of computing) “relating to exploratory problem-solving techniques that utilize self-educating techniques (as the evaluation of feedback) to improve performance.” [7]
Heuristic programming is usually regarded as an application of artificial intelligence, and as a tool for problem solving. Heuristic programming, as used in expert systems, builds on rules drawn from experience, and the answers generated by such a system get better as the system “learns” by further experience, and augments its knowledge base.
As it is used in the management of malware (and indeed spam and related nuisances), heuristic analysis, though closely related to these elements of trial-and-error and learning by experience, also has a more restricted meaning. Heuristic analysis uses a rule-based approach to diagnosing a potentially-offending file (or message, in the case of spam analysis). As the analyzer engine works through its rule-base, checking the message against criteria that indicate possible malware, it assigns score points when it locates a match. If the score meets or exceeds a threshold score [8], the file is flagged as suspicious (or potentially malicious or spammy) and processed accordingly.
In a sense, heuristic anti-malware attempts to apply the processes of human analysis to an object. In the same way that a human malware analyst would try to determine the process of a given program and its actions, heuristic analysis performs the same intelligent decision-making process, effectively acting as a virtual malware researcher. As the human malware analyst learns more from and about emerging threats he or she can apply that knowledge to the heuristic analyzer through programming, and improve future detection rates.
Heuristic programming has a dual role in AV performance: speed and detection. In fact, the term heuristic is applied in other areas of science [9] in a very similar sense; aiming to improve performance (especially speed of throughput) through a “good enough” result rather than the most exact result. As the total number of known viruses has increased, so has the need to improve detection speed. Otherwise the increased time needed to scan for an ever-increasing number of malicious programs would make the system effectively unusable.
Despite the much-improved performance of some contemporary heuristic engines, there is
Heuristic analysis uses a rule-based approach to diagnosing a potentially-offending file (or message, in the case of spam analysis).
Heuristic Analysis – Detecting Unknown Viruses
a danger that the impact of heuristic (and even non-heuristic) scanning may be seen as outweighing the advantages of improved detection. There is a common belief that heuristic scanners are generally slower than static scanners, but at a certain point of sophistication this ceases to be true.
Even early heuristic scanners using simple pattern detection benefited from optimization techniques that searched only the parts of an object where a given virus could be expected to be found. (A simple example - there’s no point in scanning an entire file for a virus signature, if that virus always stores its core code at the beginning or end of an infected file.) This reduces scanning overhead and lessens the risk of a false positive.
The inappropriate detection of a viral signature in a place where the virus would never be found in normal circumstances is not only a side effect of poor detection methodology, but a symptom of poorly designed detection testing. For instance, some testers have attempted to test the capabilities of an AV program by inserting virus code randomly into a file or other infectible object. Similarly, a particular kind of object such as a file or boot sector can be selectively scanned for only those types of malware that can realistically be expected to be found in that object, a process sometimes described as “filtering”. After all, there’s no reason to look for macro virus code in a boot sector.
However, correct identification of a file type is not concrete proof of an uncontaminated file. For example, Microsoft Word document files containing embedded malicious executables have long been a major attack vector for information theft and industrial espionage. Similarly, malware authors are constantly in search of attacks where an object not normally capable of executing code can be made to do so for example, by modifying the runtime environment. W32/Perrun, for example, appended itself to .JPG and .TXT files, but could not actually run unless specific changes |