Is it possible to describe human intelligence precisely enough to be imitated using machines? This question continues to be a bone of contention among scientists. Scientists trying to create artificial intelligence use different approaches, but many believe that artificial neural networks are the key to success. So far, no device containing artificial intelligence has come the Turing test. According to renowned British computer scientist Alan Turing, a machine can be considered intelligent if the user is unable to distinguish whether it is dealing with a machine or another human being. One possible application area for fully autonomous AI is computer virology and the development of systems for the remote treatment of infected computers.
Contents
The main task facing AI scientists today is to create an autonomous device that can learn, make intelligent decisions and modify its own behavior according to external stimuli. The creation of such highly specialized systems is possible, just as it is likely to build more versatile and complex devices based on artificial intelligence, but such systems are always based on human experience and knowledge in the form of patterns, rules and algorithms of behavior.
Why is it so difficult to create autonomous AI? Because the machine does not typically have human characteristics, such as consciousness, intuition, the ability to distinguish between important things and those less important – and most importantly – the desire to acquire new knowledge. All these characteristics allow a person to solve problems, even when they are not linear. The execution of any action by artificial intelligence now requires human-made algorithms. Still, scientists are still trying to create real AI and sometimes have some success.
Non-automatic processing costs
The process of detecting malware and restoring normal operating parameters on your computer consists of three main steps. It doesn’t matter who or what takes these steps: man or machine. The first step involves collecting objective data about the computer being tested and the programs running on it. This step is best done with fast, automated equipment that can generate reports that are suitable for automated processing and operate without human intervention.
In the second step, the collected data is subject to detailed analysis. For example, if a report contains information about the detection of a suspicious object, the object should be quarantined and carefully analyzed to determine what the threat is, and then a decision must be made on what to do next.
The third step is a real procedure to solve the problem, for which you can use a special scripting language. It contains the commands necessary to remove any malicious files and restore the correct operating parameters of the computer.
Until a few years ago, steps 2 and 3 were only performed by IT security analysts and experts working in specialized forums almost without the use of any automation. However, with the increase in the number of users falling victim to malware and thus in need of specialized assistance, many problems have emerged:
- If quarantine protocols and files are processed manually, the virus analyst receives huge amounts of ever-changing information that needs to be integrated and fully understood – a process that cannot be fast.
- Man has natural mental and physiological limitations. The specialist can get tired and make a mistake; the more complex the task, the more likely it is to make a mistake. For example, an overburdened malware expert might skip a malicious program or remove a harmless application.
- Analyzing quarantined files is a very time-consuming operation because the expert has to analyze the unique characteristics of each sample – i.e. where and how it appeared and what is suspicious in it.
The only solution to these problems is to completely automate the analysis and treatment of malware, but previous attempts using different algorithms have not yielded positive results. The main reason for these failures is the constant evolution of malware and the fact that dozens of new malicious programs are appearing on the Internet every day using increasingly sophisticated methods of embedding and hiding. Therefore, detection algorithms must be extremely complex. The fact that these algorithms become obscene very quickly is complicated by the fact that they must be constantly updated and bugs removed from them. Another problem is that the effectiveness of each algorithm limits the skills of its creators.
It seems a little more effective to use expert systems to “catch viruses”. However, developers of anti-virus expert systems face similar problems to those mentioned above – the effectiveness of such a system depends on the quality of its rules and knowledge bases. In addition, knowledge bases need to be constantly updated, which requires investment in human resources.
General principles of Cyber Helper
Despite the difficulties, over time, the experiments conducted in this field gave some positive results. An example is the creation of the Cyber Helper system, which is a successful step towards using truly autonomous AI in the fight against malware. Most of Cyber Helper’s autonomous subsystems can synchronize, exchange data, and interact with each other as if they were one. Naturally, they contain some “hard” algorithms and rules – like conventional programs – but they mostly work with demye logic and independently define their behavior when solving different tasks.
At the heart of the Cyber Helper system is a tool called AVZ, which was created by the author of this article in 2004. The purpose of AVZ is to automatically collect data from potentially infected computers as well as malware samples and store them in a form suitable for automated processing for use by other subsystems. AVZ creates reports in HTML format designed for human analysis and in XML format – for machine analysis. Since 2008, the main AVZ program has been integrated into antivirus solutions.
The system analyses the received protocol using the vast amount of data available on similar malicious programs and previous corrective actions taken in similar cases, as well as other factors. In this respect, cyber helper resembles an active human brain, which in order to process information must gather knowledge about the environment. In order for a child to develop fully, he must be constantly aware of what is happening in his or her world and be able to communicate with other people. In this case, the machine has an advantage over man, because within a certain period of time it can store, extract and process much larger amounts of information than humans.
Another similarity between cyber helper and man is that Cyber Helper can independently and almost without any hints start the process of analyzing the protocol, as well as constantly learn in an ever-changing environment. When it comes to self-learning, Cyber Helper has to work on three main problems: the mistakes of human experts, which the machine is unable to solve because it is not intuitive enough; fragmentation and inconsistency of the program information and repeated clarifications of data and delays in entering data into the system. Let us take a closer look at these issues.
Implementation difficulties
Experts processing protocols and quarantined can make mistakes or perform actions that cannot be logically explained from a machine perspective. Here’s a typical example: if a professional sees an unknown file named %System32%\ntos.exe in the protocol that has malware characteristics, it removes it without further analysis or quarantine based on your experience and intuition. Therefore, the details of the activities performed by specialists and the way in which they came to their conclusions can not always be directly translated into a form enabling machine learning. Treatment information can often be incomplete or contradictory. For example, before asking an expert for help, you might have tried to fix the problem yourself on your computer and removed only part of the malicious program by restoring the files of the infected program without clearing the registry. The third common problem: during the protocol analysis procedure, only metadata from the suspicious object is available, and after analyzing the quarantined file, only the initial information about the suspicious object. Then the categorization of the object is carried out – either it is harmful or “clean”. Such information is usually available only after repeated clarification and after a long time – from a few minutes to several months. The defined process can take place both externally, in the analytical services laboratory and inside the Cyber Helper’s own subsystems.
Here’s a typical example: the analyzer checks the file but finds nothing dangerous about its behavior and transmits this information to the Cyber Helper system. After some time, the analyzer is refined and re-examines the previously examined suspicious file, but this time it issues a different verdict. A similar problem may be related to the conclusions drawn by a specialized virus analyst for programs whose classification is problematic, for example, programs for remote management systems or tools that blurr user traces – their classification may change after the next version is published. Due to the aforementioned property – variability and ambiguity of the parameters of the analyzed programs, all decisions made by Cyber Helper are based on more than fifty independent analyses.
Based on the information available, the Cyber Helper provides a number of hypotheses about which of the objects in the protocol can pose a threat and which can be added to the database of “clean” files. Based on these hypotheses, AVZ automatically writes scripts for quarantining suspicious objects. The script is then sent to the user’s machine for execution. (Step 2 of the overall Cyber Helper algorithm.)
At the scripting stage, the intelligent system can detect data that is clearly harmful. In this case, the script can enable the command to remove known malicious programs or refer to special procedures for undoing damage to the system. Such situations occur quite often and result from the fact that Cyber Helper simultaneously processes hundreds of requests; this is typical in situations where several users have been infected with the same pest and their machines ask for help. After receiving and analyzing the required samples from the machine of one such user, Cyber Helper can provide other users with healing scripts, completely bypassing the quarantine stage, thereby saving the user time and reducing data traffic. Objects received from the user are analyzed under the control of the Cyber Helper system, and the results of this analysis, regardless of what they are, feed their knowledge base. In this way, the intelligent machine can check any hypotheses formulated in step 1 of the general algorithm of action, confirming or rejecting their result.
Cyber Helper Technical Subsystems
Cyber Helper’s main subsystems are stand-alone entities that analyze program files in terms of content and behavior. Thanks to them, Cyber Helper can analyze malicious programs and learn from the results of its actions. If the analysis clearly confirms that the object is harmful, it is sent to the anti-virus laboratory with a strong recommendation to place it in an anti-virus database; then a treatment script is created for the user (step 5 of the general algorithm). Note that despite the analysis of the object, the Cyber Helper is not always able to make a categorical decision about the harmfulness of the object. In this case, all the initial data and results collected are passed on to the expert for analysis (step 6). The expert then provides the right solution. Cyber Helper does not participate in this process, but still analyzes the received data and protocols, generating reports for the expert, thereby doing the lion’s share of his work.
However, the “policy of non-intervention” of AI systems in the work of malware analyst is not always applied; we know dozens of cases where a smart machine has detected errors in human action by reassuing the accumulated experience and the results of its own analysis of the object. In such cases, the machine can first interrupt the analytical and decision-making process and send a warning to the expert before it goes to block scripts sent to the user, which from the machine’s perspective could harm the user’s system. The machine has similar control over its own activities. At the time of development, treatment scripts are simultaneously evaluated by another subsystem to eliminate errors. The simplest example of such an error is the replacement by a malicious program of an important component of the system. On the one hand, it is necessary to destroy a malicious program, on the other – it can cause irreparable harm to the system.
Summary
Today’s harmful programs are working and spreading at an extremely fast pace. Immediate response to these types of programs requires intelligent processing of a large amount of custom data. Artificial intelligence is ideal for this task because it can process data much faster than human beings. Cyber Helper is one of the few successful attempts to get closer to the goal of creating autonomous AI. As an intelligent creature, Cyber Helper is capable of self-learning and independently defining its activities. Virus analysts and intelligent machines complement each other – thanks to cooperation, they work more efficiently and provide users with greater protection.