WO2011087167A1

WO2011087167A1 - Method for determining problem occurrence in operating system

Info

Publication number: WO2011087167A1
Application number: PCT/KR2010/000243
Authority: WO
Inventors: 임태환
Original assignee: Lim Tae-Hwan
Priority date: 2010-01-15
Filing date: 2010-01-15
Publication date: 2011-07-21

Abstract

Disclosed is a method for determining problem occurrence in an operating system for determining whether a problem occurs in the operating system (business transaction device, software, system, business transaction process, and others) which takes charge of the operation over the IT system management fields or the industrial fields. The method for determining the problem occurrence in the operating system comprises the steps of: the first step of requesting event information which is normally generated in the operating system, if connection with the operating system is normally set up after the connection setup with the operating system has been attempted; the second step of confirming whether the requested event information is received within the event generation time in accordance with each preset time point; the third step of determining problem occurrence and displaying the determined problem occurrence, if the event information is not generated within the event generation time in accordance with each time point; and the fourth step of calculating statistical values (current statistical values) according to each time slot on the day, if unspecified event information is received within the event generation time in accordance with each time point, and determining whether a problem occurs through comparison between the calculated current statistical values and predicted values on the day.

Description

How to determine failure of operating system

The present invention is a failure of an operating system for determining whether a failure of an operating system (business processing device, software, system, business processing process, etc.) in charge of operation across IT (information technology) system management field or industry field occurs. It relates to a determination method of occurrence.

In general, the IT system management field is mostly operated by using an operating system such as a computer system having a predetermined operating system. Such an operating system is caused by an internal software failure, a failure in a business process process, or a hardware failure. In this case, it is very important to detect faults as soon as the operation of the entire system is severely disrupted.

In addition, it is very important that the industrial field of mass production of products, in particular, production facilities, such as factories, can be mass-produced by maintaining their normal operating conditions, and that the quality of the products produced is kept constant. Therefore, in such an industrial field, it is very important to quickly determine whether each equipment has a failure or to quickly determine whether a software or an operating system has failed in order to continuously produce products and maintain quality.

The conventional operating system failure occurrence determination method for determining a malfunction or abnormal situation of the operating device in the IT system management field or all industrial fields is to determine whether a failure occurs by receiving a failure event from the operating system. .

That is, a failure is determined only when a failure event occurs.

However, such a conventional failure determination technique can detect and determine a failure only when a failure event occurs. Therefore, when a failure of the operating system is severely damaged or other operating errors cannot generate the failure event by itself. There was a drawback of not being able to recognize the occurrence at all, and due to these drawbacks, it was not possible to recognize the disability.

Accordingly, the present invention has been proposed to solve various problems occurring in the conventional failure occurrence determination method as described above.

The problem to be solved by the present invention is to determine whether the failure of the operating system (business processing equipment, software, systems, business processing processes, etc.) in charge of operations across the IT (information technology) system management field or industrial field The present invention provides a method for determining a failure occurrence of an operating system.

Another problem to be solved by the present invention is to provide a failure determination method of the operating system to accurately determine whether the failure of the operating system even when the failure event is impossible due to serious damage of the operating system.

Another problem to be solved by the present invention is to provide a method for determining a failure occurrence of an operating system to accurately determine whether a failure of the operating system occurs by analyzing information of an event normally occurring during operation of the system.

In order to solve the above problems, a preferred embodiment of the "method for determining a failure of an operating system" according to the present invention is

In the operating system failure determination method of the failure occurrence determination system connected to the operating system to determine whether the failure of the operating system,

A first step of requesting event information normally occurring in the operating system when the connection is established normally by attempting to establish a connection with the operating system;

A second step of checking whether the requested event information is received within a preset event occurrence time for each time point;

A third step of determining that a failure occurs and displaying the event information if the event information does not occur within the event occurrence time of each time point; And

If the unspecified event information is received within the event occurrence time of each time point, a fourth step of calculating a statistical value (current statistical value) for each time of the day and determining whether or not a failure occurs by comparing the calculated current statistical value with the expected value of the day. .

Here, the event information normally occurring in the first step is

Specific event information that occurs at a specific time during normal system operation, periodic event information that occurs periodically during normal system operation, and non-specific event information that occurs non-periodically during normal system operation are included.

In addition, the event by time of the second step is an event including a specific time point such as event information occurring at a specific time point during normal system operation and periodic event information periodically occurring during normal system operation.

In addition, the fourth step,

If the unspecified event information is received within the event occurrence time of each time point, setting current statistics by performing hourly statistics on the day;

Calculating a daily event estimate from historical event statistics;

Calculating a difference by comparing the set current statistics with the expected event of the day;

Comparing the calculated difference with a preset error range, and if the calculated difference is within the error range, returns to the step of receiving event information; if the calculated difference is out of the error range, it is determined that a failure occurs; And indicating the occurrence of the failure.

According to the present invention, even in the event of occurrence of a failure event in an operating system (business processing device, software, system, business processing process, etc.) in charge of operations across IT (information technology) system management or industry, In addition, it is possible to accurately determine whether or not the occurrence of a failure event due to serious damage of the operating system, there is an advantage that can accurately determine whether or not the failure of the operating system.

In addition, there is an advantage that can accurately determine whether the failure of the operating system by analyzing the information of the event that normally occurs during the operation of the system.

1 is a block diagram showing a schematic configuration of a failure determination system to which the present invention is applied.

2 is a flow chart showing a failure occurrence determination method of the operating system according to the present invention.

Figure 3 is an explanatory diagram for explaining a method of determining the occurrence of a failure through a specific event occurring at a specific time during normal operation in the present invention.

100... Operating system

200... Fault occurrence judgment system

210... Data interface

220... Failure occurrence judgment part

230... Memory

240... Failure Status Display

Hereinafter, described in detail with reference to the accompanying drawings a preferred embodiment of the present invention. If it is determined that the detailed description of the known function or configuration related to the present invention may unnecessarily obscure the subject matter of the present invention, the detailed description thereof will be omitted.

1 is a schematic configuration diagram of a failure occurrence determination system to which the present invention is applied, and includes an operating system 100 and a failure occurrence determination system 200.

The operating system 100 collectively refers to a business processing device, software, a system, a business processing process, and performs a function of controlling and operating a device or a system as a whole, and (normal) events and failures indicating the progress of a job. If it detects a fault, it performs a function that generates a fault event in real time or periodically.

The failure occurrence determination system 200 is connected to the operating system 100, displays a failure occurrence when receiving a failure event transmitted from the operating system 100, and displays the event information transmitted from the operating system 100. It analyzes and determines whether a failure occurs, and if it is determined that a failure occurs, it displays a function.

The failure occurrence determination system 200 stores the received event information and the data interface 210 for requesting event information or receiving the transmitted event information after performing connection setting with the operating system 100. When the event information is received through the memory 230 and the data interface 210 storing the statistical data about the past event information, it is determined whether a failure occurs through comparison with the statistical data stored in the memory 230. And, if it is determined that the failure occurs, the failure occurrence determination unit 220 to control the display of the occurrence of the failure, the failure occurrence determination unit 220 in conjunction with the failure occurrence determination unit 220 indicates whether the failure occurs when the failure of the operating system 100 The display unit 240 is included.

It is preferable that the failure occurrence determination system 200 configured as described above is implemented as one module. When implemented as a module, it can be used as a device to determine whether a failure of the connected operating system is established by establishing a connection to a specific operating system.

2 is a flowchart showing a preferred embodiment of a "method for determining a failure occurrence of an operating system" according to the present invention, wherein the failure occurrence determination unit 220 in the failure occurrence determination system 200 determines whether a failure occurs in software. As a process of doing so, S represents a step.

As shown therein, the "method of determining a failure occurrence of an operating system" according to the present invention is an event that normally occurs in the operating system 100 when a connection is established normally by attempting to establish a connection with the operating system 100. First steps S101 to S105 for requesting information; A second step (S107 to S109) of checking whether the requested event information is received within a preset event occurrence time for each time point; If event information is not generated within the event occurrence time of each time point, determining that a failure has occurred and displaying the third step (S111 to S113); And if the unspecified event information is received within the event occurrence time of each time point, calculating a statistical value (current statistical value) for each time of the day, and determining whether or not a failure occurs by comparing the calculated current statistical value with the expected value of the day (S115). S123).

As described above, the "method for determining a failure occurrence of the operating system" according to the present invention is performed in step S101 in a state in which the operating system 100 and the failure occurrence determination system 200 are physically connected (for example, a communication line). Attempts to establish a connection with the operating system 100 through communication as shown.

Thereafter, if a response to the connection establishment is received from the operating system 100, a determination is made as a normal connection. If a response to the connection establishment is not received from the operating system 100, the process moves to the above-described step S101 to establish a connection with the operating system. The connection setting is retried (S103).

After the connection setting with the operating system 100 is normally made, an event (Event) information normally occurring is requested to the operating system 100 in step S105 (S105).

The event information (S105) that normally occurs here includes specific event information that occurs at a specific point in time during normal system operation, periodic event information that occurs periodically during normal system operation, and unspecified event information that occurs non-periodically during normal system operation. It includes.

In step S107, it is checked whether the event information is received, and if the event information is not received, go to step S109 to check whether the event occurrence time for each point of time has elapsed. In step S107, if the event occurrence time for each point of time elapses, the flow moves to step S111 to determine that a failure occurs, and in step S113, a failure status display unit 240 indicates that a failure has occurred.

In more detail, the time-based event includes an event occurring at a specific point in time during normal system operation and a periodic event occurring periodically during normal system operation.

Determining whether a failure occurs through an event occurring at a specific time during normal operation means that a failure is not performed based on the result because a task for generating a corresponding event is not performed when a specific event does not occur at a specific time. . These obstacles can be applied primarily for detecting failures in business programs.

An example of applying the above method when operating an online system is shown in FIG. 3.

Here, the occurrence point means a specific time point at which the event occurs, the Event_id means a unique number for determining the event that occurred, and the event meaning indicates what the event has occurred, and the name of the failure when the event does not occur within the time point. Means. Therefore, when an event called ONL 001 occurs from 08:00 to 09:00 hours, it can be known that the online program is normally started. However, if such an event does not occur at the corresponding time, the failure occurrence determination system 200 determines that the online program is normal. It is judged that the maneuver failed.

Here, if it is not possible to determine the exact matter for checking for the occurrence of no event only by Event_id, the exact matter may be confirmed by searching the contents of the event.

Next, determining whether a failure occurs through a specific event that occurs periodically during normal operation means that if the event does not occur within the period, the task for generating the event was not performed. Can be accurately judged. This method can be applied for fault detection of Chuo equipment control program.

For example, if the temperature measurement program measures the temperature by 1 minute in a temperature measurement program that detects the temperature at a certain period (1 minute), but if a new event does not occur even after 1 minute has elapsed from the last event occurrence, it is determined that a failure has occurred. Another method is to check the occurrence of the event every minute and determine that a failure has occurred even if no event occurred within the last minute.

On the other hand, in addition to determining the point of failure occurrence by checking the event for each specific time point as described above, it is possible to determine whether or not the failure occurs through the confirmation of the event occurring at an unspecified time. To do this, statistical analysis is required, and when there is a difference between the set days, working days, time zones, the beginning of the month and the end of the month, and the amount of events generated by time and the amount of events occurring at the present time, the difference is greater than the set range. It may warn of the possibility of failure or abnormality.

In more detail, in step S115, current statistics are set through statistics of events occurring every hour on the day, and in step S117, an estimated value of the corresponding hourly events is calculated from past event statistics stored in the memory 230 in advance. That is, the event prediction value is calculated by calculating the number of events on the day and the number of events over time through the past event statistics (events per day / time zone, increase / decrease rate, daily event number, increase / decrease rate, etc.). In this case, the event estimate value is calculated after the step of calculating the hourly statistics on the day, but preferably, the calculation time is stored in advance in memory, thereby reducing the time for determining whether a failure occurs.

Thereafter, in step S119, the current statistics value is compared with the event prediction value, and in step S121, a difference between the current statistics value and the current day value is calculated. After that, check whether the calculated difference is within a preset error range. Here, since the preset error range is an event that occurs unspecified, the number of events that occur on the day or for each time zone may vary. Therefore, if there is a difference between the current statistics and today's forecast, it is less accurate to judge the disability. Accordingly, in the present invention, in determining whether a failure occurs, an error range is set for more accuracy, and even if an event occurrence difference occurs within the error range, it is regarded as normal system operation. Only when an event occurrence difference is out of the above error range is considered a system failure. The setting of the error range means the number of events, and it is preferable to set the error range properly considering the characteristics of the system to be applied.

The present invention is not limited to the above-described specific preferred embodiments, and various modifications can be made by any person having ordinary skill in the art without departing from the gist of the present invention claimed in the claims. Of course, such changes will fall within the scope of the claims.

The present invention as described above can be applied to the operating system of the IT-related field or the industrial field to accurately detect the occurrence of a failure, as well as to determine whether the failure of the corresponding system only if the data interface, using the operating system It can be extended to all fields.

Claims

In the method of determining the failure of the operating system in the failure occurrence determination system that is connected to the operating system for operating equipment or devices, etc. to determine whether the failure of the operating system,

A first step of requesting event information normally occurring in the operating system when the connection is established normally by attempting to establish a connection with the operating system;

A second step of checking whether the requested event information is received within a preset event occurrence time for each time point;

A third step of determining that a failure occurs and displaying the event information if the event information does not occur within the event occurrence time of each time point; And

When the unspecified event information is received within the event occurrence time of each time point, a fourth step of calculating a statistical value (current statistical value) for each time of the day and determining whether or not a failure occurs by comparing the calculated current statistical value with the expected value on the day; ,

The fourth step,

If the unspecified event information is received within the event occurrence time of each time point, setting current statistics by performing hourly statistics on the day;

Calculating a daily event estimate from historical event statistics;

Calculating a difference by comparing the set current statistics with the expected event of the day;

Comparing the calculated difference with a preset error range, and if the calculated difference is within the error range, returns to the step of receiving event information; if the calculated difference is out of the error range, it is determined that a failure occurs; Method for determining the failure occurrence of the operating system comprising the step of indicating the occurrence of the failure.
The event information of claim 1, wherein the event information that occurs normally in the first step comprises:

Operating system failures that include specific event information that occurs at a specific time during normal system operation, periodic event information that occurs periodically during normal system operation, and unspecified event information that occurs non-periodically during normal system operation. How to determine the occurrence.
The event-based event of the second step is an event including a specific time point such as event information occurring at a specific time point during normal system operation and periodic event information periodically occurring during normal system operation. How to determine the failure of the operating system.