WO2011087167A1 - Method for determining problem occurrence in operating system - Google Patents

Method for determining problem occurrence in operating system Download PDF

Info

Publication number
WO2011087167A1
WO2011087167A1 PCT/KR2010/000243 KR2010000243W WO2011087167A1 WO 2011087167 A1 WO2011087167 A1 WO 2011087167A1 KR 2010000243 W KR2010000243 W KR 2010000243W WO 2011087167 A1 WO2011087167 A1 WO 2011087167A1
Authority
WO
WIPO (PCT)
Prior art keywords
event
operating system
failure
event information
occurrence
Prior art date
Application number
PCT/KR2010/000243
Other languages
French (fr)
Korean (ko)
Inventor
임태환
Original Assignee
Lim Tae-Hwan
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Lim Tae-Hwan filed Critical Lim Tae-Hwan
Priority to PCT/KR2010/000243 priority Critical patent/WO2011087167A1/en
Publication of WO2011087167A1 publication Critical patent/WO2011087167A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0751Error or fault detection not based on redundancy
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0706Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0751Error or fault detection not based on redundancy
    • G06F11/0754Error or fault detection not based on redundancy by exceeding limits
    • G06F11/0757Error or fault detection not based on redundancy by exceeding limits by exceeding a time limit, i.e. time-out, e.g. watchdogs

Definitions

  • the present invention is a failure of an operating system for determining whether a failure of an operating system (business processing device, software, system, business processing process, etc.) in charge of operation across IT (information technology) system management field or industry field occurs. It relates to a determination method of occurrence.
  • the IT system management field is mostly operated by using an operating system such as a computer system having a predetermined operating system.
  • an operating system is caused by an internal software failure, a failure in a business process process, or a hardware failure. In this case, it is very important to detect faults as soon as the operation of the entire system is severely disrupted.
  • the conventional operating system failure occurrence determination method for determining a malfunction or abnormal situation of the operating device in the IT system management field or all industrial fields is to determine whether a failure occurs by receiving a failure event from the operating system. .
  • a failure is determined only when a failure event occurs.
  • the present invention has been proposed to solve various problems occurring in the conventional failure occurrence determination method as described above.
  • the problem to be solved by the present invention is to determine whether the failure of the operating system (business processing equipment, software, systems, business processing processes, etc.) in charge of operations across the IT (information technology) system management field or industrial field
  • the present invention provides a method for determining a failure occurrence of an operating system.
  • Another problem to be solved by the present invention is to provide a failure determination method of the operating system to accurately determine whether the failure of the operating system even when the failure event is impossible due to serious damage of the operating system.
  • Another problem to be solved by the present invention is to provide a method for determining a failure occurrence of an operating system to accurately determine whether a failure of the operating system occurs by analyzing information of an event normally occurring during operation of the system.
  • Specific event information that occurs at a specific time during normal system operation periodic event information that occurs periodically during normal system operation, and non-specific event information that occurs non-periodically during normal system operation are included.
  • the event by time of the second step is an event including a specific time point such as event information occurring at a specific time point during normal system operation and periodic event information periodically occurring during normal system operation.
  • the present invention even in the event of occurrence of a failure event in an operating system (business processing device, software, system, business processing process, etc.) in charge of operations across IT (information technology) system management or industry, In addition, it is possible to accurately determine whether or not the occurrence of a failure event due to serious damage of the operating system, there is an advantage that can accurately determine whether or not the failure of the operating system.
  • an operating system business processing device, software, system, business processing process, etc.
  • IT information technology
  • FIG. 1 is a block diagram showing a schematic configuration of a failure determination system to which the present invention is applied.
  • FIG. 2 is a flow chart showing a failure occurrence determination method of the operating system according to the present invention.
  • Figure 3 is an explanatory diagram for explaining a method of determining the occurrence of a failure through a specific event occurring at a specific time during normal operation in the present invention.
  • FIG. 1 is a schematic configuration diagram of a failure occurrence determination system to which the present invention is applied, and includes an operating system 100 and a failure occurrence determination system 200.
  • the operating system 100 collectively refers to a business processing device, software, a system, a business processing process, and performs a function of controlling and operating a device or a system as a whole, and (normal) events and failures indicating the progress of a job. If it detects a fault, it performs a function that generates a fault event in real time or periodically.
  • the failure occurrence determination system 200 is connected to the operating system 100, displays a failure occurrence when receiving a failure event transmitted from the operating system 100, and displays the event information transmitted from the operating system 100. It analyzes and determines whether a failure occurs, and if it is determined that a failure occurs, it displays a function.
  • the failure occurrence determination system 200 stores the received event information and the data interface 210 for requesting event information or receiving the transmitted event information after performing connection setting with the operating system 100.
  • the event information is received through the memory 230 and the data interface 210 storing the statistical data about the past event information, it is determined whether a failure occurs through comparison with the statistical data stored in the memory 230. And, if it is determined that the failure occurs, the failure occurrence determination unit 220 to control the display of the occurrence of the failure, the failure occurrence determination unit 220 in conjunction with the failure occurrence determination unit 220 indicates whether the failure occurs when the failure of the operating system 100
  • the display unit 240 is included.
  • the failure occurrence determination system 200 configured as described above is implemented as one module. When implemented as a module, it can be used as a device to determine whether a failure of the connected operating system is established by establishing a connection to a specific operating system.
  • FIG. 2 is a flowchart showing a preferred embodiment of a "method for determining a failure occurrence of an operating system" according to the present invention, wherein the failure occurrence determination unit 220 in the failure occurrence determination system 200 determines whether a failure occurs in software. As a process of doing so, S represents a step.
  • the "method of determining a failure occurrence of an operating system" is an event that normally occurs in the operating system 100 when a connection is established normally by attempting to establish a connection with the operating system 100.
  • the "method for determining a failure occurrence of the operating system" is performed in step S101 in a state in which the operating system 100 and the failure occurrence determination system 200 are physically connected (for example, a communication line). Attempts to establish a connection with the operating system 100 through communication as shown.
  • connection setting is retried (S103).
  • an event (Event) information normally occurring is requested to the operating system 100 in step S105 (S105).
  • the event information (S105) that normally occurs here includes specific event information that occurs at a specific point in time during normal system operation, periodic event information that occurs periodically during normal system operation, and unspecified event information that occurs non-periodically during normal system operation. It includes.
  • step S107 it is checked whether the event information is received, and if the event information is not received, go to step S109 to check whether the event occurrence time for each point of time has elapsed.
  • step S107 if the event occurrence time for each point of time elapses, the flow moves to step S111 to determine that a failure occurs, and in step S113, a failure status display unit 240 indicates that a failure has occurred.
  • the time-based event includes an event occurring at a specific point in time during normal system operation and a periodic event occurring periodically during normal system operation.
  • Determining whether a failure occurs through an event occurring at a specific time during normal operation means that a failure is not performed based on the result because a task for generating a corresponding event is not performed when a specific event does not occur at a specific time. . These obstacles can be applied primarily for detecting failures in business programs.
  • FIG. 3 An example of applying the above method when operating an online system is shown in FIG. 3.
  • the occurrence point means a specific time point at which the event occurs
  • the Event_id means a unique number for determining the event that occurred
  • the event meaning indicates what the event has occurred
  • the exact matter may be confirmed by searching the contents of the event.
  • determining whether a failure occurs through a specific event that occurs periodically during normal operation means that if the event does not occur within the period, the task for generating the event was not performed. Can be accurately judged.
  • This method can be applied for fault detection of Chuo equipment control program.
  • the temperature measurement program measures the temperature by 1 minute in a temperature measurement program that detects the temperature at a certain period (1 minute), but if a new event does not occur even after 1 minute has elapsed from the last event occurrence, it is determined that a failure has occurred.
  • Another method is to check the occurrence of the event every minute and determine that a failure has occurred even if no event occurred within the last minute.
  • step S115 current statistics are set through statistics of events occurring every hour on the day
  • step S117 an estimated value of the corresponding hourly events is calculated from past event statistics stored in the memory 230 in advance. That is, the event prediction value is calculated by calculating the number of events on the day and the number of events over time through the past event statistics (events per day / time zone, increase / decrease rate, daily event number, increase / decrease rate, etc.).
  • the event estimate value is calculated after the step of calculating the hourly statistics on the day, but preferably, the calculation time is stored in advance in memory, thereby reducing the time for determining whether a failure occurs.
  • step S119 the current statistics value is compared with the event prediction value, and in step S121, a difference between the current statistics value and the current day value is calculated. After that, check whether the calculated difference is within a preset error range.
  • the preset error range is an event that occurs unspecified, the number of events that occur on the day or for each time zone may vary. Therefore, if there is a difference between the current statistics and today's forecast, it is less accurate to judge the disability. Accordingly, in the present invention, in determining whether a failure occurs, an error range is set for more accuracy, and even if an event occurrence difference occurs within the error range, it is regarded as normal system operation. Only when an event occurrence difference is out of the above error range is considered a system failure.
  • the setting of the error range means the number of events, and it is preferable to set the error range properly considering the characteristics of the system to be applied.
  • the present invention as described above can be applied to the operating system of the IT-related field or the industrial field to accurately detect the occurrence of a failure, as well as to determine whether the failure of the corresponding system only if the data interface, using the operating system It can be extended to all fields.

Abstract

Disclosed is a method for determining problem occurrence in an operating system for determining whether a problem occurs in the operating system (business transaction device, software, system, business transaction process, and others) which takes charge of the operation over the IT system management fields or the industrial fields. The method for determining the problem occurrence in the operating system comprises the steps of: the first step of requesting event information which is normally generated in the operating system, if connection with the operating system is normally set up after the connection setup with the operating system has been attempted; the second step of confirming whether the requested event information is received within the event generation time in accordance with each preset time point; the third step of determining problem occurrence and displaying the determined problem occurrence, if the event information is not generated within the event generation time in accordance with each time point; and the fourth step of calculating statistical values (current statistical values) according to each time slot on the day, if unspecified event information is received within the event generation time in accordance with each time point, and determining whether a problem occurs through comparison between the calculated current statistical values and predicted values on the day.

Description

운영시스템의 장애 발생 판단방법How to determine failure of operating system
본 발명은 IT(정보기술) 시스템 관리분야 또는 산업분야에 걸쳐 운영을 담당하는 운영 시스템(업무처리기기, 소프트웨어, 시스템, 업무처리 프로세스, 기타 등등)의 장애 발생 여부를 판단하기 위한 운영시스템의 장애 발생 판단방법에 관한 것이다.The present invention is a failure of an operating system for determining whether a failure of an operating system (business processing device, software, system, business processing process, etc.) in charge of operation across IT (information technology) system management field or industry field occurs. It relates to a determination method of occurrence.
일반적으로, IT 시스템 관리 분야는 소정의 운영체제를 구비한 컴퓨터 시스템과 같은 운영 시스템을 이용하여 운영을 하는 것이 대부분이며, 이러한 운영 시스템은 내부의 소프트웨어 장애라든지 업무처리 프로세스의 장애, 또는 하드웨어의 장애가 발생한 경우 전체 시스템의 운영에 막대한 지장을 초래하게 되므로, 장애를 조속히 검출하는 것이 매우 중요하다.In general, the IT system management field is mostly operated by using an operating system such as a computer system having a predetermined operating system. Such an operating system is caused by an internal software failure, a failure in a business process process, or a hardware failure. In this case, it is very important to detect faults as soon as the operation of the entire system is severely disrupted.
또한, 제품을 대량 생산하는 산업 분야, 특히 공장 등의 생산 설비 시설은 그 정상 가동 상태를 지속함으로써 제품의 양산이 가능하고, 생산되는 제품의 품질을 일정하게 지속적으로 유지하는 것이 매우 중요하다. 따라서 이러한 산업 분야에서도 지속적인 제품 양산과 품질 유지를 위해 각 장비의 장애 여부를 신속하게 판단하는 것, 또는 소프트웨어나 운영체제 등의 장애 여부를 신속하게 판단하는 것이 매우 중요하다.In addition, it is very important that the industrial field of mass production of products, in particular, production facilities, such as factories, can be mass-produced by maintaining their normal operating conditions, and that the quality of the products produced is kept constant. Therefore, in such an industrial field, it is very important to quickly determine whether each equipment has a failure or to quickly determine whether a software or an operating system has failed in order to continuously produce products and maintain quality.
이러한 IT 시스템 관리 분야나 모든 산업 분야에서 운영 기기의 오작동이나 이상상황을 판단하기 위한 종래의 운영시스템 장애 발생 판단방법은, 운영시스템으로부터 장애 이벤트(Event)를 수신함으로써, 장애 발생 여부를 판단하게 된다.The conventional operating system failure occurrence determination method for determining a malfunction or abnormal situation of the operating device in the IT system management field or all industrial fields is to determine whether a failure occurs by receiving a failure event from the operating system. .
즉, 장애 이벤트가 발생할 경우에만 장애로 판단을 하게 된다.That is, a failure is determined only when a failure event occurs.
그러나 이러한 종래의 장애 판단 기술은 장애 이벤트가 발생할 경우에만 장애를 감지 및 판단할 수 있으므로, 운영 시스템이 심각하게 손상된 경우 또는 기타 운영상의 오류 등으로 운영 시스템 스스로 장애 이벤트를 생성할 수 없는 경우에는 장애 발생을 전혀 인지할 수 없다는 단점이 있었으며, 이러한 단점으로 인해 장애 인지가 불가능하여 대형 사고가 발생하는 경우도 자주 발생하였다.However, such a conventional failure determination technique can detect and determine a failure only when a failure event occurs. Therefore, when a failure of the operating system is severely damaged or other operating errors cannot generate the failure event by itself. There was a drawback of not being able to recognize the occurrence at all, and due to these drawbacks, it was not possible to recognize the disability.
이에 본 발명은 상기와 같은 종래 장애 발생 판단 방법에서 발생하는 제반 문제점을 해결하기 위해서 제안된 것으로서,Accordingly, the present invention has been proposed to solve various problems occurring in the conventional failure occurrence determination method as described above.
본 발명이 해결하고자 하는 과제는, IT(정보기술) 시스템 관리분야 또는 산업분야에 걸쳐 운영을 담당하는 운영 시스템(업무처리기기, 소프트웨어, 시스템, 업무처리 프로세스, 기타 등등)의 장애 발생 여부를 판단하기 위한 운영시스템의 장애 발생 판단방법을 제공하는 데 있다.The problem to be solved by the present invention is to determine whether the failure of the operating system (business processing equipment, software, systems, business processing processes, etc.) in charge of operations across the IT (information technology) system management field or industrial field The present invention provides a method for determining a failure occurrence of an operating system.
본 발명이 해결하고자 하는 다른 과제는, 운영 시스템의 심각한 손상으로 장애 이벤트의 발생이 불가능한 경우에도 운영 시스템의 장애 발생 여부를 정확하게 판단할 수 있도록 한 운영시스템의 장애 발생 판단방법을 제공하는 데 있다.Another problem to be solved by the present invention is to provide a failure determination method of the operating system to accurately determine whether the failure of the operating system even when the failure event is impossible due to serious damage of the operating system.
본 발명이 해결하고자 하는 또 다른 과제는, 시스템의 운영 중 정상적으로 발생되는 이벤트의 정보를 분석하여 운영시스템의 장애 발생 여부를 정확하게 판단할 수 있도록 한 운영시스템의 장애 발생 판단방법을 제공하는 데 있다.Another problem to be solved by the present invention is to provide a method for determining a failure occurrence of an operating system to accurately determine whether a failure of the operating system occurs by analyzing information of an event normally occurring during operation of the system.
상기와 같은 과제들을 해결하기 위한 본 발명에 따른 "운영시스템의 장애 발생 판단 방법"의 바람직한 실시 예는,In order to solve the above problems, a preferred embodiment of the "method for determining a failure of an operating system" according to the present invention is
운영시스템과 연결되어 상기 운영시스템의 장애 발생 여부를 판단하는 장애 발생 판단 시스템에서의 운영시스템 장애 발생 판단 방법에 있어서,In the operating system failure determination method of the failure occurrence determination system connected to the operating system to determine whether the failure of the operating system,
상기 운영시스템과의 연결 설정을 시도하여 정상적으로 연결이 설정되면, 상기 운영시스템에 정상적으로 발생하는 이벤트 정보를 요청하는 제1단계와;A first step of requesting event information normally occurring in the operating system when the connection is established normally by attempting to establish a connection with the operating system;
상기 요청한 이벤트 정보가 미리 설정된 시점별 이벤트 발생시간 이내에 수신되는지를 확인하는 제2단계와;A second step of checking whether the requested event information is received within a preset event occurrence time for each time point;
상기 시점별 이벤트 발생시간 이내에 이벤트 정보가 미발생되면, 장애 발생으로 판단하고 이를 표시해주는 제3단계; 및A third step of determining that a failure occurs and displaying the event information if the event information does not occur within the event occurrence time of each time point; And
상기 시점별 이벤트 발생시간 이내에 불특정 이벤트 정보가 수신되면, 당일 시간별로 통계치(현재 통계치)를 산출하고, 그 산출한 현재 통계치와 당일 예상치의 비교를 통해 장애 발생 여부를 판단하는 제4단계를 포함한다.If the unspecified event information is received within the event occurrence time of each time point, a fourth step of calculating a statistical value (current statistical value) for each time of the day and determining whether or not a failure occurs by comparing the calculated current statistical value with the expected value of the day. .
여기서 제1단계의 정상적으로 발생하는 이벤트 정보는,Here, the event information normally occurring in the first step is
정상적인 시스템 운영 중 특정 시점에서 발생하는 특정 이벤트 정보와, 정상적인 시스템 운영 중 주기적으로 발생하는 주기적인 이벤트 정보 및 정상적인 시스템 운영 중 비 주기적으로 발생하는 불특정 이벤트 정보를 포함한다.Specific event information that occurs at a specific time during normal system operation, periodic event information that occurs periodically during normal system operation, and non-specific event information that occurs non-periodically during normal system operation are included.
또한, 제2단계의 시점별 이벤트는 정상적인 시스템 운영 중 특정 시점에서 발생하는 이벤트 정보와 정상적인 시스템 운영 중 주기적으로 발생하는 주기적인 이벤트 정보와 같은 특정 시점을 포함하는 이벤트인 것을 특징으로 한다.In addition, the event by time of the second step is an event including a specific time point such as event information occurring at a specific time point during normal system operation and periodic event information periodically occurring during normal system operation.
또한, 상기 제4단계는,In addition, the fourth step,
상기 시점별 이벤트 발생시간 이내에 불특정 이벤트 정보가 수신되면, 당일 시간별 통계를 수행하여 현재 통계치를 설정하는 단계와,If the unspecified event information is received within the event occurrence time of each time point, setting current statistics by performing hourly statistics on the day;
과거 이벤트 통계로부터 당일 이벤트 예상치를 산출하는 단계와;Calculating a daily event estimate from historical event statistics;
상기 설정한 현재 통계치와 상기 당일 이벤트 예상치를 비교하여 차이를 산출하는 단계와;Calculating a difference by comparing the set current statistics with the expected event of the day;
상기 산출한 차이와 미리 설정된 오차 범위를 비교하여, 상기 산출한 차이가 오차 범위 이내이면 이벤트 정보를 수신하는 단계로 되돌아가고, 상기 산출한 차이가 상기 오차 범위를 벗어난 경우에는 장애 발생으로 판단하고, 장애 발생을 표시해주는 단계를 포함한다.Comparing the calculated difference with a preset error range, and if the calculated difference is within the error range, returns to the step of receiving event information; if the calculated difference is out of the error range, it is determined that a failure occurs; And indicating the occurrence of the failure.
본 발명에 따르면, IT(정보기술) 시스템 관리분야 또는 산업분야에 걸쳐 운영을 담당하는 운영 시스템(업무처리기기, 소프트웨어, 시스템, 업무처리 프로세스, 기타 등등)에서 장애 이벤트의 발생이 가능할 경우에도 장애 발생 여부를 정확하게 판단할 수 있음은 물론 운영 시스템의 심각한 손상으로 장애 이벤트의 발생이 불가능한 경우에도 운영 시스템의 장애 발생 여부를 정확하게 판단할 수 있는 장점이 있다.According to the present invention, even in the event of occurrence of a failure event in an operating system (business processing device, software, system, business processing process, etc.) in charge of operations across IT (information technology) system management or industry, In addition, it is possible to accurately determine whether or not the occurrence of a failure event due to serious damage of the operating system, there is an advantage that can accurately determine whether or not the failure of the operating system.
또한, 시스템의 운영 중 정상적으로 발생하는 이벤트의 정보를 분석함으로써 운영시스템의 장애 발생 여부를 정확하게 판단할 수 있는 장점이 있다.In addition, there is an advantage that can accurately determine whether the failure of the operating system by analyzing the information of the event that normally occurs during the operation of the system.
도 1은 본 발명이 적용되는 장애발생 판단시스템의 개략적인 구성을 보인 블록도.1 is a block diagram showing a schematic configuration of a failure determination system to which the present invention is applied.
도 2는 본 발명에 따른 운영시스템의 장애 발생 판단 방법을 보인 흐름도.2 is a flow chart showing a failure occurrence determination method of the operating system according to the present invention.
도 3은 본 발명에서 정상적인 운영 중 특정 시점에 발생하는 특정 이벤트를 통한 장애발생을 판단하는 방법을 설명하기 위한 설명도.Figure 3 is an explanatory diagram for explaining a method of determining the occurrence of a failure through a specific event occurring at a specific time during normal operation in the present invention.
<도면의 주요 부분에 대한 부호의 설명><Explanation of symbols for the main parts of the drawings>
100… 운영 시스템100... Operating system
200… 장애 발생 판단시스템200... Fault occurrence judgment system
210… 데이터 인터페이스기210... Data interface
220… 장애 발생 판단부220... Failure occurrence judgment part
230… 메모리230... Memory
240… 장애 여부 표시부240... Failure Status Display
이하 본 발명의 바람직한 실시 예를 첨부한 도면에 의거 상세히 설명하면 다음과 같다. 본 발명을 설명하기에 앞서 관련된 공지 기능 혹은 구성에 대한 구체적인 설명이 본 발명의 요지를 불필요하게 흐릴 수 있다고 판단되는 경우에는 그에 대한 상세한 설명은 생략한다.Hereinafter, described in detail with reference to the accompanying drawings a preferred embodiment of the present invention. If it is determined that the detailed description of the known function or configuration related to the present invention may unnecessarily obscure the subject matter of the present invention, the detailed description thereof will be omitted.
도 1은 본 발명이 적용되는 장애 발생 판단 시스템의 개략 구성도로서, 운영 시스템(100)과 장애 발생 판단 시스템(200)으로 구성된다.1 is a schematic configuration diagram of a failure occurrence determination system to which the present invention is applied, and includes an operating system 100 and a failure occurrence determination system 200.
운영 시스템(100)은 업무처리기기, 소프트웨어, 시스템, 업무처리 프로세스 등을 통칭한 것으로서, 장치 또는 시스템 전체 동작을 제어 및 운영하는 기능을 수행하며, 작업의 진행 상황을 나타내는 (정상)이벤트와 장애를 감지한 경우 장애 이벤트를 실시간 또는 주기적으로 발생하는 기능을 수행한다.The operating system 100 collectively refers to a business processing device, software, a system, a business processing process, and performs a function of controlling and operating a device or a system as a whole, and (normal) events and failures indicating the progress of a job. If it detects a fault, it performs a function that generates a fault event in real time or periodically.
장애 발생 판단 시스템(200)은 상기 운영 시스템(100)과 연결되어, 상기 운영 시스템(100)으로부터 전송된 장애 이벤트를 수신하면 장애 발생을 표시해주고, 상기 운영 시스템(100)으로부터 전송된 이벤트 정보를 분석하여 장애 발생 여부를 판단하여, 장애 발생으로 판단되면 이를 표시해주는 기능을 수행한다.The failure occurrence determination system 200 is connected to the operating system 100, displays a failure occurrence when receiving a failure event transmitted from the operating system 100, and displays the event information transmitted from the operating system 100. It analyzes and determines whether a failure occurs, and if it is determined that a failure occurs, it displays a function.
이러한 장애 발생 판단 시스템(200)은 상기 운영 시스템(100)과 연결 설정을 수행한 후 이벤트 정보를 요청하거나 전송된 이벤트 정보를 수신하기 위한 데이터 인터페이스기(210), 상기 수신한 이벤트 정보를 저장하며, 과거 이벤트 정보에 대한 통계 데이터 등을 저장한 메모리(230), 상기 데이터 인터페이스기(210)를 통해 이벤트 정보가 수신되면 상기 메모리(230)에 저장된 통계 데이터와의 비교를 통해 장애 발생 여부를 판단하고, 그 판단 결과 장애 발생으로 판단되면 장애 발생을 표시하도록 제어하는 장애 발생 판단부(220), 상기 장애 발생 판단부(220)와 연동하여 상기 운영 시스템(100)의 장애 발생시 이를 표시해주는 장애 여부 표시부(240)를 포함한다.The failure occurrence determination system 200 stores the received event information and the data interface 210 for requesting event information or receiving the transmitted event information after performing connection setting with the operating system 100. When the event information is received through the memory 230 and the data interface 210 storing the statistical data about the past event information, it is determined whether a failure occurs through comparison with the statistical data stored in the memory 230. And, if it is determined that the failure occurs, the failure occurrence determination unit 220 to control the display of the occurrence of the failure, the failure occurrence determination unit 220 in conjunction with the failure occurrence determination unit 220 indicates whether the failure occurs when the failure of the operating system 100 The display unit 240 is included.
주지한 바와 같이 구성되는 장애 발생 판단 시스템(200)은 하나의 모듈로 구현하는 것이 바람직하다. 하나의 모듈로 구현한 경우, 특정 운영 시스템에 연결만 설정하면 그 연결된 운영 시스템의 장애 발생 여부를 판단하는 장치로 사용할 수 있다.It is preferable that the failure occurrence determination system 200 configured as described above is implemented as one module. When implemented as a module, it can be used as a device to determine whether a failure of the connected operating system is established by establishing a connection to a specific operating system.
도 2는 본 발명에 따른 "운영시스템의 장애 발생 판단 방법"의 바람직한 실시 예를 보인 흐름도로서, 상기 장애 발생 판단 시스템(200) 내의 장애 발생 판단부(220)에서 소프트웨어적으로 장애 발생 여부를 판단하는 프로세스로서, S는 단계(Step)를 나타낸다.2 is a flowchart showing a preferred embodiment of a "method for determining a failure occurrence of an operating system" according to the present invention, wherein the failure occurrence determination unit 220 in the failure occurrence determination system 200 determines whether a failure occurs in software. As a process of doing so, S represents a step.
이에 도시된 바와 같이, 본 발명에 따른 "운영시스템의 장애 발생 판단 방법"은, 운영시스템(100)과의 연결 설정을 시도하여 정상적으로 연결이 설정되면, 상기 운영시스템(100)에 정상적으로 발생하는 이벤트 정보를 요청하는 제1단계(S101 ~ S105)와; 상기 요청한 이벤트 정보가 미리 설정된 시점별 이벤트 발생시간 이내에 수신되는지를 확인하는 제2단계(S107 ~ S109)와; 상기 시점별 이벤트 발생시간 이내에 이벤트 정보가 미발생되면, 장애 발생으로 판단하고 이를 표시해주는 제3단계(S111 ~ S113); 및 상기 시점별 이벤트 발생시간 이내에 불특정 이벤트 정보가 수신되면, 당일 시간별로 통계치(현재 통계치)를 산출하고, 그 산출한 현재 통계치와 당일 예상치의 비교를 통해 장애 발생 여부를 판단하는 제4단계(S115 ~ S123)으로 이루어진다.As shown therein, the "method of determining a failure occurrence of an operating system" according to the present invention is an event that normally occurs in the operating system 100 when a connection is established normally by attempting to establish a connection with the operating system 100. First steps S101 to S105 for requesting information; A second step (S107 to S109) of checking whether the requested event information is received within a preset event occurrence time for each time point; If event information is not generated within the event occurrence time of each time point, determining that a failure has occurred and displaying the third step (S111 to S113); And if the unspecified event information is received within the event occurrence time of each time point, calculating a statistical value (current statistical value) for each time of the day, and determining whether or not a failure occurs by comparing the calculated current statistical value with the expected value of the day (S115). S123).
이와 같이 이루어지는 본 발명에 따른 "운영시스템의 장애 발생 판단 방법"은 운영 시스템(100)과 장애 발생 판단 시스템(200)을 물리적(예를 들어, 통신 라인)으로 연결한 상태에서, 단계 S101에서와 같이 통신을 통해 운영 시스템(100)과 연결 설정을 시도한다.As described above, the "method for determining a failure occurrence of the operating system" according to the present invention is performed in step S101 in a state in which the operating system 100 and the failure occurrence determination system 200 are physically connected (for example, a communication line). Attempts to establish a connection with the operating system 100 through communication as shown.
이후 운영 시스템(100)으로부터 연결 설정에 대한 응답이 수신되면 정상적인 연결로 판단을 하게 되며, 상기 운영 시스템(100)으로부터 연결 설정에 대한 응답이 수신되지 않으면 전술한 단계 S101로 이동하여 운영 시스템과의 연결 설정을 재 시도하게 된다(S103).Thereafter, if a response to the connection establishment is received from the operating system 100, a determination is made as a normal connection. If a response to the connection establishment is not received from the operating system 100, the process moves to the above-described step S101 to establish a connection with the operating system. The connection setting is retried (S103).
운영 시스템(100)과의 연결 설정이 정상적으로 이루어진 후에는, 단계 S105에서 상기 운영 시스템(100)에 정상적으로 발생하는 이벤트(Event) 정보를 요청하게 된다(S105).After the connection setting with the operating system 100 is normally made, an event (Event) information normally occurring is requested to the operating system 100 in step S105 (S105).
여기서 정상적으로 발생하는 이벤트 정보(S105)는, 정상적인 시스템 운영 중 특정 시점에서 발생하는 특정 이벤트 정보와, 정상적인 시스템 운영 중 주기적으로 발생하는 주기적인 이벤트 정보 및 정상적인 시스템 운영 중 비 주기적으로 발생하는 불특정 이벤트 정보를 포함한다.The event information (S105) that normally occurs here includes specific event information that occurs at a specific point in time during normal system operation, periodic event information that occurs periodically during normal system operation, and unspecified event information that occurs non-periodically during normal system operation. It includes.
단계 S107에서는 이벤트 정보가 수신되는지를 확인하여, 이벤트 정보가 수신되지 않을 경우에는 단계 S109로 이동하여 시점별 이벤트 발생시간이 경과했는지를 확인하게 되고, 이 확인 결과 시점별 이벤트 발생 시간이 지나지 않았을 경우에는 단계 S107로 이동하고, 이와는 달리 시점별 이벤트 발생시간이 경과했을 경우에는 단계 S111로 이동하여 장애 발생으로 판단을 하고, 단계 S113에서 장애 여부 표시부(240)를 통해 장애가 발생했음을 표시해주게 된다.In step S107, it is checked whether the event information is received, and if the event information is not received, go to step S109 to check whether the event occurrence time for each point of time has elapsed. In step S107, if the event occurrence time for each point of time elapses, the flow moves to step S111 to determine that a failure occurs, and in step S113, a failure status display unit 240 indicates that a failure has occurred.
이를 좀 더 구체적으로 설명하면, 시점별 이벤트는 정상적인 시스템 운영 중 특정 시점에서 발생하는 이벤트와 정상적인 시스템 운영 중 주기적으로 발생하는 주기적인 이벤트가 있다.In more detail, the time-based event includes an event occurring at a specific point in time during normal system operation and a periodic event occurring periodically during normal system operation.
정상적인 운영 중 특정 시점에 발생하는 이벤트를 통해 장애 발생 여부를 판단하는 것은, 특정 시점에 특정 이벤트가 발생하지 않을 경우 해당 이벤트를 발생시키는 작업이 수행되지 않았다는 것이므로 이러한 결과를 토대로 장애를 판단할 수 있다. 이러한 장애는 주로 업무 프로그램의 장애 감지를 위해 적용될 수 있다.Determining whether a failure occurs through an event occurring at a specific time during normal operation means that a failure is not performed based on the result because a task for generating a corresponding event is not performed when a specific event does not occur at a specific time. . These obstacles can be applied primarily for detecting failures in business programs.
상기와 같은 방법을 온라인 시스템 운영시에 적용한 경우의 예가 도 3에 도시되었다.An example of applying the above method when operating an online system is shown in FIG. 3.
여기서 발생시점은 이벤트가 발생하는 특정 시점을 의미하며, Event_id는 발생한 이벤트를 판단하기 위한 고유번호를 의미하고, 이벤트 의미는 발생한 이벤트가 무엇을 나타내는지를 의미하며, 발생 시점 내 이벤트 미 발생시의 장애 명을 의미한다. 따라서, 08:00 ~09:00시간에 ONL 001이라는 이벤트가 발생하면 온라인 프로그램의 정상기동이라는 것을 알 수 있으나, 이러한 이벤트가 해당 시간에 발생하지 않으면 장애 발생 판단 시스템(200)은 온라인 프로그램의 정상기동이 실패하였다는 장애로 판단을 하게 되는 것이다.Here, the occurrence point means a specific time point at which the event occurs, the Event_id means a unique number for determining the event that occurred, and the event meaning indicates what the event has occurred, and the name of the failure when the event does not occur within the time point. Means. Therefore, when an event called ONL 001 occurs from 08:00 to 09:00 hours, it can be known that the online program is normally started. However, if such an event does not occur at the corresponding time, the failure occurrence determination system 200 determines that the online program is normal. It is judged that the maneuver failed.
여기서 Event_id로만 이벤트 미발생의 체크를 위한 정확한 사안을 판별할 수 없을 경우, 이벤트의 내용을 검색함으로써, 정확한 사안을 확인할 수도 있다.Here, if it is not possible to determine the exact matter for checking for the occurrence of no event only by Event_id, the exact matter may be confirmed by searching the contents of the event.
다음으로, 정상적인 운영 중 주기적으로 발생하는 특정 이벤트를 통해 장애 발생 여부를 판단하는 것은, 해당 주기 내에 이벤트가 미발생하는 경우 해당 이벤트를 발생시키는 작업이 수행되지 않았다는 것이며, 이러한 내용을 토대로 장애 발생 여부를 정확하게 판단할 수 있게 된다. 이러한 방법은 주오 장비제어 프로그램의 장애 감지를 위해 적용될 수 있다.Next, determining whether a failure occurs through a specific event that occurs periodically during normal operation means that if the event does not occur within the period, the task for generating the event was not performed. Can be accurately judged. This method can be applied for fault detection of Chuo equipment control program.
예를 들어, 일정주기(1분)로 온도를 검출하는 온도측정 프로그램에서 1분 단위로 온도를 측정하되, 최종 이벤트 발생 시점으로부터 1분 경과 후에도 새로운 이벤트가 발생하지 않을 경우에는 장애가 발생한 것으로 판단을 하며, 다른 방법으로는 매 1분마다 이벤트 발생을 확인하여 최근 1분 내에 발생한 이벤트가 없을 경우에도 장애가 발생한 것으로 판단을 하게 되는 것이다.For example, if the temperature measurement program measures the temperature by 1 minute in a temperature measurement program that detects the temperature at a certain period (1 minute), but if a new event does not occur even after 1 minute has elapsed from the last event occurrence, it is determined that a failure has occurred. Another method is to check the occurrence of the event every minute and determine that a failure has occurred even if no event occurred within the last minute.
한편, 상기와 같은 특정 시점별 이벤트의 확인을 통해 장애 발생 시점을 판단하는 것 이외에 불특정 시점에서 발생하는 이벤트의 확인을 통해서도 장애 발생 여부를 확인할 수 있다. 이를 위해서는 통계적인 분석이 필요하며, 요일, 근무일, 시간대, 월초/월말 등의 상황 및 시간별 발생되는 이벤트의 양과 현재 시점에 발생되고 있는 이벤트의 양을 비교하여, 설정한 범위 이상의 상이한 차이가 날 경우 장애 또는 이상상황의 가능성을 경고할 수 있다.On the other hand, in addition to determining the point of failure occurrence by checking the event for each specific time point as described above, it is possible to determine whether or not the failure occurs through the confirmation of the event occurring at an unspecified time. To do this, statistical analysis is required, and when there is a difference between the set days, working days, time zones, the beginning of the month and the end of the month, and the amount of events generated by time and the amount of events occurring at the present time, the difference is greater than the set range. It may warn of the possibility of failure or abnormality.
이를 좀 더 구체적으로 설명하면, 단계 S115에서 당일 시간별 발생하는 이벤트의 통계를 통해 현재 통계치를 설정하고, 단계 S117에서는 메모리(230)에 미리 저장된 과거 이벤트 통계로부터 당일 해당 시간별 이벤트의 예상치를 산출한다. 즉, 지난 이벤트 통계(요일별/시간대별 이벤트 수, 증감률, 일간 이벤트 수, 증감률, 기타 등등)를 통해 당일 이벤트 수와 시간별 이벤트 수를 산출하여 이벤트 예상치를 산출하게 된다. 여기서 이벤트 예상치 산출은 상기 당일 시간별 통계치를 산출하는 단계 이후에 산출하는 것으로 설명하였으나, 바람직하게는 미리 산출하여 메모리에 저장해 놓음으로써, 장애 발생 여부를 판단하는 시간을 단축할 수 있다.In more detail, in step S115, current statistics are set through statistics of events occurring every hour on the day, and in step S117, an estimated value of the corresponding hourly events is calculated from past event statistics stored in the memory 230 in advance. That is, the event prediction value is calculated by calculating the number of events on the day and the number of events over time through the past event statistics (events per day / time zone, increase / decrease rate, daily event number, increase / decrease rate, etc.). In this case, the event estimate value is calculated after the step of calculating the hourly statistics on the day, but preferably, the calculation time is stored in advance in memory, thereby reducing the time for determining whether a failure occurs.
이후 단계 S119에서 현재 통계치와 상기 이벤트 예상치를 비교하게 되고, 단계 S121에서 상기 현재 통계치와 당일 예상치 간의 차이를 산출한다. 이후 산출한 차이가 미리 설정한 오차 범위 이내인가를 확인한다. 여기서 미리 설정한 오차 범위는 불특정하게 발생하는 이벤트이기 때문에 당일 또는 시간대별로 발생하는 이벤트의 수가 달라질 수 있다. 따라서 현재 통계치와 당일 예상치의 차이가 발생하였다고 해서 무조건 장애가 발생한 것으로 간주를 하면 장애를 판단하는 것에 정확성이 떨어진다. 따라서 본 발명에서는 장애 발생 여부를 판단하는 것에 있어서, 좀 더 정확성을 도모하기 위해서 오차 범위를 설정하고, 해당 오차 범위 내에서는 이벤트 발생 차이가 발생하여도 정상적인 시스템 운영으로 간주를 하며, 이와는 달리 상기 산출한 이벤트 발생 차이가 상기 오차 범위를 벗어난 경우에만 시스템에 장애가 발생한 것으로 간주를 하게 된다. 오차 범위의 설정은 이벤트의 수를 의미하며, 적용하는 시스템의 특성을 고려하여 적절하게 설정하는 것이 바람직하다.Thereafter, in step S119, the current statistics value is compared with the event prediction value, and in step S121, a difference between the current statistics value and the current day value is calculated. After that, check whether the calculated difference is within a preset error range. Here, since the preset error range is an event that occurs unspecified, the number of events that occur on the day or for each time zone may vary. Therefore, if there is a difference between the current statistics and today's forecast, it is less accurate to judge the disability. Accordingly, in the present invention, in determining whether a failure occurs, an error range is set for more accuracy, and even if an event occurrence difference occurs within the error range, it is regarded as normal system operation. Only when an event occurrence difference is out of the above error range is considered a system failure. The setting of the error range means the number of events, and it is preferable to set the error range properly considering the characteristics of the system to be applied.
본 발명은 상술한 특정의 바람직한 실시 예에 한정되지 아니하며, 청구범위에서 청구하는 본 발명의 요지를 벗어남이 없이 당해 발명이 속하는 기술분야에서 통상의 지식을 가진 자라면 누구든지 다양한 변형실시가 가능한 것은 물론이고, 그와 같은 변경은 청구범위 기재의 범위 내에 있게 된다.The present invention is not limited to the above-described specific preferred embodiments, and various modifications can be made by any person having ordinary skill in the art without departing from the gist of the present invention claimed in the claims. Of course, such changes will fall within the scope of the claims.
이상 상술한 본 발명은 IT 관련 분야나 산업 분야의 운영 시스템에 적용하여 장애 발생 여부를 정확하게 감지할 수 있음은 물론, 데이터 인터페이스만 되면 해당 시스템의 장애 발생 여부를 판단할 수 있으므로, 운영시스템을 이용하는 모든 분야에 확대 적용이 가능하다.The present invention as described above can be applied to the operating system of the IT-related field or the industrial field to accurately detect the occurrence of a failure, as well as to determine whether the failure of the corresponding system only if the data interface, using the operating system It can be extended to all fields.

Claims (3)

  1. 장비 또는 기기 등을 운영하는 운영시스템과 연결되어 상기 운영시스템의 장애 발생 여부를 판단하는 장애 발생 판단 시스템에서 운영시스템 장애 발생 판단 방법에 있어서,In the method of determining the failure of the operating system in the failure occurrence determination system that is connected to the operating system for operating equipment or devices, etc. to determine whether the failure of the operating system,
    상기 운영시스템과의 연결 설정을 시도하여 정상적으로 연결이 설정되면, 상기 운영시스템에 정상적으로 발생하는 이벤트 정보를 요청하는 제1단계와;A first step of requesting event information normally occurring in the operating system when the connection is established normally by attempting to establish a connection with the operating system;
    상기 요청한 이벤트 정보가 미리 설정된 시점별 이벤트 발생시간 이내에 수신되는지를 확인하는 제2단계와;A second step of checking whether the requested event information is received within a preset event occurrence time for each time point;
    상기 시점별 이벤트 발생시간 이내에 이벤트 정보가 미발생되면, 장애 발생으로 판단하고 이를 표시해주는 제3단계; 및A third step of determining that a failure occurs and displaying the event information if the event information does not occur within the event occurrence time of each time point; And
    상기 시점별 이벤트 발생시간 이내에 불특정 이벤트 정보가 수신되면, 당일 시간별로 통계치(현재 통계치)를 산출하고, 그 산출한 현재 통계치와 당일 예상치의 비교를 통해 장애 발생 여부를 판단하는 제4단계를 포함하며, When the unspecified event information is received within the event occurrence time of each time point, a fourth step of calculating a statistical value (current statistical value) for each time of the day and determining whether or not a failure occurs by comparing the calculated current statistical value with the expected value on the day; ,
    상기 제4단계는,The fourth step,
    상기 시점별 이벤트 발생시간 이내에 불특정 이벤트 정보가 수신되면, 당일 시간별 통계를 수행하여 현재 통계치를 설정하는 단계와,If the unspecified event information is received within the event occurrence time of each time point, setting current statistics by performing hourly statistics on the day;
    과거 이벤트 통계로부터 당일 이벤트 예상치를 산출하는 단계와;Calculating a daily event estimate from historical event statistics;
    상기 설정한 현재 통계치와 상기 당일 이벤트 예상치를 비교하여 차이를 산출하는 단계와;Calculating a difference by comparing the set current statistics with the expected event of the day;
    상기 산출한 차이와 미리 설정된 오차 범위를 비교하여, 상기 산출한 차이가 오차 범위 이내이면 이벤트 정보를 수신하는 단계로 되돌아가고, 상기 산출한 차이가 상기 오차 범위를 벗어난 경우에는 장애 발생으로 판단하고, 장애 발생을 표시해주는 단계를 포함하는 것을 특징으로 하는 운영시스템의 장애 발생 판단방법.Comparing the calculated difference with a preset error range, and if the calculated difference is within the error range, returns to the step of receiving event information; if the calculated difference is out of the error range, it is determined that a failure occurs; Method for determining the failure occurrence of the operating system comprising the step of indicating the occurrence of the failure.
  2. 제1항에 있어서, 상기 제1단계의 정상적으로 발생하는 이벤트 정보는,The event information of claim 1, wherein the event information that occurs normally in the first step comprises:
    정상적인 시스템 운영 중 특정 시점에서 발생하는 특정 이벤트 정보와, 정상적인 시스템 운영 중 주기적으로 발생하는 주기적인 이벤트 정보 및 정상적인 시스템 운영 중 비 주기적으로 발생하는 불특정 이벤트 정보를 포함하는 것을 특징으로 하는 운영시스템의 장애 발생 판단방법.Operating system failures that include specific event information that occurs at a specific time during normal system operation, periodic event information that occurs periodically during normal system operation, and unspecified event information that occurs non-periodically during normal system operation. How to determine the occurrence.
  3. 제1항에 있어서, 상기 제2단계의 시점별 이벤트는 정상적인 시스템 운영 중 특정 시점에서 발생하는 이벤트 정보와 정상적인 시스템 운영 중 주기적으로 발생하는 주기적인 이벤트 정보와 같은 특정 시점을 포함하는 이벤트인 것을 특징으로 하는 운영시스템의 장애 발생 판단방법.The event-based event of the second step is an event including a specific time point such as event information occurring at a specific time point during normal system operation and periodic event information periodically occurring during normal system operation. How to determine the failure of the operating system.
PCT/KR2010/000243 2010-01-15 2010-01-15 Method for determining problem occurrence in operating system WO2011087167A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/KR2010/000243 WO2011087167A1 (en) 2010-01-15 2010-01-15 Method for determining problem occurrence in operating system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/KR2010/000243 WO2011087167A1 (en) 2010-01-15 2010-01-15 Method for determining problem occurrence in operating system

Publications (1)

Publication Number Publication Date
WO2011087167A1 true WO2011087167A1 (en) 2011-07-21

Family

ID=44304425

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/KR2010/000243 WO2011087167A1 (en) 2010-01-15 2010-01-15 Method for determining problem occurrence in operating system

Country Status (1)

Country Link
WO (1) WO2011087167A1 (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6292905B1 (en) * 1997-05-13 2001-09-18 Micron Technology, Inc. Method for providing a fault tolerant network using distributed server processes to remap clustered network resources to other servers during server failure
US7457991B1 (en) * 2002-12-03 2008-11-25 Unisys Corporation Method for scanning windows event logs on a cellular multi-processor (CMP) server
KR100937098B1 (en) * 2008-07-28 2010-01-15 임태환 Problem occurrence check method from events

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6292905B1 (en) * 1997-05-13 2001-09-18 Micron Technology, Inc. Method for providing a fault tolerant network using distributed server processes to remap clustered network resources to other servers during server failure
US7457991B1 (en) * 2002-12-03 2008-11-25 Unisys Corporation Method for scanning windows event logs on a cellular multi-processor (CMP) server
KR100937098B1 (en) * 2008-07-28 2010-01-15 임태환 Problem occurrence check method from events

Similar Documents

Publication Publication Date Title
CN103443727B (en) Abnormality detection system and method for detecting abnormality
CN113050019B (en) Voltage transformer evaluation method and system for fusing data-driven evaluation result and verification procedure
EP2492771B1 (en) Apparatus, system and method for correcting measurement data
JP2010526352A (en) Performance fault management system and method using statistical analysis
CN105204952B (en) A kind of multi-core operation System Fault Tolerance management method
EP3627264B1 (en) Plant assessment system and plant assessment method
CN102129372A (en) Root cause problem identification through event correlation
CN100394394C (en) Fault tolerant duplex computer system and its control method
US8448025B2 (en) Fault analysis apparatus, fault analysis method, and recording medium
WO2020138695A1 (en) Method for detecting integrity index of driving unit
CN106525107A (en) Method for identifying failure of sensor through arbitration
CN1808999A (en) Method and apparatus of CPU fault detection for signal processing unit
CN105933176B (en) A kind of method and device detecting Host Status
KR20200002337A (en) Fault diagnosis and automatic recovery system based on data sharing
CN107688547A (en) A kind of method and system of controller active-standby switch
WO2011087167A1 (en) Method for determining problem occurrence in operating system
KR100937098B1 (en) Problem occurrence check method from events
CN103412091A (en) PH value measuring method and device
CN115774159A (en) Fault detection system for power unit of high-voltage frequency converter
JP5623449B2 (en) Report creation apparatus, report creation program, and report creation method
Kim et al. Reliability modeling of digital component in plant protection system with various fault-tolerant techniques
CN106354635A (en) Embedded device procedure code segment self-inspection method and device
CN108303115A (en) A kind of operating range computation processing method and server
JP2010108282A (en) Station service system
CN115548481B (en) Fault diagnosis method, energy storage system and computer equipment

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 10843233

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: COMMUNICATION NOT DELIVERED. NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205N DATED 21.09.2012)

122 Ep: pct application non-entry in european phase

Ref document number: 10843233

Country of ref document: EP

Kind code of ref document: A1