US20100268997A1 - Method and device for monitoring and controlling the operational performance of a computer processor system - Google Patents

Method and device for monitoring and controlling the operational performance of a computer processor system Download PDF

Info

Publication number
US20100268997A1
US20100268997A1 US12/763,943 US76394310A US2010268997A1 US 20100268997 A1 US20100268997 A1 US 20100268997A1 US 76394310 A US76394310 A US 76394310A US 2010268997 A1 US2010268997 A1 US 2010268997A1
Authority
US
United States
Prior art keywords
computer
processor system
parameters
monitored
limit values
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
US12/763,943
Inventor
Peter Planki
Karl-Heinz Lettmair
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
KRUTH DIETER
SPERLING AXEL
Original Assignee
Peter Planki
Karl-Heinz Lettmair
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Family has litigation
First worldwide family litigation filed litigation Critical https://patents.darts-ip.com/?family=7920941&utm_source=google_patent&utm_medium=platform_link&utm_campaign=public_patent_search&patent=US20100268997(A1) "Global patent litigation dataset” by Darts-ip is licensed under a Creative Commons Attribution 4.0 International License.
Application filed by Peter Planki, Karl-Heinz Lettmair filed Critical Peter Planki
Priority to US12/763,943 priority Critical patent/US20100268997A1/en
Publication of US20100268997A1 publication Critical patent/US20100268997A1/en
Assigned to KRUTH, DIETER, SPERLING, AXEL reassignment KRUTH, DIETER ASSIGNMENT OF UNDIVIDED INTEREST Assignors: LETTMAIR, KARL-HEINZ, PLANKI, PETER
Abandoned legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3058Monitoring arrangements for monitoring environmental properties or parameters of the computing system or of the computing system component, e.g. monitoring of power, currents, temperature, humidity, position, vibrations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F1/00Details not covered by groups G06F3/00 - G06F13/00 and G06F21/00
    • G06F1/16Constructional details or arrangements
    • G06F1/20Cooling means
    • G06F1/206Cooling means comprising thermal management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3003Monitoring arrangements specially adapted to the computing system or computing system component being monitored
    • G06F11/3006Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system is distributed, e.g. networked systems, clusters, multiprocessor systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2221/00Indexing scheme relating to security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F2221/21Indexing scheme relating to G06F21/00 and subgroups addressing additional information or applications relating to security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F2221/2101Auditing as a secondary aspect

Definitions

  • the present invention relates to a method and device for monitoring and controlling the operational performance of a computer or processor system and a device for accomplishing this method.
  • Serviceability and operational reliability of components, assembly groups, devices and hence a computer or processor system as a whole is only protected within certain tolerance zones of physical values in their environment. These physical values are particularly temperature, but also air humidity, air flow, freedom of dust and percussions. Depending upon the field of application of the system to be monitored, brightness oscillations, chemical pollutions or other variables may also be of importance. If one or more of these values lie beyond the predetermined tolerance zones, this may lead to interferences of the performance of the respective component, but also to a complete failure thereof. At worst, the failure of one individual component may lead to a collapse of the complete system.
  • temperature monitoring systems are known measuring the temperature at individual components of the system and when detecting an inadmissibly increased temperature switch off the respective component, for example, or—in case of a processor—effect a decrease of performance by mans of reducing the clock frequency. In particularly critical cases a controlled shutdown of the complete system is effected.
  • a reaction is initiated in dependence upon the kind and intensity of a fault occurring in the system to be monitored, said reaction avoiding damages of components, assembly groups, devices and consequently of the computer or processor system as a whole, which would have occurred in cased of an unrestricted continuation of the operation. If the parameters lie beyond tolerable limit values a controlled shutdown of the complete system may be initiated. Moreover, there is the possibility of re-activating or running up individual components or even the complete system, if the fault has been removed or at least reduced.
  • the detected operational parameters or environmental parameters are not absolutely measured values but also temporal changes of these measured values. This offers the possibility to meet appropriate countermeasures. Thus, a very rapid temperature rise of a monitored component leads to another reaction than a merely moderate rise. It may furthermore be provided that besides the transmission of the control command corresponding to a selected reaction also a corresponding information signal is to be issued in an optical or acoustic form, in order to inform a service staff as soon as possible of place and reason of the fault. This information signal may also be the transmission of a SMS-message.
  • the device according to the invention for monitoring and controlling the operational performance on the one hand comprises first sensors for detecting operational parameters and on the other hand second sensors for detecting environmental parameters of the system.
  • the inventive device may comprise an acoustic or optical output means for outputting a message corresponding to the operational event message and/or the transmitted control command.
  • a transmitting device for communicating this message for example in form of a SMS-message, may be provided. The independent control of the system is guaranteed in that the monitoring device is part of a computer which is separate from the system to be monitored.
  • FIG. 1 shows an inventive device for monitoring a computer system in a schematic view
  • FIGS. 2 to 4 show different examples for explaining the reaction to the temperature rise of a component to be monitored.
  • FIG. 1 shows the monitoring of a mainframe computer 1 by an inventive monitoring device 2 .
  • first sensors 3 are arranged in said mainframe computer 1 , detecting operational parameters of individual components or assembly groups of said mainframe computer and transmitting said data via respective lines 4 to said monitoring device 2 .
  • Said first sensors 3 are for example temperature sensors, but also sensors for detecting voltage fluctuations, percussions or other values which are relevant for the operation.
  • second sensors are provided for detecting parameters in the environment of said mainframe computer 1 , as for example sensors for detecting chemical pollutions of the air, dust or smoke, air humidity or in certain cases also of ionising radiation. These sensors may particularly be temperature sensors.
  • the measured values detected by said second sensors are also transmitted via respective lines 6 to said monitoring device 2 .
  • the operational and environmental parameters detected by said first and second sensors 3 and 5 first of all are being processed in a monitoring unit 7 of said monitoring device 2 , whereby the detected values are compared to limit values, which are listed in a first memory 8 .
  • limit values which are listed in a first memory 8 .
  • a lower, a mean as well as an upper limit value are provided so that it is possible to react specifically to the occurrence of a fault.
  • control unit 9 selects one control command corresponding to the operational event message from a number of predetermined reaction patterns contained in a second memory 10 , and transmits said control command to the mainframe computer 1 .
  • This control command contains instructions for altering the operational performance and for example may be the instruction to shut down individual components or put them into a sleep modus or to reduce the capacity of the system. Furthermore, also the command to shut down the complete system may be transmitted.
  • the reaction patterns are chosen such that the mainframe computer 1 and the programs running thereon may still continue under the new operational conditions predetermined by said reaction patterns, if this is justifiable.
  • a control command transferred from said monitoring device 2 to said mainframe computer 11 may contain, however, to run up the system again and to re-activate components which have been shut down before. If the monitoring unit has generated an operational event message or the control unit has transmitted a control command, simultaneously a respective information signal may be transmitted to a transmission device 15 via a second output line 14 . Then, for example, respective SMS-messages may be transmitted to the service staff by means of said transmission device 15 . As an alternative there is also the possibility of applying an optical or acoustic output means instead of a transmission device.
  • the complete monitoring device 2 is part of a computer which is separate from the monitored mainframe computer 1 .
  • the flexibility of the inventive device is guaranteed in that new limit values and new reaction patters may be inscribed into the two memories 8 and 10 via input lines 12 and 13 . ⁇ this provides the possibility of a reaction to changes in the configuration of the system to be monitored at any time. This further provides the possibility of an isolated view not only of the performance of individual operational or environmental parameters, but to evaluate them in combination and to react accordingly.
  • a slight temperature increase of a monitored component does not necessarily have to lead to a shutdown of this component, if an adjacent component shows a clearly increased temperature, as the reason for the temperature increase of said first component very likely is to be found in the severe overheating of the adjacent component. In such a case, it is first sufficient to only shut down the severely overheated component.
  • FIGS. 2 to 4 show the temperature course of a component be monitored, for example a processor.
  • three different limit values, a lower, a mean and an upper limit value are defined, causing different reactions when being exceeded or fallen below of.
  • the example shown in FIGS. 2 to 4 not only refers to the absolute temperature value but also to the course of time.
  • a moderate temperature increase is detected for the monitored time, during the course of which merely the lower limit value is exceeded.
  • the lower limit is exceeded, first only the performance of the monitored processor is reduced, for example by reducing the clock frequency.
  • the performance of a respective refrigerating set may be increased. If these measures are successful, the system may be continued to be operated in this mode until the service staff arrives, who has been informed by a message transmitted simultaneously by means of the respective control command. A shutdown of the component or of the complete system is not necessary in this case.
  • the afore described measures do not lead to success and in the course of time also the other two limit values are exceeded.
  • the upper limit value is exceeded, at the latest a shutdown of the monitored processor has become necessary. If, due thereto, the temperature falls below the predetermined limit values again, the complete system may be continued to be operated with shutdown processor until the arrival of the service staff. If, however, the shutdown of the processor does not lead to a temperature decrease either—for example within a predetermined time limit—it is safer to run down the complete system by means of the shutdown procedure, in order to store the already existing data.
  • a time variations of a monitored parameter may, for example, also be effected by a separate sensor, exclusively detecting the variations of the monitored values. There is another possibility in detecting the time points at which certain limit values are exceeded or fallen below of and, on basis thereof, drawing a conclusion concerning the time behaviour.
  • the monitoring of temperature is not only possible at the individual components but for example also at an air intake channel of the system, outside the system, in a room and in adjacent rooms.
  • a change of temperature at the air intake channel may, for example, result in a change of the behaviour of the ventilator, as may be seen from the table.
  • air humidity Another parameter which is essential for the operational behaviour is the air humidity, which again may be detected at the element itself but also at the air intake channel or outside in the room.
  • an increased air humidity at the air intake channel may lead to the fact that first the system performance is reduced or the ventilator is switched off. Only as the upper limit value is exceeded, the system has to be shut down in a controlled manner for safety reasons.
  • Percussions occurring inside or outside the system may also be monitored and therefore rotating elements like disk drives could be shut down, if justifiable. If, however, the percussions become too severe, a controlled shutdown of the system is necessary. Further parameters to be monitored may be the air flow the contents of dust, smoke or aerosols as well as chemical pollutions of the air. Again, a simple measure may be to initially shut down the ventilator. If this does not lead to a success and if an upper limit value is exceeded, the consequence is a system shutdown.

Abstract

In order to monitor and control the operational performance of a computer system or processor system (1), operational parameters of individual components as well as environmental parameters of the computer system or processor system (1) are detected. Said parameters are compared with predetermined limit values. If it is determined that one or more of the detected operational parameters and environmental parameters have exceeded or fallen below of the predetermined limit values, an operational event is determined based on the limit values that have been exceeded or fallen bellow of. A reaction is selected from a number of predetermined reaction patters according to the determined operational event, and a control command which corresponds to this reaction and which is provided for altering the operational performance is transmitted to the computer to be monitored. This enables an early detection of the occurrence of faults as well as the initiation of an appropriate measure.

Description

  • This is a continuation of U.S. application Ser. No. 10/070,528 filed Dec. 2, 2002 as a United States National Stage of Patent Cooperation Treaty Application No. PCT/EP00/08704 filed Sep. 7, 2007, which claims priority to German Patent Application No. 20 2006 013 779.3 filed Sep. 6, 2000 claiming priority to German Application No. 199 42 430.6 filed Sep. 6, 1999. The above-referenced applications are hereby incorporated herein by reference in their entireties.
  • The present invention relates to a method and device for monitoring and controlling the operational performance of a computer or processor system and a device for accomplishing this method.
  • Serviceability and operational reliability of components, assembly groups, devices and hence a computer or processor system as a whole is only protected within certain tolerance zones of physical values in their environment. These physical values are particularly temperature, but also air humidity, air flow, freedom of dust and percussions. Depending upon the field of application of the system to be monitored, brightness oscillations, chemical pollutions or other variables may also be of importance. If one or more of these values lie beyond the predetermined tolerance zones, this may lead to interferences of the performance of the respective component, but also to a complete failure thereof. At worst, the failure of one individual component may lead to a collapse of the complete system.
  • Particularly in case of larger computer or processor systems, as for example mainframe computers or multiprocessor systems a continuous and faultless operation is of great importance and in particular as calculations on these devices often run over a very long period of time so that a failure of the system at a certain time probably ruins the work of several days. For this reason, temperature monitoring systems are known measuring the temperature at individual components of the system and when detecting an inadmissibly increased temperature switch off the respective component, for example, or—in case of a processor—effect a decrease of performance by mans of reducing the clock frequency. In particularly critical cases a controlled shutdown of the complete system is effected.
  • It is the main object of the hitherto known monitoring systems to avoid a sudden collapse of the complete system due to a previous shutdown of individual components or the controlled shutdown of the system. This may avoid the loss of data, but often leads to a drastic reduction of the performance of the complete system, which often would not be necessary to this extent.
  • Hence it is the object of the present invention to provide a possibility of monitoring and controlling the operational performance of a computer or processor system, wherein the influence of a fault on the serviceability of the monitored system is reduced and the serviceability thereof is maintained or prolonged in case of controllable incidents. Active calculation processes as well as their data bases and results are to be protected to the greatest possible extent.
  • This object is solved by the method of claim 1 and the device of claim 4. According to the inventive method the operational parameters of individual components of the computer or processor system to be monitored as well as environmental parameters thereof are detected in a first step. In a second step the detected parameters and environmental parameters are compared with predetermined limit values. Thereby it is detected, if one or several of said detected operational parameters and environmental parameters have exceeded or fallen below of said predetermined limit values. Based upon these limit values that have been exceeded or fallen below of, a so-called operational event is determined in a next step, informing how and to which extent the system is affected by these faults. Then a reaction corresponding to the afore determined operational event is selected from a number of predetermined reaction patters and finally a control command for altering the operational performance corresponding to said reaction is transmitted to the computer or processor system to be monitored.
  • Hence, according to the invention a reaction is initiated in dependence upon the kind and intensity of a fault occurring in the system to be monitored, said reaction avoiding damages of components, assembly groups, devices and consequently of the computer or processor system as a whole, which would have occurred in cased of an unrestricted continuation of the operation. If the parameters lie beyond tolerable limit values a controlled shutdown of the complete system may be initiated. Moreover, there is the possibility of re-activating or running up individual components or even the complete system, if the fault has been removed or at least reduced.
  • Contrary to the hitherto known solutions for monitoring computer or processor systems the inventive method guarantees the continuation of the serviceability of the system with highest possible efficiency and simultaneous protection of the active computing processes. This is due to the fact that the individual components are monitored independently of each other by measuring sensors and that when predetermined limit values are reached a complete shutdown of the complete system and hence an interruption of the running programs does not have to be effected necessarily. Quite to the contrary, if justifiable, the individual components, assembly groups or devices are switched off individually or reduced in their performance, whereby the system as a whole, however, remains operable. Thereby, the predetermined reaction patters allow a fault-adequate reaction as well as specific monitoring and selecting of the individual components.
  • It is also an advantage of the present invention that in contrast t the hitherto known monitoring systems this system enables a complete monitoring of potential interferences within and outside the computer or processor system and not only a monitoring of the temperature. Thus, the interferences of too high air humidity, too low air flow, of dust or percussions may also be detected and taken into account. Further, the inventive method may be applied independent of buses and hence of producers in all kinds of systems, guaranteeing the highest possible amount of flexibility. This refers to already existing systems or computer or processor systems to be still produced.
  • According to an embodiment of the present invention the detected operational parameters or environmental parameters are not absolutely measured values but also temporal changes of these measured values. This offers the possibility to meet appropriate countermeasures. Thus, a very rapid temperature rise of a monitored component leads to another reaction than a merely moderate rise. It may furthermore be provided that besides the transmission of the control command corresponding to a selected reaction also a corresponding information signal is to be issued in an optical or acoustic form, in order to inform a service staff as soon as possible of place and reason of the fault. This information signal may also be the transmission of a SMS-message.
  • The device according to the invention for monitoring and controlling the operational performance on the one hand comprises first sensors for detecting operational parameters and on the other hand second sensors for detecting environmental parameters of the system. A monitoring unit for comparing the detected operational and environmental parameters with limit values stored in a first storage as well as for detecting if one or several of the limit values have been exceeded or fallen below of, is further provided. Due to appropriate means an operational event message is generated on basis of the exceeding or falling below of said limit values and are transmitted to a control unit, selecting from another storage containing a number of predetermined reaction patters a control command corresponding to said operational event message and transmitting same to said computer or processor system.
  • In a further embodiment the inventive device may comprise an acoustic or optical output means for outputting a message corresponding to the operational event message and/or the transmitted control command. Further, a transmitting device for communicating this message, for example in form of a SMS-message, may be provided. The independent control of the system is guaranteed in that the monitoring device is part of a computer which is separate from the system to be monitored.
  • In the following the invention is explained in greater detail in the drawings:
  • FIG. 1 shows an inventive device for monitoring a computer system in a schematic view; and
  • FIGS. 2 to 4 show different examples for explaining the reaction to the temperature rise of a component to be monitored.
  • FIG. 1 shows the monitoring of a mainframe computer 1 by an inventive monitoring device 2. Thereby, several first sensors 3 are arranged in said mainframe computer 1, detecting operational parameters of individual components or assembly groups of said mainframe computer and transmitting said data via respective lines 4 to said monitoring device 2. Said first sensors 3 are for example temperature sensors, but also sensors for detecting voltage fluctuations, percussions or other values which are relevant for the operation. Besides said first sensors second sensors are provided for detecting parameters in the environment of said mainframe computer 1, as for example sensors for detecting chemical pollutions of the air, dust or smoke, air humidity or in certain cases also of ionising radiation. These sensors may particularly be temperature sensors. The measured values detected by said second sensors are also transmitted via respective lines 6 to said monitoring device 2.
  • The operational and environmental parameters detected by said first and second sensors 3 and 5 first of all are being processed in a monitoring unit 7 of said monitoring device 2, whereby the detected values are compared to limit values, which are listed in a first memory 8. Thereby, it is not necessary to provide only one single limit value for each monitored value. Moreover, preferably several limit values, a lower, a mean as well as an upper limit value are provided so that it is possible to react specifically to the occurrence of a fault. When exceeding the lower limit value, for example, only a slight change of the operational performance of the computer system is necessary, whereas when the upper limit value is exceeded, this leads to a shutdown of the respective component or possibly even of the complete system.
  • If one or more of the limit values stored in said first memory 8 are exceeded or fallen below of, this is detected by said monitoring unit 7 and a corresponding operational event message is generated on basis of exceeding or falling below of the limit values, which then is communicated to said control unit 9. This operational event message informs about kind and extent of the fault. In the following the control unit 9 selects one control command corresponding to the operational event message from a number of predetermined reaction patterns contained in a second memory 10, and transmits said control command to the mainframe computer 1. This control command contains instructions for altering the operational performance and for example may be the instruction to shut down individual components or put them into a sleep modus or to reduce the capacity of the system. Furthermore, also the command to shut down the complete system may be transmitted. Thereby, the reaction patterns are chosen such that the mainframe computer 1 and the programs running thereon may still continue under the new operational conditions predetermined by said reaction patterns, if this is justifiable.
  • Once the influence of the fault has been successfully removed or at least reduced, a control command transferred from said monitoring device 2 to said mainframe computer 11 may contain, however, to run up the system again and to re-activate components which have been shut down before. If the monitoring unit has generated an operational event message or the control unit has transmitted a control command, simultaneously a respective information signal may be transmitted to a transmission device 15 via a second output line 14. Then, for example, respective SMS-messages may be transmitted to the service staff by means of said transmission device 15. As an alternative there is also the possibility of applying an optical or acoustic output means instead of a transmission device.
  • Preferably, the complete monitoring device 2 is part of a computer which is separate from the monitored mainframe computer 1. The flexibility of the inventive device is guaranteed in that new limit values and new reaction patters may be inscribed into the two memories 8 and 10 via input lines 12 and 13. <this provides the possibility of a reaction to changes in the configuration of the system to be monitored at any time. This further provides the possibility of an isolated view not only of the performance of individual operational or environmental parameters, but to evaluate them in combination and to react accordingly. A slight temperature increase of a monitored component, for example, does not necessarily have to lead to a shutdown of this component, if an adjacent component shows a clearly increased temperature, as the reason for the temperature increase of said first component very likely is to be found in the severe overheating of the adjacent component. In such a case, it is first sufficient to only shut down the severely overheated component.
  • Based on the example of the monitoring of the temperature the functioning of the inventive method is to be described in an exemplary manner in the following. Particularly the temperature monitoring of the individual components is of increasing importance as due to the increase of performance and increase of packing density of the components, demanded by the market and related to the general development, lead to problems in controlling the temperature. FIGS. 2 to 4 show the temperature course of a component be monitored, for example a processor. In the present example three different limit values, a lower, a mean and an upper limit value are defined, causing different reactions when being exceeded or fallen below of. Furthermore, the example shown in FIGS. 2 to 4 not only refers to the absolute temperature value but also to the course of time.
  • In FIG. 2, for example, a moderate temperature increase is detected for the monitored time, during the course of which merely the lower limit value is exceeded. Thus, if the lower limit is exceeded, first only the performance of the monitored processor is reduced, for example by reducing the clock frequency. As an alternative, however, also the performance of a respective refrigerating set may be increased. If these measures are successful, the system may be continued to be operated in this mode until the service staff arrives, who has been informed by a message transmitted simultaneously by means of the respective control command. A shutdown of the component or of the complete system is not necessary in this case.
  • In case of a faster temperature rise, as for example shown in FIG. 3, the afore described measures do not lead to success and in the course of time also the other two limit values are exceeded. When the upper limit value is exceeded, at the latest a shutdown of the monitored processor has become necessary. If, due thereto, the temperature falls below the predetermined limit values again, the complete system may be continued to be operated with shutdown processor until the arrival of the service staff. If, however, the shutdown of the processor does not lead to a temperature decrease either—for example within a predetermined time limit—it is safer to run down the complete system by means of the shutdown procedure, in order to store the already existing data.
  • An abrupt temperature rise, as shown in FIG. 4, however, is indicative of an extraordinary fault demanding the immediate shutdown of the complete system in any case. Due to the severe temperature rise the exceeding of further limit values it is not to be waited for, but the shutdown is to be initiated immediately.
  • The consideration of a time variations of a monitored parameter may, for example, also be effected by a separate sensor, exclusively detecting the variations of the monitored values. There is another possibility in detecting the time points at which certain limit values are exceeded or fallen below of and, on basis thereof, drawing a conclusion concerning the time behaviour.
  • According to the invention also a number of other values of measurement besides the temperature may be monitored. Thereby the respective reaction pattern not only depends upon the measured value itself, but also on the respective place of measurement. A number of possible reaction patterns is enlisted in the following table. Therein GW describes a parameter to be monitored, the exceeding of which leads to a shutdown of the respective component or that it is put into a sleep modus. The definition of one single limit value is sensible in cases where the respective component either should be fully operating or not operating a all. In other cases preferably several limit values are defined, i.e. a lower, a mean and an upper limit value, in order to be able to react in a graded manner.
  • TABLE
    REACTION PATTERNS
    Measured values Place of measurement Reaction pattern (exemplary)
    1. temperature at the individual GW: shutdown of the individual
    component or at a device component, the device
    at the air inlet (sleepmodus)
    outside computer IGW: reduce system
    housing in the room performancemGW: switch off
    external, e.g. adjacent ventilatoruGW: controlled system
    rooms fire-alarm etc. shutdown
    same as b)
    fixed to local facts
    2. air humidity at the individual GW: shutdown of the individual
    component or at a device component, the device
    at the air inlet (sleepmodus)
    outside computer IGW: reduce system
    housing in the room performancemGW: switch off
    ventilatoruGW: controlled system
    shutdown
    same as b)
    3. percussion at the individual GW: shutdown of the individual
    (acceleration of component or at a device component, the device
    frequency) at the computer housing (sleepmodus)
    IGW: rotating devices (e.g. hard
    disks) shutdownuGW: controlled
    system shutdown
    4. air flow at the individual GW: shutdown of the individual
    component or at a device component, the device
    at the air outlet (sleepmodus)
    IGW: reduce system
    performanceuGW: controlled
    system shutdown
    5. dust, smoke, aerosol at the air inlet IGW: reduce system
    (e.g. optoelectronical outside computer performancemGW: switch off
    measurement) housing in the room ventilatoruGW: controlled system
    shutdown
    same as a)
    6. chemical pollution of at the individual GW: shutdown of the individual
    the air (e.g. electrical component or at a device component, the device
    conductibility of the air, at the air inlet IGW: reduce system
    ph-value) outside computer performancemGW: switch off
    housing in the room ventilator
    uGW: controlled system shutdown
    same as b)
    7. electro-magnetic-field at the individual GW: shutdown of the individual
    component or at a device component, the device
    outside computer IGW: reduce system
    housing in the room performanceuGW: controlled
    system shutdown
    8. voltage oscillation at the individual GW: shutdown of the individual
    component or at a device component, the device
    main voltage (in case of no UPS:) IGW: reduce
    system performanceuGW:
    controlled system shutdown
    9. brightness oscillation at the individual (relevant for optoelectronic
    (optoelectronic) component or at a device components:)GW: shutdown of the
    individual component, the device
    10. ionised radiation (X- at the individual GW: shutdown of the individual
    ray radiation, radio- component or at a device component, the device
    active radiation) outside computer IGW: reduce system
    housing in the room performanceuGW: controlled
    system shutdown
    11. further ./. ./:
    measurements to be
    defined
    GW = limit value IGW = lower limit value mGW = mean limit value uGW = upper limit value
  • Thereby, the monitoring of temperature is not only possible at the individual components but for example also at an air intake channel of the system, outside the system, in a room and in adjacent rooms. A change of temperature at the air intake channel may, for example, result in a change of the behaviour of the ventilator, as may be seen from the table.
  • Another parameter which is essential for the operational behaviour is the air humidity, which again may be detected at the element itself but also at the air intake channel or outside in the room. Here, an increased air humidity at the air intake channel may lead to the fact that first the system performance is reduced or the ventilator is switched off. Only as the upper limit value is exceeded, the system has to be shut down in a controlled manner for safety reasons.
  • Percussions occurring inside or outside the system may also be monitored and therefore rotating elements like disk drives could be shut down, if justifiable. If, however, the percussions become too severe, a controlled shutdown of the system is necessary. Further parameters to be monitored may be the air flow the contents of dust, smoke or aerosols as well as chemical pollutions of the air. Again, a simple measure may be to initially shut down the ventilator. If this does not lead to a success and if an upper limit value is exceeded, the consequence is a system shutdown.
  • Furthermore, the electromagnetic field intensity or voltage oscillations may be monitored. If optoelectronic components are used, brightness oscillations may further be taken into account. Finally, if necessary, the influence of ionising radiation may be taken into account in order to avoid any incidents.
  • It is the object of the inventive method to offer a maximum amount of flexibility and at the same time to enable an appropriate reaction to incidents of any kind. This offers the possibility to keep the system to be monitored operating while maintaining the largest possible performance.

Claims (21)

1. Method for an automated monitoring and controlling the operational performance of a computer or processor system (1) comprising the following steps:
(a) detecting directly at least one parameter for a first individual component of the computer or processor system (1) and at least one parameter for a second individual component of the computer or processor system, wherein at least one of said individual components is ancillary to a processor of the computer or processor system, and wherein said parameters are parameters that relate to failures of said individual components;
(b) comparing the detected parameters with predetermined limit values;
(c) determining, if predetermined limit values are exceeded or fallen below of by one or several of said detected parameters;
(d) determining an operational event on basis of said limit values that have been exceeded or fallen below of or on basis of a combined evaluation of said limit values;
(e) selecting a reaction corresponding to said determined operational event from a number of predetermined reaction patterns, wherein said number of predetermined reaction patterns includes reactions that control individually each of a plurality of discrete components being monitored to maintain or prolong the serviceability of the monitored system and protect to the greatest possible extent active calculation processes as well as their data bases and results and to avoid damage to the discrete component being controlled by a reaction; and
(f) transmitting a control command to alter the operational performance corresponding to said selected reaction to said computer or processor system (1).
2. Method of claim 1, characterized in that the detected parameters are absolute measured values as well as the temporal change of said measured value.
3. Method of one of the preceding claims, characterized in that besides the transmission of the control command corresponding to the selected reaction also a corresponding information signal is transmitted.
4. A device for an automated monitoring and controlling the operational performance of a computer or processor system (I), comprising:
first sensors (3) for detecting directly at least one parameter for a first individual component of the computer or processor system (1) and at least one parameter for a second individual component of the computer or processor system, wherein at least one of said individual components is ancillary to a processor of the computer or processor system, and wherein said parameters are parameters that relate to failures of said individual components,
a monitoring unit (7) for comparing said detected parameters with limit values stored in a first storage (8) as well as for detecting, if one or several limit values are being exceeded or fallen below of,
means for generating a determined operational event message on basis of said limit values that have been exceeded or fallen below of or on basis of a combined evaluation of said limit values, and
a control unit (9) for receiving said operational event message as well as for selecting and transmitting a control command corresponding to said operational event message to said computer and processor system (1) from a storage (10) containing a number of predetermined reaction patterns, wherein said number of predetermined reaction patterns includes control commands that control a plurality of the individual components being monitored to maintain or prolong the serviceability of the monitored system and protect to the greatest possible extent active calculation processes as well as their data bases and results and to avoid damage to the discrete component being controlled by a control command.
5. Device of claim 4, characterized in that said detected parameters are absolute measured values as well as the temporal changes of said measured value.
6. Device of claim 4 or 5, characterized in that said device further comprises an optical or acoustic output means for outputting a message corresponding to said operational event message and/or said transmitted control command.
7. Device of claim 4, characterized in that said device comprises a transmission means (15) for transmitting a message corresponding to said operational event message and/or to said transmitted control command.
8. Device of claim 4, characterized in that said device is part of a computer which is separate from the computer or processor system (1) to be monitored.
9. Device of claim 5, characterized in that said device comprises a transmission means (15) for transmitting a message corresponding to said operational event message and/or to said transmitted control command.
10. Device of claim 6, characterized in that said device comprises a transmission means (15) for transmitting a message corresponding to said operational event message and/or to said transmitted control command.
11. Device of claim 5, characterized in that said device is part of a computer which is separate from the computer or processor system (1) to be monitored.
12. Device of claim 6, characterized in that said device is part of a computer which is separate from the computer or processor system (1) to be monitored.
13. Device of claim 7, characterized in that said device is part of a computer which is separate from the computer or processor system (1) to be monitored.
14. Device of claim 9, characterized in that said device is part of a computer which is separate from the computer or processor system (1) to be monitored.
15. Device of claim 10, characterized in that said device is part of a computer which is separate from the computer or processor system (1) to be monitored.
16. Method for an automated monitoring and controlling the operational performance of a computer or processor system (1) comprising the following steps:
(a) detecting directly at least one parameter for a first individual component of the computer or processor system (1) and at least one parameter for a second individual component of the computer or processor system, wherein at least one of said individual components is ancillary to a processor of the computer or processor system, and wherein said parameters are quantitatively measurable parameters, and wherein said parameters are parameters that relate to failures of said individual components;
(b) comparing the detected parameters with predetermined limit values;
(c) determining, if predetermined limit values are exceeded or fallen below of by one or several of said detected parameters;
(d) determining an operational event on basis of a combined evaluation of said limit values that have been exceeded or fallen below of;
(e) selecting a reaction corresponding to said determined operational event from a number of predetermined reaction patterns wherein said number of predetermined reaction patterns includes reactions that control individually each of a plurality of discrete components being monitored to maintain or prolong the serviceability of the monitored system and protect to the greatest possible extent active calculation processes as well as their data bases and results and to avoid damage to the discrete component being controlled by a reaction; and
(f) transmitting a control command to alter the operational performance corresponding to said selected reaction to said computer or processor system (1).
17. The device as claimed in claim 4, wherein the device operates separately from the computer or processor system monitored by the device, such that the computer or processor system can be re-activated by the device after the computer or processor system has been shut down.
18. Method of claim 1 wherein at least one of said parameters comprises an operational parameter.
19. Method of claim 18 further comprising the step of detecting at least one environmental parameter of an environmental component.
20. Method of claim 1 wherein at least one of said parameters comprises an environmental parameter.
21. Method for an automated monitoring and controlling the operational performance of a computer or processor system (1) comprising the following steps:
(a) detecting directly at least two parameters for an individual component of the computer or processor system (1), wherein said parameters are parameters that relate to failures of said individual component;
(b) comparing the detected parameters with predetermined limit values;
(c) determining, if predetermined limit values are exceeded or fallen below of by one or several of said detected parameters;
(d) determining an operational event on basis of said limit values that have been exceeded or fallen below of;
(e) selecting a reaction corresponding to said determined operational event from a number of predetermined reaction patterns, wherein said number of predetermined reaction patterns includes reactions that control individually each of a plurality of discrete components being monitored to maintain or prolong the serviceability of the monitored system and protect to the greatest possible extent active calculation processes as well as their data bases and results and to avoid damage to the discrete component being controlled by a reaction; and
(f) transmitting a control command to alter the operational performance corresponding to said selected reaction to said computer or processor system (1).
US12/763,943 1999-09-06 2010-04-20 Method and device for monitoring and controlling the operational performance of a computer processor system Abandoned US20100268997A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US12/763,943 US20100268997A1 (en) 1999-09-06 2010-04-20 Method and device for monitoring and controlling the operational performance of a computer processor system

Applications Claiming Priority (7)

Application Number Priority Date Filing Date Title
DE19942430A DE19942430A1 (en) 1999-09-06 1999-09-06 Operating environment control device for computers with component-specific monitoring and control
DE19942430.6 1999-09-06
PCT/EP2000/008704 WO2001018632A2 (en) 1999-09-06 2000-09-06 Method and device for monitoring and controlling the operational performance of a computer system or processor system
DE202006013779.3 2000-09-06
US7052802A 2002-12-02 2002-12-02
EPPCT/EP00/08704 2007-09-07
US12/763,943 US20100268997A1 (en) 1999-09-06 2010-04-20 Method and device for monitoring and controlling the operational performance of a computer processor system

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
US7052802A Continuation 1999-09-06 2002-12-02

Publications (1)

Publication Number Publication Date
US20100268997A1 true US20100268997A1 (en) 2010-10-21

Family

ID=7920941

Family Applications (2)

Application Number Title Priority Date Filing Date
US10/070,528 Expired - Fee Related US7702965B1 (en) 1999-09-06 2000-09-06 Method and device for monitoring and controlling the operational performance of a computer system or processor system
US12/763,943 Abandoned US20100268997A1 (en) 1999-09-06 2010-04-20 Method and device for monitoring and controlling the operational performance of a computer processor system

Family Applications Before (1)

Application Number Title Priority Date Filing Date
US10/070,528 Expired - Fee Related US7702965B1 (en) 1999-09-06 2000-09-06 Method and device for monitoring and controlling the operational performance of a computer system or processor system

Country Status (7)

Country Link
US (2) US7702965B1 (en)
EP (1) EP1410205B1 (en)
AT (1) ATE463009T1 (en)
AU (1) AU1270401A (en)
DE (2) DE19942430A1 (en)
HK (1) HK1067421A1 (en)
WO (1) WO2001018632A2 (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090210097A1 (en) * 2008-02-15 2009-08-20 Sawczak Stephen D Systems and methods for computer equipment management
US20140095912A1 (en) * 2012-09-29 2014-04-03 Linda Hurd Micro-Architectural Energy Monitor Event-Assisted Temperature Sensing
CN107340833A (en) * 2017-05-27 2017-11-10 努比亚技术有限公司 Terminal temperature control method, terminal and computer-readable recording medium
US10565079B2 (en) 2017-09-28 2020-02-18 Intel Corporation Determination of idle power state
US10621849B2 (en) * 2015-09-25 2020-04-14 Intel Corporation Alert system for internet of things (IoT) devices

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
DE19942430A1 (en) * 1999-09-06 2001-03-08 Sperling Axel Operating environment control device for computers with component-specific monitoring and control
DE10157802B4 (en) * 2001-11-27 2016-02-04 E.E.P.D. Electronic Equipment Produktion & Distribution Gmbh A vehicle and method for controlling the operation of electrical or electronic components connected to a communication bus of a vehicle
US20040158627A1 (en) * 2003-02-11 2004-08-12 Thornton Barry W. Computer condition detection system
US8918624B2 (en) * 2008-05-15 2014-12-23 International Business Machines Corporation Scaling and managing work requests on a massively parallel machine
US8225324B2 (en) * 2008-05-15 2012-07-17 International Business Machines Corporation Resource management on a computer system utilizing hardware and environmental factors
US8812469B2 (en) * 2008-05-15 2014-08-19 International Business Machines Corporation Configurable persistent storage on a computer system using a database
US8799694B2 (en) 2011-12-15 2014-08-05 International Business Machines Corporation Adaptive recovery for parallel reactive power throttling
US11573881B1 (en) 2020-06-26 2023-02-07 Amazon Technologies, Inc. Role-based failure response training for distributed systems

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5500940A (en) * 1994-04-25 1996-03-19 Hewlett-Packard Company Method for evaluating failure in an electronic data storage system and preemptive notification thereof, and system with component failure evaluation
US5781434A (en) * 1994-10-19 1998-07-14 Hitachi, Ltd. Control system for communication apparatus
US5878377A (en) * 1997-04-10 1999-03-02 International Business Machines Corporation Environmental and power error handling extension and analysis
US6029119A (en) * 1996-01-16 2000-02-22 Compaq Computer Corporation Thermal management of computers
US6349385B1 (en) * 1998-11-20 2002-02-19 Compaq Computer Corporation Dual power supply fan control—thermistor input or software command from the processor
US6633782B1 (en) * 1999-02-22 2003-10-14 Fisher-Rosemount Systems, Inc. Diagnostic expert in a process control system
US6879931B2 (en) * 2002-10-03 2005-04-12 Hewlett-Packard Development Company, L.P. System and method for protection of equipment during motion anomalies
US6934658B2 (en) * 2003-03-27 2005-08-23 International Business Machines Corporation Computer chip heat responsive method and apparatus
US6937958B2 (en) * 2002-02-19 2005-08-30 Sun Microsystems, Inc. Controller for monitoring temperature
US7123995B1 (en) * 2004-05-03 2006-10-17 Sun Microsystems, Inc. Dynamic circuit operation adjustment based on distributed on-chip temperature sensors
US7702965B1 (en) * 1999-09-06 2010-04-20 Peter Planki Method and device for monitoring and controlling the operational performance of a computer system or processor system

Family Cites Families (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4823290A (en) 1987-07-21 1989-04-18 Honeywell Bull Inc. Method and apparatus for monitoring the operating environment of a computer system
US5740357A (en) 1990-04-26 1998-04-14 Digital Equipment Corporation Generic fault management of a computer system
WO1992010032A1 (en) * 1990-11-26 1992-06-11 Adaptive Solutions, Inc. Temperature-sensing control system and method for integrated circuits
US5230055A (en) 1991-01-25 1993-07-20 International Business Machines Corporation Battery operated computer operation suspension in response to environmental sensor inputs
DE4309187C1 (en) 1993-03-22 1994-06-16 Siemens Nixdorf Inf Syst Controlling ventilation of data processing system - operating number and type of processing modules to calculate appropriate ventilation fan feed rate
US5566092A (en) 1993-12-30 1996-10-15 Caterpillar Inc. Machine fault diagnostics system and method
US5752011A (en) 1994-06-20 1998-05-12 Thomas; C. Douglas Method and system for controlling a processor's clock frequency in accordance with the processor's temperature
US6000036A (en) 1996-07-17 1999-12-07 International Business Machines Corp. Logical steering to avoid hot spots on integrated circuits
US5870267A (en) 1996-07-25 1999-02-09 Konami Co., Ltd. Semiconductor integrated circuit device with overheating protector and method of protecting semiconductor integrated circuit against overheating
KR100268493B1 (en) 1996-09-23 2000-11-01 윤종용 Airflow apparatus using bi-direction fan in raid subsystem
US5944839A (en) 1997-03-19 1999-08-31 Symantec Corporation System and method for automatically maintaining a computer system
US6088816A (en) * 1997-10-01 2000-07-11 Micron Electronics, Inc. Method of displaying system status

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5500940A (en) * 1994-04-25 1996-03-19 Hewlett-Packard Company Method for evaluating failure in an electronic data storage system and preemptive notification thereof, and system with component failure evaluation
US5781434A (en) * 1994-10-19 1998-07-14 Hitachi, Ltd. Control system for communication apparatus
US6029119A (en) * 1996-01-16 2000-02-22 Compaq Computer Corporation Thermal management of computers
US5878377A (en) * 1997-04-10 1999-03-02 International Business Machines Corporation Environmental and power error handling extension and analysis
US6349385B1 (en) * 1998-11-20 2002-02-19 Compaq Computer Corporation Dual power supply fan control—thermistor input or software command from the processor
US6654894B2 (en) * 1998-11-20 2003-11-25 Hewlett-Packard Development Company, L.P. Dual control of fan speed-input corresponding to power supply temperature or software command from the processor corresponding to processor temperature
US6633782B1 (en) * 1999-02-22 2003-10-14 Fisher-Rosemount Systems, Inc. Diagnostic expert in a process control system
US7702965B1 (en) * 1999-09-06 2010-04-20 Peter Planki Method and device for monitoring and controlling the operational performance of a computer system or processor system
US6937958B2 (en) * 2002-02-19 2005-08-30 Sun Microsystems, Inc. Controller for monitoring temperature
US6879931B2 (en) * 2002-10-03 2005-04-12 Hewlett-Packard Development Company, L.P. System and method for protection of equipment during motion anomalies
US6934658B2 (en) * 2003-03-27 2005-08-23 International Business Machines Corporation Computer chip heat responsive method and apparatus
US7123995B1 (en) * 2004-05-03 2006-10-17 Sun Microsystems, Inc. Dynamic circuit operation adjustment based on distributed on-chip temperature sensors

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090210097A1 (en) * 2008-02-15 2009-08-20 Sawczak Stephen D Systems and methods for computer equipment management
US20090210755A1 (en) * 2008-02-15 2009-08-20 Sawczak Stephen D Systems and methods for computer equipment management
US20090210099A1 (en) * 2008-02-15 2009-08-20 Sawczak Stephen D Systems and methods for computer equipment management
US8175753B2 (en) 2008-02-15 2012-05-08 The Pnc Financial Services Group, Inc. Systems and methods for computer equipment management
US8201028B2 (en) * 2008-02-15 2012-06-12 The Pnc Financial Services Group, Inc. Systems and methods for computer equipment management
US8437881B2 (en) 2008-02-15 2013-05-07 The Pnc Financial Services Group, Inc. Systems and methods for computer equipment management
US20140095912A1 (en) * 2012-09-29 2014-04-03 Linda Hurd Micro-Architectural Energy Monitor Event-Assisted Temperature Sensing
US9804656B2 (en) * 2012-09-29 2017-10-31 Intel Corporation Micro-architectural energy monitor event-assisted temperature sensing
US10621849B2 (en) * 2015-09-25 2020-04-14 Intel Corporation Alert system for internet of things (IoT) devices
US11373505B2 (en) * 2015-09-25 2022-06-28 Intel Corporation Alert system for internet of things (IOT) devices
CN107340833A (en) * 2017-05-27 2017-11-10 努比亚技术有限公司 Terminal temperature control method, terminal and computer-readable recording medium
US10565079B2 (en) 2017-09-28 2020-02-18 Intel Corporation Determination of idle power state

Also Published As

Publication number Publication date
DE50015896D1 (en) 2010-05-12
EP1410205A2 (en) 2004-04-21
DE19942430A1 (en) 2001-03-08
WO2001018632A3 (en) 2001-06-14
US7702965B1 (en) 2010-04-20
HK1067421A1 (en) 2005-04-08
AU1270401A (en) 2001-04-10
WO2001018632A2 (en) 2001-03-15
EP1410205B1 (en) 2010-03-31
ATE463009T1 (en) 2010-04-15

Similar Documents

Publication Publication Date Title
US20100268997A1 (en) Method and device for monitoring and controlling the operational performance of a computer processor system
US7010450B2 (en) Coordination of field device operations with overrides and bypasses within a process control and safety system
US6934658B2 (en) Computer chip heat responsive method and apparatus
JP2005283597A (en) Operation method and device for nuclear reactor
JPH02254330A (en) Protective mechanism of electronic component
JP4511861B2 (en) Coordinate operation of field devices in process control and safety systems using override and bypass
US6704659B1 (en) Seismic emergency response system for use in a wafer fabrication plant
JPH0973321A (en) Electronic device with built-in cooling device and its cooling monitor and control method
US20100057406A1 (en) Electrical Equipment Device
JP2002108483A (en) Method for controlling information processor and information processor
JP6599794B2 (en) Operation control device for vibration and impact, operation control method for vibration and impact, and operation control program for vibration and impact
JPH0581080A (en) Runaway supervisory device for micro processor
JPH0232223A (en) Temperature monitoring circuit
RU2073906C1 (en) Device for on-line monitoring of heat modes of computer
JP3919857B2 (en) Gas shut-off device
JPS62143116A (en) Abnormality processing device for data processor
JPS6227672A (en) Apparatus for self-diagnosis of sensor
JPH04367012A (en) Controller for temperature abnormality of computer
JPH0287927A (en) Protective device for electronic parts
KR20200123938A (en) Substrate treating method and substrate treating apparatus
JPH11161548A (en) Computer with runaway detection function
JPH01303597A (en) Abnormality detecting device
JPS5822469A (en) Central monitoring controller
JPH05100760A (en) Power supply control system
JP2003020985A (en) Engine control device

Legal Events

Date Code Title Description
AS Assignment

Owner name: KRUTH, DIETER, GERMANY

Free format text: ASSIGNMENT OF UNDIVIDED INTEREST;ASSIGNORS:PLANKI, PETER;LETTMAIR, KARL-HEINZ;REEL/FRAME:025957/0257

Effective date: 20110121

Owner name: SPERLING, AXEL, GERMANY

Free format text: ASSIGNMENT OF UNDIVIDED INTEREST;ASSIGNORS:PLANKI, PETER;LETTMAIR, KARL-HEINZ;REEL/FRAME:025957/0257

Effective date: 20110121

STCB Information on status: application discontinuation

Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION