US20070038896A1 - Call-stack pattern matching for problem resolution within software - Google Patents
Call-stack pattern matching for problem resolution within software Download PDFInfo
- Publication number
- US20070038896A1 US20070038896A1 US11/203,534 US20353405A US2007038896A1 US 20070038896 A1 US20070038896 A1 US 20070038896A1 US 20353405 A US20353405 A US 20353405A US 2007038896 A1 US2007038896 A1 US 2007038896A1
- Authority
- US
- United States
- Prior art keywords
- call
- fault condition
- stack information
- server
- organization
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Abandoned
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/36—Preventing errors by testing or debugging software
- G06F11/362—Software debugging
- G06F11/366—Software debugging using diagnostics
Definitions
- the present invention relates to fault detection and resolution in software-based systems.
- This technique does have disadvantages.
- One disadvantage is that whether information about the software fault is actually sent depends upon the user's decision to follow through with sending the information. If the user chooses not to send the information, the fault goes unreported. If the problem is widespread within a large organization, and users continually choose not to report the fault, the problem may go unnoticed for a significant period of time. As noted, this can lead to wasted time as well as possible data loss.
- Another disadvantage is that even if a user chooses to send the fault information, it is provided to the operating system manufacturer. Neither the organization that is experiencing the software fault nor the developer of the software application causing the fault may be privy to the fault information. Both parties are likely to remain unaware of the frequency of any recurring software problems. As such, the problem can go unnoticed by system administrators of the organization experiencing the fault resulting in loss of productivity.
- the present invention provides a solution for detecting and diagnosing software faults within an organization and/or across multiple organizations.
- One embodiment of the present invention can include a method of diagnosing a fault condition within software.
- the method can include, responsive to a fault condition within a computing system belonging to an organization, automatically sending call-stack information for the fault condition to a first server within the organization.
- the call-stack information for the fault condition can be compared with call-stack information from prior fault conditions that occurred within the organization to determine whether the call-stack information for the fault condition matches call-stack information from one of the prior fault conditions.
- the method further can include sending the call-stack information to a second server for comparison with call-stack information from prior fault conditions that occurred within at least one different organization if the call-stack information for the fault condition does not match.
- the system can include a computing system belonging to an organization.
- the computing system can execute software configured to detect a fault condition and, responsive to the fault condition, automatically transmit call-stack information corresponding to the fault condition to another computer system within the organization.
- the system also can include a server belonging to the organization.
- the server can be configured to receive the call-stack information and compare the call-stack information for the fault condition with call-stack information corresponding to prior fault conditions originating from computing systems belonging to the organization.
- the server further can be configured to determine whether the call-stack information for the fault condition matches call-stack information from one of the prior fault conditions. If not, the server automatically can transmit the call-stack information for the fault condition to a server that does not belong to the organization for further analysis.
- Another embodiment of the present invention can include a machine readable storage being programmed to cause a machine to perform the various steps described herein.
- FIG. 1 is a schematic diagram illustrating a system configured for fault detection and resolution for a software-based system in accordance with one embodiment of the present invention.
- FIGS. 2A and 2B taken together, are a flow chart illustrating a method of fault detection and resolution in accordance with another embodiment of the present invention.
- the present invention provides a method, system, and apparatus for fault detection and resolution for use with software-based systems.
- a two tiered approach is presented where fault conditions are first matched with prior software faults that have occurred within an organization.
- the fault condition information is sent to an evaluation system which is not part of the organization.
- the outside evaluation system is associated with the developer and/or entity charged with administering the software system that experienced the fault condition.
- the outside evaluation system can compare the fault condition information with faults that have occurred across one or more different organizations to determine whether a solution exists.
- FIG. 1 is a schematic diagram illustrating a system 100 configured for fault detection and resolution relating to a software-based system in accordance with one embodiment of the present invention.
- System 100 can include, a plurality of computer systems 105 , 110 , and 115 as well as a server 120 , communicating through a communication network 125 .
- system 100 can include another communication network 130 , an additional server 135 , and a plurality of analyst computer systems 145 , 150 , and 155 .
- Computer systems 105 - 115 can represent client and/or server computing machines which are part of, or belong to, an organization 101 such as a business, enterprise, or other entity. Each computer system 105 - 115 can execute software which has been configured to perform various fault detection and/or diagnostic functions which will be described herein in greater detail.
- system 100 can be implemented using the Lotus Notes®/Domino® software architecture available from International Business Machines, Inc. of Armonk, N.Y. (IBM).
- computer systems 105 - 115 can be configured as Lotus® Notes(& clients and/or as IBM Domino® servers.
- the present invention is not limited to such an implementation as the techniques disclosed herein can be applied to any of a variety of different software systems and/or architectures.
- the fault detection techniques described herein can be applied to clients, servers, or both clients and servers.
- clients can be included in system 100 .
- additional clients and/or servers can be included in system 100 .
- FIG. 1 hardware-based systems are depicted in FIG. 1 , it should be appreciated that the terms “client” and “server” also can refer to software programs executing within suitable information processing systems.
- Each of computer systems 105 - 115 can execute application software that is configured to detect any of a variety of different fault conditions. Examples of detectable fault conditions can include, but are not limited to, access violations, memory existing in an inconsistent state, or the like.
- the computer operating system can send a notification, such as an exception, to an application which indicates that a fault condition has occurred.
- the application itself can detect an internal fault condition.
- diagnostic information 140 can include call trace information for the software application that experienced, or was responsible for, the fault condition on the computer system.
- Other examples of diagnostic information 140 can include, but are not limited to information regarding open databases for crashing threads, operating system level information, application level information such as that which may be collected by an application such as IBM Lotus® Notes® or Domino®, etc.
- the call stack information can be used for purposes of matching a fault condition with prior fault conditions to be described herein in further detail.
- the computer systems 105 - 115 further can be programmed to translate the stack trace data to call-stack information. That is, the hexadecimal stack trace information can be translated into human-readable format which specifies a list of one or more functions of the application that were called, referred to as the call-stack information. In general, the call-stack information is an ordered list of functions that were executed by the software application that experienced the fault condition in the time leading up to the fault condition, or “crash”.
- Communication network 125 can be implemented as, or include, an intranet, a wide area network, a local area network, a virtual private network, a wireless network, the Internet, and/or the like, so long as communication network 125 represents communication pathways within, or belonging to, organization 101 of which computer systems 105 - 115 and server 120 belong or are connected.
- Server 120 can be configured as a central repository for diagnostic information within organization 101 . As such, server 120 can receive and store fault condition information 140 from any of computer systems 105 - 115 within organization 101 . Accordingly, server 120 allows administrators within organization 101 to view all fault condition information which has been collected across the enterprise. As shown in FIG. 1 , server 120 can receive diagnostic information 140 from computer system 105 and extract call-stack information from it. The call-stack information can be compared with call-stack information corresponding to prior fault conditions from any computer system within organization 101 to determine whether a match exists.
- Communication network 130 can be similar to that of communication network 125 with the exception that communication network 130 includes pathways to computing resources outside of, or which are not part of, organization 101 to which computers 105 - 115 and server 120 belong.
- server 135 can be associated with, or belong to, an entity which is responsible for maintaining and/or developing the software application that experienced the fault condition within organization 101 .
- the server 135 can perform functions similar to those of server 120 in that received call-stack information can be compared with call-stack information from other fault conditions.
- Server 135 can include call-stack information from a plurality of different organizations making it possible to determine whether the fault condition has occurred in other organizations, i.e. outside of organization 101 .
- Server 135 further can route diagnostic information 140 to any of a plurality of analyst computer systems 145 - 155 .
- Server 135 can maintain a record of the diagnostic and/or call-stack information and the particular analyst to which such information was sent.
- server 135 effectively maintains, a list of analysts and the particular fault conditions upon which each analyst is working. Accordingly, when call-stack information is received for a given fault condition, that call-stack and/or diagnostic information can be forwarded to an analyst that is already working on a similar, or same, problem, whether for the same organization or for a different organization.
- FIGS. 2A and 2B taken together, are a flow chart illustrating a method 200 of fault detection and resolution in accordance with another embodiment of the present invention. While not limited to such an implementation, in one embodiment, method 200 can be performed using the system described with reference to FIG. 1 .
- Method 200 can begin in a state where application software, configured to detect fault conditions and collect diagnostic information, is executing within either a client or a server of an organization.
- the application software can detect a fault condition.
- the fault condition can be detected internally within the application software, or the application software can receive a notification from the operating system of the computer system.
- diagnostic information can be collected and/or saved.
- the diagnostic information can include, but is not limited to, call trace information.
- the call trace information can be translated into call-stack information within the computer system prior to transmission.
- the diagnostic information which now includes the translated call-stack information, can be sent to a centralized server within the same organization as the computer system that experienced the fault condition.
- the centralized server having received the diagnostic information, can extract the call-stack information.
- the call-stack information corresponding to the fault condition can be compared with call-stack information for prior fault conditions that were experienced by computer systems belonging to the organization and which also were forwarded to the centralized server.
- a process can execute on the centralized server where diagnostic information is sent. The process can be notified when new diagnostic information is added to a particular server repository.
- a fault analysis server task can extract the call-stack information from the diagnostic information. The fault analysis server task can use pattern matching technology to determine if the new call-stack information matches any previously received call-stack information.
- step 230 a determination can be made as to whether a match for the call-stack information was found. If so, the method can proceed to step 235 . If not, the method can continue to step 245 in FIG. 2B .
- step 235 an occurrence count associated with the matched, prior fault condition can be incremented within the centralized server. Incrementing the occurrence count signifies that the prior fault condition has been experienced again within the organization. By maintaining an occurrence count for each different fault condition recorded, the organization can determine those fault conditions that are problematic. It should be appreciated that time information also can be recorded for each fault condition making it possible to determine the frequency of each fault condition as well.
- Step 240 is optional in nature and can be performed from time to time, or upon request of a system administrator.
- the fault occurrence counts within the centralized server can be analyzed to identify any fault conditions which meet one or more established criteria.
- a system administrator can establish a minimum threshold. If a count for a particular fault condition meets or exceeds the threshold, the administrator can be notified and/or the fault condition having the count which met or exceeded the threshold can be identified as one which is significantly affecting the organization and which requires attention.
- the established criteria can serve to identify fault conditions having occurrence counts within particular ranges, or to identify a fault condition having the highest occurrence count.
- the centralized server can send the diagnostic information and/or the call-stack information for the detected fault condition to a server that is outside of, or does not belong to, the organization.
- the server to which the diagnostic information is sent can be associated with an entity responsible for developing and/or administering the software application that experienced the fault condition.
- the call-stack information for the current fault condition can be compared with call-stack information for prior fault conditions from one or more different organizations. It should be appreciated that if the diagnostic information is sent, that the call-stack information can be extracted from the diagnostic information within the outside server prior to performing step 250 . In any case, the comparison allows the call-stack information to be compared with call-stack information for fault conditions which have occurred in a variety of different organizations.
- step 255 a determination can be made as to whether a match for the call-stack information was found. If so, the method can proceed to step 260 . If not, however, the method can continue to step 275 , where a new fault condition tracking number can be created within the outside server.
- the fault condition tracking number can correspond to the call-stack information and associated fault condition.
- the outside server can notify the organization that sent the diagnostic information of the status of the fault condition, i.e, whether a solution exists, and continue to monitor for future occurrences of the fault condition as determined from call-stack information until such time that the fault condition is diagnosed and/or resolved. When the fault condition is diagnosed and/or resolved, the organization(s) that experienced and reported the fault condition can be notified of the solution and/or status of the fault condition.
- the method can repeat as may be required.
- step 260 a determination can be made as to whether a solution exists for the fault condition corresponding to the matched call-stack information. If not, the method can proceed to step 265 , where the organization that sent the diagnostic information can be advised that presently no solution exists.
- step 270 the fault condition can be logged and the organization can be notified when a solution becomes available. After step 270 , the method can repeat as may be required.
- the method can proceed to step 280 .
- the organization that reported the fault condition optionally can be notified of the existence of a solution.
- the outside server can send a message and/or notification to the centralized server or an administrator within the reporting organization.
- a notification optionally can be sent to an analyst within the outside organization which developed and/or supports the software that experienced the fault condition.
- the analyst that is notified can be one that has been assigned to work on matching, prior fault conditions either with the reporting organization or with another organization that also experienced the same or similar fault condition.
- that analyst when an analyst works on a particular fault condition, that analyst can be associated with the fault condition and/or the call-stack information within the outside server.
- the diagnostic information can be forwarded to the associated analyst. This ensures that fault conditions are dealt with by experienced personnel.
- step 290 if it is determined that the fault condition can be cured by upgrading the application software that experienced the fault to a newer release or version, then such a process can be initiated.
- the server within the outside organization can notify the centralized server within the organization that by upgrading to a newer version or release of the application software, the fault condition will be cured.
- a system administrator within the organization can configure the client and/or servers to automatically download and install such upgrades upon notification from the centralized server within the organization. Accordingly, in cases where a fault condition is experienced in a widespread fashion across an organization, the fix can be automatically distributed to the afflicted computer systems.
- the administrator within the organization after upgrading any afflicted computer systems, can mark the fault condition as being resolved within the centralized server.
- the fault condition resolved if call-stack information is subsequently received which corresponds to the resolved fault condition, it can be determined that the fault condition is actually a new fault condition despite having call-stack information which matches the resolved fault condition. Accordingly, the new fault condition can be considered one that is unrelated to the prior, resolved fault condition. Marking a fault condition as being resolved can be performed on a per version basis, or a per software release basis as the case may be.
- Method 200 has been provided to better illustrate various aspects of the present invention. It should be appreciated that one or more steps of method 200 can be optionally performed, or can be performed in a different order than described herein without departing from the spirit of the invention. For example, it should be appreciated that an analyst can be notified of a fault condition whether a solution exists or not. This allows an analyst working on a particular fault condition to be notified any time such a fault condition arises. The analyst is thereby exposed to the varying circumstances in which the fault condition is detected in order to increase the likelihood that a solution will be found. In another example, the outside server can routinely send notifications to the system administrator of the organization or the centralized server as to the status of different fault conditions whether a solution exists, does not exist, or whether similar fault conditions have been logged or not.
- the present invention can be realized in hardware, software, or a combination of hardware and software.
- the present invention can be realized in a centralized fashion in one computer system or in a distributed fashion where different elements are spread across several interconnected computer systems. Any kind of computer system or other apparatus adapted for carrying out the methods described herein is suited.
- a typical combination of hardware and software can be a general-purpose computer system with a computer program that, when being loaded and executed, controls the computer system such that it carries out the methods described herein.
- the present invention also can be embedded in a computer program product, which comprises all the features enabling the implementation of the methods described herein, and which when loaded in a computer system is able to carry out these methods.
- computer program means any expression, in any language, code or notation, of a set of instructions intended to cause a system having an information processing capability to perform a particular function either directly or after either or both of the following: a) conversion to another language, code or notation; b) reproduction in a different material form.
- computer software can include, but is not limited to, a subroutine, a function, a procedure, an object method, an object implementation, an executable application, an applet, a servlet, a source code, an object code, a shared library/dynamic load library and/or other sequence of instructions designed for execution on a computer system.
- the terms “a” and “an,” as used herein, are defined as one or more than one.
- the term “plurality,” as used herein, is defined as two or more than two.
- the term “another,” as used herein, is defined as at least a second or more.
- the terms “including” and/or “having,” as used herein, are defined as comprising (i.e., open language).
- the term “coupled,” as used herein, is defined as connected, although not necessarily directly, and not necessarily mechanically, i.e. communicatively linked through a communication channel or pathway.
Abstract
A method of diagnosing a fault condition within software can include, responsive to a fault condition within a computing system belonging to an organization, automatically sending call-stack information for the fault condition to a first server within the organization. Within the first server, the call-stack information for the fault condition can be compared with call-stack information from prior fault conditions that occurred within the organization to determine whether the call-stack information for the fault condition matches call-stack information from one of the prior fault conditions. The method further can include sending the call-stack information to a second server for comparison with call-stack information from prior fault conditions that occurred within at least one different organization if the call-stack information for the fault condition does not match.
Description
- 1. Field of the Invention
- The present invention relates to fault detection and resolution in software-based systems.
- 2. Description of the Related Art
- As the complexity of software-based systems increases, so too does the difficulty of identifying the source of faults, referred to as “crashes” or other anomalous behavior, within such systems. Often, when a particular software application is used across an entire organization, the same fault within the software application may be experienced by more than one user. This can lead to a significant amount of wasted time as users cope with “crashing” software applications. The possibility of data loss or corruption also exists. Presently, however, there is no reliable way of correlating software faults across an organization or to diagnose and solve the problem.
- One attempted solution has been to rely upon the computer operating system to collect system and/or application execution information within a user's computer system. Upon detecting a fault condition, collected information is sent to a specified location. A typical implementation of this technique is when an application unexpectedly quits; the user is asked whether he or she wishes to send information about the fault condition. The fault information is sent as an electronic message to the manufacturer of the operating system.
- This technique, however, does have disadvantages. One disadvantage is that whether information about the software fault is actually sent depends upon the user's decision to follow through with sending the information. If the user chooses not to send the information, the fault goes unreported. If the problem is widespread within a large organization, and users continually choose not to report the fault, the problem may go unnoticed for a significant period of time. As noted, this can lead to wasted time as well as possible data loss.
- Another disadvantage is that even if a user chooses to send the fault information, it is provided to the operating system manufacturer. Neither the organization that is experiencing the software fault nor the developer of the software application causing the fault may be privy to the fault information. Both parties are likely to remain unaware of the frequency of any recurring software problems. As such, the problem can go unnoticed by system administrators of the organization experiencing the fault resulting in loss of productivity.
- It would be beneficial to provide a mechanism for diagnosing and solving fault conditions within software-based systems which overcomes the limitations described above.
- The present invention provides a solution for detecting and diagnosing software faults within an organization and/or across multiple organizations. One embodiment of the present invention can include a method of diagnosing a fault condition within software. The method can include, responsive to a fault condition within a computing system belonging to an organization, automatically sending call-stack information for the fault condition to a first server within the organization. Within the first server, the call-stack information for the fault condition can be compared with call-stack information from prior fault conditions that occurred within the organization to determine whether the call-stack information for the fault condition matches call-stack information from one of the prior fault conditions. The method further can include sending the call-stack information to a second server for comparison with call-stack information from prior fault conditions that occurred within at least one different organization if the call-stack information for the fault condition does not match.
- Another embodiment of the present invention can include a system for diagnosing a fault condition within software. The system can include a computing system belonging to an organization. The computing system can execute software configured to detect a fault condition and, responsive to the fault condition, automatically transmit call-stack information corresponding to the fault condition to another computer system within the organization. The system also can include a server belonging to the organization. The server can be configured to receive the call-stack information and compare the call-stack information for the fault condition with call-stack information corresponding to prior fault conditions originating from computing systems belonging to the organization. The server further can be configured to determine whether the call-stack information for the fault condition matches call-stack information from one of the prior fault conditions. If not, the server automatically can transmit the call-stack information for the fault condition to a server that does not belong to the organization for further analysis.
- Another embodiment of the present invention can include a machine readable storage being programmed to cause a machine to perform the various steps described herein.
- There are shown in the drawings, embodiments that are presently preferred; it being understood, however, that the invention is not limited to the precise arrangements and instrumentalities shown.
-
FIG. 1 is a schematic diagram illustrating a system configured for fault detection and resolution for a software-based system in accordance with one embodiment of the present invention. -
FIGS. 2A and 2B , taken together, are a flow chart illustrating a method of fault detection and resolution in accordance with another embodiment of the present invention. - The present invention provides a method, system, and apparatus for fault detection and resolution for use with software-based systems. In accordance with the inventive arrangements disclosed herein, a two tiered approach is presented where fault conditions are first matched with prior software faults that have occurred within an organization. In the event that the fault condition does not match prior software faults, the fault condition information is sent to an evaluation system which is not part of the organization. Typically, the outside evaluation system is associated with the developer and/or entity charged with administering the software system that experienced the fault condition. In any case, the outside evaluation system can compare the fault condition information with faults that have occurred across one or more different organizations to determine whether a solution exists.
- While the specification concludes with claims defining the features of the invention that are regarded as novel, it is believed that the invention will be better understood from a consideration of the description in conjunction with the drawings, in which like reference numerals have been carried forward. As required, detailed embodiments of the present invention are disclosed herein; however, it is to be understood that the disclosed embodiments are merely exemplary of the invention, which can be embodied in various forms. Therefore, specific structural and functional details disclosed herein are not to be interpreted as limiting, but merely as a basis for the claims and as a representative basis for teaching one skilled in the art to variously employ the present invention in virtually any appropriately detailed structure. Further, the terms and phrases used herein are not intended to be limiting but rather to provide an understandable description of the invention.
-
FIG. 1 is a schematic diagram illustrating asystem 100 configured for fault detection and resolution relating to a software-based system in accordance with one embodiment of the present invention.System 100 can include, a plurality ofcomputer systems server 120, communicating through acommunication network 125. In addition,system 100 can include anothercommunication network 130, anadditional server 135, and a plurality ofanalyst computer systems - Computer systems 105-115 can represent client and/or server computing machines which are part of, or belong to, an
organization 101 such as a business, enterprise, or other entity. Each computer system 105-115 can execute software which has been configured to perform various fault detection and/or diagnostic functions which will be described herein in greater detail. In one embodiment,system 100 can be implemented using the Lotus Notes®/Domino® software architecture available from International Business Machines, Inc. of Armonk, N.Y. (IBM). In such an embodiment, computer systems 105-115 can be configured as Lotus® Notes(& clients and/or as IBM Domino® servers. - Still, the present invention is not limited to such an implementation as the techniques disclosed herein can be applied to any of a variety of different software systems and/or architectures. Moreover, the fault detection techniques described herein can be applied to clients, servers, or both clients and servers. In addition, it should be appreciated that while only a limited number of computer systems are represented, that additional clients and/or servers can be included in
system 100. Further, though hardware-based systems are depicted inFIG. 1 , it should be appreciated that the terms “client” and “server” also can refer to software programs executing within suitable information processing systems. - Each of computer systems 105-115 can execute application software that is configured to detect any of a variety of different fault conditions. Examples of detectable fault conditions can include, but are not limited to, access violations, memory existing in an inconsistent state, or the like. In some cases, the computer operating system can send a notification, such as an exception, to an application which indicates that a fault condition has occurred. In other cases, the application itself can detect an internal fault condition.
- Despite the particular manner in which fault conditions are detected, the application software executing on computer systems 105-115 can be configured to store
diagnostic information 140. While different types ofdiagnostic information 140 can be stored or collected, in one embodiment,diagnostic information 140 can include call trace information for the software application that experienced, or was responsible for, the fault condition on the computer system. Other examples ofdiagnostic information 140 can include, but are not limited to information regarding open databases for crashing threads, operating system level information, application level information such as that which may be collected by an application such as IBM Lotus® Notes® or Domino®, etc. The call stack information, however, can be used for purposes of matching a fault condition with prior fault conditions to be described herein in further detail. - The computer systems 105-115 further can be programmed to translate the stack trace data to call-stack information. That is, the hexadecimal stack trace information can be translated into human-readable format which specifies a list of one or more functions of the application that were called, referred to as the call-stack information. In general, the call-stack information is an ordered list of functions that were executed by the software application that experienced the fault condition in the time leading up to the fault condition, or “crash”.
- When any of the computer systems 105-115 of the
organization 101 detects a fault condition, the diagnostic information can be transmitted to theserver 120 viacommunication network 125.Communication network 125 can be implemented as, or include, an intranet, a wide area network, a local area network, a virtual private network, a wireless network, the Internet, and/or the like, so long ascommunication network 125 represents communication pathways within, or belonging to,organization 101 of which computer systems 105-115 andserver 120 belong or are connected. -
Server 120 can be configured as a central repository for diagnostic information withinorganization 101. As such,server 120 can receive and storefault condition information 140 from any of computer systems 105-115 withinorganization 101. Accordingly,server 120 allows administrators withinorganization 101 to view all fault condition information which has been collected across the enterprise. As shown inFIG. 1 ,server 120 can receivediagnostic information 140 fromcomputer system 105 and extract call-stack information from it. The call-stack information can be compared with call-stack information corresponding to prior fault conditions from any computer system withinorganization 101 to determine whether a match exists. - If no match exists, the call-stack information and/or
diagnostic information 140 can be forwarded outside oforganization 101 toserver 135 viacommunication network 130.Communication network 130 can be similar to that ofcommunication network 125 with the exception thatcommunication network 130 includes pathways to computing resources outside of, or which are not part of,organization 101 to which computers 105-115 andserver 120 belong. - In one embodiment,
server 135 can be associated with, or belong to, an entity which is responsible for maintaining and/or developing the software application that experienced the fault condition withinorganization 101. In general, theserver 135 can perform functions similar to those ofserver 120 in that received call-stack information can be compared with call-stack information from other fault conditions.Server 135, however, can include call-stack information from a plurality of different organizations making it possible to determine whether the fault condition has occurred in other organizations, i.e. outside oforganization 101. -
Server 135 further can routediagnostic information 140 to any of a plurality of analyst computer systems 145-155.Server 135 can maintain a record of the diagnostic and/or call-stack information and the particular analyst to which such information was sent. Thus,server 135 effectively maintains, a list of analysts and the particular fault conditions upon which each analyst is working. Accordingly, when call-stack information is received for a given fault condition, that call-stack and/or diagnostic information can be forwarded to an analyst that is already working on a similar, or same, problem, whether for the same organization or for a different organization. -
FIGS. 2A and 2B , taken together, are a flow chart illustrating amethod 200 of fault detection and resolution in accordance with another embodiment of the present invention. While not limited to such an implementation, in one embodiment,method 200 can be performed using the system described with reference toFIG. 1 .Method 200 can begin in a state where application software, configured to detect fault conditions and collect diagnostic information, is executing within either a client or a server of an organization. - Accordingly, in
step 205, the application software can detect a fault condition. As noted, the fault condition can be detected internally within the application software, or the application software can receive a notification from the operating system of the computer system. Regardless of how the fault condition is detected, instep 210, diagnostic information can be collected and/or saved. The diagnostic information can include, but is not limited to, call trace information. - In
step 215, the call trace information can be translated into call-stack information within the computer system prior to transmission. Instep 220, the diagnostic information, which now includes the translated call-stack information, can be sent to a centralized server within the same organization as the computer system that experienced the fault condition. Instep 225, the centralized server, having received the diagnostic information, can extract the call-stack information. The call-stack information corresponding to the fault condition can be compared with call-stack information for prior fault conditions that were experienced by computer systems belonging to the organization and which also were forwarded to the centralized server. - For example, a process can execute on the centralized server where diagnostic information is sent. The process can be notified when new diagnostic information is added to a particular server repository. When diagnostic information is delivered to the server repository, a fault analysis server task can extract the call-stack information from the diagnostic information. The fault analysis server task can use pattern matching technology to determine if the new call-stack information matches any previously received call-stack information.
- Thus, in
step 230, a determination can be made as to whether a match for the call-stack information was found. If so, the method can proceed to step 235. If not, the method can continue to step 245 inFIG. 2B . Continuing withstep 235, an occurrence count associated with the matched, prior fault condition can be incremented within the centralized server. Incrementing the occurrence count signifies that the prior fault condition has been experienced again within the organization. By maintaining an occurrence count for each different fault condition recorded, the organization can determine those fault conditions that are problematic. It should be appreciated that time information also can be recorded for each fault condition making it possible to determine the frequency of each fault condition as well. - Step 240 is optional in nature and can be performed from time to time, or upon request of a system administrator. In
step 240, the fault occurrence counts within the centralized server can be analyzed to identify any fault conditions which meet one or more established criteria. In one embodiment, for example, a system administrator can establish a minimum threshold. If a count for a particular fault condition meets or exceeds the threshold, the administrator can be notified and/or the fault condition having the count which met or exceeded the threshold can be identified as one which is significantly affecting the organization and which requires attention. In other embodiments, the established criteria can serve to identify fault conditions having occurrence counts within particular ranges, or to identify a fault condition having the highest occurrence count. Afterstep 240, the method can loop back to step 205 to continue as may be required. - Proceeding to step 245 in
FIG. 2B , in the case where the centralized server within the organization did not match the call-stack information, the centralized server can send the diagnostic information and/or the call-stack information for the detected fault condition to a server that is outside of, or does not belong to, the organization. In one embodiment, the server to which the diagnostic information is sent can be associated with an entity responsible for developing and/or administering the software application that experienced the fault condition. - In
step 250, the call-stack information for the current fault condition can be compared with call-stack information for prior fault conditions from one or more different organizations. It should be appreciated that if the diagnostic information is sent, that the call-stack information can be extracted from the diagnostic information within the outside server prior to performingstep 250. In any case, the comparison allows the call-stack information to be compared with call-stack information for fault conditions which have occurred in a variety of different organizations. - In
step 255, a determination can be made as to whether a match for the call-stack information was found. If so, the method can proceed to step 260. If not, however, the method can continue to step 275, where a new fault condition tracking number can be created within the outside server. The fault condition tracking number can correspond to the call-stack information and associated fault condition. The outside server can notify the organization that sent the diagnostic information of the status of the fault condition, i.e, whether a solution exists, and continue to monitor for future occurrences of the fault condition as determined from call-stack information until such time that the fault condition is diagnosed and/or resolved. When the fault condition is diagnosed and/or resolved, the organization(s) that experienced and reported the fault condition can be notified of the solution and/or status of the fault condition. Afterstep 275, the method can repeat as may be required. - In
step 260, a determination can be made as to whether a solution exists for the fault condition corresponding to the matched call-stack information. If not, the method can proceed to step 265, where the organization that sent the diagnostic information can be advised that presently no solution exists. Instep 270, the fault condition can be logged and the organization can be notified when a solution becomes available. Afterstep 270, the method can repeat as may be required. - In the case where a solution does exist for the fault condition, the method can proceed to step 280. In
step 280, the organization that reported the fault condition optionally can be notified of the existence of a solution. In one embodiment, the outside server can send a message and/or notification to the centralized server or an administrator within the reporting organization. - In
step 285, a notification optionally can be sent to an analyst within the outside organization which developed and/or supports the software that experienced the fault condition. The analyst that is notified can be one that has been assigned to work on matching, prior fault conditions either with the reporting organization or with another organization that also experienced the same or similar fault condition. For example, in one embodiment, when an analyst works on a particular fault condition, that analyst can be associated with the fault condition and/or the call-stack information within the outside server. When subsequent reports of the same or a similar fault condition are received, as determined from the call stack information, the diagnostic information can be forwarded to the associated analyst. This ensures that fault conditions are dealt with by experienced personnel. - In
step 290, if it is determined that the fault condition can be cured by upgrading the application software that experienced the fault to a newer release or version, then such a process can be initiated. In one embodiment, the server within the outside organization can notify the centralized server within the organization that by upgrading to a newer version or release of the application software, the fault condition will be cured. A system administrator within the organization can configure the client and/or servers to automatically download and install such upgrades upon notification from the centralized server within the organization. Accordingly, in cases where a fault condition is experienced in a widespread fashion across an organization, the fix can be automatically distributed to the afflicted computer systems. - In
step 295, the administrator within the organization, after upgrading any afflicted computer systems, can mark the fault condition as being resolved within the centralized server. By marking the fault condition resolved, if call-stack information is subsequently received which corresponds to the resolved fault condition, it can be determined that the fault condition is actually a new fault condition despite having call-stack information which matches the resolved fault condition. Accordingly, the new fault condition can be considered one that is unrelated to the prior, resolved fault condition. Marking a fault condition as being resolved can be performed on a per version basis, or a per software release basis as the case may be. -
Method 200 has been provided to better illustrate various aspects of the present invention. It should be appreciated that one or more steps ofmethod 200 can be optionally performed, or can be performed in a different order than described herein without departing from the spirit of the invention. For example, it should be appreciated that an analyst can be notified of a fault condition whether a solution exists or not. This allows an analyst working on a particular fault condition to be notified any time such a fault condition arises. The analyst is thereby exposed to the varying circumstances in which the fault condition is detected in order to increase the likelihood that a solution will be found. In another example, the outside server can routinely send notifications to the system administrator of the organization or the centralized server as to the status of different fault conditions whether a solution exists, does not exist, or whether similar fault conditions have been logged or not. - The present invention can be realized in hardware, software, or a combination of hardware and software. The present invention can be realized in a centralized fashion in one computer system or in a distributed fashion where different elements are spread across several interconnected computer systems. Any kind of computer system or other apparatus adapted for carrying out the methods described herein is suited. A typical combination of hardware and software can be a general-purpose computer system with a computer program that, when being loaded and executed, controls the computer system such that it carries out the methods described herein. The present invention also can be embedded in a computer program product, which comprises all the features enabling the implementation of the methods described herein, and which when loaded in a computer system is able to carry out these methods.
- The terms “computer program”, “software”, “application”, variants and/or combinations thereof, in the present context, mean any expression, in any language, code or notation, of a set of instructions intended to cause a system having an information processing capability to perform a particular function either directly or after either or both of the following: a) conversion to another language, code or notation; b) reproduction in a different material form. For example, computer software can include, but is not limited to, a subroutine, a function, a procedure, an object method, an object implementation, an executable application, an applet, a servlet, a source code, an object code, a shared library/dynamic load library and/or other sequence of instructions designed for execution on a computer system.
- The terms “a” and “an,” as used herein, are defined as one or more than one. The term “plurality,” as used herein, is defined as two or more than two. The term “another,” as used herein, is defined as at least a second or more. The terms “including” and/or “having,” as used herein, are defined as comprising (i.e., open language). The term “coupled,” as used herein, is defined as connected, although not necessarily directly, and not necessarily mechanically, i.e. communicatively linked through a communication channel or pathway.
- This invention can be embodied in other forms without departing from the spirit or essential attributes thereof. Accordingly, reference should be made to the following claims, rather than to the foregoing specification, as indicating the scope of the invention.
Claims (20)
1. A method of diagnosing a fault condition within software comprising:
responsive to a fault condition within a computing system belonging to an organization, automatically sending call-stack information for the fault condition to a first server within the organization;
within the first server, comparing the call-stack information for the fault condition with call-stack information from prior fault conditions that occurred within the organization to determine whether the call-stack information for the fault condition matches call-stack information from one of the prior fault conditions; and
if the call-stack information for the fault condition does not match, sending the call-stack information to a second server for comparison with call-stack information from prior fault conditions that occurred within at least one different organization.
2. The method of claim 1 , wherein the second server is associated with an entity responsible for maintaining the software that experienced the fault condition, and wherein the second server is not part of the organization.
3. The method of claim 1 , wherein the call-stack information of the fault condition matches the call-stack information from a prior fault condition within the first server, said method further comprising:
incrementing an occurrence count for the prior fault condition; and
identifying a fault condition having an occurrence count which corresponds to at least one selected criterion.
4. The method of claim 1 , further comprising, prior to said automatically sending step, translating a call trace into a plurality of function names which comprise, at least in part, the call-stack information.
5. The method of claim 1 , wherein the call-stack information is sent to the second server, said method further comprising:
matching the call-stack information of the fault condition with call-stack information for a prior fault condition that occurred within a different organization;
identifying an analyst assigned to the matching, prior fault condition; and
automatically routing the call-stack information to the analyst.
6. The method of claim 1 , wherein the call-stack information is sent to the second server, said method further comprising notifying the organization that sent the call-stack information for the fault condition whether the fault condition is a known issue.
7. The method of claim 1 , wherein the call-stack information is sent to the second server, said method further comprising determining whether the fault condition is corrected by a newer release of the software.
8. The method of claim 1 , wherein the call-stack information is sent to the second server, said method further comprising:
determining that the fault condition is corrected by a newer release of the software; and
initiating an automatic update, within the computing system belonging to the organization, of the software to the newer release.
9. The method of claim 1 , further comprising:
marking a fault condition resolved for a particular release of the software;
receiving subsequent call-stack information that matches the fault condition that has been marked resolved; and
determining that the call-stack information indicates a fault condition that is unrelated to the fault condition that was marked resolved.
10. A system for diagnosing a fault condition within software, said system comprising:
a computing system belonging to an organization, said computing system executing software configured to detect a fault condition and, responsive to the fault condition, automatically transmit call-stack information corresponding to the fault condition to another computer system within the organization; and
a server belonging to the organization, said server being configured to receive the call-stack information and compare the call-stack information for the fault condition with call-stack information corresponding to prior fault conditions originating from computing systems belonging to the organization, wherein the server is further configured to determine whether the call-stack information for the fault condition matches call-stack information from one of the prior fault conditions, and, if not, automatically transmits the call-stack information for the fault condition to a server that does not belong to the organization for further analysis.
11. The system of claim 10 , wherein the computing system translates a call trace into a plurality of function names which comprise, at least in part, the call-stack information prior to automatically transmitting the call-stack information to another computer system within the organization.
12. The system of claim 10 , wherein the call-stack information of the fault condition matches call-stack information from a prior fault condition within the server, said server further being configured to increment an occurrence count for the prior fault condition and identify a fault condition having an occurrence count which corresponds to at least one selected criterion.
13. A machine readable storage, having stored thereon a computer program having a plurality of code sections executable by a machine for causing the machine to perform the steps of:
responsive to a fault condition within a computing system belonging to an organization, automatically sending call-stack information for the fault condition to a first server within the organization;
within the first server, comparing the call-stack information for the fault condition with call-stack information from prior fault conditions that occurred within the organization to determine whether the call-stack information for the fault condition matches call-stack information from one of the prior fault conditions; and
if the call-stack information for the fault condition does not match, sending the call-stack information to a second server for comparison with fault stack information from prior fault conditions that occurred within at least one different organization.
14. The machine readable storage of claim 13 , wherein the second server is associated with an entity responsible for maintaining the software that experienced the fault condition, and wherein the second server is not part of the organization.
15. The machine readable storage of claim 13 , wherein the call-stack information of the fault condition matches the call-stack information from a prior fault condition within the first server, said method further comprising:
incrementing an occurrence count for the prior fault condition; and
identifying a fault condition having an occurrence count which corresponds to a selected criterion.
16. The machine readable storage of claim 13 , further comprising, prior to said automatically sending step, translating a call trace into a plurality of function names which comprise, at least in part, the call-stack information.
17. The machine readable storage of claim 13 , wherein the call-stack information is sent to the second server, said method further comprising:
matching the call-stack information of the fault condition with call-stack information for a prior fault condition that occurred within a different organization;
identifying an analyst assigned to the matching, prior fault condition; and
automatically routing the call-stack information to the analyst.
18. The machine readable storage of claim 13 , wherein the call-stack information is sent to the second server, said method further comprising determining whether the fault condition is corrected by a newer release of the software.
19. The machine readable storage of claim 13 , wherein the call-stack information is sent to the second server, said method further comprising:
determining that the fault condition is corrected by a newer release of the software; and
initiating an automatic update, within the computing system belonging to the organization, of the software to the newer release.
20. The machine readable storage of claim 13 , further comprising:
marking a fault condition resolved for a particular release of the software;
receiving subsequent call-stack information that matches the fault condition that has been marked resolved; and
determining that the call-stack information indicates a fault condition that is unrelated to the fault condition that was marked resolved.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US11/203,534 US20070038896A1 (en) | 2005-08-12 | 2005-08-12 | Call-stack pattern matching for problem resolution within software |
US12/618,304 US7984334B2 (en) | 2005-08-12 | 2009-11-13 | Call-stack pattern matching for problem resolution within software |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US11/203,534 US20070038896A1 (en) | 2005-08-12 | 2005-08-12 | Call-stack pattern matching for problem resolution within software |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US12/618,304 Continuation US7984334B2 (en) | 2005-08-12 | 2009-11-13 | Call-stack pattern matching for problem resolution within software |
Publications (1)
Publication Number | Publication Date |
---|---|
US20070038896A1 true US20070038896A1 (en) | 2007-02-15 |
Family
ID=37743934
Family Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US11/203,534 Abandoned US20070038896A1 (en) | 2005-08-12 | 2005-08-12 | Call-stack pattern matching for problem resolution within software |
US12/618,304 Expired - Fee Related US7984334B2 (en) | 2005-08-12 | 2009-11-13 | Call-stack pattern matching for problem resolution within software |
Family Applications After (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US12/618,304 Expired - Fee Related US7984334B2 (en) | 2005-08-12 | 2009-11-13 | Call-stack pattern matching for problem resolution within software |
Country Status (1)
Country | Link |
---|---|
US (2) | US20070038896A1 (en) |
Cited By (25)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20060143527A1 (en) * | 2004-12-21 | 2006-06-29 | Grey James A | Test executive with stack corruption detection, stack safety buffers, and increased determinism for uninitialized local variable bugs |
US20080263404A1 (en) * | 2007-04-20 | 2008-10-23 | Sap Ag | System and Method for Supporting Software |
US20090013208A1 (en) * | 2008-03-31 | 2009-01-08 | Dimuzio Thomas M | Real time automated exception notification and reporting solution |
US20090063387A1 (en) * | 2007-08-31 | 2009-03-05 | International Business Machines Corporation | Apparatus And Method For Problem Determination And Resolution |
US20090193298A1 (en) * | 2008-01-30 | 2009-07-30 | International Business Machines Corporation | System and method of fault detection, diagnosis and prevention for complex computing systems |
GB2458201A (en) * | 2008-03-12 | 2009-09-16 | Ibm | Creating a program problem signature data base during program testing to diagnose problems during program use |
US7681182B1 (en) | 2008-11-06 | 2010-03-16 | International Business Machines Corporation | Including function call graphs (FCG) generated from trace analysis data within a searchable problem determination knowledge base |
US20100095101A1 (en) * | 2008-10-15 | 2010-04-15 | Stefan Georg Derdak | Capturing Context Information in a Currently Occurring Event |
US20110030061A1 (en) * | 2009-07-14 | 2011-02-03 | International Business Machines Corporation | Detecting and localizing security vulnerabilities in client-server application |
US20110078298A1 (en) * | 2009-09-30 | 2011-03-31 | Fujitsu Limited | Data collection apparatus and method thereof |
US20110214109A1 (en) * | 2010-02-26 | 2011-09-01 | Pedersen Soeren Sandmann | Generating stack traces of call stacks that lack frame pointers |
CN102262527A (en) * | 2010-05-31 | 2011-11-30 | 国际商业机器公司 | Method and system for generating web service |
US20140052857A1 (en) * | 2009-09-10 | 2014-02-20 | AppDynamics, Inc. | Correlation of distributed business transactions |
US20140068068A1 (en) * | 2009-09-10 | 2014-03-06 | AppDynamics, Inc. | Performing call stack sampling |
US9009539B1 (en) * | 2014-03-18 | 2015-04-14 | Splunk Inc | Identifying and grouping program run time errors |
US9064046B1 (en) * | 2006-01-04 | 2015-06-23 | Emc Corporation | Using correlated stack traces to determine faults in client/server software |
US9311598B1 (en) | 2012-02-02 | 2016-04-12 | AppDynamics, Inc. | Automatic capture of detailed analysis information for web application outliers with very low overhead |
WO2016061820A1 (en) * | 2014-10-24 | 2016-04-28 | Google Inc. | Methods and systems for automated tagging based on software execution traces |
US10230611B2 (en) * | 2009-09-10 | 2019-03-12 | Cisco Technology, Inc. | Dynamic baseline determination for distributed business transaction |
US10606613B2 (en) | 2018-05-31 | 2020-03-31 | Bank Of America Corporation | Integrated mainframe distributed orchestration tool |
WO2020123261A1 (en) * | 2018-12-13 | 2020-06-18 | Microsoft Technology Licensing, Llc | Collecting repeated diagnostics data from across users participating in a document collaboration session |
US11102094B2 (en) | 2015-08-25 | 2021-08-24 | Google Llc | Systems and methods for configuring a resource for network traffic analysis |
CN113568773A (en) * | 2021-07-26 | 2021-10-29 | 北京达佳互联信息技术有限公司 | Abnormal service classification method, device, equipment and storage medium |
US11347914B1 (en) * | 2021-06-03 | 2022-05-31 | Cadence Design Systems, Inc. | System and method for automatic performance analysis in an electronic circuit design |
US20220327024A1 (en) * | 2021-04-09 | 2022-10-13 | International Business Machines Corporation | Hang Detection and Remediation in a Multi-Threaded Application Process |
Families Citing this family (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7971100B2 (en) * | 2008-01-22 | 2011-06-28 | Microsoft Corporation | Failure location detection using types in assembly files |
US8245081B2 (en) * | 2010-02-10 | 2012-08-14 | Vmware, Inc. | Error reporting through observation correlation |
US8949675B2 (en) | 2010-11-30 | 2015-02-03 | Microsoft Corporation | Error report processing using call stack similarity |
US9529658B2 (en) | 2014-02-07 | 2016-12-27 | Oracle International Corporation | Techniques for generating diagnostic identifiers to trace request messages and identifying related diagnostic information |
US9529657B2 (en) * | 2014-02-07 | 2016-12-27 | Oracle International Corporation | Techniques for generating diagnostic identifiers to trace events and identifying related diagnostic information |
JP6871943B2 (en) | 2016-03-28 | 2021-05-19 | オラクル・インターナショナル・コーポレイション | Preformed instructions for mobile cloud services |
US10552242B2 (en) * | 2017-09-18 | 2020-02-04 | Bank Of America Corporation | Runtime failure detection and correction |
Citations (32)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5446880A (en) * | 1992-08-31 | 1995-08-29 | At&T Corp. | Database communication system that provides automatic format translation and transmission of records when the owner identified for the record is changed |
US5666481A (en) * | 1993-02-26 | 1997-09-09 | Cabletron Systems, Inc. | Method and apparatus for resolving faults in communications networks |
US5768501A (en) * | 1996-05-28 | 1998-06-16 | Cabletron Systems | Method and apparatus for inter-domain alarm correlation |
US5928369A (en) * | 1996-06-28 | 1999-07-27 | Synopsys, Inc. | Automatic support system and method based on user submitted stack trace |
US5983364A (en) * | 1997-05-12 | 1999-11-09 | System Soft Corporation | System and method for diagnosing computer faults |
US6002872A (en) * | 1998-03-31 | 1999-12-14 | International Machines Corporation | Method and apparatus for structured profiling of data processing systems and applications |
US20030005414A1 (en) * | 2001-05-24 | 2003-01-02 | Elliott Scott Clementson | Program execution stack signatures |
US6553507B1 (en) * | 1998-09-30 | 2003-04-22 | Intel Corporation | Just-in-time software updates |
US20030115582A1 (en) * | 2001-12-13 | 2003-06-19 | Robert Hundt | Dynamic registration of dynamically generated code and corresponding unwind information |
US6598090B2 (en) * | 1998-11-03 | 2003-07-22 | International Business Machines Corporation | Centralized control of software for administration of a distributed computing environment |
US20030167454A1 (en) * | 2001-03-30 | 2003-09-04 | Vassil Iordanov | Method of and system for providing metacognitive processing for simulating cognitive tasks |
US20030215068A1 (en) * | 2002-03-22 | 2003-11-20 | Stein Lawrence M. | System and method for seamless audio retrieval and transmittal during wireless application protocol sessions |
US6681344B1 (en) * | 2000-09-14 | 2004-01-20 | Microsoft Corporation | System and method for automatically diagnosing a computer problem |
US6691064B2 (en) * | 2000-12-29 | 2004-02-10 | General Electric Company | Method and system for identifying repeatedly malfunctioning equipment |
US6742141B1 (en) * | 1999-05-10 | 2004-05-25 | Handsfree Networks, Inc. | System for automated problem detection, diagnosis, and resolution in a software driven system |
US6751789B1 (en) * | 1997-12-12 | 2004-06-15 | International Business Machines Corporation | Method and system for periodic trace sampling for real-time generation of segments of call stack trees augmented with call stack position determination |
US6834257B2 (en) * | 2001-09-07 | 2004-12-21 | Siemens Aktiengesellschaft | Method for providing diagnostic messages |
US20040260474A1 (en) * | 2003-06-17 | 2004-12-23 | International Business Machines Corporation | Logging of exception data |
US20050102567A1 (en) * | 2003-10-31 | 2005-05-12 | Mcguire Cynthia A. | Method and architecture for automated fault diagnosis and correction in a computer system |
US7039833B2 (en) * | 2002-10-21 | 2006-05-02 | I2 Technologies Us, Inc. | Stack trace generated code compared with database to find error resolution information |
US7051320B2 (en) * | 2002-08-22 | 2006-05-23 | Hewlett-Packard Development Company, L.P. | Diagnostic tool for a plurality of networked computers with incident escalator and relocation of information to another computer |
US7058860B2 (en) * | 2001-06-29 | 2006-06-06 | Hewlett-Packard Development Company, L.P. | System and method of automatic parameter collection and problem solution generation for computer storage devices |
US7069473B2 (en) * | 2001-10-05 | 2006-06-27 | Nec Corporation | Computer recovery method and system for recovering automatically from fault, and fault monitoring apparatus and program used in computer system |
US7127499B1 (en) * | 1998-11-25 | 2006-10-24 | General Electric Company | Medical diagnostic system service method and apparatus |
US7143415B2 (en) * | 2002-08-22 | 2006-11-28 | Hewlett-Packard Development Company, L.P. | Method for using self-help technology to deliver remote enterprise support |
US7155514B1 (en) * | 2002-09-12 | 2006-12-26 | Dorian Software Creations, Inc. | Apparatus for event log management |
US7158965B1 (en) * | 2002-11-26 | 2007-01-02 | Microsoft Corporation | Method and apparatus for providing help content corresponding to the occurrence of an event within a computer |
US7234080B2 (en) * | 2002-10-18 | 2007-06-19 | Computer Associates Think, Inc. | Locating potential sources of memory leaks |
US7257743B2 (en) * | 2000-06-23 | 2007-08-14 | Microsoft Corporation | Method and system for reporting failures of a program module in a corporate environment |
US7299455B2 (en) * | 1995-06-02 | 2007-11-20 | Cisco Technology, Inc. | Remote monitoring of computer programs |
US7503062B2 (en) * | 1999-06-29 | 2009-03-10 | Oracle International Corporation | Method and apparatus for enabling database privileges |
US7543175B2 (en) * | 2004-05-21 | 2009-06-02 | Sap Ag | Method and system for intelligent and adaptive exception handling |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP4568021B2 (en) * | 2004-04-05 | 2010-10-27 | 株式会社日立製作所 | Computer system that operates the command multiple number monitoring control system |
-
2005
- 2005-08-12 US US11/203,534 patent/US20070038896A1/en not_active Abandoned
-
2009
- 2009-11-13 US US12/618,304 patent/US7984334B2/en not_active Expired - Fee Related
Patent Citations (34)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5446880A (en) * | 1992-08-31 | 1995-08-29 | At&T Corp. | Database communication system that provides automatic format translation and transmission of records when the owner identified for the record is changed |
US5666481A (en) * | 1993-02-26 | 1997-09-09 | Cabletron Systems, Inc. | Method and apparatus for resolving faults in communications networks |
US7299455B2 (en) * | 1995-06-02 | 2007-11-20 | Cisco Technology, Inc. | Remote monitoring of computer programs |
US20080147853A1 (en) * | 1995-06-02 | 2008-06-19 | Anderson Mark D | Remote monitoring of computer programs |
US5768501A (en) * | 1996-05-28 | 1998-06-16 | Cabletron Systems | Method and apparatus for inter-domain alarm correlation |
US5928369A (en) * | 1996-06-28 | 1999-07-27 | Synopsys, Inc. | Automatic support system and method based on user submitted stack trace |
US5983364A (en) * | 1997-05-12 | 1999-11-09 | System Soft Corporation | System and method for diagnosing computer faults |
US6751789B1 (en) * | 1997-12-12 | 2004-06-15 | International Business Machines Corporation | Method and system for periodic trace sampling for real-time generation of segments of call stack trees augmented with call stack position determination |
US6002872A (en) * | 1998-03-31 | 1999-12-14 | International Machines Corporation | Method and apparatus for structured profiling of data processing systems and applications |
US6553507B1 (en) * | 1998-09-30 | 2003-04-22 | Intel Corporation | Just-in-time software updates |
US6598090B2 (en) * | 1998-11-03 | 2003-07-22 | International Business Machines Corporation | Centralized control of software for administration of a distributed computing environment |
US7127499B1 (en) * | 1998-11-25 | 2006-10-24 | General Electric Company | Medical diagnostic system service method and apparatus |
US6742141B1 (en) * | 1999-05-10 | 2004-05-25 | Handsfree Networks, Inc. | System for automated problem detection, diagnosis, and resolution in a software driven system |
US7503062B2 (en) * | 1999-06-29 | 2009-03-10 | Oracle International Corporation | Method and apparatus for enabling database privileges |
US7257743B2 (en) * | 2000-06-23 | 2007-08-14 | Microsoft Corporation | Method and system for reporting failures of a program module in a corporate environment |
US6681344B1 (en) * | 2000-09-14 | 2004-01-20 | Microsoft Corporation | System and method for automatically diagnosing a computer problem |
US6691064B2 (en) * | 2000-12-29 | 2004-02-10 | General Electric Company | Method and system for identifying repeatedly malfunctioning equipment |
US20030167454A1 (en) * | 2001-03-30 | 2003-09-04 | Vassil Iordanov | Method of and system for providing metacognitive processing for simulating cognitive tasks |
US20030005414A1 (en) * | 2001-05-24 | 2003-01-02 | Elliott Scott Clementson | Program execution stack signatures |
US7058860B2 (en) * | 2001-06-29 | 2006-06-06 | Hewlett-Packard Development Company, L.P. | System and method of automatic parameter collection and problem solution generation for computer storage devices |
US6834257B2 (en) * | 2001-09-07 | 2004-12-21 | Siemens Aktiengesellschaft | Method for providing diagnostic messages |
US7069473B2 (en) * | 2001-10-05 | 2006-06-27 | Nec Corporation | Computer recovery method and system for recovering automatically from fault, and fault monitoring apparatus and program used in computer system |
US20030115582A1 (en) * | 2001-12-13 | 2003-06-19 | Robert Hundt | Dynamic registration of dynamically generated code and corresponding unwind information |
US20030215068A1 (en) * | 2002-03-22 | 2003-11-20 | Stein Lawrence M. | System and method for seamless audio retrieval and transmittal during wireless application protocol sessions |
US7051320B2 (en) * | 2002-08-22 | 2006-05-23 | Hewlett-Packard Development Company, L.P. | Diagnostic tool for a plurality of networked computers with incident escalator and relocation of information to another computer |
US7143415B2 (en) * | 2002-08-22 | 2006-11-28 | Hewlett-Packard Development Company, L.P. | Method for using self-help technology to deliver remote enterprise support |
US7155514B1 (en) * | 2002-09-12 | 2006-12-26 | Dorian Software Creations, Inc. | Apparatus for event log management |
US7234080B2 (en) * | 2002-10-18 | 2007-06-19 | Computer Associates Think, Inc. | Locating potential sources of memory leaks |
US7039833B2 (en) * | 2002-10-21 | 2006-05-02 | I2 Technologies Us, Inc. | Stack trace generated code compared with database to find error resolution information |
US7158965B1 (en) * | 2002-11-26 | 2007-01-02 | Microsoft Corporation | Method and apparatus for providing help content corresponding to the occurrence of an event within a computer |
US20040260474A1 (en) * | 2003-06-17 | 2004-12-23 | International Business Machines Corporation | Logging of exception data |
US20050102567A1 (en) * | 2003-10-31 | 2005-05-12 | Mcguire Cynthia A. | Method and architecture for automated fault diagnosis and correction in a computer system |
US7328376B2 (en) * | 2003-10-31 | 2008-02-05 | Sun Microsystems, Inc. | Error reporting to diagnostic engines based on their diagnostic capabilities |
US7543175B2 (en) * | 2004-05-21 | 2009-06-02 | Sap Ag | Method and system for intelligent and adaptive exception handling |
Cited By (57)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7613954B2 (en) * | 2004-12-21 | 2009-11-03 | National Instruments Corporation | Test executive with stack corruption detection |
US20060143527A1 (en) * | 2004-12-21 | 2006-06-29 | Grey James A | Test executive with stack corruption detection, stack safety buffers, and increased determinism for uninitialized local variable bugs |
US9064046B1 (en) * | 2006-01-04 | 2015-06-23 | Emc Corporation | Using correlated stack traces to determine faults in client/server software |
US7757126B2 (en) * | 2007-04-20 | 2010-07-13 | Sap Ag | System and method for supporting software |
US20080263404A1 (en) * | 2007-04-20 | 2008-10-23 | Sap Ag | System and Method for Supporting Software |
US20090063387A1 (en) * | 2007-08-31 | 2009-03-05 | International Business Machines Corporation | Apparatus And Method For Problem Determination And Resolution |
US20090193298A1 (en) * | 2008-01-30 | 2009-07-30 | International Business Machines Corporation | System and method of fault detection, diagnosis and prevention for complex computing systems |
US8949671B2 (en) | 2008-01-30 | 2015-02-03 | International Business Machines Corporation | Fault detection, diagnosis, and prevention for complex computing systems |
GB2458201A (en) * | 2008-03-12 | 2009-09-16 | Ibm | Creating a program problem signature data base during program testing to diagnose problems during program use |
US20090013208A1 (en) * | 2008-03-31 | 2009-01-08 | Dimuzio Thomas M | Real time automated exception notification and reporting solution |
US20100095101A1 (en) * | 2008-10-15 | 2010-04-15 | Stefan Georg Derdak | Capturing Context Information in a Currently Occurring Event |
US8566798B2 (en) * | 2008-10-15 | 2013-10-22 | International Business Machines Corporation | Capturing context information in a currently occurring event |
US7681182B1 (en) | 2008-11-06 | 2010-03-16 | International Business Machines Corporation | Including function call graphs (FCG) generated from trace analysis data within a searchable problem determination knowledge base |
US20110030061A1 (en) * | 2009-07-14 | 2011-02-03 | International Business Machines Corporation | Detecting and localizing security vulnerabilities in client-server application |
US8516449B2 (en) * | 2009-07-14 | 2013-08-20 | International Business Machines Corporation | Detecting and localizing security vulnerabilities in client-server application |
US10348809B2 (en) * | 2009-09-10 | 2019-07-09 | Cisco Technology, Inc. | Naming of distributed business transactions |
US20140052857A1 (en) * | 2009-09-10 | 2014-02-20 | AppDynamics, Inc. | Correlation of distributed business transactions |
US20140052624A1 (en) * | 2009-09-10 | 2014-02-20 | AppDynamics, Inc. | Correlation of asynchronous business transactions |
US20140068068A1 (en) * | 2009-09-10 | 2014-03-06 | AppDynamics, Inc. | Performing call stack sampling |
US20140068069A1 (en) * | 2009-09-10 | 2014-03-06 | AppDynamics, Inc. | Conducting a diagnostic session for monitored business transactions |
US20140068003A1 (en) * | 2009-09-10 | 2014-03-06 | AppDynamics, Inc. | Transaction correlation using three way handshake |
US20140068067A1 (en) * | 2009-09-10 | 2014-03-06 | AppDynamics, Inc. | Propagating a diagnostic session for business transactions across multiple servers |
US9167028B1 (en) * | 2009-09-10 | 2015-10-20 | AppDynamics, Inc. | Monitoring distributed web application transactions |
US9077610B2 (en) * | 2009-09-10 | 2015-07-07 | AppDynamics, Inc. | Performing call stack sampling |
US8935395B2 (en) * | 2009-09-10 | 2015-01-13 | AppDynamics Inc. | Correlation of distributed business transactions |
US8938533B1 (en) | 2009-09-10 | 2015-01-20 | AppDynamics Inc. | Automatic capture of diagnostic data based on transaction behavior learning |
US9369356B2 (en) | 2009-09-10 | 2016-06-14 | AppDynamics, Inc. | Conducting a diagnostic session for monitored business transactions |
US9037707B2 (en) * | 2009-09-10 | 2015-05-19 | AppDynamics, Inc. | Propagating a diagnostic session for business transactions across multiple servers |
US10230611B2 (en) * | 2009-09-10 | 2019-03-12 | Cisco Technology, Inc. | Dynamic baseline determination for distributed business transaction |
US9015278B2 (en) * | 2009-09-10 | 2015-04-21 | AppDynamics, Inc. | Transaction correlation using three way handshake |
US9015315B2 (en) | 2009-09-10 | 2015-04-21 | AppDynamics, Inc. | Identification and monitoring of distributed business transactions |
US9015317B2 (en) * | 2009-09-10 | 2015-04-21 | AppDynamics, Inc. | Conducting a diagnostic session for monitored business transactions |
US9015316B2 (en) * | 2009-09-10 | 2015-04-21 | AppDynamics, Inc. | Correlation of asynchronous business transactions |
US20110078298A1 (en) * | 2009-09-30 | 2011-03-31 | Fujitsu Limited | Data collection apparatus and method thereof |
US8769069B2 (en) * | 2009-09-30 | 2014-07-01 | Fujitsu Limited | Data collection apparatus and method thereof |
US20110214109A1 (en) * | 2010-02-26 | 2011-09-01 | Pedersen Soeren Sandmann | Generating stack traces of call stacks that lack frame pointers |
US8732671B2 (en) * | 2010-02-26 | 2014-05-20 | Red Hat, Inc. | Generating stack traces of call stacks that lack frame pointers |
CN102262527A (en) * | 2010-05-31 | 2011-11-30 | 国际商业机器公司 | Method and system for generating web service |
US8973020B2 (en) | 2010-05-31 | 2015-03-03 | International Business Machines Corporation | Generating a web service |
US9311598B1 (en) | 2012-02-02 | 2016-04-12 | AppDynamics, Inc. | Automatic capture of detailed analysis information for web application outliers with very low overhead |
US9009539B1 (en) * | 2014-03-18 | 2015-04-14 | Splunk Inc | Identifying and grouping program run time errors |
US10977561B2 (en) | 2014-10-24 | 2021-04-13 | Google Llc | Methods and systems for processing software traces |
GB2546205A (en) * | 2014-10-24 | 2017-07-12 | Google Inc | Methods and systems for automated tagging based on software execution traces |
US9940579B2 (en) | 2014-10-24 | 2018-04-10 | Google Llc | Methods and systems for automated tagging based on software execution traces |
US11379734B2 (en) | 2014-10-24 | 2022-07-05 | Google Llc | Methods and systems for processing software traces |
WO2016061820A1 (en) * | 2014-10-24 | 2016-04-28 | Google Inc. | Methods and systems for automated tagging based on software execution traces |
GB2546205B (en) * | 2014-10-24 | 2021-07-21 | Google Llc | Methods and systems for automated tagging based on software execution traces |
US11102094B2 (en) | 2015-08-25 | 2021-08-24 | Google Llc | Systems and methods for configuring a resource for network traffic analysis |
US11444856B2 (en) | 2015-08-25 | 2022-09-13 | Google Llc | Systems and methods for configuring a resource for network traffic analysis |
US10853095B2 (en) | 2018-05-31 | 2020-12-01 | Bank Of America Corporation | Integrated mainframe distributed orchestration tool |
US10606613B2 (en) | 2018-05-31 | 2020-03-31 | Bank Of America Corporation | Integrated mainframe distributed orchestration tool |
US10963331B2 (en) | 2018-12-13 | 2021-03-30 | Microsoft Technology Licensing, Llc | Collecting repeated diagnostics data from across users participating in a document collaboration session |
WO2020123261A1 (en) * | 2018-12-13 | 2020-06-18 | Microsoft Technology Licensing, Llc | Collecting repeated diagnostics data from across users participating in a document collaboration session |
US20220327024A1 (en) * | 2021-04-09 | 2022-10-13 | International Business Machines Corporation | Hang Detection and Remediation in a Multi-Threaded Application Process |
US11693739B2 (en) * | 2021-04-09 | 2023-07-04 | International Business Machines Corporation | Hang detection and remediation in a multi-threaded application process |
US11347914B1 (en) * | 2021-06-03 | 2022-05-31 | Cadence Design Systems, Inc. | System and method for automatic performance analysis in an electronic circuit design |
CN113568773A (en) * | 2021-07-26 | 2021-10-29 | 北京达佳互联信息技术有限公司 | Abnormal service classification method, device, equipment and storage medium |
Also Published As
Publication number | Publication date |
---|---|
US7984334B2 (en) | 2011-07-19 |
US20100064179A1 (en) | 2010-03-11 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US7984334B2 (en) | Call-stack pattern matching for problem resolution within software | |
US8892960B2 (en) | System and method for determining causes of performance problems within middleware systems | |
KR102268355B1 (en) | Cloud deployment infrastructure validation engine | |
US7506336B1 (en) | System and methods for version compatibility checking | |
US8250563B2 (en) | Distributed autonomic solutions repository | |
JP6160064B2 (en) | Application determination program, failure detection apparatus, and application determination method | |
US7231550B1 (en) | Event protocol and resource naming scheme | |
US20080282104A1 (en) | Self Healing Software | |
US10489232B1 (en) | Data center diagnostic information | |
US20080098109A1 (en) | Incident resolution | |
US7779300B2 (en) | Server outage data management | |
US20080028264A1 (en) | Detection and mitigation of disk failures | |
US11625310B2 (en) | Application regression detection in computing systems | |
JP5425720B2 (en) | Virtualization environment monitoring apparatus and monitoring method and program thereof | |
US8554908B2 (en) | Device, method, and storage medium for detecting multiplexed relation of applications | |
US8677323B2 (en) | Recording medium storing monitoring program, monitoring method, and monitoring system | |
Yan et al. | Aegis: Attribution of Control Plane Change Impact across Layers and Components for Cloud Systems | |
US7017152B2 (en) | Method of detecting lost objects in a software system | |
US7363615B2 (en) | Stack-based callbacks for diagnostic data generation | |
JP2009245154A (en) | Computer system, method, and computer program for evaluating symptom | |
JP4575020B2 (en) | Failure analysis device | |
US11290325B1 (en) | System and method for change reconciliation in information technology systems | |
Arefin et al. | Cloudinsight: Shedding light on the cloud | |
CN113553243A (en) | Remote error detection method | |
US11645137B2 (en) | Exception management in heterogenous computing environment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: INTERNATIONAL BUSINESS MACHINES CORPORATION, NEW Y Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:CHAMPLIN, JONATHAN D.;THOMAS, JR., ARTHUR H.;REEL/FRAME:016670/0923 Effective date: 20050809 |
|
STCB | Information on status: application discontinuation |
Free format text: ABANDONED -- FAILURE TO PAY ISSUE FEE |