US20010008019A1 - Method and system for transparently failing over application configuration information in a server cluster - Google Patents

Method and system for transparently failing over application configuration information in a server cluster Download PDF

Info

Publication number
US20010008019A1
US20010008019A1 US09/061,857 US6185798A US2001008019A1 US 20010008019 A1 US20010008019 A1 US 20010008019A1 US 6185798 A US6185798 A US 6185798A US 2001008019 A1 US2001008019 A1 US 2001008019A1
Authority
US
United States
Prior art keywords
application
cluster
registry
data
configuration information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
US09/061,857
Other versions
US6360331B2 (en
Inventor
John D. Vert
Sunita Shrivastava
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Microsoft Technology Licensing LLC
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Individual filed Critical Individual
Priority to US09/061,857 priority Critical patent/US6360331B2/en
Assigned to MICROSOFT CORPORATION reassignment MICROSOFT CORPORATION ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: SHRIVASTAVA, SUNITA, VERT, JOHN D.
Publication of US20010008019A1 publication Critical patent/US20010008019A1/en
Application granted granted Critical
Publication of US6360331B2 publication Critical patent/US6360331B2/en
Assigned to MICROSOFT TECHNOLOGY LICENSING, LLC reassignment MICROSOFT TECHNOLOGY LICENSING, LLC ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: MICROSOFT CORPORATION
Anticipated expiration legal-status Critical
Expired - Lifetime legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/16Error detection or correction of the data by redundancy in hardware
    • G06F11/1658Data re-synchronization of a redundant component, or initial sync of replacement, additional or spare unit
    • G06F11/1662Data re-synchronization of a redundant component, or initial sync of replacement, additional or spare unit the resynchronized component or unit being a persistent storage device
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L1/00Arrangements for detecting or preventing errors in the information received
    • H04L1/22Arrangements for detecting or preventing errors in the information received using redundant apparatus to increase reliability
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/16Error detection or correction of the data by redundancy in hardware
    • G06F11/20Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
    • G06F11/202Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where processing functionality is redundant
    • G06F11/2023Failover techniques
    • G06F11/203Failover techniques using migration
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/16Error detection or correction of the data by redundancy in hardware
    • G06F11/20Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
    • G06F11/202Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where processing functionality is redundant
    • G06F11/2038Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where processing functionality is redundant with a single idle spare processing component

Definitions

  • the invention relates generally to computer network servers, and more particularly to computer servers arranged in a server cluster.
  • a server cluster is a group of at least two independent servers connected by a network and managed as a single system.
  • the clustering of servers provides a number of benefits over independent servers.
  • One important benefit is that cluster software, which is run on each of the servers in a cluster, automatically detects application failures or the failure of another server in the cluster. Upon detection of such failures, failed applications and the like can be terminated and restarted on a surviving server.
  • the failover of an application from one server (i.e., machine) to another may be automatic in response to a software or hardware failure on the first machine, or alternatively may be manually initiated by an administrator.
  • to failover an application in a manner that is transparent to the application and to the client requires that the application's execution environment be recreated on the other machine.
  • This execution environment comprises distinct parts having different characteristics from one another, a first part of which is the application code.
  • the application code changes very rarely, and thus an application's code environment may be replicated either by installing the application on all of the machines which may run in a cluster, or by installing the application on storage that is shared by all machines in the cluster. When an application needs to be restarted, the exact code is thus available to the cluster.
  • the application's data environment is best preserved by having the application store all of its data files on a shared disk, a task that is ordinarily accomplished by inputting appropriate information via the application's user interface. When an application needs to be restarted, the exact data is thus available to the cluster.
  • a third part of the execution environment is the application configuration information, which changes occasionally.
  • Applications that are “cluster-aware” i.e., designed with the knowledge that they may be run in a clustering environment) store their application configuration information in a cluster registry maintained on a shared disk, thus ensuring reliable failover.
  • the present invention provides a method and system for transparently failing over resource configuration information stored by a resource (such as an application) on a local machine. More particularly, the application configuration information written to a registry of a local machine is made available to other machines of the cluster. The other machines can rapidly obtain this application configuration information and use it to recreate the application's execution environment on another machine in the cluster, ensuring a rapid and transparent failover operation.
  • a resource such as an application
  • the present invention transparently fails over a legacy application by tracking and checkpointing changes to application configuration information that is stored locally, such as in a system's local registry.
  • an application running on the first system makes a change to the application configuration information in a subtree of the registry
  • the change is detected by a notification mechanism.
  • a snapshot mechanism is notified, takes a snapshot of the subtree's data, and causes it to be written to a storage device shared by systems of the cluster.
  • the snapshot for that application is retrieved from the quorum disk by a restore mechanism and written to the registry of the second system in a corresponding subtree.
  • the application is then run on the second system using the restored application configuration information for that application.
  • FIG. 1 is a block diagram representing a computer system into which the present invention may be incorporated;
  • FIG. 2 is a block diagram representing a server cluster including various cluster machines and a shared quorum device for storing cluster information
  • FIG. 3 is a representation of various components within the clustering service of a machine
  • FIG. 4 is a representation of a local registry maintained on a local machine
  • FIG. 5 is a block diagram generally representing the components for writing local registry information to the quorum device from a local machine in accordance with one aspect of the present invention
  • FIG. 6 is a block diagram generally representing the components for restoring registry information from the quorum device to a registry of a local machine in accordance with one aspect of the present invention.
  • FIGS. 7 - 11 comprise a flow diagram generally representing the steps taken to failover application configuration information in accordance with one aspect of the present invention.
  • FIG. 1 and the following discussion are intended to provide a brief general description of a suitable computing environment in which the invention may be implemented.
  • the invention will be described in the general context of computer-executable instructions, such as program modules, being executed by a personal computer.
  • program modules include routines, programs, objects, components, data structures and the like that perform particular tasks or implement particular abstract data types.
  • program modules include routines, programs, objects, components, data structures and the like that perform particular tasks or implement particular abstract data types.
  • program modules include routines, programs, objects, components, data structures and the like that perform particular tasks or implement particular abstract data types.
  • the invention may be practiced with other computer system configurations, including hand-held devices, multi-processor systems, microprocessor-based or programmable consumer electronics, network PCs, minicomputers, mainframe computers and the like.
  • the invention may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network.
  • program modules may be located in both local and remote memory storage devices.
  • an exemplary system for implementing the invention includes a general purpose computing device in the form of a conventional personal computer 20 or the like acting as a node (i.e., system) in a clustering environment.
  • the computer 20 includes a processing unit 21 , a system memory 22 , and a system bus 23 that couples various system components including the system memory to the processing unit 21 .
  • the system bus 23 may be any of several types of bus structures including a memory bus or memory controller, a peripheral bus, and a local bus using any of a variety of bus architectures.
  • the system memory includes read-only memory (ROM) 24 and random access memory (RAM) 25 .
  • a basic input/output system 26 (BIOS), containing the basic routines that help to transfer information between elements within the personal computer 20 , such as during start-up, is stored in ROM 24 .
  • the personal computer 20 may further include a hard disk drive 27 for reading from and writing to a hard disk, not shown, a magnetic disk drive 28 for reading from or writing to a removable magnetic disk 29 , and an optical disk drive 30 for reading from or writing to a removable optical disk 31 such as a CD-ROM or other optical media.
  • the hard disk drive 27 , magnetic disk drive 28 , and optical disk drive 30 are connected to the system bus 23 by a hard disk drive interface 32 , a magnetic disk drive interface 33 , and an optical drive interface 34 , respectively.
  • the drives and their associated computer-readable media provide non-volatile storage of computer readable instructions, data structures, program modules and other data for the personal computer 20 .
  • the exemplary environment described herein employs a hard disk, a removable magnetic disk 29 and a removable optical disk 31 , it should be appreciated by those skilled in the art that other types of computer readable media which can store data that is accessible by a computer, such as magnetic cassettes, flash memory cards, digital video disks, Bernoulli cartridges, random access memories (RAMs), read-only memories (ROMs) and the like may also be used in the exemplary operating environment.
  • a number of program modules may be stored on the hard disk, magnetic disk 29 , optical disk 31 , ROM 24 or RAM 25 , including an operating system 35 (which may be considered as including or operatively connected to a file system), one or more application programs 36 , other program modules 37 and program data 38 .
  • a user may enter commands and information into the personal computer 20 through input devices such as a keyboard 40 and pointing device 42 .
  • Other input devices may include a microphone, joystick, game pad, satellite dish, scanner or the like.
  • These and other input devices are often connected to the processing unit 21 through a serial port interface 46 that is coupled to the system bus, but may be connected by other interfaces, such as a parallel port, game port or universal serial bus (USB).
  • a monitor 47 or other type of display device is also connected to the system bus 23 via an interface, such as a video adapter 48 .
  • personal computers typically include other peripheral output devices (not shown), such as speakers and printers.
  • the personal computer 20 operates in a networked environment using logical connections to one or more remote computers 49 .
  • At least one such remote computer 49 is another system of a cluster communicating with the personal computer system 20 over the networked connection.
  • Other remote computers 49 may be another personal computer such as a client computer, a server, a router, a network PC, a peer device or other common network system, and typically includes many or all of the elements described above relative to the personal computer 20 , although only a memory storage device 50 has been illustrated in FIG. 1.
  • the logical connections depicted in FIG. 1 include a local area network (LAN) 51 and a wide area network (WAN) 52 .
  • LAN local area network
  • WAN wide area network
  • Other mechanisms suitable for connecting computers to form a cluster include direct connections such as over a serial or parallel cable, as well as wireless connections.
  • the personal computer 20 When used in a LAN networking environment, as is typical for connecting systems of a cluster, the personal computer 20 is connected to the local network 51 through a network interface or adapter 53 .
  • the personal computer 20 When used in a WAN networking environment, the personal computer 20 typically includes a modem 54 or other means for establishing communications over the wide area network 52 , such as the Internet.
  • the modem 54 which may be internal or external, is connected to the system bus 23 via the serial port interface 46 .
  • program modules depicted relative to the personal computer 20 may be stored in the remote memory storage device. It will be appreciated that the network connections shown are exemplary and other means of establishing a communications link between the computers may be used.
  • the preferred system 20 further includes a host adapter 55 or the like which connects the system bus 23 to a SCSI (Small Computer Standard Interface) bus 56 for communicating with at least one persistent memory storage device 57 , also referred to herein as a quorum device.
  • SCSI Small Computer Standard Interface
  • the computer system 20 may comprise the system 60 1
  • one of the remote computers 49 may be similarly connected to the SCSI bus 56 and comprise the system 60 2 , and so on.
  • multiple shared storage devices may be connected to the SCSI bus 56 (or the like) such as for purposes of resilience to disk failure through the use of multiple disks, i.e., software and/or hardware-based redundant arrays of inexpensive or independent disks (RAID).
  • a system administrator runs a cluster installation utility on a system that then becomes a first member of the cluster 58 .
  • a database is created and the initial cluster member information is added thereto.
  • the administrator then configures any devices that are to be managed by the cluster software.
  • a cluster exists having a single member, after which the installation procedure is run on each of the other members of the cluster.
  • the name of the existing cluster is entered and the new system receives a copy of the existing cluster database.
  • a cluster application programming interface (API) 68 is provided.
  • Applications and cluster management administration tools 69 call various interfaces in the API 68 using remote procedure calls (RPC), whether running in the cluster or on an external system.
  • RPC remote procedure calls
  • the various interfaces of the API 68 may be considered as being categorized by their association with a particular cluster component, i.e., systems, resources and the cluster itself.
  • FIG. 3 provides a representation of the cluster service components and their general relationships in a single system (e.g., 60 1 ) of a Windows NT cluster.
  • a cluster service 70 controls the cluster operation on a cluster system 58 , and is preferably implemented as a Windows NT service.
  • the cluster service 70 includes a node manager 72 , which manages node configuration information and network configuration information (e.g., the paths between nodes).
  • the node manager 72 operates in conjunction with a membership manager 74 , which runs the protocols that determine what cluster membership is when a change (e.g., regroup) occurs.
  • a communications manager 76 (kernel driver) manages communications with other systems of the cluster 58 via one or more network paths.
  • the communications manager 76 sends periodic messages, called heartbeats, to counterpart components on the other systems of the cluster 58 to provide a mechanism for detecting that the communications path is good and that the other systems are operational.
  • heartbeats periodic messages
  • the cluster service 70 is essentially in constant communication with the other systems of the cluster. In a small cluster, communication is fully connected, i.e., all systems of the cluster 58 are in direct communication with all other systems.
  • Systems e.g., 60 1 - 60 3 of FIG. 2 in the cluster 58 have the same view of cluster membership, and in the event that one system detects a communication failure with another system, the detecting system broadcasts a message to the cluster 58 causing other members to verify their view of the current cluster membership. This is known as a regroup event, during which writes to potentially shared devices are disabled until the membership has stabilized. If a system does not respond, it is removed from the cluster 58 and its active groups are failed over (“pulled”) to one or more active systems. Note that the failure of a cluster service 70 also causes its locally managed resources to fail.
  • the cluster service 70 also includes a configuration database manager 80 which implements the functions that maintain a cluster configuration database on a local device such as a disk and/or memory, and a configuration database 82 (FIG. 2) on the common persistent storage devices, (e.g., storage device 57 ).
  • the database maintains information about the physical and logical entities in the cluster 58 , including the cluster itself, systems, resource types, quorum resource configuration, network configuration, groups, and resources. Note that both persistent and volatile information may be used to track the current and desired state of the cluster.
  • the database manager 80 cooperates with counterpart database managers of systems in the cluster 58 to maintain configuration information consistently across the cluster 58 . As described below, global updates are used to ensure the consistency of the cluster database in each of systems.
  • the configuration database manager 80 also provides an interface to the configuration database 82 for use by the other cluster service 70 components.
  • a logging manager 84 provides a facility that works with the database manager 80 to maintain cluster state information across a situation in which a cluster shuts down and a new cluster is later formed with no members common to the previous cluster, known as a temporal partition.
  • the logging manager 84 operates with a log file, preferably maintained on the quorum device (storage device 57 ), to unroll logged state changes when forming a new cluster following a temporal partition.
  • a failover manager 87 makes resource/group management decisions and initiates appropriate actions, such as startup, restart and failover.
  • the failover manager 87 is responsible for stopping and starting the system's resources, managing resource dependencies, and for initiating failover of groups.
  • a group is a collection of resources organized to allow an administrator to combine resources into larger logical units and manage them as a unit. Usually a group contains all of the elements needed to run a specific application, and for client systems to connect to the service provided by the application.
  • a group may include an application that depends on a network name, which in turn depends on an Internet Protocol (IP) address, all of which are collected in a single group.
  • IP Internet Protocol
  • the dependencies of all resources in the group are maintained in a directed acyclic graph, known as a dependency tree.
  • Group operations performed on a group affect all resources contained within that group.
  • Dependency trees are described in more detail in U.S. patent application Ser. No. 08/963,049 entitled “Method and System for Resource Monitoring of Disparate Resources in a Server Cluster,” assigned to the same assignee as the present invention.
  • the failover manager 87 receives resource and system state information from at least one resource monitor 90 and the node manager 72 , for example, to make decisions about groups.
  • the failover manager 87 is responsible for deciding which systems in the cluster should “own” which groups. Those systems that own individual groups turn control of the resources within the group over to their respective failover managers 87 .
  • An event processor 92 connects the components of the cluster service 70 via an event notification mechanism.
  • the event processor 92 propagates events to and from applications (e.g., 94 and 96 ) and to and from the components within the cluster service 70 , and also performs miscellaneous services such as delivering signal events to cluster-aware applications 94 .
  • the event processor 92 in conjunction with an object manager 98 , also maintains various cluster objects.
  • a global update manager 100 operates to provide a global update service that is used by other components within the Cluster Service 70 .
  • the global update protocol is used by the global update manager 100 to broadcast updates to each node in a cluster.
  • GLUP generally comprises a standard global update message format, state information maintained in each node, and a set of rules that specify how global update should be processed and what steps should be taken when failures occur.
  • one node e.g. 60 1
  • the locker node 60 1 ensures that only one global update is in progress at any given time.
  • a node e.g., 60 2
  • wishing to send an update to other nodes first sends a request to the locker node 60 1 .
  • the locker node 60 1 gives permission for this “sender” node 60 2 to broadcast its update to the other nodes in the system.
  • the sender node sends the updates, one at a time, to the other nodes in a predetermined GLUP order that is ordinarily based on a unique number assigned to each node.
  • GLUP can be utilized to replicate data to the machines of a cluster, including application configuration information, as described below. A more detailed discussion of the GLUP protocol is described in the publication “Tandem Systems Review” Volume 1, Number 2, June, 1985 pp. 74-84.
  • a resource monitor 90 runs in one or more processes that may be part of the cluster service 70 , but are shown herein as being separate from the cluster service 70 and communicating therewith via Remote Procedure Calls (RPC) or the like.
  • the resource monitor 90 monitors the health of one or more resources (e.g., 102 1 - 102 5 ) via callbacks thereto.
  • resources e.g., 102 1 - 102 5
  • callbacks thereto.
  • the resources are implemented as one or more Dynamically Linked Libraries (DLLs) loaded into the address space of the Resource Monitor 102 .
  • DLLs Dynamically Linked Libraries
  • resource DLLs may include physical disk, logical volume (consisting of one or more physical disks), file and print shares, network addresses and names, generic service or application, and Internet Server service DLLs.
  • Certain resources e.g., provided by a single source
  • the resources 102 1 - 102 5 run in the system account and are considered privileged code. Resources 102 1 - 102 5 may be defined to run in separate processes, created by the Cluster Service 70 when creating resources.
  • Resources expose interfaces and properties to the cluster service 70 , and may depend on other resources, with no circular dependencies allowed. If a resource does depend on other resources, the resource is brought online after the resources on which it depends are already online, and is taken offline before those resources. Moreover, each resource has an associated list of systems in the cluster on which this resource may execute. For example, a disk resource may only be hosted on systems that are physically connected to the disk. Also associated with each resource is a local restart policy, defining the desired action in the event that the resource cannot continue on the current system.
  • Systems in the cluster need to maintain a consistent view of time.
  • One of the systems known as the time source and selected by the administrator, includes a resource that implements the time service.
  • the time service which maintains consistent time within the cluster 58 , is implemented as a resource rather than as part of the cluster service 70 itself.
  • systems in the cluster 58 may be in one of three distinct states, offline, online or paused. These states are visible to other systems in the cluster 58 , and thus may be considered the state of the cluster service 70 .
  • offline a system is not a fully active member of the cluster 58 .
  • the system and its cluster service 70 may or may not be running.
  • online a system is a fully active member of the cluster 58 , and honors cluster database updates, can contribute one or more votes to a quorum algorithm, maintains heartbeats, and can own and run groups.
  • a paused system is a fully active member of the cluster 58 , and thus honors cluster database update, can contribute votes to a quorum algorithm, and maintain heartbeats. Online and paused are treated as equivalent states by most of the cluster software, however, a system that is in the paused state cannot honor requests to take ownership of groups. The paused state is provided to allow certain maintenance to be performed.
  • the external state of the system is offline.
  • the event processor calls the node manager 72 to begin the process of joining or forming a cluster.
  • the cluster service 70 is started automatically.
  • the system configures and mounts local, non-shared devices. Cluster-wide devices are left offline while booting, because they may be in use by another node.
  • the system tries to communicate over the network with the last known members of the cluster 58 .
  • the system discovers any member of the cluster, it performs an authentication sequence wherein the existing cluster system authenticates the newcomer and returns a status of success if authenticated, or fails the request if not.
  • the request to join the cluster is refused. If successful, the newcomer is sent an updated copy of the shared database.
  • the joining system uses this shared database to find shared resources and to bring them online as needed, and also to find other cluster members.
  • a system will attempt to form its own cluster.
  • the system gains exclusive access to a special resource known as the quorum resource (quorum device or disk) 57 .
  • the quorum resource 57 is used as a tie-breaker when booting a cluster and also to protect against more than one node forming its own cluster if communication fails in a multiple node cluster.
  • the quorum resource is often (but not necessarily) a disk that maintains the state of the cluster, which a node arbitrates for and needs possession of before it can form a cluster.
  • the quorum resource 57 preferably maintains a log file that is unrolled to ensure consistency across a temporal partition when forming a new cluster, after another cluster previously existed.
  • the node 57 that has possession of the quorum resource is responsible for logging operations, and thus if application configuration information is replicated, such an operation is logged.
  • the quorum resource 57 offers a method for arbitrating a quorum resource object, typically by challenging (or defending) for an exclusive reservation of a storage device (e.g., 57 of FIG. 2A) such as a disk that ordinarily stores log data for the cluster.
  • a method for releasing an exclusive reservation may also be provided.
  • a cluster member When leaving a cluster, a cluster member will send a ClusterExit message to all other members in the cluster, notifying them of its intent to leave the cluster. The exiting cluster member does not wait for any responses and immediately proceeds to shutdown all resources and close all connections managed by the cluster software. Sending a message to the other systems in the cluster when leaving saves the other systems from discovering the absence by a time-out operation.
  • a system can have groups thereon.
  • a group can be “owned” by only one system at a time, and the individual resources within a group are present on the system which currently owns the Group. As a result, at any given instant, different resources within the same group cannot be owned by different systems across the cluster. Groups can be failed over or moved from one system to another as atomic units.
  • Each group has a cluster-wide policy associated therewith comprising an ordered list of owners. A group fails over to systems in the listed order.
  • the failover manager 87 may choose to restart the resource, or to take the resource offline along with any resources dependent thereon. If the failover manager 87 takes the resource offline, the group is restarted on another system in the cluster, known as pushing the group to another system. A cluster administrator may also manually initiate such a group transfer. Both situations are similar, except that resources are gracefully shutdown for a manually initiated failover, while they are forcefully shut down in the failure case.
  • the failover manager 87 decides whether to move some groups back to that system, in an action referred to as failback.
  • groups require a defined preferred owner. Groups for which the newly online system is the preferred owner are pushed from the current owner to the new system. Protection, in the form of a timing window, is included to control when the failback occurs.
  • the present invention primarily provides benefits with legacy applications, as will become apparent below, other types of resources may be failed over to other systems of a cluster. Accordingly, the present invention will be described with respect to the failing over of application configuration information stored in a local registry, however it is understood that it will operate in an equivalent manner with other types of resources that may store their configuration information locally rather than with the cluster. Thus, as used herein, the term “application” and “resource” are equivalent when used with respect to the failing over of appropriate configuration information.
  • a method and system for tracking and checkpointing changes to a local system's registry such that application configuration changes that would otherwise be lost are protected from machine failures.
  • the registry checkpointing is transparent to the application, no application changes are required, whereby a legacy application which stores its configuration in the local registry may be reliably used in a failover environment.
  • a local system's registry 104 is essentially a database indexed by a number of keys 106 1 - 106 k hierarchically arranged into trees and subtrees. As shown in FIG. 4, the keys (particularly the low level subtrees) typically have named data associated therewith including strings, binary values and/or DWORDs. As described above, legacy applications store configuration information in the local registry 104 , and occasionally make changes thereto. For example, as shown in FIG. 4, an application named “Program2” has configuration information indexed at HKEY_LOCAL_MACHINE ⁇ SOFTWARE ⁇ Program2, including a string, a binary value and a DWORD.
  • FIG. 5 represents the general architecture for tracking and checkpointing changes to configuration information on a first system (e.g., 60 1 ), while FIG. 6 represents the general architecture for failing over the information to another system (e.g., 60 2 ) of the cluster 58 .
  • a list 108 of registry subtrees associated with that application 96 may be generated. As can be appreciated, this may be accomplished by noting the differences to the local registry key structure after the application 96 is installed.
  • This list 108 is preferably stored in the cluster registry 110 of the quorum device 57 under a registry key for that resource, however it may be maintained elsewhere in the cluster (such as replicated in its systems) if desired.
  • a checkpoint manager 112 accesses the list 108 and registers each subtree in the application's list of registry subtrees with a notification mechanism 114 .
  • the notification mechanism 114 watches the registry 104 , and, whenever a change to a registered subtree is detected, informs the checkpoint manager 112 of the change.
  • the checkpoint manager 112 via a snapshot mechanism 116 , takes a snapshot of the listed subtree data and records the snapshot as data 118 1 - 118 m associated with that application 96 (e.g., snapshot data 118 2 ) on the quorum device 57 .
  • the data may be stored as text (i.e., human readable) data.
  • the checkpoint manager 112 is associated with an interface that includes three cluster resource controls which may be sent to a particular application resource's DLL (e.g., 102 3 , FIG. 3) with a ClusterResourceControl function.
  • the key name string is preferably shortened relative to HKEY_LOCAL_MACHINE, e.g., the exemplary application is simplified to “SOFTWARE ⁇ Program2.”
  • this control function adds a subtree to the subtree list 108 that is associated with an application.
  • a second resource control which essentially performs the opposite function, is named CLCTL_DELETE_REGISTRY_CHECKPOINT, and similarly includes a pointer, lpInBuffer, to a null-terminated Unicode string.
  • This string specifies the name of a registry key that was previously registered with CLCTL_ADD_REGISTRY_CHECKPOINT. When called, the specified subtree pointed to by lpInBuffer will no longer be checkpointed for the specified resource.
  • CLCTL_GET_REGISTRY_CHECKPOINTS includes a pointer to a buffer named lpOutBuffer, which when invoked, returns a REG_MULTI_SZ list of registry keys that have been added to the specified resource's list 108 with CLCTL_ADD_REGISTRY_CHECKPOINT.
  • each resource may have a list of registry subtrees 108 to checkpoint.
  • the notification mechanism 114 preferably utilizes a WIN32 API named RegNotifyChangeKey( ), via which a registry notification will be posted on each of a resource's subtrees when that resource is online.
  • RegNotifyChangeKey( ) When any registry data is modified in a subtree with a notification posted thereon, a notification fires and the snapshot mechanism 116 of the checkpoint manager 112 takes a snapshot of the registry subtree (or trees).
  • the snapshot mechanism 116 preferably utilizes the WIN32 API named RegSaveKey( ).
  • the snapshot data is saved to the quorum device 57 , referenced by the resource ID (a globally unique identifier, or GUID) and a unique checkpoint ID, which is an arbitrary DWORD.
  • the interface for saving the data to the quorum device is set forth in the table below: DWORD CpSaveData ( IN PFM_RESOURCE Resource, IN DWORD dwCheckpointId, IN PVOID lpData, IN DWORD lpcbData )
  • the CpSaveData function checkpoints arbitrary data for the specified resource.
  • the checkpointed data 118 2 is stored on the quorum device 57 to ensure that it survives temporal partitions, and so that any node in the cluster may save or retrieve the checkpointed data 118 2 .
  • the Resource argument supplies the resource associated with this data, while the dwCheckpointId argument provides a unique checkpoint identifier describing this data.
  • the caller is responsible for ensuring the uniqueness of the checkpoint identifier.
  • Another argument, lpData supplies a pointer to the checkpoint data, while lpcbData provides the length (in bytes) of the checkpoint data pointed to by lpData.
  • the function returns a value of ERROR_SUCCESS if successful, or a Win32 error code otherwise.
  • the application configuration information may be restored to any other node of the cluster 58 .
  • the checkpoint manager 120 on the other system 60 2 includes a restore mechanism 122 that essentially reverses the checkpointing operation.
  • FIG. 6 when a resource 96 is failed over, but before it is brought online on another system, (as represented by the dashed box), its checkpointed registry data 118 2 is retrieved and restored into the other system's local registry 124 .
  • CpGetData( ) is provided to retrieve the checkpointed data for a specified resource 96 , i.e., the data 118 2 which was saved to the quorum device 57 by CpSaveData( ).
  • the CpGetData( ) function is set forth in the table below: DWORD CpGetData ( IN PFM_RESOURCE Resource, IN DWORD dwCheckpointId, OUT PVOID *lpData, OUT DWORD *lpcbData )
  • Resource identifies the resource 96 associated with this data 118 2
  • dwCheckpointId supplies the unique checkpoint ID describing this data.
  • the lpData argument returns a pointer to the checkpoint data
  • lpcbData returns the length (in bytes) of the checkpoint data pointed to by lpData.
  • the caller is responsible for freeing the memory, and as before, the caller is responsible for ensuring the uniqueness of the checkpoint identifier.
  • the CpGetData function returns a value of ERROR_SUCCESS if successful, or a Win32 error code otherwise.
  • the restore mechanism 122 utilizes the RegRestoreKey( ) WIN32 API for each checkpointed subtree.
  • the resource can be brought online, i.e., the failed over application 96 can be run.
  • the application configuration information is also first tracked and checkpointed on the new system, in accordance with the present invention and as described above, i.e., using a notification mechanism 126 .
  • the checkpointing operation is initiated when a request is received, either to initially create an initial registry checkpoint (step 700 ) for an application on the quorum device, or to run the application (step 706 ).
  • a cluster application e.g., 96
  • its associated registry subtree 108 are known.
  • step 702 obtains the subtree (or subtrees) associated with the application from the list 108 thereof in the cluster registry 110 , and continues to step 800 of FIG. 8.
  • step 708 determines if a checkpoint 118 2 already exists for this particular application on the quorum device 57 , and if so, continues on to update the application's configuration information and then run the application 96 , as described below with reference to FIG. 9. If no checkpoint 118 2 exists for this resource 96 , then step 708 branches to step 702 and then to step 800 of FIG. 8.
  • a first subtree associated is selected and a snapshot is made of the specified registry subtree (using the RegSaveKey( ) WIN32 API) as described above.
  • the registry checkpoint data is then saved to the cluster quorum device 118 2 as also described above (CpSaveData).
  • CpSaveData the subtree list 108 associated with the application (using CLCTL_ADD_REGISTRY_CHECKPOINT) at this time, or the list 108 can be generated in advance (step 702 ). If there is more than one subtree of application configuration information for an application, the process is repeated for each subtree via steps 806 - 808 . This ensures that the appropriate application configuration information will be available to other cluster systems if the current system fails, as the registry subtree and its location on the quorum device 57 are now associated with the cluster application.
  • step 810 tests the state of the cluster application 96 . If at step 810 the application 96 is not currently running, nothing further needs to be done, and thus the process ends and waits for the application to be run at some later time. Otherwise the system proceeds to step 1000 of FIG. 10, where the process will register for change notifications and take any remaining steps to run the application (step 1001 ) as described below.
  • each registry snapshot associated with the application e.g., 118 2
  • CpGetData the current machine's (e.g., 60 2 ) local registry 124 using the RegRestoreKey( ) API.
  • any previously existing data at that location in the current system's local registry 124 is overwritten with the stored registry snapshot 118 2 , whereby the application 96 will not see any stale data that may have been in the current system's local registry 124 .
  • the checkpoint manager 120 (via the notification mechanism 126 ) registers for registry change notifications associated with the registry subtree, using the WIN32 API named RegNotifyChangeKey( ) as described above. At this time, the application 96 is allowed to run.
  • any subsequent modifications to the specified registry data alert the notification mechanism 126 .
  • the API preferably works asynchronously to report a change to the registry 124 , although for purposes of simplicity, FIG. 10 represents the monitoring for changes (or detecting the end of the application) in a loop (steps 1002 - 1008 ).
  • the checkpoint manager 120 takes a snapshot of the registry subtree that has changed as described above. Then, at step 1006 , the existing registry checkpoint data 118 2 on the quorum device 57 is overwritten with the new snapshot of the registry subtree.
  • the communication mechanism of the current system 60 2 transfers this information to the system that has exclusive possession of the quorum device 57 , which then writes the data. In this manner, each time that the registry data 118 2 is modified, the appropriate subtree is copied to the quorum device 57 , whereby if the application is moved to another node, the configuration information is current on the new node.
  • FIG. 11 represents the steps taken when an application ends. As shown by steps 1102 - 1110 , any registry change notifications associated with that application are removed so as to no longer fire upon a change. This is synchronized in such a way as to ensure that any registry modifications pending during the application shutdown are detected by the notification mechanism 126 and a new snapshot taken. Then, the shutdown of the application is completed at step 1112 .
  • the checkpoint manager 104 instead of using the shared quorum device 57 , the checkpoint manager 104 alternatively may write the information to at least one other non-volatile storage device shared by systems in the cluster. In another alternative, the checkpoint manager 104 may cause the information to be replicated via GLUP or some other communications mechanism to the other systems of the cluster. Note that such a replication operation would be logged on the quorum device 57 , so that changes to the configuration information would survive a temporal partition. Moreover, rather than snapshot the entire set of subtrees, it is feasible to alternatively provide a mechanism that transfers only change information, for example if the subtree data is otherwise relatively large.

Abstract

A method and system for transparently failing over a legacy application from a first system to a second system of a server cluster by tracking and checkpointing changes to application configuration information stored in a system's local registry. When an application running on the first system makes a change to the application configuration information in a subtree of the registry, the change is detected and a snapshot of the subtree's data is taken. The snapshot is written to a storage device shared by systems of the cluster, such as a quorum disk. When the application is failed over to a second system, the snapshot for that application is retrieved from the quorum disk and written to the registry of the second system in a corresponding subtree. The application is then run on the second system using the most-recent application configuration information as modified by the other system in the cluster.

Description

    FIELD OF THE INVENTION
  • The invention relates generally to computer network servers, and more particularly to computer servers arranged in a server cluster. [0001]
  • BACKGROUND OF THE INVENTION
  • A server cluster is a group of at least two independent servers connected by a network and managed as a single system. The clustering of servers provides a number of benefits over independent servers. One important benefit is that cluster software, which is run on each of the servers in a cluster, automatically detects application failures or the failure of another server in the cluster. Upon detection of such failures, failed applications and the like can be terminated and restarted on a surviving server. [0002]
  • Other benefits include the ability for administrators to inspect the status of cluster resources, and accordingly balance workloads among different servers in the cluster to improve performance. Dynamic load balancing is also available. Such manageability also provides administrators with the ability to update one server in a cluster without taking important data and applications offline. As can be appreciated, server clusters are used in critical database management, file and intranet data sharing, messaging, general business applications and the like. [0003]
  • Thus, the failover of an application from one server (i.e., machine) to another may be automatic in response to a software or hardware failure on the first machine, or alternatively may be manually initiated by an administrator. In any event, to failover an application in a manner that is transparent to the application and to the client requires that the application's execution environment be recreated on the other machine. This execution environment comprises distinct parts having different characteristics from one another, a first part of which is the application code. The application code changes very rarely, and thus an application's code environment may be replicated either by installing the application on all of the machines which may run in a cluster, or by installing the application on storage that is shared by all machines in the cluster. When an application needs to be restarted, the exact code is thus available to the cluster. [0004]
  • Another part of the execution environment is the application's data, which changes very regularly. The application's data environment is best preserved by having the application store all of its data files on a shared disk, a task that is ordinarily accomplished by inputting appropriate information via the application's user interface. When an application needs to be restarted, the exact data is thus available to the cluster. [0005]
  • A third part of the execution environment is the application configuration information, which changes occasionally. Applications that are “cluster-aware” (i.e., designed with the knowledge that they may be run in a clustering environment) store their application configuration information in a cluster registry maintained on a shared disk, thus ensuring reliable failover. [0006]
  • However, existing applications that are not cluster-aware (i.e., legacy applications) use their local machine registry to store their application configuration information. For example, Windows NT applications use the WIN32 Registry. As a result, this configuration data is not available to the rest of the cluster. At the same time, it is impractical (and likely very dangerous) to attempt to modify these legacy applications so as to use the cluster registry instead of their local registry. Moreover, it is not feasible to transparently redirect each of the local registries in the various machines to the cluster registry, and costly to replicate copies of each of the local registries to the various machines. Nevertheless, in order to ensure correct and transparent behavior after a failover, the application configuration information needs to be recreated at the machine on which the application is being restarted. [0007]
  • SUMMARY OF THE INVENTION
  • The present invention provides a method and system for transparently failing over resource configuration information stored by a resource (such as an application) on a local machine. More particularly, the application configuration information written to a registry of a local machine is made available to other machines of the cluster. The other machines can rapidly obtain this application configuration information and use it to recreate the application's execution environment on another machine in the cluster, ensuring a rapid and transparent failover operation. [0008]
  • Briefly, the present invention transparently fails over a legacy application by tracking and checkpointing changes to application configuration information that is stored locally, such as in a system's local registry. When an application running on the first system makes a change to the application configuration information in a subtree of the registry, the change is detected by a notification mechanism. A snapshot mechanism is notified, takes a snapshot of the subtree's data, and causes it to be written to a storage device shared by systems of the cluster. When the application is failed over to a second system, the snapshot for that application is retrieved from the quorum disk by a restore mechanism and written to the registry of the second system in a corresponding subtree. The application is then run on the second system using the restored application configuration information for that application. [0009]
  • Other benefits and advantages will become apparent from the following detailed description when taken in conjunction with the drawings, in which: [0010]
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • FIG. 1 is a block diagram representing a computer system into which the present invention may be incorporated; [0011]
  • FIG. 2 is a block diagram representing a server cluster including various cluster machines and a shared quorum device for storing cluster information; [0012]
  • FIG. 3 is a representation of various components within the clustering service of a machine; [0013]
  • FIG. 4 is a representation of a local registry maintained on a local machine; [0014]
  • FIG. 5 is a block diagram generally representing the components for writing local registry information to the quorum device from a local machine in accordance with one aspect of the present invention; [0015]
  • FIG. 6 is a block diagram generally representing the components for restoring registry information from the quorum device to a registry of a local machine in accordance with one aspect of the present invention; and [0016]
  • FIGS. [0017] 7-11 comprise a flow diagram generally representing the steps taken to failover application configuration information in accordance with one aspect of the present invention.
  • DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENT Exemplary Operating Environment
  • FIG. 1 and the following discussion are intended to provide a brief general description of a suitable computing environment in which the invention may be implemented. Although not required, the invention will be described in the general context of computer-executable instructions, such as program modules, being executed by a personal computer. Generally, program modules include routines, programs, objects, components, data structures and the like that perform particular tasks or implement particular abstract data types. Moreover, those skilled in the art will appreciate that the invention may be practiced with other computer system configurations, including hand-held devices, multi-processor systems, microprocessor-based or programmable consumer electronics, network PCs, minicomputers, mainframe computers and the like. The invention may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote memory storage devices. [0018]
  • With reference to FIG. 1, an exemplary system for implementing the invention includes a general purpose computing device in the form of a conventional [0019] personal computer 20 or the like acting as a node (i.e., system) in a clustering environment. The computer 20 includes a processing unit 21, a system memory 22, and a system bus 23 that couples various system components including the system memory to the processing unit 21. The system bus 23 may be any of several types of bus structures including a memory bus or memory controller, a peripheral bus, and a local bus using any of a variety of bus architectures. The system memory includes read-only memory (ROM) 24 and random access memory (RAM) 25. A basic input/output system 26 (BIOS), containing the basic routines that help to transfer information between elements within the personal computer 20, such as during start-up, is stored in ROM 24. The personal computer 20 may further include a hard disk drive 27 for reading from and writing to a hard disk, not shown, a magnetic disk drive 28 for reading from or writing to a removable magnetic disk 29, and an optical disk drive 30 for reading from or writing to a removable optical disk 31 such as a CD-ROM or other optical media. The hard disk drive 27, magnetic disk drive 28, and optical disk drive 30 are connected to the system bus 23 by a hard disk drive interface 32, a magnetic disk drive interface 33, and an optical drive interface 34, respectively. The drives and their associated computer-readable media provide non-volatile storage of computer readable instructions, data structures, program modules and other data for the personal computer 20. Although the exemplary environment described herein employs a hard disk, a removable magnetic disk 29 and a removable optical disk 31, it should be appreciated by those skilled in the art that other types of computer readable media which can store data that is accessible by a computer, such as magnetic cassettes, flash memory cards, digital video disks, Bernoulli cartridges, random access memories (RAMs), read-only memories (ROMs) and the like may also be used in the exemplary operating environment.
  • A number of program modules may be stored on the hard disk, [0020] magnetic disk 29, optical disk 31, ROM 24 or RAM 25, including an operating system 35 (which may be considered as including or operatively connected to a file system), one or more application programs 36, other program modules 37 and program data 38. A user may enter commands and information into the personal computer 20 through input devices such as a keyboard 40 and pointing device 42. Other input devices (not shown) may include a microphone, joystick, game pad, satellite dish, scanner or the like. These and other input devices are often connected to the processing unit 21 through a serial port interface 46 that is coupled to the system bus, but may be connected by other interfaces, such as a parallel port, game port or universal serial bus (USB). A monitor 47 or other type of display device is also connected to the system bus 23 via an interface, such as a video adapter 48. In addition to the monitor 47, personal computers typically include other peripheral output devices (not shown), such as speakers and printers.
  • The [0021] personal computer 20 operates in a networked environment using logical connections to one or more remote computers 49. At least one such remote computer 49 is another system of a cluster communicating with the personal computer system 20 over the networked connection. Other remote computers 49 may be another personal computer such as a client computer, a server, a router, a network PC, a peer device or other common network system, and typically includes many or all of the elements described above relative to the personal computer 20, although only a memory storage device 50 has been illustrated in FIG. 1. The logical connections depicted in FIG. 1 include a local area network (LAN) 51 and a wide area network (WAN) 52. Such networking environments are commonplace in offices, enterprise-wide computer networks, Intranets and the Internet. Other mechanisms suitable for connecting computers to form a cluster include direct connections such as over a serial or parallel cable, as well as wireless connections. When used in a LAN networking environment, as is typical for connecting systems of a cluster, the personal computer 20 is connected to the local network 51 through a network interface or adapter 53. When used in a WAN networking environment, the personal computer 20 typically includes a modem 54 or other means for establishing communications over the wide area network 52, such as the Internet. The modem 54, which may be internal or external, is connected to the system bus 23 via the serial port interface 46. In a networked environment, program modules depicted relative to the personal computer 20, or portions thereof, may be stored in the remote memory storage device. It will be appreciated that the network connections shown are exemplary and other means of establishing a communications link between the computers may be used.
  • The preferred [0022] system 20 further includes a host adapter 55 or the like which connects the system bus 23 to a SCSI (Small Computer Standard Interface) bus 56 for communicating with at least one persistent memory storage device 57, also referred to herein as a quorum device. Of course, other ways of connecting cluster systems to a storage device, including Fibre Channel, are equivalent. In any event, as shown in FIG. 2, the computer system 20 may comprise the system 60 1, while one of the remote computers 49 may be similarly connected to the SCSI bus 56 and comprise the system 60 2, and so on. Note that multiple shared storage devices may be connected to the SCSI bus 56 (or the like) such as for purposes of resilience to disk failure through the use of multiple disks, i.e., software and/or hardware-based redundant arrays of inexpensive or independent disks (RAID).
  • To create a new cluster, a system administrator runs a cluster installation utility on a system that then becomes a first member of the [0023] cluster 58. For a new cluster 58, a database is created and the initial cluster member information is added thereto. The administrator then configures any devices that are to be managed by the cluster software. At this time, a cluster exists having a single member, after which the installation procedure is run on each of the other members of the cluster. For each added member, the name of the existing cluster is entered and the new system receives a copy of the existing cluster database.
  • As shown in FIG. 3, to accomplish cluster creation and to perform other administration of cluster resources, systems, and the cluster itself, a cluster application programming interface (API) [0024] 68 is provided. Applications and cluster management administration tools 69 call various interfaces in the API 68 using remote procedure calls (RPC), whether running in the cluster or on an external system. The various interfaces of the API 68 may be considered as being categorized by their association with a particular cluster component, i.e., systems, resources and the cluster itself.
  • Cluster Service Components
  • FIG. 3 provides a representation of the cluster service components and their general relationships in a single system (e.g., [0025] 60 1) of a Windows NT cluster. A cluster service 70 controls the cluster operation on a cluster system 58, and is preferably implemented as a Windows NT service. The cluster service 70 includes a node manager 72, which manages node configuration information and network configuration information (e.g., the paths between nodes). The node manager 72 operates in conjunction with a membership manager 74, which runs the protocols that determine what cluster membership is when a change (e.g., regroup) occurs. A communications manager 76 (kernel driver) manages communications with other systems of the cluster 58 via one or more network paths. The communications manager 76 sends periodic messages, called heartbeats, to counterpart components on the other systems of the cluster 58 to provide a mechanism for detecting that the communications path is good and that the other systems are operational. Through the communications manager 76, the cluster service 70 is essentially in constant communication with the other systems of the cluster. In a small cluster, communication is fully connected, i.e., all systems of the cluster 58 are in direct communication with all other systems.
  • Systems (e.g., [0026] 60 1-60 3 of FIG. 2) in the cluster 58 have the same view of cluster membership, and in the event that one system detects a communication failure with another system, the detecting system broadcasts a message to the cluster 58 causing other members to verify their view of the current cluster membership. This is known as a regroup event, during which writes to potentially shared devices are disabled until the membership has stabilized. If a system does not respond, it is removed from the cluster 58 and its active groups are failed over (“pulled”) to one or more active systems. Note that the failure of a cluster service 70 also causes its locally managed resources to fail.
  • The [0027] cluster service 70 also includes a configuration database manager 80 which implements the functions that maintain a cluster configuration database on a local device such as a disk and/or memory, and a configuration database 82 (FIG. 2) on the common persistent storage devices, (e.g., storage device 57). The database maintains information about the physical and logical entities in the cluster 58, including the cluster itself, systems, resource types, quorum resource configuration, network configuration, groups, and resources. Note that both persistent and volatile information may be used to track the current and desired state of the cluster. The database manager 80 cooperates with counterpart database managers of systems in the cluster 58 to maintain configuration information consistently across the cluster 58. As described below, global updates are used to ensure the consistency of the cluster database in each of systems. The configuration database manager 80 also provides an interface to the configuration database 82 for use by the other cluster service 70 components.
  • A [0028] logging manager 84 provides a facility that works with the database manager 80 to maintain cluster state information across a situation in which a cluster shuts down and a new cluster is later formed with no members common to the previous cluster, known as a temporal partition. The logging manager 84 operates with a log file, preferably maintained on the quorum device (storage device 57), to unroll logged state changes when forming a new cluster following a temporal partition.
  • A [0029] failover manager 87 makes resource/group management decisions and initiates appropriate actions, such as startup, restart and failover. The failover manager 87 is responsible for stopping and starting the system's resources, managing resource dependencies, and for initiating failover of groups. A group is a collection of resources organized to allow an administrator to combine resources into larger logical units and manage them as a unit. Usually a group contains all of the elements needed to run a specific application, and for client systems to connect to the service provided by the application. For example, a group may include an application that depends on a network name, which in turn depends on an Internet Protocol (IP) address, all of which are collected in a single group. In a preferred arrangement, the dependencies of all resources in the group are maintained in a directed acyclic graph, known as a dependency tree. Group operations performed on a group affect all resources contained within that group. Dependency trees are described in more detail in U.S. patent application Ser. No. 08/963,049 entitled “Method and System for Resource Monitoring of Disparate Resources in a Server Cluster,” assigned to the same assignee as the present invention.
  • The [0030] failover manager 87 receives resource and system state information from at least one resource monitor 90 and the node manager 72, for example, to make decisions about groups. The failover manager 87 is responsible for deciding which systems in the cluster should “own” which groups. Those systems that own individual groups turn control of the resources within the group over to their respective failover managers 87.
  • An [0031] event processor 92 connects the components of the cluster service 70 via an event notification mechanism. The event processor 92 propagates events to and from applications (e.g., 94 and 96) and to and from the components within the cluster service 70, and also performs miscellaneous services such as delivering signal events to cluster-aware applications 94. The event processor 92, in conjunction with an object manager 98, also maintains various cluster objects. A global update manager 100 operates to provide a global update service that is used by other components within the Cluster Service 70.
  • The global update protocol (GLUP) is used by the [0032] global update manager 100 to broadcast updates to each node in a cluster. GLUP generally comprises a standard global update message format, state information maintained in each node, and a set of rules that specify how global update should be processed and what steps should be taken when failures occur. In general, according to the GLUP protocol, one node (e.g. 60 1) serves as a “locker” node. The locker node 60 1 ensures that only one global update is in progress at any given time. With GLUP, a node (e.g., 60 2) wishing to send an update to other nodes first sends a request to the locker node 60 1. When any preceding updates are complete, the locker node 60 1 gives permission for this “sender” node 60 2 to broadcast its update to the other nodes in the system. In accordance with GLUP, the sender node sends the updates, one at a time, to the other nodes in a predetermined GLUP order that is ordinarily based on a unique number assigned to each node. GLUP can be utilized to replicate data to the machines of a cluster, including application configuration information, as described below. A more detailed discussion of the GLUP protocol is described in the publication “Tandem Systems Review” Volume 1, Number 2, June, 1985 pp. 74-84.
  • A resource monitor [0033] 90 runs in one or more processes that may be part of the cluster service 70, but are shown herein as being separate from the cluster service 70 and communicating therewith via Remote Procedure Calls (RPC) or the like. The resource monitor 90 monitors the health of one or more resources (e.g., 102 1-102 5) via callbacks thereto. The monitoring and general operation of resources is described in more detail in U.S. patent application Ser. No. 08/963,049, hereby incorporated by reference herein in its entirety.
  • The resources (e.g., [0034] 102 1-102 5) are implemented as one or more Dynamically Linked Libraries (DLLs) loaded into the address space of the Resource Monitor 102. For example, resource DLLs may include physical disk, logical volume (consisting of one or more physical disks), file and print shares, network addresses and names, generic service or application, and Internet Server service DLLs. Certain resources (e.g., provided by a single source) may be run in a single process, while other resources may be run in at least one other process. The resources 102 1-102 5 run in the system account and are considered privileged code. Resources 102 1-102 5 may be defined to run in separate processes, created by the Cluster Service 70 when creating resources.
  • Resources expose interfaces and properties to the [0035] cluster service 70, and may depend on other resources, with no circular dependencies allowed. If a resource does depend on other resources, the resource is brought online after the resources on which it depends are already online, and is taken offline before those resources. Moreover, each resource has an associated list of systems in the cluster on which this resource may execute. For example, a disk resource may only be hosted on systems that are physically connected to the disk. Also associated with each resource is a local restart policy, defining the desired action in the event that the resource cannot continue on the current system.
  • Systems in the cluster need to maintain a consistent view of time. One of the systems, known as the time source and selected by the administrator, includes a resource that implements the time service. Note that the time service, which maintains consistent time within the [0036] cluster 58, is implemented as a resource rather than as part of the cluster service 70 itself.
  • From the point of view of other systems in the [0037] cluster 58 and management interfaces, systems in the cluster 58 may be in one of three distinct states, offline, online or paused. These states are visible to other systems in the cluster 58, and thus may be considered the state of the cluster service 70. When offline, a system is not a fully active member of the cluster 58. The system and its cluster service 70 may or may not be running. When online, a system is a fully active member of the cluster 58, and honors cluster database updates, can contribute one or more votes to a quorum algorithm, maintains heartbeats, and can own and run groups. Lastly, a paused system is a fully active member of the cluster 58, and thus honors cluster database update, can contribute votes to a quorum algorithm, and maintain heartbeats. Online and paused are treated as equivalent states by most of the cluster software, however, a system that is in the paused state cannot honor requests to take ownership of groups. The paused state is provided to allow certain maintenance to be performed.
  • Note that after initialization is complete, the external state of the system is offline. The event processor calls the [0038] node manager 72 to begin the process of joining or forming a cluster. To join a cluster, following the restart of a system, the cluster service 70 is started automatically. The system configures and mounts local, non-shared devices. Cluster-wide devices are left offline while booting, because they may be in use by another node. The system tries to communicate over the network with the last known members of the cluster 58. When the system discovers any member of the cluster, it performs an authentication sequence wherein the existing cluster system authenticates the newcomer and returns a status of success if authenticated, or fails the request if not. For example, if a system is not recognized as a member or its credentials are invalid, then the request to join the cluster is refused. If successful, the newcomer is sent an updated copy of the shared database. The joining system uses this shared database to find shared resources and to bring them online as needed, and also to find other cluster members.
  • If a cluster is not found during the discovery process, a system will attempt to form its own cluster. In general, to form a cluster, the system gains exclusive access to a special resource known as the quorum resource (quorum device or disk) [0039] 57. The quorum resource 57 is used as a tie-breaker when booting a cluster and also to protect against more than one node forming its own cluster if communication fails in a multiple node cluster. The quorum resource is often (but not necessarily) a disk that maintains the state of the cluster, which a node arbitrates for and needs possession of before it can form a cluster. The quorum resource 57 preferably maintains a log file that is unrolled to ensure consistency across a temporal partition when forming a new cluster, after another cluster previously existed. The node 57 that has possession of the quorum resource is responsible for logging operations, and thus if application configuration information is replicated, such an operation is logged. Also, the quorum resource 57 offers a method for arbitrating a quorum resource object, typically by challenging (or defending) for an exclusive reservation of a storage device (e.g., 57 of FIG. 2A) such as a disk that ordinarily stores log data for the cluster. A method for releasing an exclusive reservation may also be provided. The general operation of quorum resources including arbitration and exclusive possession of the quorum resource is described in more detail in U.S. patent application Ser. No. 08/963,050 entitled “Method and System for Quorum Resource Arbitration in a Server Cluster,” assigned to the same assignee and hereby incorporated by reference herein in its entirety.
  • When leaving a cluster, a cluster member will send a ClusterExit message to all other members in the cluster, notifying them of its intent to leave the cluster. The exiting cluster member does not wait for any responses and immediately proceeds to shutdown all resources and close all connections managed by the cluster software. Sending a message to the other systems in the cluster when leaving saves the other systems from discovering the absence by a time-out operation. [0040]
  • Once online, a system can have groups thereon. A group can be “owned” by only one system at a time, and the individual resources within a group are present on the system which currently owns the Group. As a result, at any given instant, different resources within the same group cannot be owned by different systems across the cluster. Groups can be failed over or moved from one system to another as atomic units. Each group has a cluster-wide policy associated therewith comprising an ordered list of owners. A group fails over to systems in the listed order. [0041]
  • For example, if a resource (e.g., an application) fails, the [0042] failover manager 87 may choose to restart the resource, or to take the resource offline along with any resources dependent thereon. If the failover manager 87 takes the resource offline, the group is restarted on another system in the cluster, known as pushing the group to another system. A cluster administrator may also manually initiate such a group transfer. Both situations are similar, except that resources are gracefully shutdown for a manually initiated failover, while they are forcefully shut down in the failure case.
  • When an entire system in the cluster fails, its groups are pulled from the failed system to another system. This process is similar to pushing a group, but without the shutdown phase on the failed system. To determine what groups were running on the failed system, the systems maintain group information on each node of the cluster in a database to track which systems own which groups. To determine which system should take ownership of which groups, those systems capable of hosting the groups negotiate among themselves for ownership, based on system capabilities, current load, application feedback and/or the group's system preference list. Once negotiation of a group is complete, all members of the cluster update their databases to properly reflect which systems own which groups. [0043]
  • When a previously failed system comes back online, the [0044] failover manager 87 decides whether to move some groups back to that system, in an action referred to as failback. To automatically failback, groups require a defined preferred owner. Groups for which the newly online system is the preferred owner are pushed from the current owner to the new system. Protection, in the form of a timing window, is included to control when the failback occurs.
  • Failing Over Application Configuration Information
  • Although the present invention primarily provides benefits with legacy applications, as will become apparent below, other types of resources may be failed over to other systems of a cluster. Accordingly, the present invention will be described with respect to the failing over of application configuration information stored in a local registry, however it is understood that it will operate in an equivalent manner with other types of resources that may store their configuration information locally rather than with the cluster. Thus, as used herein, the term “application” and “resource” are equivalent when used with respect to the failing over of appropriate configuration information. [0045]
  • In accordance with one aspect of the present invention, there is provided a method and system for tracking and checkpointing changes to a local system's registry, such that application configuration changes that would otherwise be lost are protected from machine failures. As will be described below, because the registry checkpointing is transparent to the application, no application changes are required, whereby a legacy application which stores its configuration in the local registry may be reliably used in a failover environment. [0046]
  • As represented in FIG. 4, a local system's [0047] registry 104 is essentially a database indexed by a number of keys 106 1-106 k hierarchically arranged into trees and subtrees. As shown in FIG. 4, the keys (particularly the low level subtrees) typically have named data associated therewith including strings, binary values and/or DWORDs. As described above, legacy applications store configuration information in the local registry 104, and occasionally make changes thereto. For example, as shown in FIG. 4, an application named “Program2” has configuration information indexed at HKEY_LOCAL_MACHINE\SOFTWARE\Program2, including a string, a binary value and a DWORD.
  • FIG. 5 represents the general architecture for tracking and checkpointing changes to configuration information on a first system (e.g., [0048] 60 1), while FIG. 6 represents the general architecture for failing over the information to another system (e.g., 60 2) of the cluster 58. In general, whenever an application 96 is initially installed on any cluster machine, a list 108 of registry subtrees associated with that application 96 may be generated. As can be appreciated, this may be accomplished by noting the differences to the local registry key structure after the application 96 is installed. This list 108 is preferably stored in the cluster registry 110 of the quorum device 57 under a registry key for that resource, however it may be maintained elsewhere in the cluster (such as replicated in its systems) if desired. In any event, whenever the application 96 is run, a checkpoint manager 112 accesses the list 108 and registers each subtree in the application's list of registry subtrees with a notification mechanism 114. The notification mechanism 114 watches the registry 104, and, whenever a change to a registered subtree is detected, informs the checkpoint manager 112 of the change. When notified of a change, the checkpoint manager 112, via a snapshot mechanism 116, takes a snapshot of the listed subtree data and records the snapshot as data 118 1-118 m associated with that application 96 (e.g., snapshot data 118 2) on the quorum device 57. The data may be stored as text (i.e., human readable) data.
  • More particularly, to accomplish the checkpointing operation, the [0049] checkpoint manager 112 is associated with an interface that includes three cluster resource controls which may be sent to a particular application resource's DLL (e.g., 102 3, FIG. 3) with a ClusterResourceControl function. A first resource control, CLCTL_ADD_REGISTRY_CHECKPOINT, includes a pointer named lpInBuffer, which points to a null-terminated Unicode string. The string specifies the name of the registry key at the root of the subtree that should be checkpointed for the specified resource. Since local application subtrees are stored under the HKEY_LOCAL_MACHINE key 106 3, the key name string is preferably shortened relative to HKEY_LOCAL_MACHINE, e.g., the exemplary application is simplified to “SOFTWARE\Program2.” Thus, this control function adds a subtree to the subtree list 108 that is associated with an application.
  • A second resource control, which essentially performs the opposite function, is named CLCTL_DELETE_REGISTRY_CHECKPOINT, and similarly includes a pointer, lpInBuffer, to a null-terminated Unicode string. This string specifies the name of a registry key that was previously registered with CLCTL_ADD_REGISTRY_CHECKPOINT. When called, the specified subtree pointed to by lpInBuffer will no longer be checkpointed for the specified resource. Lastly, a control function named CLCTL_GET_REGISTRY_CHECKPOINTS includes a pointer to a buffer named lpOutBuffer, which when invoked, returns a REG_MULTI_SZ list of registry keys that have been added to the specified resource's [0050] list 108 with CLCTL_ADD_REGISTRY_CHECKPOINT.
  • Using this general-purpose checkpointing facility, each resource may have a list of [0051] registry subtrees 108 to checkpoint. To receive notifications when the application 96 changes its configuration information, the notification mechanism 114 preferably utilizes a WIN32 API named RegNotifyChangeKey( ), via which a registry notification will be posted on each of a resource's subtrees when that resource is online. When any registry data is modified in a subtree with a notification posted thereon, a notification fires and the snapshot mechanism 116 of the checkpoint manager 112 takes a snapshot of the registry subtree (or trees). To accomplish the snapshot, the snapshot mechanism 116 preferably utilizes the WIN32 API named RegSaveKey( ).
  • In keeping with the invention, to provide failover support, the snapshot data is saved to the [0052] quorum device 57, referenced by the resource ID (a globally unique identifier, or GUID) and a unique checkpoint ID, which is an arbitrary DWORD. The interface for saving the data to the quorum device is set forth in the table below:
    DWORD
    CpSaveData (
    IN PFM_RESOURCE Resource,
    IN DWORD dwCheckpointId,
    IN PVOID lpData,
    IN DWORD lpcbData
    )
  • The CpSaveData function checkpoints arbitrary data for the specified resource. The [0053] checkpointed data 118 2 is stored on the quorum device 57 to ensure that it survives temporal partitions, and so that any node in the cluster may save or retrieve the checkpointed data 118 2. The Resource argument supplies the resource associated with this data, while the dwCheckpointId argument provides a unique checkpoint identifier describing this data. The caller is responsible for ensuring the uniqueness of the checkpoint identifier. Another argument, lpData supplies a pointer to the checkpoint data, while lpcbData provides the length (in bytes) of the checkpoint data pointed to by lpData. The function returns a value of ERROR_SUCCESS if successful, or a Win32 error code otherwise.
  • In accordance with another aspect of the present invention, once application configuration information is checkpointed (e.g., as the data [0054] 118 2) to the quorum device 57, the application configuration information may be restored to any other node of the cluster 58. Thus, to failover an application to another system 60 2, the checkpoint manager 120 on the other system 60 2 includes a restore mechanism 122 that essentially reverses the checkpointing operation. As represented in FIG. 6, when a resource 96 is failed over, but before it is brought online on another system, (as represented by the dashed box), its checkpointed registry data 118 2 is retrieved and restored into the other system's local registry 124.
  • To this end, another function, CpGetData( ), is provided to retrieve the checkpointed data for a specified [0055] resource 96, i.e., the data 118 2 which was saved to the quorum device 57 by CpSaveData( ). The CpGetData( ) function is set forth in the table below:
    DWORD
    CpGetData (
     IN PFM_RESOURCE Resource,
     IN DWORD dwCheckpointId,
     OUT PVOID *lpData,
     OUT DWORD *lpcbData
     )
  • In the present example with the CpGetData function, Resource identifies the [0056] resource 96 associated with this data 118 2, while dwCheckpointId supplies the unique checkpoint ID describing this data. The lpData argument returns a pointer to the checkpoint data, and lpcbData returns the length (in bytes) of the checkpoint data pointed to by lpData. The caller is responsible for freeing the memory, and as before, the caller is responsible for ensuring the uniqueness of the checkpoint identifier. The CpGetData function returns a value of ERROR_SUCCESS if successful, or a Win32 error code otherwise.
  • To restore the registry, the restore [0057] mechanism 122 utilizes the RegRestoreKey( ) WIN32 API for each checkpointed subtree. Once the other system's registry 124 is restored, the resource can be brought online, i.e., the failed over application 96 can be run. However, because this other system 60 2 may also fail, the application configuration information is also first tracked and checkpointed on the new system, in accordance with the present invention and as described above, i.e., using a notification mechanism 126.
  • Turning to an explanation of the operation of the invention with particular respect to the flow diagrams of FIGS. [0058] 7-11, the checkpointing operation is initiated when a request is received, either to initially create an initial registry checkpoint (step 700) for an application on the quorum device, or to run the application (step 706). In any event, at this time a cluster application (e.g., 96) and its associated registry subtree 108 are known. If the request is to create a registry checkpoint (e.g., 118 2), then step 702 obtains the subtree (or subtrees) associated with the application from the list 108 thereof in the cluster registry 110, and continues to step 800 of FIG. 8. Alternatively, if the application 96 is to be run, any initial steps to run the application 96, (e.g., allocate space in memory) may be performed. Then, step 708 determines if a checkpoint 118 2 already exists for this particular application on the quorum device 57, and if so, continues on to update the application's configuration information and then run the application 96, as described below with reference to FIG. 9. If no checkpoint 118 2 exists for this resource 96, then step 708 branches to step 702 and then to step 800 of FIG. 8.
  • At [0059] step 800 of FIG. 8, to create a registry checkpoint 118 2, a first subtree associated is selected and a snapshot is made of the specified registry subtree (using the RegSaveKey( ) WIN32 API) as described above. The registry checkpoint data is then saved to the cluster quorum device 118 2 as also described above (CpSaveData). Note that it is possible to also generate the subtree list 108 associated with the application (using CLCTL_ADD_REGISTRY_CHECKPOINT) at this time, or the list 108 can be generated in advance (step 702). If there is more than one subtree of application configuration information for an application, the process is repeated for each subtree via steps 806-808. This ensures that the appropriate application configuration information will be available to other cluster systems if the current system fails, as the registry subtree and its location on the quorum device 57 are now associated with the cluster application.
  • When the application has been initially checkpointed, step [0060] 810 tests the state of the cluster application 96. If at step 810 the application 96 is not currently running, nothing further needs to be done, and thus the process ends and waits for the application to be run at some later time. Otherwise the system proceeds to step 1000 of FIG. 10, where the process will register for change notifications and take any remaining steps to run the application (step 1001) as described below.
  • The steps of FIG. 9 are executed when a request is received to run an application (e.g., [0061] 96) that has an existing checkpoint 118 2 on the quorum device 57. In general, before the application 96 is started, the checkpointing process enumerates all the registry checkpoints associated with the cluster application 96. To this end, for each checkpoint, via steps 900-908, each registry snapshot associated with the application (e.g., 118 2) is retrieved from its location on the quorum device 57 (using CpGetData), and restored into the current machine's (e.g., 60 2) local registry 124 using the RegRestoreKey( ) API. As a result, any previously existing data at that location in the current system's local registry 124 is overwritten with the stored registry snapshot 118 2, whereby the application 96 will not see any stale data that may have been in the current system's local registry 124.
  • Next, after each checkpoint has been restored into the [0062] local registry 124, the checkpoint manager 120 (via the notification mechanism 126) registers for registry change notifications associated with the registry subtree, using the WIN32 API named RegNotifyChangeKey( ) as described above. At this time, the application 96 is allowed to run.
  • As represented in FIG. 10, any subsequent modifications to the specified registry data alert the [0063] notification mechanism 126. The API preferably works asynchronously to report a change to the registry 124, although for purposes of simplicity, FIG. 10 represents the monitoring for changes (or detecting the end of the application) in a loop (steps 1002-1008). In any event, when a change is detected as represented by step 1002, at step 1004, the checkpoint manager 120 takes a snapshot of the registry subtree that has changed as described above. Then, at step 1006, the existing registry checkpoint data 118 2 on the quorum device 57 is overwritten with the new snapshot of the registry subtree. Note that in a preferred embodiment, the communication mechanism of the current system 60 2 transfers this information to the system that has exclusive possession of the quorum device 57, which then writes the data. In this manner, each time that the registry data 118 2 is modified, the appropriate subtree is copied to the quorum device 57, whereby if the application is moved to another node, the configuration information is current on the new node.
  • FIG. 11 represents the steps taken when an application ends. As shown by steps [0064] 1102-1110, any registry change notifications associated with that application are removed so as to no longer fire upon a change. This is synchronized in such a way as to ensure that any registry modifications pending during the application shutdown are detected by the notification mechanism 126 and a new snapshot taken. Then, the shutdown of the application is completed at step 1112.
  • Lastly, as can be appreciated, instead of using the shared [0065] quorum device 57, the checkpoint manager 104 alternatively may write the information to at least one other non-volatile storage device shared by systems in the cluster. In another alternative, the checkpoint manager 104 may cause the information to be replicated via GLUP or some other communications mechanism to the other systems of the cluster. Note that such a replication operation would be logged on the quorum device 57, so that changes to the configuration information would survive a temporal partition. Moreover, rather than snapshot the entire set of subtrees, it is feasible to alternatively provide a mechanism that transfers only change information, for example if the subtree data is otherwise relatively large.
  • As can be seen from the foregoing detailed description, there is provided a method and system for transparently failing over resource configuration information stored by an application on a local machine. The application configuration information written to a registry of a local machine is made available to other machines of the cluster. The other machines can rapidly obtain this application configuration information and use it to recreate the application's execution environment on another machine in the cluster, ensuring a rapid and transparent failover operation. [0066]
  • While the invention is susceptible to various modifications and alternative constructions, certain illustrated embodiments thereof are shown in the drawings and has been described above in detail. It should be understood, however, that there is no intention to limit the invention to the specific forms disclosed, but on the contrary, the intention is to cover all modifications, alternative constructions, and equivalents falling within the spirit and scope of the invention. [0067]

Claims (21)

What is claimed is:
1. In a server cluster including at least two systems, a method of failing over an application from a first system to a second system of the cluster, comprising the steps of, locally maintaining application configuration information for the application on the first system, running the application on the first system, detecting a change to the application configuration information, and, in response to the change, making data representative of the change available to the second system, and running the application on the second system using the data made available thereto.
2. The method of
claim 1
wherein the application configuration information is maintained in a registry of the first system, and wherein the step of detecting a change to the application configuration information includes the step of monitoring for a change to data in at least one subtree associated with the application in the registry.
3. The method of
claim 2
wherein the step of making data representative of the change available to the second system includes the steps of making a copy of the data in each subtree having a change detected thereto.
4. The method of
claim 1
wherein the step of making data representative of the change available to the second system comprises the step of writing the data to a storage device shared by systems of the cluster.
5. The method of
claim 4
wherein the step of making data representative of the change available to the second system further comprises the step of retrieving the data from the storage device and passing the data to the second system.
6. The method of
claim 1
wherein the step of making data representative of the change available to other systems in the cluster comprises the step of storing the data in a quorum device of the cluster.
7. The method of
claim 1
wherein the step of making data representative of the change available to other systems in the cluster comprises the step communicating the data to at least one other system in the cluster.
8. The method of
claim 1
wherein the first system and the second system each locally maintain application configuration information for the application in a registry, and wherein the step of making data representative of the change available to the second system comprises the steps of, reading subtree data of the registry in the first system, writing the subtree data to a storage device shared by systems in the cluster, and retrieving the subtree data from the storage device to the registry of the second system.
9. The method of
claim 1
wherein the application has a list of subtrees associated therewith, and further comprising the step of registering each of the subtrees in the list with a notification mechanism for detecting changes thereto.
10. The method of
claim 1
further comprising the step of terminating the application on the first system.
11. The method of
claim 1
wherein the cluster includes a third system, and further comprising the steps of locally maintaining application configuration information for another application on the first system, running the other application on the first system, detecting a change to the other application configuration information, and, in response to the change, making data representative of the change available to the third system, and running the other application on the third system using the data made available thereto.
12. In a server cluster including at least two systems, a system for failing over an application from a first system to a second system of the cluster, comprising, a registry on each of the first and second systems for storing application configuration information of the application, a storage device shared by the first and second systems, a notification mechanism in the first system for detecting a change to a subtree in the registry associated with the application and providing a notification in response thereto, a snapshot mechanism in the first system responsive to the notification for reading the registry and saving subtree data to the storage device, and a restore mechanism in the second system for retrieving the subtree data from the storage device and updating the registry of the second system therewith.
13. The system of
claim 12
wherein the application has a list of subtrees associated therewith, and wherein the notification mechanism monitors each of the subtrees in the list for detecting changes thereto.
14. The system of
claim 13
wherein the list of subtrees is stored on the storage device.
15. The system of
claim 13
wherein the list of subtrees is stored in a cluster registry on the storage device.
16. The system of
claim 12
wherein the subtree data includes a name representative of a key in the registry and at least one value.
17. In a server cluster, a method of using application configuration information with an application, comprising the steps of:
locally maintaining application configuration information for the application on a system of the cluster;
determining if a cluster checkpoint of data corresponding to the application configuration information for the application is present on a storage device shared by systems in the cluster; and
if the cluster checkpoint exists,
updating the application configuration information of the local system with the data in the storage device,
running the application with the updated application configuration information, and
updating the cluster checkpoint to correspond to local changes to the application configuration information; and
if the checkpoint does not exist,
creating a cluster checkpoint on the storage device,
running the application with the locally maintained application configuration information, and
updating the cluster checkpoint to correspond to local changes to the application configuration information.
18. The method of
claim 17
wherein the application configuration information is maintained in a local registry of the system, and further comprising the step of detecting a change to the application configuration information by monitoring for a change to data of at least one subtree in the registry.
19. In a server cluster including at least two systems, a method of failing over an application from a first system to a second system of the cluster, comprising the steps of, maintaining application configuration information for the application in a registry of the first system, running the application on the first system, detecting a change to the application configuration information in a subtree of the registry, and, in response to the change, writing data of that subtree as subtree data to a storage device shared by systems of the cluster, terminating the application on the first system, reading the subtree data, modifying a registry of the second system with the subtree data, and running the application on the second system using the application configuration information stored in the registry of the second system.
20. The method of
claim 18
wherein the application has a list of subtrees associated therewith, and further comprising the step of registering each of the subtrees in the list with a notification mechanism for detecting changes thereto.
21. The method of
claim 19
wherein the cluster includes a third system, and further comprising the steps of maintaining other application configuration information for another application in the registry of the first system, running the other application on the first system, detecting a change to the other application configuration information in a subtree of the registry, and, in response to the change, writing data of that subtree as subtree data to a storage device shared by systems of the cluster, terminating the other application on the first system, reading the subtree data, modifying a registry of the third system with the subtree data, and running the other application on the third system using the other application configuration information stored in the registry of the third system.
US09/061,857 1998-04-17 1998-04-17 Method and system for transparently failing over application configuration information in a server cluster Expired - Lifetime US6360331B2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US09/061,857 US6360331B2 (en) 1998-04-17 1998-04-17 Method and system for transparently failing over application configuration information in a server cluster

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US09/061,857 US6360331B2 (en) 1998-04-17 1998-04-17 Method and system for transparently failing over application configuration information in a server cluster

Publications (2)

Publication Number Publication Date
US20010008019A1 true US20010008019A1 (en) 2001-07-12
US6360331B2 US6360331B2 (en) 2002-03-19

Family

ID=22038599

Family Applications (1)

Application Number Title Priority Date Filing Date
US09/061,857 Expired - Lifetime US6360331B2 (en) 1998-04-17 1998-04-17 Method and system for transparently failing over application configuration information in a server cluster

Country Status (1)

Country Link
US (1) US6360331B2 (en)

Cited By (38)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6647473B1 (en) * 2000-02-16 2003-11-11 Microsoft Corporation Kernel-based crash-consistency coordinator
FR2843209A1 (en) * 2002-08-02 2004-02-06 Cimai Technology Software application mirroring method for replication of a software application in different nodes of a computer cluster to provide seamless continuity to client computers in the case of failure of an application server
US6732289B1 (en) * 2000-08-31 2004-05-04 Sun Microsystems, Inc. Fault tolerant data storage system
US20040153558A1 (en) * 2002-10-31 2004-08-05 Mesut Gunduc System and method for providing java based high availability clustering framework
US20040199611A1 (en) * 2002-11-25 2004-10-07 Sven Bernhard Method and system for remote configuration of network devices
US20040230687A1 (en) * 2003-04-28 2004-11-18 Tomonori Nakamura Service management system, and method, communications unit and integrated circuit for use in such system
US6857082B1 (en) * 2000-11-21 2005-02-15 Unisys Corporation Method for providing a transition from one server to another server clustered together
US20050050084A1 (en) * 2003-08-29 2005-03-03 Atm Shafiqul Khalid Dynamic registry partitioning
US20050198102A1 (en) * 2001-09-25 2005-09-08 Sun Microsystems, Inc. Method for dynamic optimization of multiplexed resource partitions
US20060026463A1 (en) * 2004-07-28 2006-02-02 Oracle International Corporation, (A California Corporation) Methods and systems for validating a system environment
US20060037016A1 (en) * 2004-07-28 2006-02-16 Oracle International Corporation Methods and systems for modifying nodes in a cluster environment
WO2006085028A2 (en) * 2005-02-11 2006-08-17 Airbus France Test flight on-board processing system and method
US20060215564A1 (en) * 2005-03-23 2006-09-28 International Business Machines Corporation Root-cause analysis of network performance problems
US20070124347A1 (en) * 2005-11-30 2007-05-31 Oracle International Corporation Database system configured for automatic failover with no data loss
US20070288903A1 (en) * 2004-07-28 2007-12-13 Oracle International Corporation Automated treatment of system and application validation failures
WO2008058230A2 (en) 2006-11-08 2008-05-15 Archivas, Inc. Fast primary cluster recovery
US20080126845A1 (en) * 2005-11-30 2008-05-29 Oracle International Corporation Automatic failover configuration with lightweight observer
US7640454B1 (en) * 2004-06-28 2009-12-29 Symantec Operating Corporation System and method for point-in-time recovery of application resource sets
US20100250867A1 (en) * 2009-03-30 2010-09-30 The Boeing Company Computer architectures using shared storage
US20110214007A1 (en) * 2000-03-16 2011-09-01 Silicon Graphics, Inc. Flexible failover policies in high availability computing systems
US20110213753A1 (en) * 2010-02-26 2011-09-01 Symantec Corporation Systems and Methods for Managing Application Availability
US20110238813A1 (en) * 1999-03-26 2011-09-29 Microsoft Corporation Consistent cluster operational data in a server cluster using a quorum of replicas
US20130159528A1 (en) * 2011-12-15 2013-06-20 Microsoft Corporation Failover based application resource acquisition
US20130268495A1 (en) * 2012-04-09 2013-10-10 Microsoft Corporation Split brain protection in computer clusters
US8732162B2 (en) 2006-02-15 2014-05-20 Sony Computer Entertainment America Llc Systems and methods for server management
US8938639B1 (en) * 2012-02-24 2015-01-20 Symantec Corporation Systems and methods for performing fast failovers
US9098462B1 (en) 2010-09-14 2015-08-04 The Boeing Company Communications via shared memory
US9329953B2 (en) 2010-12-07 2016-05-03 International Business Machines Corporation Reducing application downtime during failover
CN106209450A (en) * 2016-07-08 2016-12-07 深圳前海微众银行股份有限公司 Server failure changing method and application automatization deployment system
US20170078439A1 (en) * 2015-09-15 2017-03-16 International Business Machines Corporation Tie-breaking for high availability clusters
US20170132100A1 (en) * 2015-11-10 2017-05-11 International Business Machines Corporation Smart selection of a storage module to be excluded
US9817739B1 (en) * 2012-10-31 2017-11-14 Veritas Technologies Llc Method to restore a virtual environment based on a state of applications/tiers
US10038702B2 (en) * 2014-12-15 2018-07-31 Sophos Limited Server drift monitoring
US10296425B2 (en) * 2017-04-20 2019-05-21 Bank Of America Corporation Optimizing data processing across server clusters and data centers using checkpoint-based data replication
US20190196718A1 (en) * 2017-12-21 2019-06-27 Apple Inc. Techniques for facilitating processing checkpoints between computing devices
US10721335B2 (en) * 2018-08-01 2020-07-21 Hewlett Packard Enterprise Development Lp Remote procedure call using quorum state store
US20210081280A1 (en) * 2019-09-12 2021-03-18 restorVault Virtual replication of unstructured data
US10970177B2 (en) * 2017-08-18 2021-04-06 Brian J. Bulkowski Methods and systems of managing consistency and availability tradeoffs in a real-time operational DBMS

Families Citing this family (98)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6427163B1 (en) * 1998-07-10 2002-07-30 International Business Machines Corporation Highly scalable and highly available cluster system management scheme
US6401120B1 (en) * 1999-03-26 2002-06-04 Microsoft Corporation Method and system for consistent cluster operational data in a server cluster using a quorum of replicas
US6453426B1 (en) * 1999-03-26 2002-09-17 Microsoft Corporation Separately storing core boot data and cluster configuration data in a server cluster
US6594779B1 (en) * 1999-03-30 2003-07-15 International Business Machines Corporation Method, system and program products for managing the checkpointing/restarting of resources of a computing environment
US6587860B1 (en) * 1999-03-31 2003-07-01 International Business Machines Corporation Apparatus and method for tracking access to data resources in a cluster environment
US7756830B1 (en) 1999-03-31 2010-07-13 International Business Machines Corporation Error detection protocol
US6925513B1 (en) * 1999-05-04 2005-08-02 Apple Computer, Inc. USB device notification
US6871222B1 (en) 1999-05-28 2005-03-22 Oracle International Corporation Quorumless cluster using disk-based messaging
US7020695B1 (en) * 1999-05-28 2006-03-28 Oracle International Corporation Using a cluster-wide shared repository to provide the latest consistent definition of the cluster (avoiding the partition-in time problem)
US7076783B1 (en) 1999-05-28 2006-07-11 Oracle International Corporation Providing figure of merit vote from application executing on a partitioned cluster
US6601190B1 (en) * 1999-10-28 2003-07-29 Hewlett-Packard Development Company, L.P. Automatic capture and reporting of computer configuration data
US6662219B1 (en) 1999-12-15 2003-12-09 Microsoft Corporation System for determining at subgroup of nodes relative weight to represent cluster by obtaining exclusive possession of quorum resource
US6874035B1 (en) * 2000-02-02 2005-03-29 Storage Technology Corporation System and methods for transforming data from a source to target platform using snapshot
US6671688B1 (en) * 2000-02-10 2003-12-30 Novell, Inc. Virtual replication for a computer directory system
US20020087949A1 (en) * 2000-03-03 2002-07-04 Valery Golender System and method for software diagnostics using a combination of visual and dynamic tracing
US6898727B1 (en) * 2000-03-22 2005-05-24 Emc Corporation Method and apparatus for providing host resources for an electronic commerce site
US6934269B1 (en) * 2000-04-24 2005-08-23 Microsoft Corporation System for networked component address and logical network formation and maintenance
WO2001084314A2 (en) * 2000-05-02 2001-11-08 Sun Microsystem, Inc. Method and system for providing cluster replicated checkpoint services
AU2001261141A1 (en) * 2000-05-02 2001-11-12 Sun Microsystems, Inc. Method and system for achieving high availability in a networked computer system
US6725261B1 (en) * 2000-05-31 2004-04-20 International Business Machines Corporation Method, system and program products for automatically configuring clusters of a computing environment
WO2002001410A1 (en) * 2000-06-26 2002-01-03 International Business Machines Corporation Data management application programming interface for a parallel file system
US6990606B2 (en) * 2000-07-28 2006-01-24 International Business Machines Corporation Cascading failover of a data management application for shared disk file systems in loosely coupled node clusters
US8312435B2 (en) 2000-12-26 2012-11-13 Identify Software Ltd. (IL) System and method for conditional tracing of computer programs
US20020107966A1 (en) * 2001-02-06 2002-08-08 Jacques Baudot Method and system for maintaining connections in a network
US6990602B1 (en) * 2001-08-23 2006-01-24 Unisys Corporation Method for diagnosing hardware configuration in a clustered system
US7277952B2 (en) * 2001-09-28 2007-10-02 Microsoft Corporation Distributed system resource protection via arbitration and ownership
DE60106467T2 (en) * 2001-12-14 2006-02-23 Hewlett-Packard Development Co., L.P., Houston Procedure for installing monitoring agent, system and computer program of objects in an IT network monitoring
US7000103B2 (en) * 2001-12-21 2006-02-14 Inventec Corporation Method for updating a system BIOS by reading a BIOS stored in an IDE-interface connected to a hard disk drive
US7206817B2 (en) * 2002-01-18 2007-04-17 Bea Systems, Inc. Systems and methods for application management and deployment
US7228326B2 (en) * 2002-01-18 2007-06-05 Bea Systems, Inc. Systems and methods for application deployment
US20030140100A1 (en) * 2002-01-18 2003-07-24 Sam Pullara System and method for URL response caching and filtering in servlets and application servers
WO2003073209A2 (en) 2002-02-22 2003-09-04 Bea Systems, Inc. System and method for software application scoping
US7406039B2 (en) * 2002-02-26 2008-07-29 Dell Products L.P. System and method for a failover protocol in storage area network controllers
US7228344B2 (en) * 2002-03-13 2007-06-05 Hewlett-Packard Development Company, Lp. High availability enhancement for servers using structured query language (SQL)
US7120834B1 (en) 2002-03-29 2006-10-10 Marvell International Ltd. Fast port failover in a network switch
US7606920B2 (en) * 2002-05-17 2009-10-20 Sony Computer Entertainment America Inc. Method and apparatus for controlling communication ports for an online session of a multi-user application by associating each of the ports with a protocol and designating an active port
US7124171B1 (en) * 2002-05-23 2006-10-17 Emc Corporation In a networked computing cluster storage system and plurality of servers sharing files, in the event of server unavailability, transferring a floating IP network address from first server to second server to access area of data
US7302692B2 (en) * 2002-05-31 2007-11-27 International Business Machines Corporation Locally providing globally consistent information to communications layers
US7010717B2 (en) * 2002-07-29 2006-03-07 Hewlett-Packard Development Company, L.P. Facility creation process for clustered servers
US7124320B1 (en) * 2002-08-06 2006-10-17 Novell, Inc. Cluster failover via distributed configuration repository
FI119407B (en) * 2002-08-28 2008-10-31 Sap Ag A high-quality software-based contact server
JP2004086721A (en) * 2002-08-28 2004-03-18 Nec Corp Data reproducing system, relay system, data transmission/receiving method, and program for reproducing data in storage
US7194445B2 (en) * 2002-09-20 2007-03-20 Lenovo (Singapore) Pte. Ltd. Adaptive problem determination and recovery in a computer system
US7043419B2 (en) * 2002-09-20 2006-05-09 International Business Machines Corporation Method and apparatus for publishing and monitoring entities providing services in a distributed data processing system
US20040060054A1 (en) * 2002-09-20 2004-03-25 International Business Machines Corporation Composition service for autonomic computing
US7216343B2 (en) * 2002-09-20 2007-05-08 International Business Machines Corporation Method and apparatus for automatic updating and testing of software
US20040059704A1 (en) * 2002-09-20 2004-03-25 International Business Machines Corporation Self-managing computing system
US7363534B1 (en) * 2002-09-30 2008-04-22 Cisco Technology, Inc. Method and system for stateful switch-over in a high-availability point to point system
US7386839B1 (en) 2002-11-06 2008-06-10 Valery Golender System and method for troubleshooting software configuration problems using application tracing
US7055052B2 (en) * 2002-11-21 2006-05-30 International Business Machines Corporation Self healing grid architecture for decentralized component-based systems
US8032866B1 (en) 2003-03-27 2011-10-04 Identify Software Ltd. System and method for troubleshooting runtime software problems using application learning
US7739543B1 (en) * 2003-04-23 2010-06-15 Netapp, Inc. System and method for transport-level failover for loosely coupled iSCSI target devices
US7603442B2 (en) * 2003-06-20 2009-10-13 Microsoft Corporation Method and system for maintaining service dependency relationships in a computer system
US7406694B2 (en) * 2003-06-20 2008-07-29 Microsoft Corporation Method and system for tracking kernel resource usage
JP4321705B2 (en) * 2003-07-29 2009-08-26 株式会社日立製作所 Apparatus and storage system for controlling acquisition of snapshot
US20050038882A1 (en) * 2003-08-13 2005-02-17 Huscher Anthony A. Automated eRoom archive tool and method
US7743381B1 (en) * 2003-09-16 2010-06-22 Symantec Operating Corporation Checkpoint service
US7543174B1 (en) * 2003-09-24 2009-06-02 Symantec Operating Corporation Providing high availability for an application by rapidly provisioning a node and failing over to the node
FI20031628A0 (en) * 2003-11-10 2003-11-10 Nokia Corp Computer-aided, computer-based unit and method for controlling access to memory between computer-units
US20050125557A1 (en) * 2003-12-08 2005-06-09 Dell Products L.P. Transaction transfer during a failover of a cluster controller
US20050132379A1 (en) * 2003-12-11 2005-06-16 Dell Products L.P. Method, system and software for allocating information handling system resources in response to high availability cluster fail-over events
US7234075B2 (en) * 2003-12-30 2007-06-19 Dell Products L.P. Distributed failover aware storage area network backup of application data in an active-N high availability cluster
US20050198652A1 (en) * 2004-01-13 2005-09-08 Huscher Anthony A. Dynamic link library (DLL) for providing server enhancements
US8020034B1 (en) 2004-03-12 2011-09-13 Microsoft Corporation Dependency filter object
US20050210152A1 (en) * 2004-03-17 2005-09-22 Microsoft Corporation Providing availability information using a distributed cache arrangement and updating the caches using peer-to-peer synchronization strategies
US20050267920A1 (en) * 2004-05-13 2005-12-01 Fabrice Helliker System and method for archiving data in a clustered environment
US20050283636A1 (en) * 2004-05-14 2005-12-22 Dell Products L.P. System and method for failure recovery in a cluster network
US9626173B1 (en) 2004-06-08 2017-04-18 Sap Se Non specification supported application deployment descriptors and web application deployment descriptors
US7827539B1 (en) 2004-06-25 2010-11-02 Identify Software Ltd. System and method for automated tuning of program execution tracing
US7827154B1 (en) * 2004-10-05 2010-11-02 Symantec Operating Corporation Application failure diagnosis
US7895591B2 (en) * 2004-10-21 2011-02-22 Oracle International Corp. File deployment system and method
US7320088B1 (en) * 2004-12-28 2008-01-15 Veritas Operating Corporation System and method to automate replication in a clustered environment
US7698390B1 (en) * 2005-03-29 2010-04-13 Oracle America, Inc. Pluggable device specific components and interfaces supported by cluster devices and systems and methods for implementing the same
US7779295B1 (en) * 2005-06-28 2010-08-17 Symantec Operating Corporation Method and apparatus for creating and using persistent images of distributed shared memory segments and in-memory checkpoints
US7548923B2 (en) * 2007-01-05 2009-06-16 Microsoft Corporation Sync configuration and discovery support
US7689862B1 (en) * 2007-01-23 2010-03-30 Emc Corporation Application failover in a cluster environment
US7757116B2 (en) * 2007-04-04 2010-07-13 Vision Solutions, Inc. Method and system for coordinated multiple cluster failover
US8326805B1 (en) * 2007-09-28 2012-12-04 Emc Corporation High-availability file archiving
US8918603B1 (en) 2007-09-28 2014-12-23 Emc Corporation Storage of file archiving metadata
US8060709B1 (en) 2007-09-28 2011-11-15 Emc Corporation Control of storage volumes in file archiving
US7849223B2 (en) * 2007-12-07 2010-12-07 Microsoft Corporation Virtually synchronous Paxos
US9189250B2 (en) * 2008-01-16 2015-11-17 Honeywell International Inc. Method and system for re-invoking displays
US9454444B1 (en) 2009-03-19 2016-09-27 Veritas Technologies Llc Using location tracking of cluster nodes to avoid single points of failure
JP5352299B2 (en) * 2009-03-19 2013-11-27 株式会社日立製作所 High reliability computer system and configuration method thereof
US8458515B1 (en) 2009-11-16 2013-06-04 Symantec Corporation Raid5 recovery in a high availability object based file system
US20110191627A1 (en) * 2010-01-29 2011-08-04 Maarten Koning System And Method for Handling a Failover Event
US8190947B1 (en) * 2010-02-11 2012-05-29 Network Appliance, Inc. Method and system for automatically constructing a replica catalog for maintaining protection relationship information between primary and secondary storage objects in a network storage system
US8874961B2 (en) * 2010-03-22 2014-10-28 Infosys Limited Method and system for automatic failover of distributed query processing using distributed shared memory
US9600315B2 (en) * 2010-10-22 2017-03-21 Netapp, Inc. Seamless takeover of a stateful protocol session in a virtual machine environment
US9059898B2 (en) 2010-12-07 2015-06-16 General Electric Company System and method for tracking configuration changes in enterprise product
US8495323B1 (en) 2010-12-07 2013-07-23 Symantec Corporation Method and system of providing exclusive and secure access to virtual storage objects in a virtual machine cluster
CN102685728B (en) * 2011-03-14 2014-10-08 鸿富锦精密工业(深圳)有限公司 WiMAX (Worldwide Interoperability for Microwave Access) client and parameter setting method thereof
JP5959733B2 (en) * 2013-04-23 2016-08-02 株式会社日立製作所 Storage system and storage system failure management method
CN103401798B (en) * 2013-07-30 2016-12-28 北京京东尚科信息技术有限公司 A kind of multi-node communication method and device
US9632803B2 (en) 2013-12-05 2017-04-25 Red Hat, Inc. Managing configuration states in an application server
US10031933B2 (en) 2014-03-02 2018-07-24 Netapp, Inc. Peer to peer ownership negotiation
EP2937785B1 (en) * 2014-04-25 2016-08-24 Fujitsu Limited A method of recovering application data
US10466984B2 (en) 2017-05-01 2019-11-05 At&T Intellectual Property I, L.P. Identifying and associating computer assets impacted by potential change to a particular computer asset

Family Cites Families (56)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4736393A (en) 1986-04-16 1988-04-05 American Telephone And Telegraph Co., At&T Information Systems, Inc. Distributed timing control for a distributed digital communication system
US5165018A (en) 1987-01-05 1992-11-17 Motorola, Inc. Self-configuration of nodes in a distributed message-based operating system
US5021949A (en) 1988-02-29 1991-06-04 International Business Machines Corporation Method and apparatus for linking an SNA host to a remote SNA host over a packet switched communications network
US5027269A (en) 1989-04-27 1991-06-25 International Business Machines Corporation Method and apparatus for providing continuous availability of applications in a computer network
US5117352A (en) 1989-10-20 1992-05-26 Digital Equipment Corporation Mechanism for fail-over notification
JPH03164837A (en) 1989-11-22 1991-07-16 Hitachi Ltd Spare switching system for communication control processor
US5128885A (en) 1990-02-23 1992-07-07 International Business Machines Corporation Method for automatic generation of document history log exception reports in a data processing system
US5301337A (en) 1990-04-06 1994-04-05 Bolt Beranek And Newman Inc. Distributed resource management system using hashing operation to direct resource request from different processors to the processor controlling the requested resource
US5416777A (en) 1991-04-10 1995-05-16 California Institute Of Technology High speed polling protocol for multiple node network
US5341372A (en) 1991-04-10 1994-08-23 California Institute Of Technology Protocol for multiple node network
AU1893392A (en) 1991-05-03 1992-12-21 Storage Technology Corporation Knowledge based resource management
JPH0575628A (en) 1991-09-13 1993-03-26 Fuji Xerox Co Ltd Network resource monitor system
US5423037A (en) 1992-03-17 1995-06-06 Teleserve Transaction Technology As Continuously available database server having multiple groups of nodes, each group maintaining a database copy with fragments stored on multiple nodes
JP2721294B2 (en) 1993-01-29 1998-03-04 本田技研工業株式会社 Online monitoring system for computer systems
DE4497149T1 (en) 1993-09-24 1996-10-17 Oracle Corp Method and device for replicating data
GB9320641D0 (en) 1993-10-07 1993-11-24 British Telecomm Networks with distributed restoration
US5435003A (en) 1993-10-07 1995-07-18 British Telecommunications Public Limited Company Restoration in communications networks
US5745669A (en) * 1993-10-21 1998-04-28 Ast Research, Inc. System and method for recovering PC configurations
US5491800A (en) 1993-12-20 1996-02-13 Taligent, Inc. Object-oriented remote procedure call networking system
US5710727A (en) 1994-05-04 1998-01-20 National Instruments Corporation System and method for creating resources in an instrumentation system
US5490270A (en) 1994-06-16 1996-02-06 International Business Machines Corporation Simultaneous updates to the modification time attribute of a shared file in a cluster having a server and client nodes
US5634010A (en) * 1994-10-21 1997-05-27 Modulus Technologies, Inc. Managing and distributing data objects of different types between computers connected to a network
US5757642A (en) 1995-01-20 1998-05-26 Dell Usa L.P. Multi-function server input/output subsystem and method
US5666538A (en) * 1995-06-07 1997-09-09 Ast Research, Inc. Disk power manager for network servers
US5666486A (en) * 1995-06-23 1997-09-09 Data General Corporation Multiprocessor cluster membership manager framework
AU6678096A (en) 1995-07-20 1997-02-18 Novell, Inc. Transaction synchronization in a disconnectable computer and network
JPH09149076A (en) * 1995-09-22 1997-06-06 Canon Inc Data communication equipment and method
US6047323A (en) * 1995-10-19 2000-04-04 Hewlett-Packard Company Creation and migration of distributed streams in clusters of networked computers
US5815649A (en) 1995-10-20 1998-09-29 Stratus Computer, Inc. Distributed fault tolerant digital data storage subsystem for fault tolerant computer system
US5819019A (en) 1995-12-01 1998-10-06 Silicon Graphics, Inc. System/method for recovering network resources in a distributed environment, via registered callbacks
US5982747A (en) 1995-12-28 1999-11-09 Dynarc Inc. Method for managing failures on dynamic synchronous transfer mode dual ring topologies
US5754752A (en) 1996-03-28 1998-05-19 Tandem Computers Incorporated End-to-end session recovery
US5781737A (en) 1996-04-30 1998-07-14 International Business Machines Corporation System for processing requests for notice of events
US5768523A (en) 1996-04-30 1998-06-16 International Business Machines Corporation Program product for processing requests for notice of events
US5768524A (en) 1996-04-30 1998-06-16 International Business Machines Corporation Method for processing requests for notice of events
US5940870A (en) 1996-05-21 1999-08-17 Industrial Technology Research Institute Address translation for shared-memory multiprocessor clustering
US5852724A (en) * 1996-06-18 1998-12-22 Veritas Software Corp. System and method for "N" primary servers to fail over to "1" secondary server
US5832514A (en) 1996-06-26 1998-11-03 Microsoft Corporation System and method for discovery based data recovery in a store and forward replication process
US5805839A (en) 1996-07-02 1998-09-08 Advanced Micro Devices, Inc. Efficient technique for implementing broadcasts on a system of hierarchical buses
US5754877A (en) 1996-07-02 1998-05-19 Sun Microsystems, Inc. Extended symmetrical multiprocessor architecture
US5794253A (en) 1996-07-12 1998-08-11 Microsoft Corporation Time based expiration of data objects in a store and forward replication enterprise
US5787247A (en) 1996-07-12 1998-07-28 Microsoft Corporation Replica administration without data loss in a store and forward replication enterprise
US5919247A (en) * 1996-07-24 1999-07-06 Marimba, Inc. Method for the distribution of code and data updates
US6044367A (en) * 1996-08-02 2000-03-28 Hewlett-Packard Company Distributed I/O store
JP2933021B2 (en) 1996-08-20 1999-08-09 日本電気株式会社 Communication network failure recovery method
US5963960A (en) 1996-10-29 1999-10-05 Oracle Corporation Method and apparatus for queuing updates in a computer system
US5867714A (en) * 1996-10-31 1999-02-02 Ncr Corporation System and method for distributing configuration-dependent software revisions to a computer system
JPH10161916A (en) * 1996-11-28 1998-06-19 Hitachi Ltd Detection of update conflict accompanying duplication of data base
US5968140A (en) * 1997-01-02 1999-10-19 Intel Corporation System for configuring a device where stored configuration information is asserted at a first time and external operational data is asserted at a second time
US5935230A (en) * 1997-02-07 1999-08-10 Amiga Development, Llc Multiprocessor arrangement including bus arbitration scheme involving plural CPU clusters that address each other as "phantom" CPUs
US6134673A (en) * 1997-05-13 2000-10-17 Micron Electronics, Inc. Method for clustering software applications
US6003075A (en) 1997-07-07 1999-12-14 International Business Machines Corporation Enqueuing a configuration change in a network cluster and restore a prior configuration in a back up storage in reverse sequence ordered
US5968121A (en) * 1997-08-13 1999-10-19 Microsoft Corporation Method and apparatus for representing and applying network topological data
US5991893A (en) 1997-08-29 1999-11-23 Hewlett-Packard Company Virtually reliable shared memory
US6173420B1 (en) * 1997-10-31 2001-01-09 Oracle Corporation Method and apparatus for fail safe configuration
US6195760B1 (en) * 1998-07-20 2001-02-27 Lucent Technologies Inc Method and apparatus for providing failure detection and recovery with predetermined degree of replication for distributed applications in a network

Cited By (92)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110238813A1 (en) * 1999-03-26 2011-09-29 Microsoft Corporation Consistent cluster operational data in a server cluster using a quorum of replicas
US20110238842A1 (en) * 1999-03-26 2011-09-29 Microsoft Corporation Consistent cluster operational data in a server cluster using a quorum of replicas
US8850007B2 (en) 1999-03-26 2014-09-30 Microsoft Corporation Consistent cluster operational data in a server cluster using a quorum of replicas
US8850018B2 (en) * 1999-03-26 2014-09-30 Microsoft Corporation Consistent cluster operational data in a server cluster using a quorum of replicas
US6647473B1 (en) * 2000-02-16 2003-11-11 Microsoft Corporation Kernel-based crash-consistency coordinator
US20140281675A1 (en) * 2000-03-16 2014-09-18 Sony Computer Entertainment America Llc Flexible failover policies in high availability computing systems
US20110214007A1 (en) * 2000-03-16 2011-09-01 Silicon Graphics, Inc. Flexible failover policies in high availability computing systems
US8769132B2 (en) * 2000-03-16 2014-07-01 Sony Computer Entertainment America Llc Flexible failover policies in high availability computing systems
US9405640B2 (en) * 2000-03-16 2016-08-02 Sony Interactive Entertainment America Llc Flexible failover policies in high availability computing systems
US20170031790A1 (en) * 2000-03-16 2017-02-02 Sony Interactive Entertainment America Llc Flexible failover policies in high availability computing systems
US6732289B1 (en) * 2000-08-31 2004-05-04 Sun Microsystems, Inc. Fault tolerant data storage system
US6857082B1 (en) * 2000-11-21 2005-02-15 Unisys Corporation Method for providing a transition from one server to another server clustered together
US7694303B2 (en) * 2001-09-25 2010-04-06 Sun Microsystems, Inc. Method for dynamic optimization of multiplexed resource partitions
US20050198102A1 (en) * 2001-09-25 2005-09-08 Sun Microsystems, Inc. Method for dynamic optimization of multiplexed resource partitions
FR2843209A1 (en) * 2002-08-02 2004-02-06 Cimai Technology Software application mirroring method for replication of a software application in different nodes of a computer cluster to provide seamless continuity to client computers in the case of failure of an application server
US7725763B2 (en) * 2002-08-02 2010-05-25 International Business Machines Corporation Functional continuity by replicating a software application in a multi-computer architecture
US20050251785A1 (en) * 2002-08-02 2005-11-10 Meiosys Functional continuity by replicating a software application in a multi-computer architecture
WO2004015574A3 (en) * 2002-08-02 2004-09-02 Meiosys Functional continuity by replicating a software application in a multi-computer architecture
WO2004015574A2 (en) * 2002-08-02 2004-02-19 Meiosys Functional continuity by replicating a software application in a multi-computer architecture
US20040153558A1 (en) * 2002-10-31 2004-08-05 Mesut Gunduc System and method for providing java based high availability clustering framework
US20040199611A1 (en) * 2002-11-25 2004-10-07 Sven Bernhard Method and system for remote configuration of network devices
US7831734B2 (en) * 2002-11-25 2010-11-09 Sap Ag Method and system for remote configuration of network devices
US20040230670A1 (en) * 2002-11-25 2004-11-18 Markus Schmidt-Karaca Method and system for representing, configuring and deploying distributed applications
US20040230687A1 (en) * 2003-04-28 2004-11-18 Tomonori Nakamura Service management system, and method, communications unit and integrated circuit for use in such system
US20050050084A1 (en) * 2003-08-29 2005-03-03 Atm Shafiqul Khalid Dynamic registry partitioning
US7203696B2 (en) * 2003-08-29 2007-04-10 Microsoft Corporation Dynamic registry partitioning
KR101085643B1 (en) 2003-08-29 2011-11-22 마이크로소프트 코포레이션 Dynamic registry partitioning
US7640454B1 (en) * 2004-06-28 2009-12-29 Symantec Operating Corporation System and method for point-in-time recovery of application resource sets
US7536599B2 (en) 2004-07-28 2009-05-19 Oracle International Corporation Methods and systems for validating a system environment
US20070288903A1 (en) * 2004-07-28 2007-12-13 Oracle International Corporation Automated treatment of system and application validation failures
US7937455B2 (en) 2004-07-28 2011-05-03 Oracle International Corporation Methods and systems for modifying nodes in a cluster environment
US7962788B2 (en) 2004-07-28 2011-06-14 Oracle International Corporation Automated treatment of system and application validation failures
US20060037016A1 (en) * 2004-07-28 2006-02-16 Oracle International Corporation Methods and systems for modifying nodes in a cluster environment
US20060026463A1 (en) * 2004-07-28 2006-02-02 Oracle International Corporation, (A California Corporation) Methods and systems for validating a system environment
US7693986B2 (en) 2005-02-11 2010-04-06 Airbus France Test flight on-board processing system and method
JP2008530672A (en) * 2005-02-11 2008-08-07 エアバス・フランス Test flight onboard processing system and method
WO2006085028A3 (en) * 2005-02-11 2007-01-11 Airbus France Test flight on-board processing system and method
FR2882165A1 (en) * 2005-02-11 2006-08-18 Airbus France Sas SYSTEM AND METHOD FOR ONBOARD FLIGHT TEST PROCESSING
WO2006085028A2 (en) * 2005-02-11 2006-08-17 Airbus France Test flight on-board processing system and method
US7489639B2 (en) * 2005-03-23 2009-02-10 International Business Machines Corporation Root-cause analysis of network performance problems
US20060215564A1 (en) * 2005-03-23 2006-09-28 International Business Machines Corporation Root-cause analysis of network performance problems
US20080126845A1 (en) * 2005-11-30 2008-05-29 Oracle International Corporation Automatic failover configuration with lightweight observer
US8255369B2 (en) * 2005-11-30 2012-08-28 Oracle International Corporation Automatic failover configuration with lightweight observer
US20070124347A1 (en) * 2005-11-30 2007-05-31 Oracle International Corporation Database system configured for automatic failover with no data loss
US7668879B2 (en) * 2005-11-30 2010-02-23 Oracle International Corporation Database system configured for automatic failover with no data loss
US8630985B2 (en) 2005-11-30 2014-01-14 Oracle International Corporation Automatic failover configuration with lightweight observer
US8732162B2 (en) 2006-02-15 2014-05-20 Sony Computer Entertainment America Llc Systems and methods for server management
US9886508B2 (en) 2006-02-15 2018-02-06 Sony Interactive Entertainment America Llc Systems and methods for server management
JP2010509686A (en) * 2006-11-08 2010-03-25 アーカイヴァス インコーポレイテッド Primary cluster fast recovery
EP2092442A4 (en) * 2006-11-08 2010-08-18 Archivas Inc Fast primary cluster recovery
EP2092442A2 (en) * 2006-11-08 2009-08-26 Archivas, Inc. Fast primary cluster recovery
WO2008058230A2 (en) 2006-11-08 2008-05-15 Archivas, Inc. Fast primary cluster recovery
US20100257374A1 (en) * 2009-03-30 2010-10-07 The Boeing Company Computer architectures using shared storage
US9690839B2 (en) 2009-03-30 2017-06-27 The Boeing Company Computer architectures using shared storage
US8601309B2 (en) 2009-03-30 2013-12-03 The Boeing Company Computer architectures using shared storage
US8601308B2 (en) 2009-03-30 2013-12-03 The Boeing Company Computer architectures using shared storage
US20100250867A1 (en) * 2009-03-30 2010-09-30 The Boeing Company Computer architectures using shared storage
US9098562B2 (en) 2009-03-30 2015-08-04 The Boeing Company Computer architectures using shared storage
US20100251010A1 (en) * 2009-03-30 2010-09-30 The Boeing Company Computer architectures using shared storage
US8601307B2 (en) 2009-03-30 2013-12-03 The Boeing Company Computer architectures using shared storage
US8171337B2 (en) * 2009-03-30 2012-05-01 The Boeing Company Computer architectures using shared storage
US8972515B2 (en) 2009-03-30 2015-03-03 The Boeing Company Computer architectures using shared storage
JP2013520746A (en) * 2010-02-26 2013-06-06 シマンテック コーポレーション System and method for failing over non-cluster aware applications in a cluster system
WO2011106067A1 (en) * 2010-02-26 2011-09-01 Symantec Corporation Systems and methods for failing over cluster unaware applications in a clustered system
CN102782656A (en) * 2010-02-26 2012-11-14 赛门铁克公司 Systems and methods for failing over cluster unaware applications in a clustered system
US20110213753A1 (en) * 2010-02-26 2011-09-01 Symantec Corporation Systems and Methods for Managing Application Availability
US8688642B2 (en) * 2010-02-26 2014-04-01 Symantec Corporation Systems and methods for managing application availability
US9098462B1 (en) 2010-09-14 2015-08-04 The Boeing Company Communications via shared memory
US9329952B2 (en) 2010-12-07 2016-05-03 International Business Machines Corporation Reducing application downtime during failover
US9329953B2 (en) 2010-12-07 2016-05-03 International Business Machines Corporation Reducing application downtime during failover
US20130159528A1 (en) * 2011-12-15 2013-06-20 Microsoft Corporation Failover based application resource acquisition
US8938639B1 (en) * 2012-02-24 2015-01-20 Symantec Corporation Systems and methods for performing fast failovers
US9146705B2 (en) * 2012-04-09 2015-09-29 Microsoft Technology, LLC Split brain protection in computer clusters
US20130268495A1 (en) * 2012-04-09 2013-10-10 Microsoft Corporation Split brain protection in computer clusters
US9817739B1 (en) * 2012-10-31 2017-11-14 Veritas Technologies Llc Method to restore a virtual environment based on a state of applications/tiers
US10447708B2 (en) 2014-12-15 2019-10-15 Sophos Limited Server drift monitoring
US10038702B2 (en) * 2014-12-15 2018-07-31 Sophos Limited Server drift monitoring
US9930140B2 (en) * 2015-09-15 2018-03-27 International Business Machines Corporation Tie-breaking for high availability clusters
US20170078439A1 (en) * 2015-09-15 2017-03-16 International Business Machines Corporation Tie-breaking for high availability clusters
US20170132100A1 (en) * 2015-11-10 2017-05-11 International Business Machines Corporation Smart selection of a storage module to be excluded
US9898378B2 (en) * 2015-11-10 2018-02-20 International Business Machines Corporation Smart selection of a storage module to be excluded
US10657013B2 (en) 2015-11-10 2020-05-19 International Business Machines Corporation Smart selection of a storage module to be excluded
CN106209450A (en) * 2016-07-08 2016-12-07 深圳前海微众银行股份有限公司 Server failure changing method and application automatization deployment system
US10296425B2 (en) * 2017-04-20 2019-05-21 Bank Of America Corporation Optimizing data processing across server clusters and data centers using checkpoint-based data replication
US10970177B2 (en) * 2017-08-18 2021-04-06 Brian J. Bulkowski Methods and systems of managing consistency and availability tradeoffs in a real-time operational DBMS
US20190196718A1 (en) * 2017-12-21 2019-06-27 Apple Inc. Techniques for facilitating processing checkpoints between computing devices
US10871912B2 (en) * 2017-12-21 2020-12-22 Apple Inc. Techniques for facilitating processing checkpoints between computing devices
US11675519B2 (en) 2017-12-21 2023-06-13 Apple Inc. Techniques for facilitating processing checkpoints between computing devices
US10721335B2 (en) * 2018-08-01 2020-07-21 Hewlett Packard Enterprise Development Lp Remote procedure call using quorum state store
US20210081280A1 (en) * 2019-09-12 2021-03-18 restorVault Virtual replication of unstructured data
US11630737B2 (en) * 2019-09-12 2023-04-18 Restorvault, Llc Virtual replication of unstructured data
US11816000B2 (en) 2019-09-12 2023-11-14 restor Vault, LLC Virtual recovery of unstructured data

Also Published As

Publication number Publication date
US6360331B2 (en) 2002-03-19

Similar Documents

Publication Publication Date Title
US6360331B2 (en) Method and system for transparently failing over application configuration information in a server cluster
US6449734B1 (en) Method and system for discarding locally committed transactions to ensure consistency in a server cluster
US6243825B1 (en) Method and system for transparently failing over a computer name in a server cluster
US6453426B1 (en) Separately storing core boot data and cluster configuration data in a server cluster
US6163855A (en) Method and system for replicated and consistent modifications in a server cluster
US6279032B1 (en) Method and system for quorum resource arbitration in a server cluster
US6748429B1 (en) Method to dynamically change cluster or distributed system configuration
US6393485B1 (en) Method and apparatus for managing clustered computer systems
JP4307673B2 (en) Method and apparatus for configuring and managing a multi-cluster computer system
US6178529B1 (en) Method and system for resource monitoring of disparate resources in a server cluster
JP4603755B2 (en) Method and system for coordinating and managing multiple snapshot providers in common
Candea et al. Recursive restartability: Turning the reboot sledgehammer into a scalpel
EP1222540B1 (en) Method and system for consistent cluster operational data in a server cluster using a quorum of replicas
US7676635B2 (en) Recoverable cache preload in clustered computer system based upon monitored preload state of cache
US7610582B2 (en) Managing a computer system with blades
US8335765B2 (en) Provisioning and managing replicated data instances
US7543174B1 (en) Providing high availability for an application by rapidly provisioning a node and failing over to the node
US7849221B2 (en) Online instance deletion in a multi-instance computer system
US20040153558A1 (en) System and method for providing java based high availability clustering framework
JPH08272725A (en) System and method for judgment and operation of constituent conformation of server in distributed object environment
US8316110B1 (en) System and method for clustering standalone server applications and extending cluster functionality
US9106537B1 (en) Method for high availability of services in cloud computing systems
JP2003532190A (en) Method and apparatus for providing a volume snapshot dependency in a computer system
US20040010666A1 (en) Storage services and systems
US8020034B1 (en) Dependency filter object

Legal Events

Date Code Title Description
AS Assignment

Owner name: MICROSOFT CORPORATION, WASHINGTON

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:VERT, JOHN D.;SHRIVASTAVA, SUNITA;REEL/FRAME:009149/0737

Effective date: 19980406

STCF Information on status: patent grant

Free format text: PATENTED CASE

FPAY Fee payment

Year of fee payment: 4

FPAY Fee payment

Year of fee payment: 8

FPAY Fee payment

Year of fee payment: 12

AS Assignment

Owner name: MICROSOFT TECHNOLOGY LICENSING, LLC, WASHINGTON

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:MICROSOFT CORPORATION;REEL/FRAME:034541/0001

Effective date: 20141014