US20040117687A1 - High-availability architecture using high-speed pipes - Google Patents

High-availability architecture using high-speed pipes Download PDF

Info

Publication number
US20040117687A1
US20040117687A1 US10/692,252 US69225203A US2004117687A1 US 20040117687 A1 US20040117687 A1 US 20040117687A1 US 69225203 A US69225203 A US 69225203A US 2004117687 A1 US2004117687 A1 US 2004117687A1
Authority
US
United States
Prior art keywords
computer system
data
transferring
availability
pipe
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
US10/692,252
Inventor
Leslie Smith
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Individual
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Individual filed Critical Individual
Priority to US10/692,252 priority Critical patent/US20040117687A1/en
Publication of US20040117687A1 publication Critical patent/US20040117687A1/en
Abandoned legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/16Error detection or correction of the data by redundancy in hardware
    • G06F11/20Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
    • G06F11/2097Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements maintaining the standby controller/processing unit updated
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/16Error detection or correction of the data by redundancy in hardware
    • G06F11/20Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
    • G06F11/202Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where processing functionality is redundant
    • G06F11/2038Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where processing functionality is redundant with a single idle spare processing component

Definitions

  • the present invention relates generally to high availability computer system architectures, and in particular to apparatus, systems, and methods for a high-availability computer system architecture using high-speed pipes.
  • Apparatus, systems, and methods consistent with the present invention utilize available high-speed pipes to transfer information necessary for high-availability between two computer systems.
  • one or more logical pipes are implemented on a physical pipe between two computer systems.
  • the use of the term pipe refers to a communication channel.
  • a physical pipe refers to a physical communication channel.
  • a logical pipe refers to a logical communication channel, and high-availability information refers to data transferred between systems for purposes of implementing a high-availability architecture.
  • the logical pipes are used for data transfer between an active system and a standby system so that the standby system has the information necessary to take over from the active system if the active system fails in some way.
  • the logical pipes that transfer information necessary for implementing high-availability are part of a physical pipe that also carries other types of information used by the active system.
  • the system may also use network interface cards to implement the high-speed pipes.
  • the network interface cards may be implemented using conventional interface cards without departing from the principles of the invention.
  • a NIC using Virtual Interface (VI) Architecture may be used.
  • An apparatus consistent with the present invention comprises a physical pipe for transferring data between an active system and a standby system.
  • the apparatus further comprises a first logical pipe for transferring data over the physical pipe, and a second logical pipe for transferring high-availability data over the physical pipe.
  • Another apparatus consistent with the present invention comprises a physical pipe for transferring data between an active system and a standby system.
  • the apparatus further comprises network interface card for transferring data and high-availability information over the physical pipe.
  • Yet another apparatus consistent with the present invention includes a physical pipe for transferring data between an active system and a standby system.
  • the apparatus further comprises a first logical pipe for transferring checkpointing data over the physical pipe, and a second logical pipe for transferring total system state data over the physical pipe.
  • a system consistent with the present invention comprises a physical pipe.
  • the system further comprises an active system for transferring data and high-availability information over the physical pipe, and a standby system for receiving the high-availability information from the physical pipe.
  • a method in a high-availability system having an active system and a standby system is provided.
  • the active system sends a message to the standby system to enter a switch-over state.
  • the standby system monitors a transfer complete marker.
  • the method transfers total system state from the active system to the standby system.
  • the method switches the high-availability system from the active system to the standby system upon detecting the transfer complete marker.
  • FIG. 1 is a block diagram showing a high-availability computer system consistent with the present invention
  • FIG. 2 is a block diagram showing a method using a transfer complete marker consistent with the present invention
  • FIG. 3 is a block diagram showing transitions for Graceful Switch Over consistent with the present invention.
  • FIG. 4 is a block diagram illustrating the VI Architectural Model consistent with the present invention.
  • Apparatus, systems, and methods consistent with the improved high-availability architecture disclosed herein use high-speed pipes to exchange information between an active computer system and a standby system.
  • Appendix A which contains a glossary of the terms and conventions used in describing the invention, is incorporated herein in its entirety as part of this Detailed Description.
  • High Speed Pipes system uses high speed pipes to transfer information necessary for high-availability between two computer systems. This information exchange permits a standby computer system to takeover in case the active system fails.
  • the present system uses logical pipes on existing physical pipes, thereby realizing significant cost savings compared with conventional systems that require dedicated pipes for transferring high-availability information between computer systems.
  • the call processing platform redundancy scheme is based on a 1+1 model and can be expanded to work in an n+1 redundancy model.
  • a database in this document means protected memory region, hard disk drive files and any data structures common on both the active and standby system.
  • the use of the term pipe refers to a communication channel.
  • a physical pipe refers to a physical communication channel.
  • a logical pipe refers to a logical communication channel, and high-availability information refers to data transferred between systems for purposes of implementing a high-availability architecture.
  • FIG. 1 illustrates a high-availability computer system 100 .
  • active system 102 is interconnected with standby system 104 .
  • Active system 102 comprises a disk drive 106 , memory 108 , and CPU 109 .
  • Standby system 104 comprises a disk drive 110 , memory 112 , and CPU 113 .
  • Active system 102 and standby system 104 are interconnected via two interfaces or logical pipes: Interface A 114 and Interface B 116 .
  • Interface A 114 transfers two types of traffic:
  • Interface A 114 one of the logical pipes, is used to transfer “Heart Beat,” or in other words messages between active node 102 and standby node 104 that make one system aware of another's existence or health.
  • Interface A 114 is used to transfer “P.mem updates,” or in other words any protected memory updates that occur on the active node are replicated on the standby node.
  • Interface A 114 is used to ensure disk redundancy. For example, any updates or write operations performed on active node disk 106 are replicated on standby node disk 110 .
  • any configuration changes made on active node 102 are replicated on standby node 104 by transferring commands or inputs associated with any configuration changes to the standby node using Interface A.
  • the pipe is configured to work within a client server type configuration, allowing software to access the pipe in a similar manner to a socket-based TCP/IP connection. All data sent across this pipe will be encapsulated and sent across as a message. Only complete transactions should be sent across this interface at any one time to prevent the case where a partial transaction has been sent to the standby side when active node failure occurs, thus causing an inconsistent database on the standby side.
  • the transaction could be built on the inactive side from partial transactions, and then applied as a single transaction once it has been fully built. Due to the symmetrical nature of the system, it can be assumed that if a transaction is completed on the active side, the same transaction will be complete on the standby side, therefore no rollback and retry functionality will be required for the first phase of this system.
  • GSO Graceful Switch over
  • a data transfer mechanism transfers the total system state at a particular time by the active node to the standby node in the least amount of time possible, allowing the standby node to continue where the previously active node stopped.
  • the HSP must exhibit the following characteristics:
  • the receiving side must pend and be notified of completion without OS involvement.
  • DMA Direct Memory Access
  • RDMA Remote Direct Memory Access
  • RDMAW Remote Direct Memory Access Write
  • RDMAR Remote Direct Memory Access Read
  • the two may be viewed as push (the initiator writes directly into the recipients memory) for RDMAW and pull (the initiator reads the hosts memory and copies the data into its own memory) for RDMAR.
  • All current adaptors support RDMAW, a few also support RDMAR. Even though FIG. 1 depicts RDMAW operation, one skilled in the art will appreciate that RDMAR may also be used.
  • the receiving side must know when the transfer is complete.
  • a small loop is entered on the receiving side where a memory address is monitored.
  • address location 0x7 ffff 206 on standby node 204 is set to 0 ⁇ 0000 initially and is monitored.
  • the transfer complete marker it indicates that transfer has been completed and thus standby side may assume the role of the active node.
  • a value of 0xfabe 210 is depicted as the transfer complete marker.
  • this value could be any non-zero value that the active node and the standby node have agreed to treat as the transfer complete marker.
  • FIG. 3 is a block diagram illustrating an overview of the transitions taking place on both nodes during a GSO.
  • Side A the active side is in normal active 302 state.
  • Side B the standby side, is in normal standby 304 state.
  • a GSO event is always initiated by the active side ‘A’ by sending a message to the standby side ‘B’ to enter the GSO receive state.
  • side A enters Start GSO 306 state and upon receiving the Start GSO message, side B enters Start GSO 308 state as well.
  • Side ‘A’ then enters a PRE-GSO state, the GSO Interrupt State (State 310 ), and waits for an acknowledgment from side ‘B’ that it is ready to receive the system image.
  • State 310 the GSO Interrupt State
  • Side ‘B’ stops all activity and enters a small loop looking for a specific memory location to change, the GSO Interrupt State (state 312 ). Side ‘A’ then initiates the RDMAW, and enters a loop, similar to side ‘B,’ to prevent it from restarting until the system image has been transferred (state 314 ). Side ‘B’ sends a done message to side ‘A’ when it detects that the transfer complete marker has changed (state 316 ), thus allowing the side ‘A’ to restart (state 318 ) and become the standby node (state 322 ).
  • Side ‘B’ then executes a return from interrupt or return from exception instruction, for example, IRET (state 320 ), causing the processor to continue from the point where side ‘A’ jumped into the GSO interrupt, thus assuming the role of the active node (state 324 ).
  • IRET exception instruction
  • FIG. 3 depicts the state transitions in a particular order, the order of these state transitions may be changed.
  • NIC Network Interface Card
  • a commercially available NIC from Compaq that fulfills the requirements of the HSP may be used to implement the physical and the logical pipes.
  • the Servernet card has been externalized for the open systems server market, allowing it to be used as the HSP hardware.
  • Some NIC's include a virtual interface architecture, such as the Virtual Interface Architecture (VIA) standard.
  • VIP Virtual Interface Architecture
  • NIC's such as the Servernet card
  • X and Y a dual interconnect fabric denoted X and Y allowing transparent link redundancy to be part of the standard interface.
  • the NIC has native VIA processing in the hardware.
  • a software VIA emulator may be used, allowing software to be written to utilize the VIA interface.
  • VIA NIC's have to provide the ability to do RDMAW operations because the RDMAW is the basic transport mechanism of the VIA interface.
  • One consideration for CPP high availability strategy is that the RDMAW can take place with no OS support, because of the requirement that both the active and standby sides are in a locked interrupt state to prevent the OS state from changing.
  • VIA is a channel architecture. Therefore, one or more logical pipes may exist through one physical pipe.
  • VI Architectural Model 400 depicts the relationship between VI Consumer 402 and VI Provider 404 .
  • VI Consumer 402 comprises Application 406 , OS Communication Interface 408 , and VI User Agent 410 .
  • OS Communication Interface may consist of sockets, Message Passing Interface (MPI), Cluster, or other communication mechanisms.
  • VI Provider 404 comprises VI Kernel Agent 412 , VI Send/Receive and Completion Module 414 , and VI Network Adapter 416 .
  • VI Consumer 402 runs in the user mode and VI Provider 404 runs in the kernel mode as depicted in FIG. 4.
  • the VI Consumer on the local node always specifies the location of the data.
  • the sending process specifies the memory regions that contain the data to be sent.
  • the receiving process specifies the memory regions where the data will be placed. Given a single connection, there is a one-to-one correspondence between send Descriptors on the transmitting side and receive Descriptors on the receiving side.
  • the VI Consumer at the receiving end pre-posts a Descriptor to the receive queue of a VI send/receive module.
  • the VI Consumer at the sending end can then post the message to the corresponding VI's send queue.
  • the Send/Receive model of data transfer requires that the VI Consumers be notified of Descriptor completion at both ends of the transfer, for synchronization purposes.
  • VI Consumers are responsible for managing flow control on a connection.
  • the VI Consumer on the receiving side must post a Receive Descriptor of sufficient size before the sender's data arrives. If the Receive Descriptor at the head of the queue is not large enough to handle the incoming message, or the Receive Queue is empty, an error will occur.
  • the connection may be broken if it is intended to be reliable.
  • the VI Architecture differs from some existing models in that all Send/Receive operations are completed asynchronously.
  • the initiator of the data transfer specifies both the source buffer and the destination buffer of the data transfer.
  • the VI Consumer specifies the source of the data transfer in one of its local registered memory regions, and the destination of the data transfer within a remote memory region that has been registered on the remote system.
  • the source of an RDMA Write can be specified as a gather list of buffers, while the destination must be a single, virtually contiguous region.
  • the RDMA Write operation implies that prior to the data transfer, the VI Consumer at the remote end has informed the initiator of the RDMA Write of the location of the destination buffer, and that the buffer itself is enabled for RDMA Write operations.
  • the remote location of the data is specified by its virtual address and its associated memory handle.
  • the VI Consumer specifies the source of the data transfer at the remote end, and the destination of the data transfer within a locally registered memory region.
  • the VI Consumer on the receiving side must post a Receive Descriptor to receive the Immediate Data, before the sender executes the RDMA Write. If no Descriptor is posted, an error will occur and the connection may be broken.
  • RDMAW does not change OS state during operation, on both initiator and receiver.
  • memory transfer may be implemented in a variety of ways. For example, the system could start at lower memory location and increment address as data is transferred, or start at high memory location and decrement address as data is transferred.
  • Apparatus, systems, and methods consistent with the principles of the invention disclosed herein provide a high-availability architecture using high-speed pipes.
  • the high-speed pipes may be implemented using logical pipes over an existing physical pipe.
  • the high-speed pipes may also be implemented using conventional network interface cards.
  • the apparatus disclosed herein should be understood to support the processes performed thereby, and, similarly, the processes disclosed herein should be understood to support the apparatus necessary to perform the steps of the processes. It should be further understood that the apparatus and methods disclosed herein may be implemented entirely in hardware, entirely in software, or a mixture of hardware and software.
  • the apparatus and method consistent with the present invention and disclosed herein are related to apparatus and methods for a high-availability architecture using high-speed pipes.
  • Parts of the architecture may be implemented in whole or in part by one or more sequences of instructions which carry out the apparatus and method described herein.
  • Such instructions may be read by the computer systems or by network interface cards from a computer-readable medium, such as a storage device.
  • Execution of sequences of instructions by the computer system or network interface cards causes performance of process steps consistent with the present invention described herein.
  • Execution of sequences of instructions may also be considered to implement apparatus elements that perform the process steps.
  • Hard-wired circuitry may be used in place of or in combination with software instructions to implement the invention.
  • embodiments of the invention are not limited to any specific combination of hardware circuitry and software.
  • Non-volatile memory media includes, for example, optical or magnetic disks.
  • Volatile memory media includes RAM.
  • Transmission media includes, for example, coaxial cables, copper wire and fiber optics, including the wires. Transmission media can also take the form of acoustic or light waves, such-as those generated during radio-wave and infra-red data communications.
  • Computer-readable media include, for example, a floppy disk, a flexible disk, hard disk, magnetic tape, or any other magnetic storage medium, a CD-ROM, any other optical medium, punchcards, papertape, any other physical medium with patterns of holes, a RAM, a PROM, an EPROM, a FLASH-EPROM, any other memory chip or cartridge, a carrier wave as described hereinafter, or any other medium from which a computer can read and use.
  • Various forms of computer readable media may be involved in carrying one or more sequences of instructions for execution to implement the high-availability architecture described herein.
  • the instructions may initially be carried on a magnetic disk or a remote computer.
  • the remote computer can load the instructions into its dynamic memory and send the instructions over a telephone line using a modem.
  • a modem local to a computer system can receive the data on the telephone line and use an infra-red transmitter to convert the data to an infrared signal.
  • An infra-red detector coupled to appropriate circuitry can receive the data carried in the infra-red signal and place the data on a bus.
  • the bus may carry data to a memory, from which a processor retrieves and executes the instructions.
  • the instructions received by the memory may optionally be stored on a storage device either before or after execution by the processor.

Abstract

Apparatus, system, and methods for a high availability computer system architecture using high-speed pipes are provided. An active computer system and a standby computer system are connected using a physical pipe for transferring data between the active computer system and the standby computer system. A first logical pipe is used for transferring data over the physical pipe, and a second logical pipe is used for transferring high-availability data over the physical pipe. Network-interface cards may be used to implement the high-speed pipes.

Description

    I. CROSS REFERENCE TO RELATED APPLICATIONS
  • This application claims the benefit of provisional application “Methods and Apparatus for High-Availability Architecture Using High-Speed Pipes,” filed Jun. 2, 1999 bearing Serial No. 60/137,203, the contents of which are relied upon and incorporated by reference.[0001]
  • II. BACKGROUND OF THE INVENTION
  • A. Field of the Invention [0002]
  • The present invention relates generally to high availability computer system architectures, and in particular to apparatus, systems, and methods for a high-availability computer system architecture using high-speed pipes. [0003]
  • B. Description of the Related Art [0004]
  • Conventional high-availability computer systems use special purpose, dedicated systems for implementing redundancy. For example, some conventional systems utilize two computer systems, one of which is active and the other standby, and special purpose hardware and software that interacts with each computer system to implement high-availability. The special purpose hardware and software communicates with the active computer system to capture status information so that in the event the active system goes down the standby system can start in place of the active system using the information collected by the special purpose hardware and software. [0005]
  • Thus, conventional high-availability computer system architectures require special purpose hardware and software, which raises system costs. The additional costs make these systems very expensive. There is, therefore, a need for a high-availability computer system architecture that solves the problems associated with special purpose hardware and software high-availability systems. [0006]
  • III. SUMMARY OF THE INVENTION
  • Apparatus, systems, and methods consistent with the present invention utilize available high-speed pipes to transfer information necessary for high-availability between two computer systems. In one embodiment, one or more logical pipes are implemented on a physical pipe between two computer systems. The use of the term pipe refers to a communication channel. A physical pipe refers to a physical communication channel. A logical pipe refers to a logical communication channel, and high-availability information refers to data transferred between systems for purposes of implementing a high-availability architecture. The logical pipes are used for data transfer between an active system and a standby system so that the standby system has the information necessary to take over from the active system if the active system fails in some way. In one embodiment, the logical pipes that transfer information necessary for implementing high-availability are part of a physical pipe that also carries other types of information used by the active system. [0007]
  • The system may also use network interface cards to implement the high-speed pipes. The network interface cards (NIC) may be implemented using conventional interface cards without departing from the principles of the invention. For example, a NIC using Virtual Interface (VI) Architecture may be used. [0008]
  • By using logical pipes on existing physical pipes, there are significant cost savings as compared to conventional systems that require dedicated pipes to transfer the high-availability information. Moreover, by using network interface cards, additional cost savings may be realized. Logical pipes and network interface cards may also be used in combination. Because the architecture reduces or eliminates special purpose hardware and software, costs are significantly reduced. [0009]
  • An apparatus consistent with the present invention comprises a physical pipe for transferring data between an active system and a standby system. The apparatus further comprises a first logical pipe for transferring data over the physical pipe, and a second logical pipe for transferring high-availability data over the physical pipe. [0010]
  • Another apparatus consistent with the present invention comprises a physical pipe for transferring data between an active system and a standby system. The apparatus further comprises network interface card for transferring data and high-availability information over the physical pipe. [0011]
  • Yet another apparatus consistent with the present invention includes a physical pipe for transferring data between an active system and a standby system. The apparatus further comprises a first logical pipe for transferring checkpointing data over the physical pipe, and a second logical pipe for transferring total system state data over the physical pipe. [0012]
  • A system consistent with the present invention comprises a physical pipe. The system further comprises an active system for transferring data and high-availability information over the physical pipe, and a standby system for receiving the high-availability information from the physical pipe. [0013]
  • A method in a high-availability system having an active system and a standby system is provided. According to this method, the active system sends a message to the standby system to enter a switch-over state. The standby system monitors a transfer complete marker. The method transfers total system state from the active system to the standby system. The method switches the high-availability system from the active system to the standby system upon detecting the transfer complete marker. [0014]
  • Such apparatus, systems, and methods overcome the problems of conventional high-availability architectures described above. Additional advantages of the invention are apparent from the description which follows, and may be learned by practice of the invention. It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the invention, as claimed.[0015]
  • IV. BRIEF DESCRIPTION OF THE DRAWINGS
  • The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate an embodiment of the invention and, together with the description, serve to explain the advantages and principles of the invention. In the drawings, [0016]
  • FIG. 1 is a block diagram showing a high-availability computer system consistent with the present invention; [0017]
  • FIG. 2 is a block diagram showing a method using a transfer complete marker consistent with the present invention; [0018]
  • FIG. 3 is a block diagram showing transitions for Graceful Switch Over consistent with the present invention; and [0019]
  • FIG. 4 is a block diagram illustrating the VI Architectural Model consistent with the present invention.[0020]
  • V. DETAILED DESCRIPTION
  • Apparatus, systems, and methods consistent with the improved high-availability architecture disclosed herein use high-speed pipes to exchange information between an active computer system and a standby system. Appendix A, which contains a glossary of the terms and conventions used in describing the invention, is incorporated herein in its entirety as part of this Detailed Description. [0021]
  • HSP System Overview [0022]
  • High Speed Pipes system uses high speed pipes to transfer information necessary for high-availability between two computer systems. This information exchange permits a standby computer system to takeover in case the active system fails. The present system uses logical pipes on existing physical pipes, thereby realizing significant cost savings compared with conventional systems that require dedicated pipes for transferring high-availability information between computer systems. [0023]
  • Due to legacy reasons, the call processing platform redundancy scheme is based on a 1+1 model and can be expanded to work in an n+1 redundancy model. [0024]
  • The use of the term database in this document means protected memory region, hard disk drive files and any data structures common on both the active and standby system. The use of the term pipe refers to a communication channel. A physical pipe refers to a physical communication channel. A logical pipe refers to a logical communication channel, and high-availability information refers to data transferred between systems for purposes of implementing a high-availability architecture. [0025]
  • Interface A [0026]
  • FIG. 1 illustrates a high-[0027] availability computer system 100. As shown in FIG. 1, active system 102 is interconnected with standby system 104. Active system 102 comprises a disk drive 106, memory 108, and CPU 109. Standby system 104 comprises a disk drive 110, memory 112, and CPU 113. Active system 102 and standby system 104 are interconnected via two interfaces or logical pipes: Interface A 114 and Interface B 116. Interface A 114 transfers two types of traffic:
  • Operational and Management (OA&M) and Health status [0028]
  • Transactions that change the state of the protected memory interface. [0029]
  • Accordingly, as shown in FIG. 1, [0030] Interface A 114, one of the logical pipes, is used to transfer “Heart Beat,” or in other words messages between active node 102 and standby node 104 that make one system aware of another's existence or health. In addition, Interface A 114 is used to transfer “P.mem updates,” or in other words any protected memory updates that occur on the active node are replicated on the standby node. Also, as shown in FIG. 1, Interface A 114 is used to ensure disk redundancy. For example, any updates or write operations performed on active node disk 106 are replicated on standby node disk 110. One skilled in the art will appreciate that other operational and management and health status information may also be transferred using Interface A. For example, any configuration changes made on active node 102 are replicated on standby node 104 by transferring commands or inputs associated with any configuration changes to the standby node using Interface A.
  • The characteristics of this interface are: [0031]
  • Low latency [0032]
  • Low CPU utilization [0033]
  • Moderate to high bandwidth [0034]
  • Low latency reduces the time window where a transaction has been placed into the logical pipe and the possibility of the active system failing during the transfer of the data, thus causing an inconsistency in the database on the standby side. [0035]
  • Low CPU utilization is required due to the large number of transactions expected between the two systems during normal operation. The utilization of the main CPU by processes or tasks other than for maintaining high-availability should not be greater than 10% during normal operation. [0036]
  • In many systems, the average traffic rate across this pipe will be relatively low, although during some administrative operations the traffic rate will have some significant bursts of traffic. In one embodiment, the pipe is configured to work within a client server type configuration, allowing software to access the pipe in a similar manner to a socket-based TCP/IP connection. All data sent across this pipe will be encapsulated and sent across as a message. Only complete transactions should be sent across this interface at any one time to prevent the case where a partial transaction has been sent to the standby side when active node failure occurs, thus causing an inconsistent database on the standby side. Alternatively, the transaction could be built on the inactive side from partial transactions, and then applied as a single transaction once it has been fully built. Due to the symmetrical nature of the system, it can be assumed that if a transaction is completed on the active side, the same transaction will be complete on the standby side, therefore no rollback and retry functionality will be required for the first phase of this system. [0037]
  • Failure of this link will cause inconsistency in the two databases. Therefore a procedure may be used to synchronize the databases without impact to the operation of the system. This synchronization could happen at any time. For example, synchronization could happen at hardware failure, software error, or human error (e.g., inadvertently removes cable). [0038]
  • To ensure database consistency, some form of audit facility may be run periodically. It will be assumed that the active database is correct and any differences will be applied to the standby database in the case of inconsistency. [0039]
  • Interface B [0040]
  • The second logical pipe, [0041] Interface B 116, between the two systems is used only during a Graceful Switch over (GSO). GSO in this context refers to the ability to transfer control from one processing element to a standby element within a brief period, such as one second, without any impact to the functionality of the system. To facilitate GSO, a data transfer mechanism transfers the total system state at a particular time by the active node to the standby node in the least amount of time possible, allowing the standby node to continue where the previously active node stopped.
  • Within the CPP system this transfer of data period is known as the stop and copy point. The requirements of the HSP during the stop and copy phase are considerably different to the requirements during normal operation. [0042]
  • During the stop and copy phase the HSP must exhibit the following characteristics: [0043]
  • Very High bandwidth. [0044]
  • OS-independent data transfer. [0045]
  • Does not change the system state on the active or inactive side. [0046]
  • The receiving side must pend and be notified of completion without OS involvement. [0047]
  • These requirements pose a number of technical challenges. Although many technologies offer very high bandwidth (IEEE 1394, Giga-bit Ethernet, etc.), many of them require the use of OS services. The use of a Direct Memory Access (DMA) engine fulfills most of the requirements except the ability to transfer the data between two independent nodes. Remote Direct Memory Access (RDMA) has all the same characteristics of regular DMA engines except that a DMA transaction can be performed across a pair of nodes, thus allowing a block of data to be directly transferred between the memory subsystems of two independent nodes. Two nodes of RDMA exist: Remote Direct Memory Access Write (RDMAW) and Remote Direct Memory Access Read (RDMAR). The two may be viewed as push (the initiator writes directly into the recipients memory) for RDMAW and pull (the initiator reads the hosts memory and copies the data into its own memory) for RDMAR. All current adaptors support RDMAW, a few also support RDMAR. Even though FIG. 1 depicts RDMAW operation, one skilled in the art will appreciate that RDMAR may also be used. [0048]
  • The receiving side must know when the transfer is complete. In one embodiment, a small loop is entered on the receiving side where a memory address is monitored. When this location changes, the last byte of the transfer has completed, allowing the standby side to return out of the GSO and assume the role of the active node. For example, as shown in FIG. 2, address [0049] location 0x7 ffff 206 on standby node 204 is set to 0×0000 initially and is monitored. When this location changes to a previously agreed upon value, the transfer complete marker, it indicates that transfer has been completed and thus standby side may assume the role of the active node. In FIG. 2, for example, a value of 0xfabe 210 is depicted as the transfer complete marker. One skilled in the art will appreciate that this value could be any non-zero value that the active node and the standby node have agreed to treat as the transfer complete marker.
  • FIG. 3 is a block diagram illustrating an overview of the transitions taking place on both nodes during a GSO. Initially, Side A, the active side is in normal active [0050] 302 state. Side B, the standby side, is in normal standby 304 state. In one embodiment a GSO event is always initiated by the active side ‘A’ by sending a message to the standby side ‘B’ to enter the GSO receive state. Thus, side A enters Start GSO 306 state and upon receiving the Start GSO message, side B enters Start GSO 308 state as well. Side ‘A’ then enters a PRE-GSO state, the GSO Interrupt State (State 310), and waits for an acknowledgment from side ‘B’ that it is ready to receive the system image. Side ‘B’ stops all activity and enters a small loop looking for a specific memory location to change, the GSO Interrupt State (state 312). Side ‘A’ then initiates the RDMAW, and enters a loop, similar to side ‘B,’ to prevent it from restarting until the system image has been transferred (state 314). Side ‘B’ sends a done message to side ‘A’ when it detects that the transfer complete marker has changed (state 316), thus allowing the side ‘A’ to restart (state 318) and become the standby node (state 322). Side ‘B’ then executes a return from interrupt or return from exception instruction, for example, IRET (state 320), causing the processor to continue from the point where side ‘A’ jumped into the GSO interrupt, thus assuming the role of the active node (state 324). One skilled in the art will appreciate that even though FIG. 3 depicts the state transitions in a particular order, the order of these state transitions may be changed.
  • Use of a Network Interface Card (NIC) as HSP [0051]
  • A commercially available NIC from Compaq (Servernet), that fulfills the requirements of the HSP may be used to implement the physical and the logical pipes. Recently the Servernet card has been externalized for the open systems server market, allowing it to be used as the HSP hardware. Some NIC's include a virtual interface architecture, such as the Virtual Interface Architecture (VIA) standard. [0052]
  • Conventional NIC's, such as the Servernet card, employ a dual interconnect fabric denoted X and Y allowing transparent link redundancy to be part of the standard interface. In one embodiment of the invention, the NIC has native VIA processing in the hardware. In another embodiment, a software VIA emulator may be used, allowing software to be written to utilize the VIA interface. [0053]
  • An important feature of all VIA NIC's is that they have to provide the ability to do RDMAW operations because the RDMAW is the basic transport mechanism of the VIA interface. One consideration for CPP high availability strategy is that the RDMAW can take place with no OS support, because of the requirement that both the active and standby sides are in a locked interrupt state to prevent the OS state from changing. [0054]
  • Software Interface [0055]
  • The following is an overview of the software interface for the two logical HSP's. It should be noted that VIA is a channel architecture. Therefore, one or more logical pipes may exist through one physical pipe. [0056]
  • Send/Receive [0057]
  • The Send/Receive model of the known VI Architecture follows a well-known model of transferring data between two endpoints. As shown in FIG. 4, [0058] VI Architectural Model 400 depicts the relationship between VI Consumer 402 and VI Provider 404. VI Consumer 402 comprises Application 406, OS Communication Interface 408, and VI User Agent 410. OS Communication Interface may consist of sockets, Message Passing Interface (MPI), Cluster, or other communication mechanisms. VI Provider 404 comprises VI Kernel Agent 412, VI Send/Receive and Completion Module 414, and VI Network Adapter 416.
  • In one [0059] implementation VI Consumer 402 runs in the user mode and VI Provider 404 runs in the kernel mode as depicted in FIG. 4.
  • In this model, the VI Consumer on the local node always specifies the location of the data. On the sending side, the sending process specifies the memory regions that contain the data to be sent. On the receiving side, the receiving process specifies the memory regions where the data will be placed. Given a single connection, there is a one-to-one correspondence between send Descriptors on the transmitting side and receive Descriptors on the receiving side. [0060]
  • The VI Consumer at the receiving end pre-posts a Descriptor to the receive queue of a VI send/receive module. The VI Consumer at the sending end can then post the message to the corresponding VI's send queue. The Send/Receive model of data transfer requires that the VI Consumers be notified of Descriptor completion at both ends of the transfer, for synchronization purposes. VI Consumers are responsible for managing flow control on a connection. The VI Consumer on the receiving side must post a Receive Descriptor of sufficient size before the sender's data arrives. If the Receive Descriptor at the head of the queue is not large enough to handle the incoming message, or the Receive Queue is empty, an error will occur. The connection may be broken if it is intended to be reliable. The VI Architecture differs from some existing models in that all Send/Receive operations are completed asynchronously. [0061]
  • Remote Direct Memory Access (RDMA) [0062]
  • In the RDMA Model, the initiator of the data transfer specifies both the source buffer and the destination buffer of the data transfer. There are two types of RDMA operations, RDMA Write and RDMA Read. [0063]
  • For the RDMA Write operation, the VI Consumer specifies the source of the data transfer in one of its local registered memory regions, and the destination of the data transfer within a remote memory region that has been registered on the remote system. The source of an RDMA Write can be specified as a gather list of buffers, while the destination must be a single, virtually contiguous region. The RDMA Write operation implies that prior to the data transfer, the VI Consumer at the remote end has informed the initiator of the RDMA Write of the location of the destination buffer, and that the buffer itself is enabled for RDMA Write operations. The remote location of the data is specified by its virtual address and its associated memory handle. For the RDMA Read operation, the VI Consumer specifies the source of the data transfer at the remote end, and the destination of the data transfer within a locally registered memory region. The VI Consumer on the receiving side must post a Receive Descriptor to receive the Immediate Data, before the sender executes the RDMA Write. If no Descriptor is posted, an error will occur and the connection may be broken. [0064]
  • The following using Servernet as an example, is a list of actions and VI Architecture calls required to support both HSP links required for the CPP high availability pipes. Hardware init [0065]
  • ServernetInit [0066]
  • ServernerReset [0067]
  • Hardware Connection [0068]
  • VipOpenNic [0069]
  • VipCloseNic [0070]
  • Endpoint Creation and Destruction [0071]
  • VipCreateVi [0072]
  • VipDestroyVi [0073]
  • Connection Management [0074]
  • VipConnectWait [0075]
  • VipConnectAccept [0076]
  • VipConnectRequest [0077]
  • VipDisconnect [0078]
  • Data transfer [0079]
  • VipPostSend [0080]
  • VipSendDone [0081]
  • VipSendWait [0082]
  • VipPostRecv [0083]
  • VipRecvDone [0084]
  • VipRecvWait [0085]
  • Querying Information [0086]
  • VlpQueryNic [0087]
  • VipSetViAttributes [0088]
  • VipQueryVi [0089]
  • VipQuerySystemManagementInfo [0090]
  • Special requirements for stop and copy functionality. [0091]
  • RDMAW does not change OS state during operation, on both initiator and receiver. [0092]
  • During a RDMAW operation memory transfer may be implemented in a variety of ways. For example, the system could start at lower memory location and increment address as data is transferred, or start at high memory location and decrement address as data is transferred. [0093]
  • Apparatus, systems, and methods consistent with the principles of the invention disclosed herein provide a high-availability architecture using high-speed pipes. The high-speed pipes may be implemented using logical pipes over an existing physical pipe. The high-speed pipes may also be implemented using conventional network interface cards. [0094]
  • VI. CONCLUSION
  • It will be apparent to those skilled in the art that various modifications and variations can be made in the high-availability apparatus, system, and methods consistent with the principles of the present invention without departing from the scope or spirit of the invention. Although several embodiments have been described above, other variations are possible within the spirit and scope consistent with the principles of the present invention. [0095]
  • Although the invention has been described in terms of two systems, the principles may be applied to more than two systems. The principles of the invention, as disclosed herein, may be used in any environment requiring high-availability. For example, the principles may be used in financial settings or call-processing systems. [0096]
  • The apparatus disclosed herein should be understood to support the processes performed thereby, and, similarly, the processes disclosed herein should be understood to support the apparatus necessary to perform the steps of the processes. It should be further understood that the apparatus and methods disclosed herein may be implemented entirely in hardware, entirely in software, or a mixture of hardware and software. [0097]
  • The apparatus and method consistent with the present invention and disclosed herein are related to apparatus and methods for a high-availability architecture using high-speed pipes. Parts of the architecture may be implemented in whole or in part by one or more sequences of instructions which carry out the apparatus and method described herein. Such instructions may be read by the computer systems or by network interface cards from a computer-readable medium, such as a storage device. Execution of sequences of instructions by the computer system or network interface cards causes performance of process steps consistent with the present invention described herein. Execution of sequences of instructions may also be considered to implement apparatus elements that perform the process steps. Hard-wired circuitry may be used in place of or in combination with software instructions to implement the invention. Thus, embodiments of the invention are not limited to any specific combination of hardware circuitry and software. [0098]
  • The term “computer-readable medium” as used herein refers to any medium that may store instructions for execution. Such a medium may take many forms, including but not limited to, non-volatile memory media, volatile memory media, and transmission media. Non-volatile memory media includes, for example, optical or magnetic disks. Volatile memory media includes RAM. Transmission media includes, for example, coaxial cables, copper wire and fiber optics, including the wires. Transmission media can also take the form of acoustic or light waves, such-as those generated during radio-wave and infra-red data communications. [0099]
  • Common forms of computer-readable media include, for example, a floppy disk, a flexible disk, hard disk, magnetic tape, or any other magnetic storage medium, a CD-ROM, any other optical medium, punchcards, papertape, any other physical medium with patterns of holes, a RAM, a PROM, an EPROM, a FLASH-EPROM, any other memory chip or cartridge, a carrier wave as described hereinafter, or any other medium from which a computer can read and use. [0100]
  • Various forms of computer readable media may be involved in carrying one or more sequences of instructions for execution to implement the high-availability architecture described herein. For example, the instructions may initially be carried on a magnetic disk or a remote computer. The remote computer can load the instructions into its dynamic memory and send the instructions over a telephone line using a modem. A modem local to a computer system can receive the data on the telephone line and use an infra-red transmitter to convert the data to an infrared signal. An infra-red detector coupled to appropriate circuitry can receive the data carried in the infra-red signal and place the data on a bus. The bus may carry data to a memory, from which a processor retrieves and executes the instructions. The instructions received by the memory may optionally be stored on a storage device either before or after execution by the processor. [0101]
  • Other embodiments of the invention will be apparent to those skilled in the art from consideration of the specification and practice of the disclosed embodiments. The specification and examples are exemplary only, and the true scope and spirit of the invention is defined by the following claims and their equivalents. [0102]

Claims (24)

We claim:
1. An apparatus for implementing a high-availability computer system architecture, comprising:
a physical pipe for transferring data between an active computer system and a standby computer system;
a first logical pipe for transferring data over the physical pipe; and
a second logical pipe for transferring high-availability data over the physical pipe.
2. The apparatus of claim 1, wherein the data transferred between the active computer system and the standby computer system on the first logical pipe comprises checkpointing data.
3. The apparatus of claim 1, wherein the high-availability data transferred between the active computer system and the standby computer system on the second logical pipe comprises total system state data of the active computer system.
4. The apparatus of claim 1, wherein the second logical pipe uses remote direct memory access write operations for transferring high-availability data.
5. The apparatus of claim 1, wherein the second logical pipe uses remote direct memory access read operations for transferring high-availability data.
6. An apparatus for implementing a high-availability computer system architecture, comprising:
a physical pipe for transferring data between an active computer system and a standby computer system; and
a network interface card for transferring data and high-availability information over the physical pipe.
7. The apparatus of claim 6, wherein the data transferred between the active computer system and the standby computer system on the network interface card comprises checkpointing data.
8. The apparatus of claim 6, wherein the high-availability information transferred between the active computer system and the standby computer system on the network interface card comprises total system state data of the active computer system.
9. The apparatus of claim 6, wherein the second logical pipe uses remote direct memory access write operations for transferring high-availability data.
10. The apparatus of claim 6, wherein the second logical pipe uses remote direct memory access read operations for transferring high-availability data.
11. A system for implementing a high-availability computer system architecture, comprising:
a physical pipe;
an active computer system for transferring data and high-availability information over the physical pipe; and
a standby computer system for receiving the high-availability information from the physical pipe.
12. The system according to claim 11, wherein the active computer system further comprises:
an interface card for transferring the data and high-availability information.
13. The system according to claim 11, wherein the standby computer system further comprises:
an interface card for receiving the high-availability information.
14. A system for implementing a high-availability computer system architecture, comprising:
physical means for transferring data between an active computer system and a standby computer system;
a first logical means for transferring data over the physical means; and
a second logical means for transferring high-availability data over the physical means.
15. The system of claim 14, wherein the data transferred between the active computer system and the standby computer system on the first logical means comprises checkpointing data.
16. The system of claim 14, wherein the high-availability data transferred between the active computer system and the standby computer system on the second logical means comprises total system state data of the active computer system.
17. The system of claim 14, wherein the second logical means uses remote direct memory access read and write operations for transferring high-availability data.
18. An apparatus for implementing a high-availability computer system architecture, comprising:
a physical pipe for transferring data between an active computer system and a standby computer system;
a first logical pipe for transferring checkpointing data over the physical pipe; and
a second logical pipe for transferring total system state data over the physical pipe.
19. The apparatus of claim 18, wherein the second logical pipe uses remote direct memory access write operations for transferring total system state data over the physical pipe.
20. The apparatus of claim 18, wherein the second logical pipe uses remote direct memory access read operations for transferring total system state data over the physical pipe.
21. A method in a high-availability computer system having an active computer system and a standby computer system, the method comprising the steps of:
sending a message to the standby computer system to enter a switch-over state;
monitoring a transfer complete marker;
transferring total system state from the active computer system to the standby computer system; and
switching from the active computer system to the standby computer system upon detecting the transfer complete marker.
22. The method of claim 21, wherein the step of transferring total system state from the active computer system to the standby computer system, further includes the step of:
performing remote direct memory access read and write operations.
23. A computer-readable medium containing instructions for performing a method in a high-availability computer system having an active computer system and a standby computer system, the method comprising the steps of:
sending a message to the standby computer system to enter a switch-over state;
monitoring a transfer complete-marker;
transferring total system state from the active computer system to the standby computer system; and
switching from the active computer system to the standby computer system upon detecting the transfer complete marker.
24. The computer-readable medium of claim 23, wherein the step of transferring total system state from the active computer system to the standby computer system, further includes the step of:
performing remote direct memory access read and write operations.
US10/692,252 1999-06-02 2003-10-23 High-availability architecture using high-speed pipes Abandoned US20040117687A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US10/692,252 US20040117687A1 (en) 1999-06-02 2003-10-23 High-availability architecture using high-speed pipes

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US13720399P 1999-06-02 1999-06-02
US09/585,577 US6715099B1 (en) 1999-06-02 2000-06-02 High-availability architecture using high-speed pipes
US10/692,252 US20040117687A1 (en) 1999-06-02 2003-10-23 High-availability architecture using high-speed pipes

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
US09/585,577 Continuation US6715099B1 (en) 1999-06-02 2000-06-02 High-availability architecture using high-speed pipes

Publications (1)

Publication Number Publication Date
US20040117687A1 true US20040117687A1 (en) 2004-06-17

Family

ID=31996568

Family Applications (2)

Application Number Title Priority Date Filing Date
US09/585,577 Expired - Lifetime US6715099B1 (en) 1999-06-02 2000-06-02 High-availability architecture using high-speed pipes
US10/692,252 Abandoned US20040117687A1 (en) 1999-06-02 2003-10-23 High-availability architecture using high-speed pipes

Family Applications Before (1)

Application Number Title Priority Date Filing Date
US09/585,577 Expired - Lifetime US6715099B1 (en) 1999-06-02 2000-06-02 High-availability architecture using high-speed pipes

Country Status (1)

Country Link
US (2) US6715099B1 (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050283543A1 (en) * 2002-03-12 2005-12-22 Hawkins Peter A Redundant system management controllers
US20060150005A1 (en) * 2004-12-21 2006-07-06 Nec Corporation Fault tolerant computer system and interrupt control method for the same
US7085226B1 (en) * 1999-10-01 2006-08-01 Lg Electronics Inc. Control apparatus and method for relay node duplexing
US7117393B2 (en) 2003-08-26 2006-10-03 Hitachi, Ltd. Failover method in a redundant computer system with storage devices
US20060250946A1 (en) * 2005-04-19 2006-11-09 Marian Croak Method and apparatus for maintaining active calls during failover of network elements
US20130326261A1 (en) * 2012-06-04 2013-12-05 Verizon Patent And Licensing Inc. Failover of interrelated services on multiple devices

Families Citing this family (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7010614B2 (en) * 2000-07-05 2006-03-07 International Business Machines Corporation System for computing cumulative amount of data received by all RDMA to determine when a complete data transfer has arrived at receiving device
US8086894B1 (en) * 2000-09-06 2011-12-27 Cisco Technology, Inc. Managing redundant network components
US7177919B1 (en) * 2000-11-28 2007-02-13 Cisco Technology, Inc. Method and system for controlling tasks on network cards
JP2002208981A (en) * 2001-01-12 2002-07-26 Hitachi Ltd Communication method
CA2432386A1 (en) * 2001-01-31 2002-08-08 International Business Machines Corporation Method and apparatus for transferring interrupts from a peripheral device to a host computer system
US7870258B2 (en) * 2001-08-08 2011-01-11 Microsoft Corporation Seamless fail-over support for virtual interface architecture (VIA) or the like
US7251747B1 (en) * 2001-09-20 2007-07-31 Ncr Corp. Method and system for transferring data using a volatile data transfer mechanism such as a pipe
KR100474704B1 (en) * 2002-04-29 2005-03-08 삼성전자주식회사 Dual processor apparatus capable of burst concurrent writing of data
US7117390B1 (en) * 2002-05-20 2006-10-03 Sandia Corporation Practical, redundant, failure-tolerant, self-reconfiguring embedded system architecture
US20050091334A1 (en) * 2003-09-29 2005-04-28 Weiyi Chen System and method for high performance message passing
US7937616B2 (en) * 2005-06-28 2011-05-03 International Business Machines Corporation Cluster availability management
US7647483B2 (en) * 2007-02-20 2010-01-12 Sony Computer Entertainment Inc. Multi-threaded parallel processor methods and apparatus
US8774225B2 (en) * 2009-02-04 2014-07-08 Nokia Corporation Mapping service components in a broadcast environment
US9864772B2 (en) 2010-09-30 2018-01-09 International Business Machines Corporation Log-shipping data replication with early log record fetching

Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5715386A (en) * 1992-09-30 1998-02-03 Lucent Technologies Inc. Apparatus and methods for software rejuvenation
US5951695A (en) * 1997-07-25 1999-09-14 Hewlett-Packard Company Fast database failover
US5974114A (en) * 1997-09-25 1999-10-26 At&T Corp Method and apparatus for fault tolerant call processing
US5987621A (en) * 1997-04-25 1999-11-16 Emc Corporation Hardware and software failover services for a file server
US6081851A (en) * 1997-12-15 2000-06-27 Intel Corporation Method and apparatus for programming a remote DMA engine residing on a first bus from a destination residing on a second bus
US6115829A (en) * 1998-04-30 2000-09-05 International Business Machines Corporation Computer system with transparent processor sparing
US6205557B1 (en) * 1998-06-09 2001-03-20 At&T Corp. Redundant call processing
US6263363B1 (en) * 1999-01-28 2001-07-17 Skydesk, Inc. System and method for creating an internet-accessible working replica of a home computer on a host server controllable by a user operating a remote access client computer
US6298457B1 (en) * 1997-10-17 2001-10-02 International Business Machines Corporation Non-invasive networked-based customer support
US20010056554A1 (en) * 1997-05-13 2001-12-27 Michael Chrabaszcz System for clustering software applications
US6374262B1 (en) * 1998-03-25 2002-04-16 Fujitsu Limited Relational database synchronization method and a recording medium storing a program therefore
US6378021B1 (en) * 1998-02-16 2002-04-23 Hitachi, Ltd. Switch control method and apparatus in a system having a plurality of processors
US6427213B1 (en) * 1998-11-16 2002-07-30 Lucent Technologies Inc. Apparatus, method and system for file synchronization for a fault tolerate network
US6463342B1 (en) * 2000-04-19 2002-10-08 Ford Motor Company Method for preventing computer down time

Patent Citations (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5715386A (en) * 1992-09-30 1998-02-03 Lucent Technologies Inc. Apparatus and methods for software rejuvenation
US5987621A (en) * 1997-04-25 1999-11-16 Emc Corporation Hardware and software failover services for a file server
US20010056554A1 (en) * 1997-05-13 2001-12-27 Michael Chrabaszcz System for clustering software applications
US6363497B1 (en) * 1997-05-13 2002-03-26 Micron Technology, Inc. System for clustering software applications
US5951695A (en) * 1997-07-25 1999-09-14 Hewlett-Packard Company Fast database failover
US5974114A (en) * 1997-09-25 1999-10-26 At&T Corp Method and apparatus for fault tolerant call processing
US6298457B1 (en) * 1997-10-17 2001-10-02 International Business Machines Corporation Non-invasive networked-based customer support
US6081851A (en) * 1997-12-15 2000-06-27 Intel Corporation Method and apparatus for programming a remote DMA engine residing on a first bus from a destination residing on a second bus
US6378021B1 (en) * 1998-02-16 2002-04-23 Hitachi, Ltd. Switch control method and apparatus in a system having a plurality of processors
US6374262B1 (en) * 1998-03-25 2002-04-16 Fujitsu Limited Relational database synchronization method and a recording medium storing a program therefore
US6115829A (en) * 1998-04-30 2000-09-05 International Business Machines Corporation Computer system with transparent processor sparing
US6205557B1 (en) * 1998-06-09 2001-03-20 At&T Corp. Redundant call processing
US6427213B1 (en) * 1998-11-16 2002-07-30 Lucent Technologies Inc. Apparatus, method and system for file synchronization for a fault tolerate network
US6263363B1 (en) * 1999-01-28 2001-07-17 Skydesk, Inc. System and method for creating an internet-accessible working replica of a home computer on a host server controllable by a user operating a remote access client computer
US6463342B1 (en) * 2000-04-19 2002-10-08 Ford Motor Company Method for preventing computer down time

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7085226B1 (en) * 1999-10-01 2006-08-01 Lg Electronics Inc. Control apparatus and method for relay node duplexing
US20050283543A1 (en) * 2002-03-12 2005-12-22 Hawkins Peter A Redundant system management controllers
US7337243B2 (en) * 2002-03-12 2008-02-26 Intel Corporation Redundant system management controllers
US7117393B2 (en) 2003-08-26 2006-10-03 Hitachi, Ltd. Failover method in a redundant computer system with storage devices
US20060150005A1 (en) * 2004-12-21 2006-07-06 Nec Corporation Fault tolerant computer system and interrupt control method for the same
US7441150B2 (en) * 2004-12-21 2008-10-21 Nec Corporation Fault tolerant computer system and interrupt control method for the same
US20060250946A1 (en) * 2005-04-19 2006-11-09 Marian Croak Method and apparatus for maintaining active calls during failover of network elements
US8593939B2 (en) * 2005-04-19 2013-11-26 At&T Intellectual Property Ii, L.P. Method and apparatus for maintaining active calls during failover of network elements
US20130326261A1 (en) * 2012-06-04 2013-12-05 Verizon Patent And Licensing Inc. Failover of interrelated services on multiple devices
US8935562B2 (en) * 2012-06-04 2015-01-13 Verizon Patent And Licensing Inc. Failover of interrelated services on multiple devices

Also Published As

Publication number Publication date
US6715099B1 (en) 2004-03-30

Similar Documents

Publication Publication Date Title
US6715099B1 (en) High-availability architecture using high-speed pipes
JP3266481B2 (en) Method and associated apparatus for recovering from a failure in a disk access path of a clustered computing system
US8191078B1 (en) Fault-tolerant messaging system and methods
JP3718471B2 (en) Crash recovery without full remirror
US5878205A (en) Method and system for processing complex recovery using polling signals in a shared medium
AU723208B2 (en) Fault resilient/fault tolerant computing
JP3156083B2 (en) Fault-tolerant computer equipment
US7600087B2 (en) Distributed remote copy system
US7194652B2 (en) High availability synchronization architecture
CN100591031C (en) Methods and apparatus for implementing a high availability fibre channel switch
US7290086B2 (en) Method, apparatus and program storage device for providing asynchronous status messaging in a data storage system
US8375363B2 (en) Mechanism to change firmware in a high availability single processor system
US6718347B1 (en) Method and apparatus for maintaining coherence among copies of a database shared by multiple computers
US7167963B2 (en) Storage system with multiple remote site copying capability
US8583755B2 (en) Method and system for communicating between memory regions
US9948545B2 (en) Apparatus and method for failover of device interconnect using remote memory access with segmented queue
JP2003503796A (en) Intelligent splitter, system, and usage
JP2002041348A (en) Communication pass through shared system resource to provide communication with high availability, network file server and its method
JP2002525748A (en) Protocol for replication server
US7065673B2 (en) Staged startup after failover or reboot
US7987154B2 (en) System, a method and a device for updating a data set through a communication network
JP4498389B2 (en) Multi-node computer system
JPH086910A (en) Cluster type computer system
US8595452B1 (en) System and method for streaming data conversion and replication
EP1001344A2 (en) Apparatus, method and system for file synchronization for a fault tolerant network

Legal Events

Date Code Title Description
STCB Information on status: application discontinuation

Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION