US20140006543A1 - Distributed filesystem atomic flush transactions - Google Patents

Distributed filesystem atomic flush transactions Download PDF

Info

Publication number
US20140006543A1
US20140006543A1 US13/930,560 US201313930560A US2014006543A1 US 20140006543 A1 US20140006543 A1 US 20140006543A1 US 201313930560 A US201313930560 A US 201313930560A US 2014006543 A1 US2014006543 A1 US 2014006543A1
Authority
US
United States
Prior art keywords
flush
request
file
dds
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
US13/930,560
Inventor
William M Pitts
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Individual
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Individual filed Critical Individual
Priority to US13/930,560 priority Critical patent/US20140006543A1/en
Priority to US13/959,474 priority patent/US20150006649A1/en
Publication of US20140006543A1 publication Critical patent/US20140006543A1/en
Priority to US15/726,116 priority patent/US10440141B2/en
Priority to US16/554,425 priority patent/US11050839B2/en
Priority to US17/361,063 priority patent/US11349949B2/en
Abandoned legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/50Network services
    • H04L67/56Provisioning of proxy services
    • H04L67/568Storing data temporarily at an intermediate stage, e.g. caching
    • H04L67/5682Policies or rules for updating, deleting or replacing the stored data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0804Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches with main memory updating
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0891Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches using clearing, invalidating or resetting means
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/16Combinations of two or more digital computers each having at least an arithmetic unit, a program unit and a register, e.g. for a simultaneous processing of several programs
    • G06F15/163Interprocessor communication
    • G06F15/167Interprocessor communication using a common memory, e.g. mailbox
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/17Details of further file system functions
    • G06F16/172Caching, prefetching or hoarding of files
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/182Distributed file systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/182Distributed file systems
    • G06F16/1824Distributed file systems implemented using Network-attached Storage [NAS] architecture
    • G06F16/183Provision of network file services by network file servers, e.g. by using NFS, CIFS
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L65/00Network arrangements, protocols or services for supporting real-time applications in data packet communication
    • H04L65/40Support for services or applications
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/06Protocols specially adapted for file transfer, e.g. file transfer protocol [FTP]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1097Protocols in which an application is distributed across nodes in the network for distributed storage of data in networks, e.g. transport arrangements for network file system [NFS], storage area networks [SAN] or network attached storage [NAS]

Definitions

  • DDS Distributed Data Service
  • the Distributed Data Service (DDS) architecture provides a framework for highly distributed, hierarchical, multi-protocol caching.
  • DDS is a distributed caching layer that spans an enterprise's network, encompassing multiple LANs interconnected with WAN links.
  • the caching layer's constituent parts are DDS modules installed on file servers, client workstations, and intermediate nodes (routers, switches, and computers).
  • DDS employs TCP/IP for inter-site communications and may therefore be incrementally deployed.
  • Non-DDS nodes appear as “just part of the wire”.
  • the DDS caching layer slices through each DDS configured computer system at the vnode interface layer.
  • File systems UFS, VxFS, NTFS, EXT4
  • other devices such as video sources and shared memory plug into the bottom of the caching layer and provide permanent file storage or a data sourcing/sinking capability.
  • Client systems plug into the top of the caching layer to access “local” data.
  • intermediate DDS nodes routers, switches, and other computers
  • the DDS layer implements an intelligent integrated data streaming and caching mechanism to make file data appear as “local” as possible.
  • a client process accesses a file
  • the file appears to be local (in terms of file access performance) if it has been accessed before and has not been modified since its last access.
  • DDS pre-fetches file data in advance of the client's request stream. Of course, pre-fetching is only performed for well-behaved clients. Write behind is also implemented by DDS in a manner consistent with the fact that users aren't very tolerant of file systems that lose their data.
  • Data cached within the DDS layer is stored in a protocol neutral format in a manner that requires no “translation” when the client is of the same type (Unix, Windows, . . . ) as the origin server.
  • the DDS layer maintains “UFS consistency” (a read always returns the most recently written data) on cached images and provides several methods of handling and recovering from network partitioning events. To the maximum extent possible, recovery and reconnection is performed automatically with no requirement for user or administrator intervention.
  • UFS consistency a read always returns the most recently written data
  • transparent means that when an intermediate network node or link fails during a DDS file access operation, an alternate path to the origin server is discovered and used to complete the operation without the client or server ever even becoming aware of the network failure.
  • One example embodiment includes a computing system where a data request has been passed between an upstream site and a file service proxy cache node, the file service proxy cache node being a network node located between the upstream site and the origin file system node, a non-transitory computer-readable storage medium including instructions that, when executed by the file service proxy cache node, performs the step receiving a flush request from the upstream site.
  • the flush request includes a request to save flush data contained in the flush request to a stable memory.
  • the instructions also perform the steps storing the flush data in a shadow extent and dispatching a downstream flush request to a second file service proxy cache node.
  • the instructions further perform the step receiving a response to the downstream flush request from the second file service proxy cache node.
  • the instructions additionally perform the step sending a flush response that includes a status code that indicates the successful completion of the request to the upstream site.
  • Another example embodiment includes a computing system where a data request has been passed between an upstream site and a file service proxy cache node, the file service proxy cache node being a network node located between the upstream site and the origin file system node, a non-transitory computer-readable storage medium including instructions that, when executed by the file service proxy cache node, performs the step receiving a flush request from the upstream site.
  • the flush request includes a request to save flush data contained in the flush request to a stable memory.
  • the instructions also perform the step storing the flush data in a shadow extent.
  • the instructions further perform the steps, if the file service proxy cache node is not a server terminator site, dispatching a downstream flush request to a second file service proxy cache node and receiving a response to the downstream flush request from the second file service proxy cache node.
  • the instructions additionally perform the step sending a flush response that includes a status code that indicates the successful completion of the request to the upstream site.
  • Another example embodiment includes a computing system where a data request has been passed between an upstream site and a file service proxy cache node, the file service proxy cache node being a network node located between the upstream site and the origin file system node, a non-transitory computer-readable storage medium including instructions that, when executed by the file service proxy cache node, performs the step receiving a flush request from the upstream site.
  • the flush request includes a request to save flush data contained in the flush request to a stable memory and a flush level, wherein the flush level indicates how far the data should be flushed.
  • the instructions also perform the step storing the flush data in a shadow extent.
  • the instructions further perform the steps, if the file service proxy cache node is not a server terminator site, dispatching a downstream flush request to a second file service proxy cache node and receiving a response to the downstream flush request from the second file service proxy cache node. If the downstream flush request was successfully completed the instructions additionally perform the step promoting the shadow extents. The instructions moreover perform the step sending a flush response that includes a status code that indicates the successful completion of the request to the upstream site.
  • FIG. 1 illustrates a DDS virtual file server constructed with a flat network topology
  • FIG. 2 depicts the flat network of FIG. 1 reconfigured to provide high availability access to the eng and sales sub-domains, and another node has been added to create redundant network paths;
  • FIG. 3 depicts the /etc/dds_exports file (also referred to as the site map file) for node 4, which exports the eng sub-domain;
  • FIG. 4 illustrates the domain map file for node 1
  • FIG. 5 illustrates the export filesystem (site tree) constructed by acme-4 by following the directions contained in the site map file;
  • FIG. 6 illustrates the domain tree for node 1
  • FIG. 7 depicts a simple hierarchy of DDS nodes configured with multiple routes from node A to the origin server node (node S);
  • FIG. 8 is a flowchart illustrating an example of a method 800 of recursive ascent failure notifications
  • FIG. 9 is a flowchart illustrating an example of a method of atomic flush transactions
  • FIG. 10 is a flowchart illustrating an example of a method of validating a cached file image when a response received at an upstream site contains the same version number as the one associated with the cached file image;
  • FIG. 11 is a flowchart illustrating an example of a method of flushing modified file data downstream towards the DDS server terminator site
  • FIG. 12 is a flowchart illustrating an example of a method employed by a DDS node to load a path signature when a connection is established.
  • FIG. 13 is a flowchart illustrating an example of a method employed to reconnect an upstream DDS node.
  • a DDS network may be configured into either a flat or hierarchical organization.
  • Hierarchical topologies inherently provide more latitude for constructing networks incorporating redundant paths.
  • DDS employs redundant network paths to transparently re-route file access traffic around the failure.
  • FIG. 1 illustrates a DDS virtual file server constructed with a flat network topology.
  • the virtual file server exports the acme domain, consisting of four sub-domains: corp, mrkt, sales and eng.
  • the virtual file server appears to be a single multi-homed (multiple ip addresses) Windows or Linux file server to client workstations.
  • the public network interfaces are DNS registered with the names acme-1, acme-2, acme-3 and acme-4. Workstations may use any of the four public interfaces and may switch to using a different interface at any time.
  • the “view through any portal” is equivalent to the view through any other portal.
  • DDS's distributed consistency mechanism works behind the scenes to ensure this consistency of views.
  • a network node fails, a flat network topology does not provide the redundant paths necessary to provide transparent uninterrupted access to all file server data.
  • FIG. 2 depicts the flat network of FIG. 1 reconfigured to provide high availability access to the eng and sales sub-domains, and another node has been added to create redundant network paths.
  • Nodes 3 and 4 have shared access to a storage area network (SAN) containing the filesystems exported through the eng and sales sub-domains. These two nodes are configured such that if either node fails, the other node will notice the failure and mount and export the failed node's filesystems.
  • SAN storage area network
  • Node 5 Another node has been added to the configuration depicted in FIG. 1 .
  • Node 5 named acme-5, does not export any of its own filesystems. It is a domain manager providing the same “portal view” as the other four nodes. However, it also serves as the DDS gateway for remote clients and remote DDS workgroup accelerators.
  • DDS nodes operate independently and cooperatively to create a hierarchical global namespace.
  • a DDS node constructs a site tree as specified by the /etc/dds_exports file.
  • FIG. 3 depicts the /etc/dds_exports file (also referred to as the site map file) for node 4, which exports the eng sub-domain.
  • FIG. 4 illustrating the domain map file for node 1 , specifies:
  • the names used in a domain map file may be public names (registered with a network name service such as DNS) or they may be private.
  • n1 through n4 are private names
  • acme-1 through acme-4 are public names.
  • the domain nodes communicate with each other using the domain's private names, and clients of the DDS virtual file server use the domain's public names.
  • a node's private name and public name may resolve to the same IP address or to different IP addresses.
  • the node exports these two trees.
  • the site tree is a single filesystem containing all content exported by this node.
  • the domain tree is a single filesystem containing all content exported by all sub-domain nodes for which this node is a domain manager.
  • the initialization process for a DDS node occurs in two phases:
  • the site map file specifies what is exported, but not what it is called.
  • the domain map file ( FIG. 4 ) specifies that n4 (private name for node 4) supplies the eng sub-domain, but it is the site map file ( FIG. 3 ) that defines what node 4 will export under the name eng. So, the pathname employed by a user on a client workstation to access an acme hardware engineering document would look something like: /dds/acme/eng/hw/the_document.doc.
  • the multi-homed virtual file server acme may now be accessed through any of it four network interfaces.
  • the top level directory structure of the acme's exported domain tree is:
  • Clients may now direct their requests to any acme portal and expect to receive the same response.
  • a DDS node constructs a global namespace that includes its exported filesystem and the exported filesystems of all of its sub-domains. Following initialization, the DDS node is a file access portal to all files and directories contained within the domain's global namespace. The DDS node may also be bound into a larger domain as a sub-domain of a ‘higher level’ domain. This process may be recursively repeated until there is a single Internet domain that encompasses content from thousands or millions of origin servers distributed throughout the world.
  • DDS nodes may initialize to become Internet domain managers (portals). So, a portal located anywhere in the world may provide access to content distributed about the globe.
  • DDS employs extensive file level hierarchical caching to make data appear to be ‘here’ rather than ‘there’. From a filesystem perspective, a phrase that encapsulates DDS's primary focus is DDS removes the distinction between local files and remote files.
  • Redundancy can be woven into the DDS fabric to create always-available networks, and redundancy incorporated into file servers can ensure the continuous availability of file data. DDS transparently overcomes network failures whenever redundant paths make it possible to do so.
  • channel refers to the channels at the individual sites bound together by DDS Protocol into a single integrated channel.
  • a single DDS module contains client-side code for requesting file data from an origin file server and server-side code that receives and responds to requests from “upstream” DDS sites.
  • client-side code for requesting file data from an origin file server
  • server-side code that receives and responds to requests from “upstream” DDS sites.
  • upstream is towards the origin server
  • upstream is towards the client.
  • the DDS protocol defines two remote procedures for transporting file data: DDS_LOAD and DDS_FLUSH. These two procedures are briefly described since recovery operations are based upon variations of these procedures.
  • DDS_LOAD This operation loads data and metadata from a downstream site.
  • the request includes a file identifier and the flags (DDS_CC_SITE_READING, DDS_CC_SITE_WRITING) that inform the downstream site of the types of operations that will be performed upon the file data being loaded. These flags are used by the distributed consistency mechanism to keep track of the type of operations (read vs. write) being performed at upstream sites.
  • a single load or flush request may specify multiple file segments and each segment may be up to 4 gigabytes in length.
  • the response includes flags (DDS_CC_SUSTAIN_DIR_PROJECTION, and DDS_CC_SUSTAIN_FILE_PROJECTION) that indicate whether the returned file data and metadata may be cached or whether it must be discarded immediately after responding to the current client request.
  • flags DDS_CC_SUSTAIN_DIR_PROJECTION, and DDS_CC_SUSTAIN_FILE_PROJECTION
  • the DDS server terminator site projects file data into remote DDS client terminator sites. These projections are sustained in the remote DDS sites while the file is being accessed at those sites unless a concurrent write sharing condition arises.
  • An upstream DDS cache buffer is no different than an internal origin file server buffer. After a write operation modifies a file system buffer (either local or remote), performance is enhanced if the buffer is asynchronously written to the server's disk. However, file modifications are safeguarded when they are synchronously written to disk or some other form of stable storage. Flush levels allow both the client and the server to express their level of paranoia. The more paranoid of the two usually prevails.
  • an upstream DDS site flushes all of a channel's dirty data and metadata downstream as an atomic unit.
  • the downstream site remains “committed” to the upstream site until the last batch of data (flagged with DDS_FLUSH_DATA_COMPLETE) is successfully received. This means that the channel at the downstream site will not service a request from any other upstream site until the flush has completed.
  • each file extent structure contains a pointer to a shadow extent
  • each attribute structure contains a pointer to a set of shadow attributes.
  • each dirty extent (flagged with X_DIRTY) is flagged with X_FLUSHING. If the flush operation does not complete successfully, DDS resets all X_FLUSHING flags. And, of course, when the operation is successful both the X_DIRTY and X_FLUSHING flags are reset.
  • the DDS client-side node processes a multi-transfer flush operation in an atomic manner. Once the channel has been acquired and the first batch has been accepted at the downstream site (as opposed to rejected because of a consistency operation), it will not be released until the last batch has been dispatched. And when released, the channel will either be clean (successful flush) or it will be just as dirty as it ever was.
  • DDS In addition to the two remote procedures used to move file data up and down the wire, DDS also defines:
  • This section presents a simplified overview of DDS network communications.
  • DDS nodes communicate using multi-threaded SUNRPC remote procedure calls over TCP/IP connections. For every successful remote procedure call there is a client making the call and a server responding to the call.
  • SUNRPC & TCP/IP have built in mechanisms to reliably transport requests and responses across a network. DDS, layered on top of the SUNRPC and TCP/IP combination, depends upon this protocol stack for reliable message delivery.
  • a component failure manifests itself as a failure to respond to a request.
  • a router, switch, intermediate DDS node or a server component failed. What does matter is that the client issued a request and did not receive a response.
  • An “industry standard” NFS client would, in this circumstance, keep re-issuing the same request until the server responded and then the client would proceed as normal. (A DDS client is more proactive in this situation, and this is described later.)
  • a network partition event There is a class of failures referred to as a network partition event, where both DDS client-side nodes and server-side nodes remain operational, but the failed component, has isolated the client-side from the server-side.
  • a network partition event occurs, the client-side and server-side components assume very different roles.
  • the server-side's main priority is to ensure the integrity of all file data and then to continue providing file access services to clients still able to communicate with the server.
  • the client-side's priorities are: a) to safeguard file modifications that have not yet been successfully flushed to the origin server; b) to re-establish communication with the server and immediately flush all files ‘crossing’ the partition; and c) to continue providing file access services if possible.
  • each side plays a different role during a partition event.
  • each role is tempered and shaped by DDS domain policy attributes. These attributes provide instructions for handling file data, processing file access requests, and responding to failures.
  • the following section, Centralized Control over Distributed Operations, describes how policy attributes are employed to exercise centralized control over geographically distributed DDS overlay networks.
  • DDS origin server Whenever a DDS origin server responds to a file access request, the file's policy attributes are fed into the DDS distribution network as a class of metadata associated with the file data throughout the network. Every DDS node faithfully adheres to all policies specified by the file's policy attributes under all circumstances. (Of course, DDS nodes employ standard authentication methods to ensure that secured data is only sent to nodes that can be trusted.)
  • File metadata including policy attributes, is provided with the same level of consistency as regular file data. Therefore, a metadata read operation (to fetch file attributes and/or policies) at any DDS site will return the most recently written metadata. This means the policies for handling a file or a group of files can be changed instantly throughout the network.
  • a DDS node communicates with other DDS nodes at the network transaction level.
  • a network transaction usually consisting of a single request-response interaction, uses whatever number or remote procedure calls are required to complete an atomic DDS operation. For example, some DDS_FLUSH operations require multiple request-response interactions to perform a flush as an atomic DDS operation.
  • the server-side DDS channel targeted by the request either “steps completely forward” to a new state or it remains unchanged from its original state.
  • the client-side DDS channel issuing the request either “steps completely forward” to a new state or it remains unchanged from its original state.
  • DDS servers incorporate a duplicate request cache (DRC) that enables the server to receive the same request multiple times and to ensure all responses will be the same as the first response.
  • DRC duplicate request cache
  • the network transaction feature ensures that each DDS node either “does” or “completely does not” respond to a client request. But it is possible, even likely, that the server can “do” while the client-side “completely does not” because a network failure prevented the delivery of the response.
  • DDS's idempotent operations feature provides a graceful (and transparent!) method for the client-side to catch up with the server-side.
  • FIG. 7 depicting a simple hierarchy 700 of DDS nodes configured with multiple routes from node A to the origin server node (node S), assumes the following scenario for illustrative purposes:
  • FIG. 8 is a flowchart illustrating an example of a method 800 of recursive ascent failure notifications.
  • the method 800 can be used in a hierarchy 700 of FIG. 7 .
  • a failure notification (DDS_BAD_LINK) percolates up to a higher level only after the reconnect efforts at the current level failed to re-establish a downstream connection.
  • FIG. 8 shows that the method 800 can include declaring 802 a network failure.
  • the DDS client node whose immediate downstream link has stopped responding is the site that declares 802 a network failure.
  • node F will be the node that detects and declares 802 the network failure.
  • FIG. 8 also shows that the method 800 can include determining 804 if an alternate route is available. Determining 804 if an alternate route is available can include referencing network configuration data stored at the node that is declaring a network failure 802 or searching the directory tree for connections by other nodes to the target node. Additionally or alternatively, determining 804 if an alternate route is available can include communicating with other connected nodes to determine if a path to the target node exists.
  • determining 804 if an alternate route is available can include communicating with other connected nodes to determine if a path to the target node exists.
  • a request does not include a request to the originating node. I.e., node F of FIG. 7 will not search for an alternate path through node C, which sent the data request to node F.
  • FIG. 8 further shows that the method 800 can include reporting 806 an error code if an alternate route is not available.
  • node F since node F has no alternate routes to node S, node F will respond to node C's DDS_LOAD request with and error code of DDS_BAD_LINK.
  • node C attempts to reconnect on an alternate path (node E in FIG. 7 ) and if it fails to do so, node C will respond to node A with a status of DDS_BAD_LINK. I.e., node C will complete the same method 800 being utilized by node F.
  • node A will attempt to reconnect on an alternate path and if it fails to do so, it will respond to its client with a status of DDS_BAD_LINK.
  • FIG. 8 additionally shows that the method 800 can include re-establishing 808 the file connection over the alternative route if found. I.e., since node F has no alternate working route to node S, it responds to node C with DDS_BAD_LINK. Node C then attempts to re-establish a file connection 808 to node S through node E.
  • FIG. 8 shows that when the attempt to re-establish a file connection 808 is successful, node C re-sends 810 the same DDS_LOAD request to node E that was originally sent to node F. If node E responds with DDS_OK, node C will respond to node A with DDS_OK.
  • node C originally dispatched it's DDS_LOAD request (which it had generated in response to having received a DDS_LOAD request from node A), node C was willing to wait for node F to respond because node C's fast pings kept reassuring it that node F and the link to it were both operational.
  • a DDS channel “steps forward” one request-response cycle at a time.
  • the external view of a channel's state transitions from its state before a request is processed to its state after the request is successfully processed in an atomic manner.
  • the external view (as opposed to internal view which is what the code processing requests “sees”) of a channel's state will never reflect a partially completed network request. It will reflect the channel's state either before or after a request has been processed, and never anything in-between.
  • a channel may be required to engage with its downstream channel, but, if so, this communication constitutes a separate request-response transaction.
  • DDS flushes are atomic and synchronous from the DDS client terminator site all the way through to the origin filesystem.
  • intermediate nodes may be simultaneously forwarding transfer n while receiving transfer n+1.
  • An upstream node does not consider a flush successful until it receives a DDS_OK response with the ALL_DATA_RECEIVED response flag also set.
  • Flush data is maintained in shadow extents at upstream nodes until the upstream node receives confirmation that the origin filesystem has successfully received all flush data.
  • each transfer flows through intermediate nodes independently of the other transfers. All nodes store the received flush data in shadow extents.
  • the server terminator site receives the last flush transfer (DATA_COMPLETE request flag is set), it writes all received flush data to the origin filesystem.
  • the server terminator site After receiving confirmation that the write was successful, the server terminator site replaces all extents that have shadow extents with their shadows and then dispatches a DDS_OK response with the ALL_DATA_RECEIVED response flag also set back upstream. As the response propagates through each intermediate node, each node also promotes its shadow extents.
  • FIG. 9 is a flowchart illustrating an example of a method 900 of atomic flush transactions.
  • DDS flushes are atomic and synchronous from the client terminator site to the server terminator site.
  • FIG. 9 shows that the method 900 can include receiving 902 a flush request from an upstream node. I.e., modified file data/metadata is received 902 with the intent that the modified file data/metadata is saved to some form of stable memory, as described below.
  • FIG. 9 additionally shows that the method 900 can include storing 904 the received flushed data in shadow extents. I.e., the data received 902 in the flush request from the upstream node is stored in shadow extents.
  • FIG. 9 also shows that the method 900 can include determining 906 if this site is the server terminator site.
  • FIG. 9 shows that the method 900 can include determining 908 if all batches of the flush (multi-transfer flushes have more than one) have been received and the DATA_COMPLETE request flag was set in one of the requests. (Note that it is not uncommon for network requests to be received and/or processed out of sequence with respect to when they were dispatched by the upstream node.)
  • FIG. 9 shows that the method 900 can include determining 910 if synchronous writes are enabled for the file identified in the flush request and when synchronous writes are enabled, writing 912 all received flush data to the origin filesystem.
  • FIG. 9 shows that the method 900 can include determining 914 whether the filesystem write operation completed successfully.
  • FIG. 9 shows that the method 900 can include responding 916 to the flush request with a status that conveys the write error code.
  • FIG. 9 shows that the method 900 can include setting 918 the ALL_DATA_RECEIVED response flag; promoting 920 the flushed data in the shadow extents; and responding 922 to the received flush request with a status code of DDS_OK. Furthermore, when the DATA_COMPLETE request flag is not set, FIG. 9 shows that the method 900 can include simply responding 922 to the received flush request with a status code of DDS_OK.
  • FIG. 9 shows that the method 900 can include forwarding 924 the flush request to a downstream node. I.e., the site forwards 924 the flush request to ensure that the flush request continues until it reaches the server terminator site.
  • FIG. 9 further shows that the method 900 can include receiving 926 a response to the flush request from the downstream node. I.e., because the flush) request was forwarded 924 , the site waits for confirmation that the downstream site (and all sites to which the downstream site forwarded the request) has responded to the flush request.
  • FIG. 9 further shows that the method 900 can include determining 928 if the response status is DDS_OK. I.e., the site determines 928 that the downstream site (and all sites to which the downstream site forwarded the request) has successfully completed the flush request.
  • FIG. 9 further shows that the method 900 can include decrementing 930 a retry count and, if the count is greater than zero, re-forwarding 924 the flush request along the same route as the previous request. However, if the retry count equals zero, FIG. 9 further shows that the method 900 can include declaring 932 a network failure, which will initiate a search for an alternate route to the server terminator site.
  • FIG. 9 further shows that the method 900 can include determining 934 if the ALL_DATA_RECEIVED response flag is set, indicating that the server terminator site has received and accepted all flush data.
  • FIG. 9 further shows that the method 900 can include promoting 920 the flushed data in the shadow extents. All extents that have shadow extents are replaced with their shadows. I.e., the changes are made permanent and the data is no longer stored within a shadow extent. Any further access of the file will receive the updated file, rather than the pre-update file.
  • FIG. 9 additionally shows that the method 900 can include responding 922 to the received flush request with a code of DDS_OK.
  • each node also promotes its shadow extents if the ALL_DATA_RECEIVED response flag is also set. This ensures that each node is working from the same concurrent data. The response, therefore, propagates upstream until the originating node receives it, thereby becoming aware that the flush request has been completed successfully.
  • DDS's consistency mechanism is woven into the DDS Protocol. Every DDS file access request provides an indication of whether the client intends to modify the returned file data or just read it.
  • DDS implements an internal rule: a DDS client site must first inform its downstream DDS site before it performs a new type of activity (read/write) on a file. So, for example, a client node that has previously fetched a complete file for reading cannot begin writing without first informing the downstream site that it intends to begin writing. This allows the server-side to detect the onset of a concurrent write sharing (CWS) condition and take whatever steps are necessary to maintain cache consistency before the client-side actually performs the write operation.
  • CWS concurrent write sharing
  • the consistency mechanism ensures that, at any instant, only a single DDS site is modifying a file. Therefore, at the moment when a site with dirty data (modified data) becomes partitioned from its downstream server-side counterpart, all other sites are guaranteed to not have any file modifications.
  • the isolated site holds the most recent file modifications and all portals still connected to the server terminator site provide a consistent view of the file “just before” the latest (and now isolated) file write. The isolated site can flush the file modifications to any DDS site providing connectivity to the source file, including the server terminator site itself.
  • Modern computer systems typically employ internal clocks with nanosecond or microsecond resolutions.
  • the clock used by remote DDS nodes cannot be synchronized with the clock used by the origin filesystem's code to the level of precision required to support standard filesystem operations. So, filesystem timestamps may only be set at the origin server site.
  • DDS file access responses always include the target file's attributes, which are cached and stored in association with the file's data.
  • the file's last modification timestamp a file attribute element, is used as a version number for cached images.
  • the cached image of a file's last modification timestamp may be referred to as the file's version number. The two are the same.
  • the differing nomenclature relates to how the attribute is interpreted and used at DDS sites.
  • Upstream DDS sites use a file's timestamp (a file attribute), the timestamp arrival time and a delta time to generate and maintain projected timestamps.
  • Projected timestamps which enable DDS upstream sites to operate autonomously (the main point of file caching), are temporary timestamps that are replaced with (upgraded to) genuine filesystem timestamps whenever a server terminator site accesses file data in the origin filesystem.
  • the elements of a projected timestamp are:
  • the last modification timestamp contained in the response becomes the version number
  • the m_arrival time is set to the current time
  • the m_delta time is set to zero.
  • FIG. 10 is a flowchart illustrating an example of a method 1000 of validating a cached file image when a response received at an upstream site contains the same version number as the one associated with the cached file image; and invalidating the cached file image, storing the new version number, setting the m_arrival time to the host system's current time, and resetting the m_delta time to zero when a response contains a different version number than the one associated with the cached file image.
  • FIG. 10 moreover shows that the method 1000 can include dispatching 1002 a DDS file access request to the downstream DDS site. I.e., the node sends a request to read, write, save, etc. a file from a downstream site.
  • FIG. 10 also shows that the method 1000 can include receiving 1004 a response to the DDS downstream request. I.e., the node receives any response from the downstream site, regardless of whether the access was successful or not.
  • FIG. 10 further shows that the method 1000 can include determining 1006 if the DDS request was processed without any errors and the response status is therefore DDS_OK. And, when there is an error, that the request is repeated. However, what is not depicted is that after a few unsuccessful re-attempts the upstream site will declare a network failure and may begin searching for an alternate route to the DDS server terminator site.
  • FIG. 10 further shows that the method 1000 can include determining 1008 whether the received version number is the same as the version number associated with the cached file image.
  • the response to any successful DDS file access request (response status is DDS_OK) conveys the file's attributes, which include the version number (m_time, the last modification timestamp).
  • FIG. 10 also shows that the method 1000 can include continuing to use 1010 the current cached file image (revalidating the current file image) when the received version number is the same as the version number associated with the cached file image. I.e., because this site is using the current version, the current file image is revalidated and is continued to be used.
  • FIG. 10 additionally shows that when the received version number is not the same as the version number associated with the cached file image, the method 1000 can include storing 1012 the new version number, setting the m_arrival time to the host system's current time, and resetting the m_delta time to zero.
  • FIG. 10 further shows that the method 1000 can include determining 1014 whether the response is a flush response. And when the response is not a flush response the method 1000 can include discarding 1016 the current cached file image.
  • a DDS flush request flows through to the origin file server in a single atomic operation and the response returns the last modification timestamp as the file's version number. So, a file's version number always changes when a flush response with a status of DDS_OK is received. (When the version number changes on any other response, the file has been modified at some other site and the cache image at this site is therefore not current and must be discarded.)
  • FIG. 10 further shows that the method 1000 can include determining 1018 if the response has the CACHING_ENABLED flag set and caching 1020 any new response data and metadata when the CACHING_ENABLED flag is set.
  • Upstream DDS sites provide better (faster) write performance when client file modifications are captured at the site and not immediately flushed downstream. “Collecting” many client file modifications before sending a “batch flush” downstream employs the underlying network infrastructure far more efficiently and provides a more responsive file access service to the client. However, when a DDS client terminator site acknowledges a client write request before the new data has been successfully flushed downstream, the possibility arises that a future network failure could cause the new data to be lost.
  • DDS provides a distributed file service and is therefore always balancing performance against filesystem integrity. Performance is increased when DDS client terminator sites operate autonomously, but the risk of losing file modifications is also increased. Both the administrator and the user can set or adjust policies affecting this balancing act.
  • the most risk adverse policy instructs client terminator sites to immediately forward file modifications on to the server terminator site and to not respond to a client write request until the origin server acknowledges the successful receipt of the new data.
  • This mode referred to as synchronous writes, traverses the full network path from client application to origin server on every write operation.
  • the file data flowing downstream includes the version number. It will be the same as the version number at the DDS server terminator site and all intermediate DDS nodes unless some previous network partition event prevented consistency control messages from being delivered.
  • a DDS client site detects that it is isolated from its server-side companion (it employs fast pings to detect this quickly), it immediately invokes procedures to re-establish communication with a DDS node still “connected” to the server terminator site. Once the isolated, and now paranoid, node reconnects, it usually begins immediately flushing all dirty file images.
  • Each flush request is tagged with the cached file image's version number. Whenever a DDS node processes a flush request, it compares the version number in the flush request with the version number associated with its cached image of the file identified in the flush request. When there is a mismatch the flush is rejected and sent back to the client with an error code of OUT_OF_SEQUENCE. This error code is then returned to the client application, which will have to resolve this issue.
  • Version numbers will usually match and the incoming file modifications will be accepted. However, when a substantial amount of time passes while the client node is processing file requests using only cached file image data (no communication with the downstream site), the likelihood of a version mismatch increases. Of course, when a DDS site processes file requests independently of its downstream site, it is relying upon DDS's consistency callback mechanism for immediate notification when a concurrent write sharing condition arises and it is relying on fast pings to continually reassure itself that the callback path is operational.
  • FIG. 11 is a flowchart illustrating an example of a method 1100 of flushing modified file data downstream towards the DDS server terminator site; re-validating the modified file data at each DDS site; and, when all flush data has arrived at the server terminator site, possibly writing all file modifications to the origin filesystem and then fetching the file's last modification timestamp.
  • FIG. 11 shows that the method 1100 can include receiving and servicing 1102 a request from a client application and updating the appropriate delta time, which is m_delta for write requests, a_delta for read requests and c_delta for requests that modify the file's attributes.
  • FIG. 11 also shows that the method 1100 can include determining 1104 if it is time to flush file modifications downstream. If it is not time to flush the file modifications downstream, the site continues to service client file access requests. When synchronous writes are being used for the file identified in the request, it will always be time to flush the modifications downstream. For delayed writes, flushes may be initiated when a delay timer expires. The delay timer may be reset on every write request or every file access request. So, for example, a client terminator site might initiate a batch flush operation 15 seconds after receiving the last of many write requests.
  • FIG. 11 further shows that the method 1100 can include flushing 1106 all file modifications downstream.
  • the file's version number accompanies the flush data.
  • FIG. 11 additionally shows that the method 1100 can include a downstream DDS node receiving 1108 the flush request and comparing the received version number with the version number associated with the downstream node's cached file image.
  • FIG. 11 moreover shows that the method 1100 can include determining 1110 if the version numbers differ. I.e., the version numbers are compared.
  • FIG. 11 also shows that the method 1100 can include the downstream DDS node responding 1112 OUT_OF_SEQUENCE to the upstream node if the version numbers differ.
  • FIG. 11 shows that the method 1100 can include the storing 1114 the received flush data in shadow extents.
  • FIG. 11 further shows that the method 1100 can include determining 1116 if the downstream DDS node is the server terminator site. If the downstream DDS node is not the server terminator site then the steps 1106 - 1116 are repeated until the server terminator site is reached.
  • FIG. 11 also shows that when the flush request is processed at the server terminator site, the method 1100 can include determining 1118 that all requests in this flush operation have been received and that one of the requests had the DATA_COMPLETE request flag set.
  • FIG. 11 shows that the method 1100 can include responding 1120 to the upstream node with a status of DDS_OK.
  • FIG. 11 additionally shows that the method 1100 can include determining 1122 whether synchronous write mode is being used for the file identified in the request. When synchronous write mode is not being used, FIG. 11 shows that the method 1100 can include responding 1120 to the upstream node with a status of DDS_OK.
  • FIG. 11 shows that the method 1100 can include writing 1124 all file modifications to the underlying origin filesystem.
  • FIG. 11 further shows that the method 1100 can include determining 1126 whether the write to the origin filesystem was successful. When the write to the origin filesystem is not successful, FIG. 11 shows that the method 1100 can include responding 1128 to upstream node with a status that indicates the type of error that occurred.
  • FIG. 11 shows that the method 1100 can include setting 1130 the ALL_DATA_RECEIVED response flag and replacing all extents that have shadow extents with their respective shadow extents.
  • FIG. 11 shows that the method 1100 can include fetching the file's last modification timestamp from the origin filesystem.
  • This timestamp which conveys the time (according to the origin filesystem clock) that the write 1124 was performed, will be interpreted at upstream sites as the file's version number.
  • FIG. 11 finally shows that the method 1100 can include responding 1120 to upstream node with a status of DDS_OK.
  • the response When a client system (a workstation, for example) receives a response to a file write request, the response includes file's attributes, an element of which is the file's last modification timestamp. For synchronous writes, this timestamp will be the correct timestamp generated by the origin filesystem when it received the new file modification. However, when delayed writes are employed for faster performance and the DDS client terminator site is operating autonomously, the DDS instance executing at the client terminator site generates the response's last modification timestamp as follows:
  • m _delta time current time ⁇ m _arrival time.
  • the version number is the origin server's last modification timestamp and only the origin server updates it. Upstream sites use the last modification timestamp as the file's version number. Every file modification performed by the origin server creates a new file version.
  • the timestamp generated by the client terminator is a temporary timestamp that is accurate enough to enable the client application to believe it is accessing the most current version of the file (which, in fact, it is).
  • the timestamp monotonically increases by a reasonable amount on every file modification request and it is periodically resynchronized (whenever dirty file data is flushed downstream) with the origin server's timestamp. This behavior helps to maintain the illusion that the DDS service is provided by a single local filesystem “within” the computer where the client application is executing.
  • Similar procedures may also be used to project other temporary filesystem timestamps such as a file's last access timestamp and last change timestamp.
  • Path signatures include the following structure:
  • the downstream site will establish a connection to the upstream site and then include in its request response a path signature constructed by adding its signature to the channel's path signature.
  • a channel's path signature reflects the last successful path used to access file data.
  • DDS When a network failure occurs DDS initiates procedures to re-establish a connection to a file. A client-side node will successively direct a reconnection request down each of its alternate paths to the origin server; stopping as soon as one of the reconnection attempts is successful. After trying all paths without successfully reconnecting to the file, the client-side node will return the DDS_BAD_LINK error code to the client that was accessing the file when the failure was discovered.
  • a reconnection request which is a DDS request with the DDS_RECONNECT flag set, contains the channel's path signature.
  • the server-side node receiving and processing the request uses the path signature to recognize the client as a current client and to reconcile its consistency control data structures. So, for example, when a writing client reconnects, the server-side node does not “see” a second writer and declare a concurrent write sharing condition. It “sees” a client that has been modifying a file now attempting to access the file through a new path.
  • a reconnection request propagates downstream until a node recognizes that its signature is contained in the path signature. If this node is more than one level lower in the cache hierarchy, the upstream nodes along the previous path must be informed so that the old path's consistency control structures can be reconciled.
  • a reconnection request may also be issued when there is no network failure.
  • An upstream node deciding to re-balance its downstream traffic, may issue a reconnection request at any time.
  • FIG. 12 is a flowchart illustrating an example of a method 1200 employed by a DDS node to load a path signature when a connection is established; and for that node to add its signature to the path signature that it sends upstream whenever an upstream node establishes a file connection.
  • FIG. 12 illustrates that the method 1200 can include receiving 1202 a DDS request. I.e., a DDS request is received from an upstream node.
  • FIG. 12 also shows that the method 1200 can include acquiring 1204 the channel for the file identified in the request. When the channel does not already exist, a new channel is created and assigned to the identified file.
  • FIG. 12 further illustrates that the method 1200 can include determining 1206 if it is possible to service the request without communicating with the downstream node. This would be the case when all required file data is cached and valid at the site, and the channel at this site is already “connected” to a downstream site or this site is operating in disconnected mode.
  • FIG. 12 also shows that the method 1200 can include dispatching 1208 a request to a downstream site and then receiving 1210 a response from the downstream site.
  • FIG. 12 shows that the method 1200 can include determining 1212 if the response contained an indication that the downstream site just established a file connection to this site.
  • FIG. 12 shows that the method 1200 can include copying 1214 the path signature from the response to the channel.
  • FIG. 12 shows that the method 1200 can include servicing 1216 the request.
  • FIG. 12 illustrates that the method 1200 can include determining 1218 if the upstream site that sent the request now being processed is establishing an initial file connection.
  • FIG. 12 shows that the method 1200 can include copying 1220 the path signature from the channel to the response, adding this node's signature to the response's path signature, and then setting the INITIAL_FILE_CONNECTION flag in the response.
  • FIG. 12 shows that the method 1200 can include dispatching 1222 a response back to the upstream DDS client.
  • FIG. 13 is a flowchart illustrating an example of a method 1300 employed to reconnect an upstream DDS node.
  • FIG. 13 illustrates that the method 1300 can include receiving 1302 a DDS request from an upstream site.
  • FIG. 13 also shows that the method 1300 can include acquiring 1304 the channel for the file identified in the request. When the channel does not already exist, a new channel is created and assigned to the identified file.
  • FIG. 13 further illustrates that the method 1300 can include determining 1306 if the DDS_RECONNECT flag is set in the request. When the DDS_RECONNECT flag is not set, the procedure for reconciling the upstream site structures at this site (and possibly sites above) is bypassed.
  • FIG. 13 shows that when the DDS_RECONNECT flag is set the method 1300 can include determining 1308 if the path signature contained in the request identifies this node as a member of the “old” path. When the node is not a member of the “old” path, FIG. 13 illustrates that the method 1300 can include resetting 1310 the channel's attributes valid (ATTRS_VALID) flag to force this node “go downstream” to fetch valid attributes.
  • ATTRS_VALID channel's attributes valid
  • FIG. 13 shows that when this node is a member of the “old” path the method 1300 can include sending 1312 a DDS RECALL or INVALIDATE message to the “old” upstream site identified in the path signature.
  • FIG. 13 also shows that the method 1300 can include determining 1314 if a response to the DDS RECALL or INVALIDATE message is received.
  • FIG. 13 shows that when a response is received the method 1300 can include updating 1316 the upstream site structure of the “old” upstream site identified in the path signature.
  • FIG. 13 further illustrates that when a response is not received the method 1300 can include declaring 1318 the “old” upstream site identified in the path signature to be OFFLINE and recording this status by setting the OFFLINE flag in the upstream site structure of the “old” upstream site.
  • FIG. 13 finally shows that the method 1300 can include servicing 1320 the DDS request and then dispatching a response back to the upstream DDS node.
  • the DDS consistency mechanism ensures that all DDS portals are equivalent.
  • the any port in a storm feature allows a DDS client node to switch to a new downstream node at any time. So, whenever a client node feels isolated it can elect to find a new partner.
  • the new partner may be at the same hierarchical level as the partitioned site, or it may be at any level closer to the server terminator site.
  • the new partner When the new partner is at the same hierarchical level as the partitioned site, it must, in addition to responding to the request, send a message downstream informing that site that an upstream site has switched partners. This allows the downstream site to revoke whatever permissions it had granted the isolated site (isolated from the client attempting to reconnect and possibly also isolated from this site) and grant permissions along this new path.
  • DDS filehandles patterned after NFS filehandles, are permanent filehandles.
  • the file server generates an opaque file identifier during the processing of a lookup request and passes it back to a client. The client then uses this filehandle as a reference point in future read/write requests.
  • Permanent filehandles facilitate network error recovery operations by reducing to a bare minimum the amount of distributed state required for a disconnected DDS client site to successfully reconnect with the origin server.
  • a DDS client needs nothing more than a filehandle to reconnect to a file. The DDS client does not need to know what directory, what filesystem or even what file server.
  • DDS server nodes When a concurrent write sharing condition arises, DDS server nodes must RECALL a modified upstream image projection (if there is one, and there can be only one) or INVALIDATE all upstream image projections supporting accelerated read operations.
  • the upstream site When an upstream site is partitioned from its downstream site, the upstream site never receives the notification. So, the downstream site never receives confirmation that the upstream site has invalidated its cached image of the file.
  • the downstream site may decide (depending on the currently established policies) to give up on the isolated site and continue servicing other clients. So, the downstream site may declare the upstream site OFFLINE and record that status in the upstream site structure (uss) associated with the upstream site. Then the downstream site, based on the established policy, may do one of the following:
  • the upstream site When communications are restored, the upstream site will promptly send a SASE. If the downstream site had chosen to wait it will quickly bounce the SASE back with the same note it had sent earlier.
  • the primary directive for DDS server-side nodes is therefore:
  • DDS server-side nodes act reflexively to ensure filesystem integrity. Then, throughout the partition event, the server-side nodes continue providing file services in conformance with the primary directive. However, these server-side nodes are pledged to filesystem integrity and have no real obligations to any client. DDS server-side nodes will quickly refuse any request that is not consistent with the PD.
  • DDS server-side of a network partition is not dedicated to servicing client systems, the other side of the partition is completely focused providing the best client service possible given the current circumstances.
  • DDS client-side nodes are responsible for ensuring that no delayed write data (file modifications that have not yet been flushed to the origin) is ever lost.
  • the domain policy attributes in effect at each client-side node determine what actions the node performs when a network partition is detected, but a client-side node will generally safeguard its data first and then attempt to reconnect to the server-side and flush all file modifications back to their respective origin servers.
  • DDS client nodes continuously monitor the health and performance of all downstream paths, and for every path the client node is aware of all alternate paths.
  • the statistics maintained for each path include:
  • a DDS client node When a DDS client node issues a request and fails to receive a response within a timeout window (typically 2 or 3 seconds), the node retransmits the request several times before declaring the downstream node OFFLINE.
  • a timeout window typically 2 or 3 seconds
  • the DDS client node When the downstream site is declared OFFLINE, the DDS client node simultaneously tries all alternate paths and selects the path with the best performance.
  • DDS syncs all modifications to files from the disconnected downstream server to the site's stable memory (hard disk or flash memory). When downstream communications are restored, all file modifications are immediately flushed downstream.
  • DDS either:
  • the upstream site In addition to performing whatever downstream communications are required to support the file access services the upstream site is providing, the upstream site also pings the downstream site once a second (typically). This enables the upstream site to quickly detect a partition event and, depending on the established policies, possibly stop providing access to cached files affected by the partition event.
  • the upstream site When the upstream site fails to receive a reply to its ping, it retransmits the request several times before declaring the downstream node OFFLINE.
  • the upstream site will continue to periodically ping the downstream site.
  • the site is declared ONLINE and both sites may begin re-synchronizing.
  • Site re-synchronization basically consists of re-syncing a series of individual files. And this is a straightforward process with one exception: an out of sequence write (which is explained in the following section).
  • the downstream site MUST detect that the modification performed at the upstream site did not use the most current version of foo, and the request must be rejected with error code OUT_OF_SEQUENCE.
  • Server-side nodes assume a rather passive role during a network failure. These nodes must receive file modifications that are being flushed downstream, but that is also what they do when the network is not broken.
  • the main difference during a network failure is when a server-side node receives a flush with a path signature indicating an upstream site is reconnecting, it executes a recovery routine to transfer read/write permissions from the isolated upstream site to the site that sent this flush.
  • Client-side nodes bear almost the entire burden of recovering from network failures because they have service commitments to active clients that must be maintained.
  • the client-side node “closest to” a failed network component will be the node that detects the failure and declares the network partitioned. The node will then execute a recovery procedure.
  • the recovery procedure will sequentially call upon various recovery routines, which are described in the following section.
  • This routine is continuously executed by a thread dedicated to monitoring all downstream paths and maintaining performance statistics associated with each path.
  • the information gathered by dds_monitor_ds_paths( ) may be referenced to quickly determine the best alternate path.
  • dds_monitor_ds_paths( ) maintains a path_state mini-database of all downstream paths emanating from this node.
  • This routine may also periodically send its current path_state to a node that maintains and presents a global view of network operations.
  • This routine re-establishes a connection to the origin server for the file identified by the filehandle contained within the channel (cp->fh).
  • dds_reconnect( ) is optimized for the speed of re-establishing a downstream connection because DDS nodes really don't like to be disconnected.
  • This routine may reference the path_state mini-database to quickly select the most appropriate alternate path for the reconnection attempt.
  • a dds_reconnect( ) request always contains the issuing site's path signature and it always propagates through to the first node that is common to both the ‘old’ and the ‘new’ paths.
  • a dds_flush( ) request flows from a DDS client terminator site to a server terminator site as a single atomic network transaction.
  • intermediate nodes may be simultaneously forwarding transfer n while receiving transfer n+1.
  • DDS flushes are atomic and synchronous. A flush is not considered successful until the client receives the server's OK response. A shadow copy is kept until a node receives an OK response from its immediate downstream site.
  • each transfer flows through intermediate nodes independently of the other transfers. All nodes store the flush data in shadow extents.
  • the server terminator site receives the last flush, it moves all shadow extents into the sunlight and dispatches an OK response back upstream. As the response propagates through each intermediate node, each node also sunlights its shadow extents.
  • This routine RECALLS or INVALIDATES the file image at the upstream node ‘from_mx’ and simultaneously transfers whatever permissions node ‘from_mx’ had to node ‘to_mx’. Node ‘from_mx’ may respond to the RECALL/INVALIDATE message or not.
  • This server-side node does not really care. It has marked node ‘from_mx’ OFFLINE (for this file) and will handle all reintegration issues later when node ‘from_mx” attempts to reconnect this file.
  • This routine RECALLS or INVALIDATES the file image at an upstream node.
  • client-side nodes When a network component fails client-side nodes detect the failure and drive the transparent recovery procedure, during which the DDS filesystem client remains completely unaware of the failure.
  • the DDS failure recovery procedure operates in the following manner:
  • isolated nodes When an isolated network segment “comes back online”, isolated nodes automatically re-synchronize their images of all files “crossing” the link that had failed.
  • Channels containing modified data have the highest priority. Each is flushed downstream.
  • the server-side will accept or reject each flush on an individual basis. A flush will only be rejected if the file was modified somewhere else while this client-side node was partitioned.
  • the client-side node must have a means of handling a flush rejection. This probably includes a) saving the dirty data that was just rejected, and b) notifying the user that there is a conflict that must be resolved.

Abstract

Large scale high performance file proxy caching sites may be configured to coalesce many client write operations into one very large assemblage of modified file data. At some point the caching site will flush all modified file data downstream towards the origin file server. In some instances the amount of modified data being flushed may be more than can be transferred in a single network request. When multiple network requests are required, the consistency guarantee provided by many filesystems requires that the file either be updated with the data contained in all of the network requests or not be modified at all. In addition, once the first flush request is processed no other file read or write requests can be serviced until the last flush request has been processed. This document discloses methods for performing atomic multi-request flush operations within a large geographically distributed filesystem environment.

Description

    CROSS-REFERENCE TO RELATED APPLICATIONS
  • This application claims the benefit of and priority to U.S. Provisional Patent Application Ser. No. 61/666,597 filed on Jun. 29, 2012, which application is incorporated herein by reference in its entirety.
  • This application is related to co-pending U.S. application Ser. No. ______, filed on Jun. 28, 2013, and entitled, “RECURSIVE ASCENT NETWORK LINK FAILURE NOTIFICATIONS” (Attorney Docket No. 10284.14), which application is incorporated herein by reference in its entirety.
  • This application is related to co-pending U.S. application Ser. No. ______, filed on Jun. 28, 2013, and entitled, “USING PROJECTED TIMESTAMPS TO CONTROL THE SEQUENCING OF FILE MODIFICATIONS IN DISTRIBUTED FILESYSTEMS” (Attorney Docket No. 10284.16), which application is incorporated herein by reference in its entirety.
  • This application is related to co-pending U.S. application Ser. No. ______, filed on Jun. 28, 2013, and entitled, “METHOD OF CREATING PATH SIGNATURES TO FACILITATE THE RECOVERY FROM NETWORK LINK FAILURES” (Attorney Docket No. 10284.17), which application is incorporated herein by reference in its entirety.
  • BACKGROUND OF THE INVENTION
  • The Distributed Data Service (DDS) architecture provides a framework for highly distributed, hierarchical, multi-protocol caching. DDS is a distributed caching layer that spans an enterprise's network, encompassing multiple LANs interconnected with WAN links. The caching layer's constituent parts are DDS modules installed on file servers, client workstations, and intermediate nodes (routers, switches, and computers). DDS employs TCP/IP for inter-site communications and may therefore be incrementally deployed. Non-DDS nodes appear as “just part of the wire”.
  • Conceptually, the DDS caching layer slices through each DDS configured computer system at the vnode interface layer. File systems (UFS, VxFS, NTFS, EXT4) and other devices such as video sources and shared memory plug into the bottom of the caching layer and provide permanent file storage or a data sourcing/sinking capability. Client systems plug into the top of the caching layer to access “local” data. Distributed throughout the network, intermediate DDS nodes (routers, switches, and other computers) provide increased scalability and faster file access.
  • The DDS layer implements an intelligent integrated data streaming and caching mechanism to make file data appear as “local” as possible. When a client process accesses a file, the file appears to be local (in terms of file access performance) if it has been accessed before and has not been modified since its last access. When file data must be fetched from the origin server, DDS pre-fetches file data in advance of the client's request stream. Of course, pre-fetching is only performed for well-behaved clients. Write behind is also implemented by DDS in a manner consistent with the fact that users aren't very tolerant of file systems that lose their data.
  • Data cached within the DDS layer is stored in a protocol neutral format in a manner that requires no “translation” when the client is of the same type (Unix, Windows, . . . ) as the origin server.
  • The DDS layer maintains “UFS consistency” (a read always returns the most recently written data) on cached images and provides several methods of handling and recovering from network partitioning events. To the maximum extent possible, recovery and reconnection is performed automatically with no requirement for user or administrator intervention.
  • This document discloses and explains the methods and procedures employed by DDS to transparently overcome network infrastructure failures. In this context, “transparent” means that when an intermediate network node or link fails during a DDS file access operation, an alternate path to the origin server is discovered and used to complete the operation without the client or server ever even becoming aware of the network failure.
  • BRIEF SUMMARY OF SOME EXAMPLE EMBODIMENTS
  • This Summary is provided to introduce a selection of concepts in a simplified form that are further described below in the Detailed Description. This Summary is not intended to identify key features or essential characteristics of the claimed subject matter, nor is it intended to be used as an aid in determining the scope of the claimed subject matter.
  • One example embodiment includes a computing system where a data request has been passed between an upstream site and a file service proxy cache node, the file service proxy cache node being a network node located between the upstream site and the origin file system node, a non-transitory computer-readable storage medium including instructions that, when executed by the file service proxy cache node, performs the step receiving a flush request from the upstream site. The flush request includes a request to save flush data contained in the flush request to a stable memory. The instructions also perform the steps storing the flush data in a shadow extent and dispatching a downstream flush request to a second file service proxy cache node. The instructions further perform the step receiving a response to the downstream flush request from the second file service proxy cache node. The instructions additionally perform the step sending a flush response that includes a status code that indicates the successful completion of the request to the upstream site.
  • Another example embodiment includes a computing system where a data request has been passed between an upstream site and a file service proxy cache node, the file service proxy cache node being a network node located between the upstream site and the origin file system node, a non-transitory computer-readable storage medium including instructions that, when executed by the file service proxy cache node, performs the step receiving a flush request from the upstream site. The flush request includes a request to save flush data contained in the flush request to a stable memory. The instructions also perform the step storing the flush data in a shadow extent. The instructions further perform the steps, if the file service proxy cache node is not a server terminator site, dispatching a downstream flush request to a second file service proxy cache node and receiving a response to the downstream flush request from the second file service proxy cache node. The instructions additionally perform the step sending a flush response that includes a status code that indicates the successful completion of the request to the upstream site.
  • Another example embodiment includes a computing system where a data request has been passed between an upstream site and a file service proxy cache node, the file service proxy cache node being a network node located between the upstream site and the origin file system node, a non-transitory computer-readable storage medium including instructions that, when executed by the file service proxy cache node, performs the step receiving a flush request from the upstream site. The flush request includes a request to save flush data contained in the flush request to a stable memory and a flush level, wherein the flush level indicates how far the data should be flushed. The instructions also perform the step storing the flush data in a shadow extent. The instructions further perform the steps, if the file service proxy cache node is not a server terminator site, dispatching a downstream flush request to a second file service proxy cache node and receiving a response to the downstream flush request from the second file service proxy cache node. If the downstream flush request was successfully completed the instructions additionally perform the step promoting the shadow extents. The instructions moreover perform the step sending a flush response that includes a status code that indicates the successful completion of the request to the upstream site.
  • These and other objects and features of the present invention will become more fully apparent from the following description and appended claims, or may be learned by the practice of the invention as set forth hereinafter.
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • To further clarify various aspects of some example embodiments of the present invention, a more particular description of the invention will be rendered by reference to specific embodiments thereof which are illustrated in the appended drawings. It is appreciated that these drawings depict only illustrated embodiments of the invention and are therefore not to be considered limiting of its scope. The invention will be described and explained with additional specificity and detail through the use of the accompanying drawings in which:
  • FIG. 1 illustrates a DDS virtual file server constructed with a flat network topology;
  • FIG. 2 depicts the flat network of FIG. 1 reconfigured to provide high availability access to the eng and sales sub-domains, and another node has been added to create redundant network paths;
  • FIG. 3 depicts the /etc/dds_exports file (also referred to as the site map file) for node 4, which exports the eng sub-domain;
  • FIG. 4, illustrates the domain map file for node 1;
  • FIG. 5 illustrates the export filesystem (site tree) constructed by acme-4 by following the directions contained in the site map file;
  • FIG. 6, illustrates the domain tree for node 1;
  • FIG. 7, depicts a simple hierarchy of DDS nodes configured with multiple routes from node A to the origin server node (node S);
  • FIG. 8 is a flowchart illustrating an example of a method 800 of recursive ascent failure notifications;
  • FIG. 9 is a flowchart illustrating an example of a method of atomic flush transactions;
  • FIG. 10 is a flowchart illustrating an example of a method of validating a cached file image when a response received at an upstream site contains the same version number as the one associated with the cached file image;
  • FIG. 11 is a flowchart illustrating an example of a method of flushing modified file data downstream towards the DDS server terminator site;
  • FIG. 12 is a flowchart illustrating an example of a method employed by a DDS node to load a path signature when a connection is established; and
  • FIG. 13 is a flowchart illustrating an example of a method employed to reconnect an upstream DDS node.
  • DETAILED DESCRIPTION OF SOME EXAMPLE EMBODIMENTS
  • Reference will now be made to the figures wherein like structures will be provided with like reference designations. It is understood that the figures are diagrammatic and schematic representations of some embodiments of the invention, and are not limiting of the present invention.
  • DDS Domain Architecture
  • A DDS network may be configured into either a flat or hierarchical organization. Hierarchical topologies inherently provide more latitude for constructing networks incorporating redundant paths. When network failures do occur, DDS employs redundant network paths to transparently re-route file access traffic around the failure.
  • FIG. 1 illustrates a DDS virtual file server constructed with a flat network topology. The virtual file server exports the acme domain, consisting of four sub-domains: corp, mrkt, sales and eng. The virtual file server appears to be a single multi-homed (multiple ip addresses) Windows or Linux file server to client workstations. The public network interfaces are DNS registered with the names acme-1, acme-2, acme-3 and acme-4. Workstations may use any of the four public interfaces and may switch to using a different interface at any time.
  • When all components of the virtual file server are operating properly, the “view through any portal” is equivalent to the view through any other portal. DDS's distributed consistency mechanism works behind the scenes to ensure this consistency of views. However, when a network node fails, a flat network topology does not provide the redundant paths necessary to provide transparent uninterrupted access to all file server data.
  • FIG. 2 depicts the flat network of FIG. 1 reconfigured to provide high availability access to the eng and sales sub-domains, and another node has been added to create redundant network paths.
  • Nodes 3 and 4 have shared access to a storage area network (SAN) containing the filesystems exported through the eng and sales sub-domains. These two nodes are configured such that if either node fails, the other node will notice the failure and mount and export the failed node's filesystems. This is a standard high availability file server mechanism and products are available from several sources, both supported and freeware.
  • Another node has been added to the configuration depicted in FIG. 1. Node 5, named acme-5, does not export any of its own filesystems. It is a domain manager providing the same “portal view” as the other four nodes. However, it also serves as the DDS gateway for remote clients and remote DDS workgroup accelerators.
  • DDS nodes operate independently and cooperatively to create a hierarchical global namespace. During its initialization phase a DDS node constructs a site tree as specified by the /etc/dds_exports file. FIG. 3 depicts the /etc/dds_exports file (also referred to as the site map file) for node 4, which exports the eng sub-domain.
  • After constructing the site tree, the DDS node constructs a domain tree as specified by the /._dds_./._site_./._control_./._map_.file. FIG. 4, illustrating the domain map file for node 1, specifies:
      • the domain's name is acme,
      • the domain has four domain managers: acme-1, acme-2, acme-3 and acme-4,
      • the domain has four sub-domains:
        • n1 exporting the corp sub-domain,
        • n2 exporting the mrkt sub-domain,
        • n3 exporting the sales sub-domain, and
        • n4 exporting the eng sub-domain.
  • The names used in a domain map file may be public names (registered with a network name service such as DNS) or they may be private. In this example, n1 through n4 are private names and acme-1 through acme-4 are public names. The domain nodes communicate with each other using the domain's private names, and clients of the DDS virtual file server use the domain's public names. A node's private name and public name may resolve to the same IP address or to different IP addresses.
  • Once initialization is complete, the node exports these two trees. The site tree is a single filesystem containing all content exported by this node. The domain tree is a single filesystem containing all content exported by all sub-domain nodes for which this node is a domain manager.
  • A DDS Node May Host:
      • An atomic domain—The node has a /etc/dds_exports file but there is no /._dds_./._site ./._control_./._map_. file. An atomic domain does not contain any sub-domains. The node's site tree is also its domain tree (by way of a symbolic link).
      • A non-atomic domain—The node has a /._dds ./._site_./._control ./._map_. file which specifies the domain's name and the names of all sub-domains. The node might not have a /etc/dds_exports file, in which case the node does not contribute any content to the domain for which it was a domain manager. Or, the node might have a site tree (specified by the /etc/dds_exports file) that may or may not be included in its domain tree.
    DDS Initialization
  • The initialization process for a DDS node occurs in two phases:
      • Phase 1—Site initialization. (Refer to FIG. 3 and FIG. 5) The site map file (/etc/dds_exports) is read to determine the exported filesystems and the policy attributes associated with each exported filesystem. On a per filesystem basis, the policy attributes provide the default policies associated with every file within the filesystem. Default policies can be overridden by directory level and/or file level policy attributes.
      • FIG. 3, depicting the site map file for node 4, specifies that the node will export a filesystem with the branches /export/eng/hw and /export/eng/sw, both with the same policy attributes of “**(rw,sync,wdelay,root_squash)” FIG. 5 illustrates the export filesystem (site tree) constructed by acme-4 by following the directions contained in the site map file. FIG. 5 also shows portions of acme-4's filesystem not being exported; they are interconnected with dashed lines instead of solid lines.
      • Phase 2—Domain initialization. (Refer to FIGS. 4 and 6) After DDS site initialization completes, DDS checks for the presence of a domain map file (/._dds ./._site./._control_./._map_.). If present, this file informs the DDS instance that it is a domain manager and it also specifies the name of the domain, the names of all sub-domains and the names of all sub-domain managers.
      • FIG. 4 is the domain map file for the acme domain depicted in FIG. 1 and FIG. 6 illustrates the acme domain constructed by each node configured with that domain map file.
      • Having discovered their domain map files, each DDS node constructs the acme domain tree by requesting from each sub-domain node the root of that node's domain tree. The returned roots are then grafted onto the root of the host node's domain tree.
      • Referring to FIG. 6, the acme domain contains the sub-domains corp, mrkt, sales and eng. The eng sub-domain shows additional detail (the site tree exported by node 4), but the respective site trees of the other nodes are not depicted. The acme domain has four portals (acme-1, acme-2, acme-3, acme-4 hosted respectively on nodes 1 through 4), and each portal has a path to every sub-domain.
  • Note that the site map file specifies what is exported, but not what it is called. The domain map file (FIG. 4) specifies that n4 (private name for node 4) supplies the eng sub-domain, but it is the site map file (FIG. 3) that defines what node 4 will export under the name eng. So, the pathname employed by a user on a client workstation to access an acme hardware engineering document would look something like: /dds/acme/eng/hw/the_document.doc.
  • After each DDS node completes constructing its domain tree it is open for business. The multi-homed virtual file server acme may now be accessed through any of it four network interfaces. The top level directory structure of the acme's exported domain tree is:
      • /dds/acme/corp/ . . .
      • /dds/acme/mrkt/ . . .
      • /dds/acme/sales/ . . .
      • /dds/acme/eng/ . . .
  • Clients may now direct their requests to any acme portal and expect to receive the same response.
  • During Phase 2, domain initialization, a DDS node constructs a global namespace that includes its exported filesystem and the exported filesystems of all of its sub-domains. Following initialization, the DDS node is a file access portal to all files and directories contained within the domain's global namespace. The DDS node may also be bound into a larger domain as a sub-domain of a ‘higher level’ domain. This process may be recursively repeated until there is a single Internet domain that encompasses content from thousands or millions of origin servers distributed throughout the world.
  • Using the process described above to construct the multi-homed acme domain, thousands of DDS nodes may initialize to become Internet domain managers (portals). So, a portal located anywhere in the world may provide access to content distributed about the globe.
  • DDS Global File Services
  • DDS employs extensive file level hierarchical caching to make data appear to be ‘here’ rather than ‘there’. From a filesystem perspective, a phrase that encapsulates DDS's primary focus is DDS removes the distinction between local files and remote files.
  • The distinctions removed are:
      • Latency and Bandwidth—an image of the file is cached locally and therefore can be accessed al “local” speeds.
      • Consistency—read operations always return the most recently written data.
      • Security—file data flowing and cached within DDS networks is encrypted and the content owner maintains complete control over its content throughout the distribution network. All DDS portals faithfully follow the content owner's instructions (which are attached to the content as policy attributes) with regard to providing access to unencrypted content.
      • Availability—DDS may be used to construct resilient networks and file servers.
  • Redundancy can be woven into the DDS fabric to create always-available networks, and redundancy incorporated into file servers can ensure the continuous availability of file data. DDS transparently overcomes network failures whenever redundant paths make it possible to do so.
      • Protocol—DDS appears to be just another local filesystem, using the same filesystem API as native local filesystems. DDS extends the native filesystem API to provide a remote file access capability that is almost indistinguishable from the access capabilities afforded to local files.
    DDS Terminology
  • The following list of words and phrases used throughout this document have the following definitions:
      • channel, integrated channel—In the context of a single DDS node, a channel is DDS's main data structure for representing a remote file and all information related to that file. The channel data structure contains a number of smaller data structures either directly or indirectly (by containing a reference to the smaller data structure). The file attributes data structure and the file data extent structures are referenced by the channel using memory address pointers.
  • In the context of a client terminator site communicating across the network with a server terminator site (possibly through several intermediate sites), channel refers to the channels at the individual sites bound together by DDS Protocol into a single integrated channel.
      • NOTE: When an application, executing on a DDS configured origin server, is accessing a file within the origin server the channel for that file will simultaneously fulfill the roles of both client terminator site and server terminator site. A channel does not always span multiple DDS sites.
      • file data extent—a contiguous memory segment that holds file data. The size of the file data extent is set following a negotiation with a downstream site when the channel is created. The file data extent structure contains a shadow pointer, which is the memory address of a contiguous memory segment (the shadow extent) of the same size as the file data extent.
      • policy attributes—file attributes attached to a file by a domain manager as the file's data is sent upstream in the response to a file access request. At upstream sites, these attributes, associated with the file in the same manner as the file's “normal” attributes, instruct the upstream site on the procedures required for granting access, performing decryption and all other file handling operations. (Only the upstream sites that have been authenticated by a downstream site will be trusted by the downstream site.)
      • external request—a file access request using a file access protocol other that DDS. NFS, CIFS, UFS, EXT4 and NTFS are examples of external requests. Note that NFS and CIFS are network protocols and the others are “local” protocols (used when DDS is installed on the system generating the request).
      • internal request—a file access request using the DDS protocol. Internal requests are internal to DDS and flow exclusively between DDS sites. Internal requests use the DDS protocol.
      • client terminator site or client terminator—The DDS site that receives an external request from a client.
      • server terminator site or server terminator—The DDS site “closest” to the origin file server. When the origin server is DDS configured the server terminator site is the origin server. In other cases, the server terminator communicates with the origin server using a network protocol such as NFS or CIFS.
      • intermediate site—A DDS site in the chain linking the client terminator site to the server terminator site.
      • client system, client computer or just client—In the context of DDS processing a request, the client is the computer system that dispatched the external request to the DDS client terminator site.
      • upstream site—When two DDS sites are communicating, the site “closest” to the client is the upstream site.
      • downstream site—When two DDS sites are communicating, the site “closest” to the origin server is the downstream site.
      • client-side—With respect to any point along the integrated channel path from server terminator to client terminator, client-side refers to everything on the client side of that point.
      • server-side—With respect to any point along the integrated channel path from server terminator to client terminator, server-side refers to everything on the server side of that point.
      • origin file server or origin server—The file server exporting the filesystem containing the target file.
      • DDS site or DDS node—a DDS configured network node that provides a file proxy cache service.
    DDS Operations
  • A single DDS module contains client-side code for requesting file data from an origin file server and server-side code that receives and responds to requests from “upstream” DDS sites. Within the DDS framework, “downstream” is towards the origin server and “upstream” is towards the client.
  • An overview of a typical DDS network operation is:
      • a client computer system issues an NFS file access request targeting a DDS portal,
      • the request is received at the portal (a DDS configured Linux system) and routed to an nfs daemon (the native NFS server code),
      • the NFS server code executes a read system call to read file data,
      • the Linux vfs layer routes the system call into the DDS module (which has registered as a local filesystem),
      • the call's file identifier parameter is used to identify and connect to a channel (DDS's main data structure for representing a file and all information related to that file),
      • the channel is examined to determine if all data required to respond to the system call is cached within the channel; if so, DDS responds to the system call; if not, . . .
      • the channel is examined to determine the origin file server's identity,
      • if the origin server is this node, DDS executes a read system call to fetch whatever additional file data is required from the underlying native filesystem to respond to the request from the NFS server code,
      • if the origin server is some other node, DDS generates and dispatches a DDS_LOAD request targeting a downstream DDS site “closer” to the origin server,
      • the DDS_LOAD request may ripple through multiple DDS intermediate sites (executing essentially the same procedure as outlined above) before arriving at the DDS server terminator site,
      • DDS executes a read system call to fetch whatever additional file data is required from the underlying native filesystem to respond to the request from an upstream DDS site,
      • the response propagates back upstream and eventually arrives at the DDS client terminator site,
      • the file data contained in the response is attached to the channel structure,
      • all data required to respond to the NFS system call is now cached within the channel, so DDS responds to the call from the NFS server code,
      • the NFS server code responds to the request from the client computer system.
    DDS Protocol
  • The DDS protocol defines two remote procedures for transporting file data: DDS_LOAD and DDS_FLUSH. These two procedures are briefly described since recovery operations are based upon variations of these procedures.
  • DDS_LOAD—This operation loads data and metadata from a downstream site. The request includes a file identifier and the flags (DDS_CC_SITE_READING, DDS_CC_SITE_WRITING) that inform the downstream site of the types of operations that will be performed upon the file data being loaded. These flags are used by the distributed consistency mechanism to keep track of the type of operations (read vs. write) being performed at upstream sites.
  • A single load or flush request may specify multiple file segments and each segment may be up to 4 gigabytes in length.
  • The response includes flags (DDS_CC_SUSTAIN_DIR_PROJECTION, and DDS_CC_SUSTAIN_FILE_PROJECTION) that indicate whether the returned file data and metadata may be cached or whether it must be discarded immediately after responding to the current client request.
      • DDS_FLUSH—This operation flushes modified file data/metadata to some form of stable memory. A flush level specifies how far the flush propagates. The currently defined levels are:
      • DDS_FLUSH_TO_STABLE_MEMORY—Flush to client terminator's flash memory
      • DDS_FLUSH_TO_DISK—Flush to client terminator's disk
      • DDS_FLUSH_TO_ORIGIN—Flush all the way to the origin server
  • In response to DDS_LOAD requests, the DDS server terminator site projects file data into remote DDS client terminator sites. These projections are sustained in the remote DDS sites while the file is being accessed at those sites unless a concurrent write sharing condition arises.
  • An upstream DDS cache buffer is no different than an internal origin file server buffer. After a write operation modifies a file system buffer (either local or remote), performance is enhanced if the buffer is asynchronously written to the server's disk. However, file modifications are safeguarded when they are synchronously written to disk or some other form of stable storage. Flush levels allow both the client and the server to express their level of paranoia. The more paranoid of the two usually prevails.
  • As disclosed in this document, an upstream DDS site flushes all of a channel's dirty data and metadata downstream as an atomic unit. When the amount of dirty data is more than can be accommodated in a single network operation, the downstream site remains “committed” to the upstream site until the last batch of data (flagged with DDS_FLUSH_DATA_COMPLETE) is successfully received. This means that the channel at the downstream site will not service a request from any other upstream site until the flush has completed.
  • This works fine as long as everything else works fine. But, when two sites get partitioned in the midst of a multi-transfer flush operation, the client-side and the server-side will both attempt to overcome the failure. But, at some point the server-side may decide (based on its current policies) to cut off the isolated upstream site and continue providing file access services to its other client systems. In this case, the flush operation is less than atomic. And this is unacceptable because file modifications must be atomic at all times and under all circumstances.
  • Within DDS each file extent structure contains a pointer to a shadow extent, and each attribute structure contains a pointer to a set of shadow attributes. When a multi-transfer flush is processed at a downstream node, all incoming data is routed into these shadow structures. Then, when all dirty data (extents and attributes) has been received at the downstream site, the shadow structures are promoted to reality in an atomic operation and the ‘old’ structures are released. Of course, when the multi-transfer flush does not complete successfully, all shadow extents and the shadow attributes must be discarded.
  • At an upstream node, when DDS flushes a channel, each dirty extent (flagged with X_DIRTY) is flagged with X_FLUSHING. If the flush operation does not complete successfully, DDS resets all X_FLUSHING flags. And, of course, when the operation is successful both the X_DIRTY and X_FLUSHING flags are reset.
  • The DDS client-side node processes a multi-transfer flush operation in an atomic manner. Once the channel has been acquired and the first batch has been accepted at the downstream site (as opposed to rejected because of a consistency operation), it will not be released until the last batch has been dispatched. And when released, the channel will either be clean (successful flush) or it will be just as dirty as it ever was.
  • In addition to the two remote procedures used to move file data up and down the wire, DDS also defines:
      • DDS_CONNECT for establishing a connection to a filesystem (equivalent to an NFS mount operation) or a connection to a directory or a file,
      • DDS_NAME creates, modifies and deletes file/directory names and links,
      • DDS_CTRL provides various capabilities required to actively monitor the health of DDS domain nodes and to support the DDS consistency mechanism. The two DDS_CTRL procedures that support DDS's distributed consistency mechanism are:
        • fast ping—dispatched frequently by a DDS client site to ensure that is still in communication with a downstream site. The fast ping rate brackets the amount of time that a client site can operate before becoming aware that it is disconnected from its downstream counterpart. This rate is typically set to about one second, but could be much higher when DDS nodes are interconnected with extremely fast links and/or shared memory.
        • A downstream site is fast pinged only when there has been no other communication with the site for the amount of time specified by the fast ping rate. Any successful message exchange with a downstream site serves the purpose of a fast ping.
        • slow ping—issued by a DDS client site as a “self-addressed stamped envelope” (SASE) that the DDS server-side node uses when it wants to deliver a consistency notification message. The server-side node will not respond to this request until it has a notification it wants delivered to the DDS client node. Thus, the name “slow ping”.
        • slow ping is the means by which DDS implements a callback mechanism for consistency control operations.
  • I. DDS Network Operations
  • This section presents a simplified overview of DDS network communications.
  • DDS nodes communicate using multi-threaded SUNRPC remote procedure calls over TCP/IP connections. For every successful remote procedure call there is a client making the call and a server responding to the call. SUNRPC & TCP/IP have built in mechanisms to reliably transport requests and responses across a network. DDS, layered on top of the SUNRPC and TCP/IP combination, depends upon this protocol stack for reliable message delivery.
  • Every DDS remote procedure call issued eventually returns with an indication of the status of the call.
      • RPC_SUCCESS indicates that both the request and the response were successfully transported across the wire.
      • RPC_TIMEDOUT indicates that a response was not received. This occurs when a network link or node (including a DDS node) has gone offline or failed. When alternate paths make it possible to still access the source file, the upstream DDS node (client-side) re-routes and re-issues the request to transparently overcome the network failure.
      • Other RPC_XXXX error codes should not occur. But, when they occur some administrative action is probably required.
    High Level View of Network Failures
  • From a workstation or a DDS client node's perspective a component failure manifests itself as a failure to respond to a request. To the DDS client node it does not really matter whether a router, switch, intermediate DDS node or a server component failed. What does matter is that the client issued a request and did not receive a response. An “industry standard” NFS client would, in this circumstance, keep re-issuing the same request until the server responded and then the client would proceed as normal. (A DDS client is more proactive in this situation, and this is described later.)
  • There is a class of failures referred to as a network partition event, where both DDS client-side nodes and server-side nodes remain operational, but the failed component, has isolated the client-side from the server-side. When a network partition event occurs, the client-side and server-side components assume very different roles.
  • The server-side's main priority is to ensure the integrity of all file data and then to continue providing file access services to clients still able to communicate with the server.
  • The client-side's priorities are: a) to safeguard file modifications that have not yet been successfully flushed to the origin server; b) to re-establish communication with the server and immediately flush all files ‘crossing’ the partition; and c) to continue providing file access services if possible.
  • So each side plays a different role during a partition event. However, each role is tempered and shaped by DDS domain policy attributes. These attributes provide instructions for handling file data, processing file access requests, and responding to failures. The following section, Centralized Control over Distributed Operations, describes how policy attributes are employed to exercise centralized control over geographically distributed DDS overlay networks.
  • Centralized Control over Distributed Operations
  • Whenever a DDS origin server responds to a file access request, the file's policy attributes are fed into the DDS distribution network as a class of metadata associated with the file data throughout the network. Every DDS node faithfully adheres to all policies specified by the file's policy attributes under all circumstances. (Of course, DDS nodes employ standard authentication methods to ensure that secured data is only sent to nodes that can be trusted.)
  • File metadata, including policy attributes, is provided with the same level of consistency as regular file data. Therefore, a metadata read operation (to fetch file attributes and/or policies) at any DDS site will return the most recently written metadata. This means the policies for handling a file or a group of files can be changed instantly throughout the network.
  • II. Network Failure Recovery
  • DDS Failure Recovery Building Blocks
  • DDS network operations incorporate the following features and characteristics designed to facilitate the transparent recovery from network component failures:
  • 1. Network Transactions
  • A DDS node communicates with other DDS nodes at the network transaction level. A network transaction, usually consisting of a single request-response interaction, uses whatever number or remote procedure calls are required to complete an atomic DDS operation. For example, some DDS_FLUSH operations require multiple request-response interactions to perform a flush as an atomic DDS operation.
  • At the completion of any network transaction, the server-side DDS channel targeted by the request either “steps completely forward” to a new state or it remains unchanged from its original state.
  • At the completion of any network transaction, the client-side DDS channel issuing the request either “steps completely forward” to a new state or it remains unchanged from its original state.
  • 2. Idempotent Operations
  • DDS servers incorporate a duplicate request cache (DRC) that enables the server to receive the same request multiple times and to ensure all responses will be the same as the first response.
  • Note that the network transaction feature ensures that each DDS node either “does” or “completely does not” respond to a client request. But it is possible, even likely, that the server can “do” while the client-side “completely does not” because a network failure prevented the delivery of the response. DDS's idempotent operations feature provides a graceful (and transparent!) method for the client-side to catch up with the server-side.
  • 3. Recursive Ascent Failure Notifications
  • FIG. 7, depicting a simple hierarchy 700 of DDS nodes configured with multiple routes from node A to the origin server node (node S), assumes the following scenario for illustrative purposes:
      • node A has dispatched a DDS_LOAD request to node C, causing
      • node C to dispatch a DDS_LOAD request to node F, causing
      • node F to dispatch a DDS_LOAD request to node S, but
      • the link to node S has failed.
  • FIG. 8 is a flowchart illustrating an example of a method 800 of recursive ascent failure notifications. The method 800 can be used in a hierarchy 700 of FIG. 7. In the method 800, a failure notification (DDS_BAD_LINK) percolates up to a higher level only after the reconnect efforts at the current level failed to re-establish a downstream connection.
  • FIG. 8 shows that the method 800 can include declaring 802 a network failure. I.e., the DDS client node whose immediate downstream link has stopped responding is the site that declares 802 a network failure. For example, referring to FIG. 7, when the link from node F to node S fails while node C is attempting to load data from node S, node F will be the node that detects and declares 802 the network failure.
  • FIG. 8 also shows that the method 800 can include determining 804 if an alternate route is available. Determining 804 if an alternate route is available can include referencing network configuration data stored at the node that is declaring a network failure 802 or searching the directory tree for connections by other nodes to the target node. Additionally or alternatively, determining 804 if an alternate route is available can include communicating with other connected nodes to determine if a path to the target node exists. One of skill in the art will appreciate that such a request does not include a request to the originating node. I.e., node F of FIG. 7 will not search for an alternate path through node C, which sent the data request to node F.
  • FIG. 8 further shows that the method 800 can include reporting 806 an error code if an alternate route is not available. E.g., since node F has no alternate routes to node S, node F will respond to node C's DDS_LOAD request with and error code of DDS_BAD_LINK. Now, node C attempts to reconnect on an alternate path (node E in FIG. 7) and if it fails to do so, node C will respond to node A with a status of DDS_BAD_LINK. I.e., node C will complete the same method 800 being utilized by node F. Finally, node A will attempt to reconnect on an alternate path and if it fails to do so, it will respond to its client with a status of DDS_BAD_LINK.
  • FIG. 8 additionally shows that the method 800 can include re-establishing 808 the file connection over the alternative route if found. I.e., since node F has no alternate working route to node S, it responds to node C with DDS_BAD_LINK. Node C then attempts to re-establish a file connection 808 to node S through node E.
  • Finally, FIG. 8 shows that when the attempt to re-establish a file connection 808 is successful, node C re-sends 810 the same DDS_LOAD request to node E that was originally sent to node F. If node E responds with DDS_OK, node C will respond to node A with DDS_OK.
  • One of skill in the art will appreciate that once node C originally dispatched it's DDS_LOAD request (which it had generated in response to having received a DDS_LOAD request from node A), node C was willing to wait for node F to respond because node C's fast pings kept reassuring it that node F and the link to it were both operational.
  • One skilled in the art will appreciate that, for this and other processes and methods disclosed herein, the functions performed in the processes and methods may be implemented in differing order. Furthermore, the outlined steps and operations are only provided as examples, and some of the steps and operations may be optional, combined into fewer steps and operations, or expanded into additional steps and operations without detracting from the essence of the disclosed embodiments.
  • 4. Atomic Flush Transactions
  • For all DDS network requests except DDS_FLUSH, a DDS channel “steps forward” one request-response cycle at a time. At any point in time, the external view of a channel's state transitions from its state before a request is processed to its state after the request is successfully processed in an atomic manner. The external view (as opposed to internal view which is what the code processing requests “sees”) of a channel's state will never reflect a partially completed network request. It will reflect the channel's state either before or after a request has been processed, and never anything in-between. A channel may be required to engage with its downstream channel, but, if so, this communication constitutes a separate request-response transaction.
  • However, to maintain the highest levels of file consistency whereby each DDS request either “happens completely” or “does not happen at all”, DDS flushes are atomic and synchronous from the DDS client terminator site all the way through to the origin filesystem. For large multi-transfer flushes, intermediate nodes may be simultaneously forwarding transfer n while receiving transfer n+1.
  • An upstream node does not consider a flush successful until it receives a DDS_OK response with the ALL_DATA_RECEIVED response flag also set. Flush data is maintained in shadow extents at upstream nodes until the upstream node receives confirmation that the origin filesystem has successfully received all flush data. For large multi-transfer flushes, each transfer flows through intermediate nodes independently of the other transfers. All nodes store the received flush data in shadow extents. When the server terminator site receives the last flush transfer (DATA_COMPLETE request flag is set), it writes all received flush data to the origin filesystem. After receiving confirmation that the write was successful, the server terminator site replaces all extents that have shadow extents with their shadows and then dispatches a DDS_OK response with the ALL_DATA_RECEIVED response flag also set back upstream. As the response propagates through each intermediate node, each node also promotes its shadow extents.
  • FIG. 9 is a flowchart illustrating an example of a method 900 of atomic flush transactions. In the method 900 DDS flushes are atomic and synchronous from the client terminator site to the server terminator site.
  • FIG. 9 shows that the method 900 can include receiving 902 a flush request from an upstream node. I.e., modified file data/metadata is received 902 with the intent that the modified file data/metadata is saved to some form of stable memory, as described below.
  • FIG. 9 additionally shows that the method 900 can include storing 904 the received flushed data in shadow extents. I.e., the data received 902 in the flush request from the upstream node is stored in shadow extents.
  • FIG. 9 also shows that the method 900 can include determining 906 if this site is the server terminator site.
  • When this site is the server terminator site, FIG. 9 shows that the method 900 can include determining 908 if all batches of the flush (multi-transfer flushes have more than one) have been received and the DATA_COMPLETE request flag was set in one of the requests. (Note that it is not uncommon for network requests to be received and/or processed out of sequence with respect to when they were dispatched by the upstream node.)
  • When all flush data has been received and the DATA_COMPLETE request flag is set, FIG. 9 shows that the method 900 can include determining 910 if synchronous writes are enabled for the file identified in the flush request and when synchronous writes are enabled, writing 912 all received flush data to the origin filesystem.
  • FIG. 9 then shows that the method 900 can include determining 914 whether the filesystem write operation completed successfully. When a write error occurs, FIG. 9 shows that the method 900 can include responding 916 to the flush request with a status that conveys the write error code.
  • When a write error does not occur or synchronous writes are not enabled for the file identified in the flush request, FIG. 9 shows that the method 900 can include setting 918 the ALL_DATA_RECEIVED response flag; promoting 920 the flushed data in the shadow extents; and responding 922 to the received flush request with a status code of DDS_OK. Furthermore, when the DATA_COMPLETE request flag is not set, FIG. 9 shows that the method 900 can include simply responding 922 to the received flush request with a status code of DDS_OK.
  • When this site is not the server terminator site, FIG. 9 shows that the method 900 can include forwarding 924 the flush request to a downstream node. I.e., the site forwards 924 the flush request to ensure that the flush request continues until it reaches the server terminator site.
  • FIG. 9 further shows that the method 900 can include receiving 926 a response to the flush request from the downstream node. I.e., because the flush) request was forwarded 924, the site waits for confirmation that the downstream site (and all sites to which the downstream site forwarded the request) has responded to the flush request.
  • FIG. 9 further shows that the method 900 can include determining 928 if the response status is DDS_OK. I.e., the site determines 928 that the downstream site (and all sites to which the downstream site forwarded the request) has successfully completed the flush request.
  • When the response status is not DDS_OK (the flush request was not successfully processed downstream), FIG. 9 further shows that the method 900 can include decrementing 930 a retry count and, if the count is greater than zero, re-forwarding 924 the flush request along the same route as the previous request. However, if the retry count equals zero, FIG. 9 further shows that the method 900 can include declaring 932 a network failure, which will initiate a search for an alternate route to the server terminator site.
  • When the response status is DDS_OK, FIG. 9 further shows that the method 900 can include determining 934 if the ALL_DATA_RECEIVED response flag is set, indicating that the server terminator site has received and accepted all flush data.
  • When the ALL_DATA_RECEIVED response flag is set, FIG. 9 further shows that the method 900 can include promoting 920 the flushed data in the shadow extents. All extents that have shadow extents are replaced with their shadows. I.e., the changes are made permanent and the data is no longer stored within a shadow extent. Any further access of the file will receive the updated file, rather than the pre-update file.
  • FIG. 9 additionally shows that the method 900 can include responding 922 to the received flush request with a code of DDS_OK. As the response propagates through each DDS node, each node also promotes its shadow extents if the ALL_DATA_RECEIVED response flag is also set. This ensures that each node is working from the same concurrent data. The response, therefore, propagates upstream until the originating node receives it, thereby becoming aware that the flush request has been completed successfully.
  • 5. Distributed Consistency Mechanism
  • DDS's consistency mechanism is woven into the DDS Protocol. Every DDS file access request provides an indication of whether the client intends to modify the returned file data or just read it.
  • DDS implements an internal rule: a DDS client site must first inform its downstream DDS site before it performs a new type of activity (read/write) on a file. So, for example, a client node that has previously fetched a complete file for reading cannot begin writing without first informing the downstream site that it intends to begin writing. This allows the server-side to detect the onset of a concurrent write sharing (CWS) condition and take whatever steps are necessary to maintain cache consistency before the client-side actually performs the write operation.
  • The consistency mechanism ensures that, at any instant, only a single DDS site is modifying a file. Therefore, at the moment when a site with dirty data (modified data) becomes partitioned from its downstream server-side counterpart, all other sites are guaranteed to not have any file modifications. The isolated site holds the most recent file modifications and all portals still connected to the server terminator site provide a consistent view of the file “just before” the latest (and now isolated) file write. The isolated site can flush the file modifications to any DDS site providing connectivity to the source file, including the server terminator site itself.
  • 6. “Versioned” File Modifications
  • 6.1. Validating a Cached File Image
  • Modern computer systems typically employ internal clocks with nanosecond or microsecond resolutions. When DDS code executes on any computer system other than the origin server, the clock used by remote DDS nodes cannot be synchronized with the clock used by the origin filesystem's code to the level of precision required to support standard filesystem operations. So, filesystem timestamps may only be set at the origin server site.
  • DDS file access responses always include the target file's attributes, which are cached and stored in association with the file's data. The file's last modification timestamp, a file attribute element, is used as a version number for cached images. At DDS sites the cached image of a file's last modification timestamp may be referred to as the file's version number. The two are the same. The differing nomenclature relates to how the attribute is interpreted and used at DDS sites.
  • Upstream DDS sites use a file's timestamp (a file attribute), the timestamp arrival time and a delta time to generate and maintain projected timestamps. Projected timestamps, which enable DDS upstream sites to operate autonomously (the main point of file caching), are temporary timestamps that are replaced with (upgraded to) genuine filesystem timestamps whenever a server terminator site accesses file data in the origin filesystem.
  • The elements of a projected timestamp are:
      • timestamp—a cached image of a file timestamp (for Unix-like systems: atime—time of last access, mtime—time of last modification, ctime—time of last status change),
      • timestamp arrival time—the time at which a particular timestamp is received in a response from a DDS downstream site, and
      • delta time—the difference between the current time and the timestamp arrival time.
  • Filesystems native to Unix-like systems typically maintain three timestamps for each file:
      • a_time—time of last access,
      • m_time—time of last modification, and
      • c_time—time of last status change.
        DDS upstream sites generate projected timestamps for each of these filesystem timestamps and therefore maintain three arrival times and three delta times: a_arrival/a_delta, m_arrival/m_delta and c_arrival/c_delta. The description of projected timestamps in the remainder of this document focuses on the generation of projected timestamps for the time of last modification. However, similar methods are used to generate projected timestamps for a_time and c_time.
  • When a response to a flush request with status DDS_OK is received and processed, the last modification timestamp contained in the response becomes the version number, the m_arrival time is set to the current time, and the m_delta time is set to zero. Whenever a response to any other type of DDS file access request is received, the version number is compared with the cached one. When they differ, the client knows the file has been modified at some other site and its image is no longer valid.
  • FIG. 10 is a flowchart illustrating an example of a method 1000 of validating a cached file image when a response received at an upstream site contains the same version number as the one associated with the cached file image; and invalidating the cached file image, storing the new version number, setting the m_arrival time to the host system's current time, and resetting the m_delta time to zero when a response contains a different version number than the one associated with the cached file image.
  • FIG. 10 moreover shows that the method 1000 can include dispatching 1002 a DDS file access request to the downstream DDS site. I.e., the node sends a request to read, write, save, etc. a file from a downstream site.
  • FIG. 10 also shows that the method 1000 can include receiving 1004 a response to the DDS downstream request. I.e., the node receives any response from the downstream site, regardless of whether the access was successful or not.
  • FIG. 10 further shows that the method 1000 can include determining 1006 if the DDS request was processed without any errors and the response status is therefore DDS_OK. And, when there is an error, that the request is repeated. However, what is not depicted is that after a few unsuccessful re-attempts the upstream site will declare a network failure and may begin searching for an alternate route to the DDS server terminator site.
  • FIG. 10 further shows that the method 1000 can include determining 1008 whether the received version number is the same as the version number associated with the cached file image. Note that the response to any successful DDS file access request (response status is DDS_OK) conveys the file's attributes, which include the version number (m_time, the last modification timestamp).
  • FIG. 10 also shows that the method 1000 can include continuing to use 1010 the current cached file image (revalidating the current file image) when the received version number is the same as the version number associated with the cached file image. I.e., because this site is using the current version, the current file image is revalidated and is continued to be used.
  • FIG. 10 additionally shows that when the received version number is not the same as the version number associated with the cached file image, the method 1000 can include storing 1012 the new version number, setting the m_arrival time to the host system's current time, and resetting the m_delta time to zero.
  • FIG. 10 further shows that the method 1000 can include determining 1014 whether the response is a flush response. And when the response is not a flush response the method 1000 can include discarding 1016 the current cached file image. A DDS flush request flows through to the origin file server in a single atomic operation and the response returns the last modification timestamp as the file's version number. So, a file's version number always changes when a flush response with a status of DDS_OK is received. (When the version number changes on any other response, the file has been modified at some other site and the cache image at this site is therefore not current and must be discarded.)
  • FIG. 10 further shows that the method 1000 can include determining 1018 if the response has the CACHING_ENABLED flag set and caching 1020 any new response data and metadata when the CACHING_ENABLED flag is set.
  • 6.2. Flushing a Modified File Image
  • Upstream DDS sites provide better (faster) write performance when client file modifications are captured at the site and not immediately flushed downstream. “Collecting” many client file modifications before sending a “batch flush” downstream employs the underlying network infrastructure far more efficiently and provides a more responsive file access service to the client. However, when a DDS client terminator site acknowledges a client write request before the new data has been successfully flushed downstream, the possibility arises that a future network failure could cause the new data to be lost.
  • DDS provides a distributed file service and is therefore always balancing performance against filesystem integrity. Performance is increased when DDS client terminator sites operate autonomously, but the risk of losing file modifications is also increased. Both the administrator and the user can set or adjust policies affecting this balancing act.
  • The most risk adverse policy instructs client terminator sites to immediately forward file modifications on to the server terminator site and to not respond to a client write request until the origin server acknowledges the successful receipt of the new data. This mode, referred to as synchronous writes, traverses the full network path from client application to origin server on every write operation.
  • More risky, performance oriented policies allow file modifications to be aggregated and batch flushed. This mode is often referred to as delayed writes. Generally, a timer, controlled by policy, initiates batch flushes in this mode of operation.
  • When modified file data is flushed, the file data flowing downstream includes the version number. It will be the same as the version number at the DDS server terminator site and all intermediate DDS nodes unless some previous network partition event prevented consistency control messages from being delivered.
  • Once a DDS client site detects that it is isolated from its server-side companion (it employs fast pings to detect this quickly), it immediately invokes procedures to re-establish communication with a DDS node still “connected” to the server terminator site. Once the isolated, and now paranoid, node reconnects, it usually begins immediately flushing all dirty file images.
  • Each flush request is tagged with the cached file image's version number. Whenever a DDS node processes a flush request, it compares the version number in the flush request with the version number associated with its cached image of the file identified in the flush request. When there is a mismatch the flush is rejected and sent back to the client with an error code of OUT_OF_SEQUENCE. This error code is then returned to the client application, which will have to resolve this issue.
  • Version numbers will usually match and the incoming file modifications will be accepted. However, when a substantial amount of time passes while the client node is processing file requests using only cached file image data (no communication with the downstream site), the likelihood of a version mismatch increases. Of course, when a DDS site processes file requests independently of its downstream site, it is relying upon DDS's consistency callback mechanism for immediate notification when a concurrent write sharing condition arises and it is relying on fast pings to continually reassure itself that the callback path is operational.
  • FIG. 11 is a flowchart illustrating an example of a method 1100 of flushing modified file data downstream towards the DDS server terminator site; re-validating the modified file data at each DDS site; and, when all flush data has arrived at the server terminator site, possibly writing all file modifications to the origin filesystem and then fetching the file's last modification timestamp.
  • FIG. 11 shows that the method 1100 can include receiving and servicing 1102 a request from a client application and updating the appropriate delta time, which is m_delta for write requests, a_delta for read requests and c_delta for requests that modify the file's attributes.
  • FIG. 11 also shows that the method 1100 can include determining 1104 if it is time to flush file modifications downstream. If it is not time to flush the file modifications downstream, the site continues to service client file access requests. When synchronous writes are being used for the file identified in the request, it will always be time to flush the modifications downstream. For delayed writes, flushes may be initiated when a delay timer expires. The delay timer may be reset on every write request or every file access request. So, for example, a client terminator site might initiate a batch flush operation 15 seconds after receiving the last of many write requests.
  • FIG. 11 further shows that the method 1100 can include flushing 1106 all file modifications downstream. The file's version number accompanies the flush data.
  • FIG. 11 additionally shows that the method 1100 can include a downstream DDS node receiving 1108 the flush request and comparing the received version number with the version number associated with the downstream node's cached file image.
  • FIG. 11 moreover shows that the method 1100 can include determining 1110 if the version numbers differ. I.e., the version numbers are compared.
  • FIG. 11 also shows that the method 1100 can include the downstream DDS node responding 1112 OUT_OF_SEQUENCE to the upstream node if the version numbers differ.
  • When the version numbers are the same, FIG. 11 then shows that the method 1100 can include the storing 1114 the received flush data in shadow extents.
  • FIG. 11 further shows that the method 1100 can include determining 1116 if the downstream DDS node is the server terminator site. If the downstream DDS node is not the server terminator site then the steps 1106-1116 are repeated until the server terminator site is reached.
  • FIG. 11 also shows that when the flush request is processed at the server terminator site, the method 1100 can include determining 1118 that all requests in this flush operation have been received and that one of the requests had the DATA_COMPLETE request flag set.
  • When not all requests in this flush operation have been received with one of the requests having the DATA_COMPLETE request flag set, FIG. 11 shows that the method 1100 can include responding 1120 to the upstream node with a status of DDS_OK.
  • FIG. 11 additionally shows that the method 1100 can include determining 1122 whether synchronous write mode is being used for the file identified in the request. When synchronous write mode is not being used, FIG. 11 shows that the method 1100 can include responding 1120 to the upstream node with a status of DDS_OK.
  • When synchronous write mode is being used, FIG. 11 shows that the method 1100 can include writing 1124 all file modifications to the underlying origin filesystem.
  • FIG. 11 further shows that the method 1100 can include determining 1126 whether the write to the origin filesystem was successful. When the write to the origin filesystem is not successful, FIG. 11 shows that the method 1100 can include responding 1128 to upstream node with a status that indicates the type of error that occurred.
  • When the write to the origin filesystem is successful, FIG. 11 shows that the method 1100 can include setting 1130 the ALL_DATA_RECEIVED response flag and replacing all extents that have shadow extents with their respective shadow extents.
  • FIG. 11 then shows that the method 1100 can include fetching the file's last modification timestamp from the origin filesystem. This timestamp, which conveys the time (according to the origin filesystem clock) that the write 1124 was performed, will be interpreted at upstream sites as the file's version number.
  • FIG. 11 finally shows that the method 1100 can include responding 1120 to upstream node with a status of DDS_OK.
  • 6.3. Generating Timestamps at Client Terminator Sites
  • When a client system (a workstation, for example) receives a response to a file write request, the response includes file's attributes, an element of which is the file's last modification timestamp. For synchronous writes, this timestamp will be the correct timestamp generated by the origin filesystem when it received the new file modification. However, when delayed writes are employed for faster performance and the DDS client terminator site is operating autonomously, the DDS instance executing at the client terminator site generates the response's last modification timestamp as follows:

  • last modification timestamp=version number+m_delta time;

  • where:

  • m_delta time=current time−m_arrival time.
  • The version number is the origin server's last modification timestamp and only the origin server updates it. Upstream sites use the last modification timestamp as the file's version number. Every file modification performed by the origin server creates a new file version.
  • The timestamp generated by the client terminator is a temporary timestamp that is accurate enough to enable the client application to believe it is accessing the most current version of the file (which, in fact, it is). The timestamp monotonically increases by a reasonable amount on every file modification request and it is periodically resynchronized (whenever dirty file data is flushed downstream) with the origin server's timestamp. This behavior helps to maintain the illusion that the DDS service is provided by a single local filesystem “within” the computer where the client application is executing.
  • Similar procedures may also be used to project other temporary filesystem timestamps such as a file's last access timestamp and last change timestamp.
  • 7. Path Signatures
  • The path signature, included in some DDS responses, defines the current route back to the origin server. Path signatures have the following structure:
      • typedef struct dds_path_signature {int n_hops; //number of DDS hops to server site long node_signature[16]; //4 byte node identifier of a DDS node }DDS PATH SIGNATURE;
  • Whenever an upstream DDS node receives a successful connection response (DDS_OK status and the INITIAL_FILE_CONNECTION response flag is set), the path signature contained within the response is copied into the channel.
  • During the course of processing a request from an upstream site currently not connected to the target file, the downstream site will establish a connection to the upstream site and then include in its request response a path signature constructed by adding its signature to the channel's path signature.
  • At any instant, a channel's path signature reflects the last successful path used to access file data.
  • When a network failure occurs DDS initiates procedures to re-establish a connection to a file. A client-side node will successively direct a reconnection request down each of its alternate paths to the origin server; stopping as soon as one of the reconnection attempts is successful. After trying all paths without successfully reconnecting to the file, the client-side node will return the DDS_BAD_LINK error code to the client that was accessing the file when the failure was discovered.
  • A reconnection request, which is a DDS request with the DDS_RECONNECT flag set, contains the channel's path signature. The server-side node receiving and processing the request uses the path signature to recognize the client as a current client and to reconcile its consistency control data structures. So, for example, when a writing client reconnects, the server-side node does not “see” a second writer and declare a concurrent write sharing condition. It “sees” a client that has been modifying a file now attempting to access the file through a new path.
  • A reconnection request propagates downstream until a node recognizes that its signature is contained in the path signature. If this node is more than one level lower in the cache hierarchy, the upstream nodes along the previous path must be informed so that the old path's consistency control structures can be reconciled.
  • A reconnection request may also be issued when there is no network failure. An upstream node, deciding to re-balance its downstream traffic, may issue a reconnection request at any time.
  • 7.1. Path Signature Upstream Propagation
  • FIG. 12 is a flowchart illustrating an example of a method 1200 employed by a DDS node to load a path signature when a connection is established; and for that node to add its signature to the path signature that it sends upstream whenever an upstream node establishes a file connection.
  • FIG. 12 illustrates that the method 1200 can include receiving 1202 a DDS request. I.e., a DDS request is received from an upstream node.
  • FIG. 12 also shows that the method 1200 can include acquiring 1204 the channel for the file identified in the request. When the channel does not already exist, a new channel is created and assigned to the identified file.
  • FIG. 12 further illustrates that the method 1200 can include determining 1206 if it is possible to service the request without communicating with the downstream node. This would be the case when all required file data is cached and valid at the site, and the channel at this site is already “connected” to a downstream site or this site is operating in disconnected mode.
  • When a downstream communication is required, FIG. 12 also shows that the method 1200 can include dispatching 1208 a request to a downstream site and then receiving 1210 a response from the downstream site.
  • FIG. 12 then shows that the method 1200 can include determining 1212 if the response contained an indication that the downstream site just established a file connection to this site.
  • When the downstream site indicates that it has just established a file connection to this site, FIG. 12 shows that the method 1200 can include copying 1214 the path signature from the response to the channel.
  • When 1206 determines that a downstream communication is not required, FIG. 12 shows that the method 1200 can include servicing 1216 the request.
  • Then FIG. 12 illustrates that the method 1200 can include determining 1218 if the upstream site that sent the request now being processed is establishing an initial file connection.
  • When the upstream site is establishing an initial file connection, FIG. 12 shows that the method 1200 can include copying 1220 the path signature from the channel to the response, adding this node's signature to the response's path signature, and then setting the INITIAL_FILE_CONNECTION flag in the response.
  • Finally, FIG. 12 shows that the method 1200 can include dispatching 1222 a response back to the upstream DDS client.
  • 7.2. Path Signature Based Reconnection
  • FIG. 13 is a flowchart illustrating an example of a method 1300 employed to reconnect an upstream DDS node.
  • FIG. 13 illustrates that the method 1300 can include receiving 1302 a DDS request from an upstream site.
  • FIG. 13 also shows that the method 1300 can include acquiring 1304 the channel for the file identified in the request. When the channel does not already exist, a new channel is created and assigned to the identified file.
  • FIG. 13 further illustrates that the method 1300 can include determining 1306 if the DDS_RECONNECT flag is set in the request. When the DDS_RECONNECT flag is not set, the procedure for reconciling the upstream site structures at this site (and possibly sites above) is bypassed.
  • FIG. 13 then shows that when the DDS_RECONNECT flag is set the method 1300 can include determining 1308 if the path signature contained in the request identifies this node as a member of the “old” path. When the node is not a member of the “old” path, FIG. 13 illustrates that the method 1300 can include resetting 1310 the channel's attributes valid (ATTRS_VALID) flag to force this node “go downstream” to fetch valid attributes.
  • FIG. 13 shows that when this node is a member of the “old” path the method 1300 can include sending 1312 a DDS RECALL or INVALIDATE message to the “old” upstream site identified in the path signature.
  • FIG. 13 also shows that the method 1300 can include determining 1314 if a response to the DDS RECALL or INVALIDATE message is received.
  • FIG. 13 shows that when a response is received the method 1300 can include updating 1316 the upstream site structure of the “old” upstream site identified in the path signature.
  • FIG. 13 further illustrates that when a response is not received the method 1300 can include declaring 1318 the “old” upstream site identified in the path signature to be OFFLINE and recording this status by setting the OFFLINE flag in the upstream site structure of the “old” upstream site.
  • FIG. 13 finally shows that the method 1300 can include servicing 1320 the DDS request and then dispatching a response back to the upstream DDS node.
  • Note that this procedure may be repeated at multiple intermediate sites until a site on the “old” path is encountered. Resetting the ATTRS_VALID flag (step 1310 of FIG. 13) continually pushes a DDS_RECONNECT request further downstream until a site currently connected to the origin server is encountered. DDS_RECONNECT requests contain the full path signature of the upstream client node requesting the reconnection.
  • 8. Any Port in a Storm
  • The DDS consistency mechanism ensures that all DDS portals are equivalent. The any port in a storm feature allows a DDS client node to switch to a new downstream node at any time. So, whenever a client node feels isolated it can elect to find a new partner.
  • The new partner may be at the same hierarchical level as the partitioned site, or it may be at any level closer to the server terminator site. When the new partner is at the same hierarchical level as the partitioned site, it must, in addition to responding to the request, send a message downstream informing that site that an upstream site has switched partners. This allows the downstream site to revoke whatever permissions it had granted the isolated site (isolated from the client attempting to reconnect and possibly also isolated from this site) and grant permissions along this new path.
  • Note that the any port in a storm feature, which has a downstream orientation, is only possible because of the consistency feature that has an upstream orientation. These two features work together to provide a highly consistent file access service layered on top of inherently unreliable networks such as the Internet.
      • The any port in a storm feature provides the resiliency required for extremely reliable communications.
      • The DDS consistency mechanism is completely dependent on extremely reliable communications.
      • The any port in a storm feature is completely dependent on the DDS consistency mechanism.
  • Neither feature, the DDS consistency mechanism or any port in a storm, can stand alone. But, when intertwined, they provide the solid foundation required for providing very strong consistency guarantees over highly distributed, unreliable networks.
  • 9. Filehandles are Forever
  • DDS filehandles, patterned after NFS filehandles, are permanent filehandles. The file server generates an opaque file identifier during the processing of a lookup request and passes it back to a client. The client then uses this filehandle as a reference point in future read/write requests.
  • There is no timeout on the validity of a DDS (or NFS) filehandle. It is valid forever. The client system may present a filehandle received ten years ago (and not used since) and the file server must connect to the same file or return the error STALE_FILEHANDLE if the file no longer exists.
  • U.S. patent application Ser. No. 12/558,482 (Nomadic File Systems) discloses the construction of globally unique permanent filehandles.
      • However, a remote NFS client accessing the same file would receive a file handle that uniquely identified the file forever. An NFS client may present a file handle that it received ten years ago and hasn't used since back to the file server and that server must either establish a connection with the original file or respond with an error indicating that the file handle is no longer valid. (A file handle is another type of object ID.)
      • A method commonly used by Unix based NFS file servers to create a permanent file id is to concatenate two 32 bit numbers, the inode number and the inode generation number, to create a 64 bit file id. Since each time an inode is assigned to a new file its generation number is incremented, an inode would have to be re-used over 4 billion times before a file id of this type could repeat. These 64 bit file ids are essentially good forever.
  • Permanent filehandles facilitate network error recovery operations by reducing to a bare minimum the amount of distributed state required for a disconnected DDS client site to successfully reconnect with the origin server. A DDS client needs nothing more than a filehandle to reconnect to a file. The DDS client does not need to know what directory, what filesystem or even what file server.
  • Server-Side Operations
  • When a concurrent write sharing condition arises, DDS server nodes must RECALL a modified upstream image projection (if there is one, and there can be only one) or INVALIDATE all upstream image projections supporting accelerated read operations.
  • An INVALIDATE or RECALL operation proceeds as follows:
      • A concurrent write sharing condition (multiple clients active on a file and at least one of them is writing) is detected at the onset of processing a file access request.
      • Notifications are prepared for each upstream site except the one that sent the request that precipitated this CWS condition. Each note contains two elements: a) a file identifier and b) an opcode (RECALL or INVALIDATE).
      • Each notification is dispatched to an upstream site using a self addressed stamped envelope (SASE) that the upstream site had previously sent to the downstream site.
      • The upstream site responds by dispatching a DDS_FLUSH request. At a minimum, this request contains a flag indicating that this request is an acknowledgement to the RECALL/INVALIDATE notification. In the case of a RECALL, the request also contains all file modifications.
      • After all upstream sites have responded, the downstream site proceeds with processing the original request.
  • When an upstream site is partitioned from its downstream site, the upstream site never receives the notification. So, the downstream site never receives confirmation that the upstream site has invalidated its cached image of the file.
  • When this occurs the downstream site may decide (depending on the currently established policies) to give up on the isolated site and continue servicing other clients. So, the downstream site may declare the upstream site OFFLINE and record that status in the upstream site structure (uss) associated with the upstream site. Then the downstream site, based on the established policy, may do one of the following:
      • The downstream site waits for communications to be re-established, receives confirmation from all upstream sites, and finally proceeds with processing the original request.
      • The downstream site proceeds with processing the original request. In this case, the isolated site is sidelined so that the server-side site can continue servicing the other client sites. The sidelined site will be dealt with and re-integrated when communications are re-established.
  • When communications are restored, the upstream site will promptly send a SASE. If the downstream site had chosen to wait it will quickly bounce the SASE back with the same note it had sent earlier.
  • During a network partition event, all DDS nodes on the server-side of the partition have a primary responsibility to protect and ensure the integrity of all DDS filesystem content. DDS must never lose or mangle data once that data has been successfully written to a DDS node.
  • The primary directive for DDS server-side nodes is therefore:
  • Never Take any Action that can Possibly Result in the Loss or Corruption of Filesystem Data.
  • So, during a network partition event, DDS server-side nodes act reflexively to ensure filesystem integrity. Then, throughout the partition event, the server-side nodes continue providing file services in conformance with the primary directive. However, these server-side nodes are pledged to filesystem integrity and have no real obligations to any client. DDS server-side nodes will quickly refuse any request that is not consistent with the PD.
  • Client-Side Operations
  • Although the DDS server-side of a network partition is not dedicated to servicing client systems, the other side of the partition is completely focused providing the best client service possible given the current circumstances. In particular, DDS client-side nodes are responsible for ensuring that no delayed write data (file modifications that have not yet been flushed to the origin) is ever lost.
  • The domain policy attributes in effect at each client-side node determine what actions the node performs when a network partition is detected, but a client-side node will generally safeguard its data first and then attempt to reconnect to the server-side and flush all file modifications back to their respective origin servers.
  • DDS client nodes continuously monitor the health and performance of all downstream paths, and for every path the client node is aware of all alternate paths. The statistics maintained for each path include:
      • running average of bytes/second,
      • average response time latency, and
      • uptime percentage.
  • When a DDS client node issues a request and fails to receive a response within a timeout window (typically 2 or 3 seconds), the node retransmits the request several times before declaring the downstream node OFFLINE.
  • When the downstream site is declared OFFLINE, the DDS client node simultaneously tries all alternate paths and selects the path with the best performance.
  • After all alternates have been tried, and none have been successful, DDS syncs all modifications to files from the disconnected downstream server to the site's stable memory (hard disk or flash memory). When downstream communications are restored, all file modifications are immediately flushed downstream.
  • Once all file modifications have been secured in stable memory, DDS either:
      • continues attempting to establish communication along original path and all alternate paths; or
      • returns an error indication to the client workstation.
  • In addition to performing whatever downstream communications are required to support the file access services the upstream site is providing, the upstream site also pings the downstream site once a second (typically). This enables the upstream site to quickly detect a partition event and, depending on the established policies, possibly stop providing access to cached files affected by the partition event.
  • When the upstream site fails to receive a reply to its ping, it retransmits the request several times before declaring the downstream node OFFLINE.
  • The upstream site will continue to periodically ping the downstream site. When a response is finally received, the site is declared ONLINE and both sites may begin re-synchronizing.
  • Site re-synchronization basically consists of re-syncing a series of individual files. And this is a straightforward process with one exception: an out of sequence write (which is explained in the following section).
  • Out of Sequence Writes
  • In the case where an upstream site has been modifying the file foo when a partition event occurs, it is possible that:
      • a CWS condition can arise, and
      • the RECALL message is not delivered, and
      • the upstream site modifies its image of foo before
      • the upstream site has detected the partition event via its fast ping polling.
  • At some point, the downstream site:
      • finally gives up on the upstreamer that has not responded to the RECALL notification,
      • marks the upstreamer OFFLINE, and
      • moves on with providing other clients with access to foo.
  • Then another client modifies foo.
  • Finally, the network is fixed and the upstreamer comes back ONLINE. When the upstreamer attempts to flush foo downstream: the downstream site MUST detect that the modification performed at the upstream site did not use the most current version of foo, and the request must be rejected with error code OUT_OF_SEQUENCE.
  • III. Recovery Procedures and Recovery Routines
  • Server-side nodes assume a rather passive role during a network failure. These nodes must receive file modifications that are being flushed downstream, but that is also what they do when the network is not broken. The main difference during a network failure is when a server-side node receives a flush with a path signature indicating an upstream site is reconnecting, it executes a recovery routine to transfer read/write permissions from the isolated upstream site to the site that sent this flush.
  • Client-side nodes bear almost the entire burden of recovering from network failures because they have service commitments to active clients that must be maintained. The client-side node “closest to” a failed network component will be the node that detects the failure and declares the network partitioned. The node will then execute a recovery procedure.
  • The recovery procedure will sequentially call upon various recovery routines, which are described in the following section.
  • Client-Side Recovery Routines
  • int dds_monitor_ds_paths(CHANNEL *)
  • This routine is continuously executed by a thread dedicated to monitoring all downstream paths and maintaining performance statistics associated with each path. When a failure occurs, the information gathered by dds_monitor_ds_paths( ) may be referenced to quickly determine the best alternate path.
  • dds_monitor_ds_paths( )maintains a path_state mini-database of all downstream paths emanating from this node. This routine may also periodically send its current path_state to a node that maintains and presents a global view of network operations.
  • int dds_reconnect(CHANNEL *cp, int mx)
  • This routine re-establishes a connection to the origin server for the file identified by the filehandle contained within the channel (cp->fh). dds_reconnect( ) is optimized for the speed of re-establishing a downstream connection because DDS nodes really don't like to be disconnected. This routine may reference the path_state mini-database to quickly select the most appropriate alternate path for the reconnection attempt.
  • A dds_reconnect( ) request always contains the issuing site's path signature and it always propagates through to the first node that is common to both the ‘old’ and the ‘new’ paths.
  • int dds_flush(CHANNEL *cp)
  • A dds_flush( ) request flows from a DDS client terminator site to a server terminator site as a single atomic network transaction. For large multi-transfer flushes, intermediate nodes may be simultaneously forwarding transfer n while receiving transfer n+1.
  • DDS flushes are atomic and synchronous. A flush is not considered successful until the client receives the server's OK response. A shadow copy is kept until a node receives an OK response from its immediate downstream site.
  • For large multi-transfer flushes, each transfer flows through intermediate nodes independently of the other transfers. All nodes store the flush data in shadow extents. When the server terminator site receives the last flush, it moves all shadow extents into the sunlight and dispatches an OK response back upstream. As the response propagates through each intermediate node, each node also sunlights its shadow extents.
  • Server-Side Recovery Routines
  • int dds_transfer_permissions(CHANNEL *cp, int to_mx, int from_mx)
  • This routine RECALLS or INVALIDATES the file image at the upstream node ‘from_mx’ and simultaneously transfers whatever permissions node ‘from_mx’ had to node ‘to_mx’. Node ‘from_mx’ may respond to the RECALL/INVALIDATE message or not. This server-side node does not really care. It has marked node ‘from_mx’ OFFLINE (for this file) and will handle all reintegration issues later when node ‘from_mx” attempts to reconnect this file.
  • int dds_recall(CHANNEL *cp, int mx)
  • This routine RECALLS or INVALIDATES the file image at an upstream node.
  • IV. DDS Failure Recovery Procedure
  • When a network component fails client-side nodes detect the failure and drive the transparent recovery procedure, during which the DDS filesystem client remains completely unaware of the failure.
  • The DDS failure recovery procedure operates in the following manner:
      • 1. A client-side node dispatches a request downstream and is waiting for a response. If a response is not received within a timeout period, control will be returned to the thread that issued the request with an indication that the downstream site failed to respond.
      • 2. The request thread calls dds_monitor_ds_paths(cp) to determine whether the path to the downstream site is operational. This routine, constantly fast pinging the downstream site, is the authority with respect to determining link status.
      • 3. If the link is still good, the thread will just re-issue the request again. This request will have the same message identifier (xid) as the original. If the downstream site did respond to the previous request, the response to this request is guaranteed to be the same response (thanks to the duplicate request cache).
      • 4. If the link is down, the request thread references the path_state mini-database and selects the ‘best’ alternate path and then calls dds_reconnect(cp, mx) to re-establish a connection to the DDS server-side network. (Or alternatively, this thread may spawn a bunch of threads that would simultaneously attempt to re-establish connections to the DDS server-side network. Then, when several of the attempts are successful, one of the paths would be selected and the others could be disconnected or just allowed to atrophy.)
      • 5. If the reconnect attempt is not successful, the request thread responds to the request (that it had originally received from an upstream client) with an error status of DDS_BAD_LINK. The upstream site will treat this error status the same as a TIMEOUT. So, the request thread at the upstream site will start executing step 2 above.
      • 6. If the reconnect attempt is successful, the request thread immediately flushes all dirty data downstream. This flush operates as previously described: it is atomic and synchronous all the way to the server terminator site.
      • 7. Once all file modifications have been secured, DDS operations switch into “normal” mode. The filesystem client is still accessing the file, but the path has been changed. The client remains unaware of the change.
    Reintegration of Isolated Nodes
  • When an isolated network segment “comes back online”, isolated nodes automatically re-synchronize their images of all files “crossing” the link that had failed.
  • Channels containing modified data have the highest priority. Each is flushed downstream. The server-side will accept or reject each flush on an individual basis. A flush will only be rejected if the file was modified somewhere else while this client-side node was partitioned. The client-side node must have a means of handling a flush rejection. This probably includes a) saving the dirty data that was just rejected, and b) notifying the user that there is a conflict that must be resolved.
  • The reintegration process is essentially complete after all dirty files have been flushed. Files that were being read before the partition event occurred can now be read again. No special processing is required. Cached file images will be updated on a demand basis if the source file has been modified since the image was fetched.
  • The present invention may be embodied in other specific forms without departing from its spirit or essential characteristics. The described embodiments are to be considered in all respects only as illustrative and not restrictive. The scope of the invention is, therefore, indicated by the appended claims rather than by the foregoing description. All changes which come within the meaning and range of equivalency of the claims are to be embraced within their scope.

Claims (20)

What is claimed is:
1. In a computing system where a data request has been passed between an upstream site and a file service proxy cache node, the file service proxy cache node being a network node located between the upstream site and the origin file system node, a non-transitory computer-readable storage medium including instructions that, when executed by the file service proxy cache node, performs the steps:
receiving a flush request from the upstream site, wherein the flush request includes:
a request to save flush data contained in the flush request to a stable memory;
storing the flush data in a shadow extent;
dispatching a downstream flush request to a second file service proxy cache node;
receiving a response to the downstream flush request from the second file service proxy cache node; and
if the response to the downstream flush request includes a status code that indicates the successful completion of the request
sending a flush response that includes the status code that indicates the successful completion of the request to the upstream site.
2. The system of claim 1, wherein the upstream site includes a third file service proxy cache node.
3. The system of claim 1, wherein the second file service proxy cache node includes the origin file system.
4. The system of claim 1, wherein the flush request includes a flush level.
5. The system of claim 4, wherein the flush level specifies the file service proxy cache node's flash memory as the flush destination.
6. The system of claim 4, wherein the flush level specifies the file service proxy cache node's disk as the flush destination.
7. The system of claim 4, wherein the flush level specifies the origin file system as the flush destination.
8. In a computing system where a data request has been passed between an upstream site and a file service proxy cache node, the file service proxy cache node being a network node located between the upstream site and the origin file system node, a non-transitory computer-readable storage medium including instructions that, when executed by the file service proxy cache node, performs the steps:
receiving a flush request from the upstream site, wherein the flush request includes:
a request to save flush data contained in the flush request to a stable memory;
storing the flush data in a shadow extent; and
if the file service proxy cache node is not a server terminator site:
dispatching a downstream flush request to a second file service proxy cache node;
receiving a response to the downstream flush request from the second file service proxy cache node; and
if the response to the downstream flush request includes a status code that indicates the successful completion of the request
sending a flush response that includes the status code that indicates the successful completion of the request to the upstream site.
9. The system of claim 8 further comprising:
if the downstream flush request was not successfully completed:
decrementing a retry count;
if the retry count is greater than zero:
dispatching the downstream flush request to the second file service proxy cache node again;
if the retry count equals zero:
declaring a network failure, which will initiate a search for an alternate route to the server terminator site.
10. The system of claim 8, wherein if the downstream flush request was successfully completed further comprises:
if the ALL_DATA_RECEIVED response flag is set, indicating that the server terminator site has received and accepted all flush data:
promoting the flushed data in the shadow extents.
11. The system of claim 8 further comprising:
the file service proxy cache node remaining committed to the node when the amount of flushed data is higher than can be accommodated in a single network operation until the last batch of data is successfully received.
12. The system of claim 11, wherein the request containing the last batch of data also includes a flag that indicates this is the last batch.
13. In a computing system where a data request has been passed between an upstream site and a file service proxy cache node, the file service proxy cache node being a network node located between the upstream site and the origin file system node, a non-transitory computer-readable storage medium including instructions that, when executed by the file service proxy cache node, performs the steps:
receiving a flush request from the upstream site, wherein the flush request includes:
a request to save flush data contained in the flush request to a stable memory; and
a flush level, wherein the flush level indicates how far the data should be flushed;
storing the flush data in a shadow extent;
if the file service proxy cache node is not a server terminator site:
dispatching a downstream flush request to a second file service proxy cache node, wherein the downstream flush request includes:
the flush data;
receiving a response to the downstream flush request from the second file service proxy cache node including:
if the response to the downstream flush request includes a status code that indicates the successful completion of the request
sending a flush response that includes a status code that indicates the successful completion of the request to the upstream site.
14. The system of claim 13, wherein if the downstream flush request was successfully completed further comprises:
if the ALL_DATA_RECEIVED response flag is set, indicating that the server terminator site has received and accepted all flush data:
promoting the flushed data in the shadow extents.
15. The system of claim 15, wherein promoting the shadow extents includes making the flushed data permanent.
16. The system of claim 13, wherein the response to the downstream flush request includes:
an error code if the downstream flush request was not successfully completed.
17. The system of claim 16, wherein the flush response to the upstream site includes:
an error code if the flush request was not successfully completed.
18. The system of claim 13 further comprising:
if the file service proxy cache node is the server terminator site:
determining if:
all batches of the flush have been received; and
a DATA_COMPLETE request flag was set in one of the requests.
19. The system of claim 18 further comprising:
if all batches of the flush have been received and the DATA_COMPLETE request flag was set in one of the requests:
if synchronous filesystem writes are enabled for the file identified in the flush request:
writing all data contained in shadow extents to the origin filesystem
if the filesystem write operation did not complete successfully:
responding to the upstream site with a status that conveys the write error code.
if the filesystem write operation did complete successfully:
setting the ALL_DATA_RECEIVED response flag;
promoting the flushed data in the shadow extents; and
responding to the received flush request with a status code that indicates the successful completion of the request.
20. The system of claim 19 further comprising:
if synchronous writes are not enabled for the file identified in the flush request:
setting the ALL_DATA_RECEIVED response flag;
promoting the flushed data in the shadow extents; and
responding to the received flush request with a status code that indicates the successful completion of the request.
US13/930,560 2012-06-29 2013-06-28 Distributed filesystem atomic flush transactions Abandoned US20140006543A1 (en)

Priority Applications (5)

Application Number Priority Date Filing Date Title
US13/930,560 US20140006543A1 (en) 2012-06-29 2013-06-28 Distributed filesystem atomic flush transactions
US13/959,474 US20150006649A1 (en) 2012-06-29 2013-08-05 Method of using path signatures to recover from network link failures
US15/726,116 US10440141B2 (en) 2012-06-29 2017-10-05 Distributed filesystem atomic flush transactions
US16/554,425 US11050839B2 (en) 2012-06-29 2019-08-28 Method of creating path signatures to facilitate the recovery from network link failures
US17/361,063 US11349949B2 (en) 2012-06-29 2021-06-28 Method of using path signatures to facilitate the recovery from network link failures

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US201261666597P 2012-06-29 2012-06-29
US13/930,560 US20140006543A1 (en) 2012-06-29 2013-06-28 Distributed filesystem atomic flush transactions

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
US13/930,594 Continuation US20140006464A1 (en) 2012-06-29 2013-06-28 Using projected timestamps to control the sequencing of file modifications in distributed filesystems

Related Child Applications (2)

Application Number Title Priority Date Filing Date
US13/930,537 Continuation US20140006542A1 (en) 2012-06-29 2013-06-28 Recursive ascent network link failure notifications
US15/726,116 Continuation US10440141B2 (en) 2012-06-29 2017-10-05 Distributed filesystem atomic flush transactions

Publications (1)

Publication Number Publication Date
US20140006543A1 true US20140006543A1 (en) 2014-01-02

Family

ID=49779294

Family Applications (8)

Application Number Title Priority Date Filing Date
US13/930,594 Abandoned US20140006464A1 (en) 2012-06-29 2013-06-28 Using projected timestamps to control the sequencing of file modifications in distributed filesystems
US13/930,630 Abandoned US20140006618A1 (en) 2012-06-29 2013-06-28 Method of creating path signatures to facilitate the recovery from network link failures
US13/930,560 Abandoned US20140006543A1 (en) 2012-06-29 2013-06-28 Distributed filesystem atomic flush transactions
US13/930,537 Abandoned US20140006542A1 (en) 2012-06-29 2013-06-28 Recursive ascent network link failure notifications
US13/959,474 Abandoned US20150006649A1 (en) 2012-06-29 2013-08-05 Method of using path signatures to recover from network link failures
US15/726,116 Expired - Fee Related US10440141B2 (en) 2012-06-29 2017-10-05 Distributed filesystem atomic flush transactions
US16/554,425 Active US11050839B2 (en) 2012-06-29 2019-08-28 Method of creating path signatures to facilitate the recovery from network link failures
US17/361,063 Active US11349949B2 (en) 2012-06-29 2021-06-28 Method of using path signatures to facilitate the recovery from network link failures

Family Applications Before (2)

Application Number Title Priority Date Filing Date
US13/930,594 Abandoned US20140006464A1 (en) 2012-06-29 2013-06-28 Using projected timestamps to control the sequencing of file modifications in distributed filesystems
US13/930,630 Abandoned US20140006618A1 (en) 2012-06-29 2013-06-28 Method of creating path signatures to facilitate the recovery from network link failures

Family Applications After (5)

Application Number Title Priority Date Filing Date
US13/930,537 Abandoned US20140006542A1 (en) 2012-06-29 2013-06-28 Recursive ascent network link failure notifications
US13/959,474 Abandoned US20150006649A1 (en) 2012-06-29 2013-08-05 Method of using path signatures to recover from network link failures
US15/726,116 Expired - Fee Related US10440141B2 (en) 2012-06-29 2017-10-05 Distributed filesystem atomic flush transactions
US16/554,425 Active US11050839B2 (en) 2012-06-29 2019-08-28 Method of creating path signatures to facilitate the recovery from network link failures
US17/361,063 Active US11349949B2 (en) 2012-06-29 2021-06-28 Method of using path signatures to facilitate the recovery from network link failures

Country Status (1)

Country Link
US (8) US20140006464A1 (en)

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106713470A (en) * 2016-12-29 2017-05-24 北京奇艺世纪科技有限公司 Distributed cache updating method and cache updating system
US10521135B2 (en) * 2017-02-15 2019-12-31 Amazon Technologies, Inc. Data system with data flush mechanism
US10534709B2 (en) * 2016-08-31 2020-01-14 Sandisk Technologies Llc Flush command consolidation
CN112448968A (en) * 2019-08-28 2021-03-05 华为技术有限公司 Method for processing network request, related device and storage system
US11169723B2 (en) 2019-06-28 2021-11-09 Amazon Technologies, Inc. Data storage system with metadata check-pointing
US11182096B1 (en) 2020-05-18 2021-11-23 Amazon Technologies, Inc. Data storage system with configurable durability
US11301144B2 (en) 2016-12-28 2022-04-12 Amazon Technologies, Inc. Data storage system
US11444641B2 (en) 2016-12-28 2022-09-13 Amazon Technologies, Inc. Data storage system with enforced fencing
US11467732B2 (en) 2016-12-28 2022-10-11 Amazon Technologies, Inc. Data storage system with multiple durability levels
US11681443B1 (en) 2020-08-28 2023-06-20 Amazon Technologies, Inc. Durable data storage with snapshot storage space optimization

Families Citing this family (24)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8527646B2 (en) * 2009-04-14 2013-09-03 Avid Technology Canada Corp. Rendering in a multi-user video editing system
US9407443B2 (en) 2012-06-05 2016-08-02 Lookout, Inc. Component analysis of software applications on computing devices
US10235382B2 (en) * 2013-11-12 2019-03-19 Red Hat, Inc. Transferring objects between different storage devices based on timestamps
US9679033B2 (en) * 2014-03-21 2017-06-13 International Business Machines Corporation Run time insertion and removal of buffer operators
US10554662B2 (en) * 2015-06-30 2020-02-04 Mcafee, Llc Security service for an unmanaged device
US10152481B1 (en) * 2015-12-03 2018-12-11 EMC IP Holding Company LLC Technique to scale out namespace
US10970311B2 (en) * 2015-12-07 2021-04-06 International Business Machines Corporation Scalable snapshot isolation on non-transactional NoSQL
US10394775B2 (en) 2015-12-28 2019-08-27 International Business Machines Corporation Order constraint for transaction processing with snapshot isolation on non-transactional NoSQL servers
CN105677101A (en) * 2016-01-06 2016-06-15 京东方科技集团股份有限公司 OGS touch screen, OGS touch screen manufacturing method and display device
CN106791889B (en) * 2016-12-27 2019-07-09 北京奇艺世纪科技有限公司 A kind of method for processing video frequency and system, distributed objects storage system
WO2018129706A1 (en) * 2017-01-13 2018-07-19 Oracle International Corporation System and method for conditional call path monitoring in a distributed transactional middleware environment
US10218697B2 (en) * 2017-06-09 2019-02-26 Lookout, Inc. Use of device risk evaluation to manage access to services
CN108009910A (en) * 2017-11-10 2018-05-08 深圳前海微众银行股份有限公司 Bank's remittance road monitoring method, system and computer-readable recording medium
US10977218B1 (en) * 2018-05-18 2021-04-13 Amazon Technologies, Inc. Distributed application development
US20190354482A1 (en) * 2018-05-21 2019-11-21 Microsoft Technology Licensing, Llc Time-based mechanism supporting flush operation
US11418603B2 (en) * 2018-08-02 2022-08-16 International Business Machines Corporation Dynamic backoff and retry attempts based on incoming request
US10795824B2 (en) * 2018-11-21 2020-10-06 International Business Machines Corporation Speculative data return concurrent to an exclusive invalidate request
US10887262B1 (en) * 2019-01-25 2021-01-05 Twitter, Inc. Visibility filtering
US11307768B2 (en) 2019-06-07 2022-04-19 Seagate Technology Llc Namespace auto-routing data storage system
CN111211925B (en) * 2019-12-27 2022-11-29 京信网络系统股份有限公司 Alarm information synchronization method and device, computer equipment and storage medium
US11893064B2 (en) * 2020-02-05 2024-02-06 EMC IP Holding Company LLC Reliably maintaining strict consistency in cluster wide state of opened files in a distributed file system cluster exposing a global namespace
CN111506613A (en) * 2020-04-22 2020-08-07 支付宝(杭州)信息技术有限公司 Method, system, device and equipment for querying incidence relation of data record
CN111782614A (en) * 2020-06-23 2020-10-16 北京青云科技股份有限公司 Data access method, device, equipment and storage medium
CN114297172B (en) * 2022-01-04 2022-07-12 北京乐讯科技有限公司 Cloud-native-based distributed file system

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030043797A1 (en) * 2001-07-06 2003-03-06 Abbas Rashid Cross-bar switch
US20040054777A1 (en) * 2002-09-16 2004-03-18 Emmanuel Ackaouy Apparatus and method for a proxy cache
US20040103261A1 (en) * 2002-11-25 2004-05-27 Hitachi, Ltd. Virtualization controller and data transfer control method
US20040193858A1 (en) * 2003-03-24 2004-09-30 Infineon Technologies North America Corp. Zero-overhead loop operation in microprocessor having instruction buffer
US20060031602A1 (en) * 2004-08-09 2006-02-09 Ellis Jackson L Scalable architecture for context execution
US20070208757A1 (en) * 2000-12-18 2007-09-06 Kazar Michael L Mechanism for handling file level and block level remote file accesses using the same server
US20080027920A1 (en) * 2006-07-26 2008-01-31 Microsoft Corporation Data processing over very large databases
US20120150804A1 (en) * 2010-12-08 2012-06-14 International Business Machines Corporation Multiple contexts in a redirect on write file system
US20120166891A1 (en) * 2010-12-22 2012-06-28 Dahlen Eric J Two-level system main memory
US8417681B1 (en) * 2001-01-11 2013-04-09 F5 Networks, Inc. Aggregated lock management for locking aggregated files in a switched file system

Family Cites Families (28)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4697232A (en) * 1984-11-30 1987-09-29 Storage Technology Corporation I/O device reconnection in a multiple-CPU, dynamic path allocation environment
US5611049A (en) * 1992-06-03 1997-03-11 Pitts; William M. System for accessing distributed data cache channel at each network node to pass requests and data
US5475834A (en) * 1992-10-26 1995-12-12 International Business Machines Corporation Integration of migration level two and backup tape processing using multiple inventory entries
US5835720A (en) * 1996-05-17 1998-11-10 Sun Microsystems, Inc. IP discovery apparatus and method
US20060129627A1 (en) * 1996-11-22 2006-06-15 Mangosoft Corp. Internet-based shared file service with native PC client access and semantics and distributed version control
US6473401B1 (en) * 1998-04-06 2002-10-29 Iscale, Inc. Self-scaling method for exploiting cached resources across organizational boundaries to enhance user response time and to reduce server and network load
US6366584B1 (en) * 1999-02-06 2002-04-02 Triton Network Systems, Inc. Commercial network based on point to point radios
WO2001014959A2 (en) * 1999-08-16 2001-03-01 Z-Force Corporation System of reusable software parts and methods of use
EP1388073B1 (en) * 2001-03-01 2018-01-10 Akamai Technologies, Inc. Optimal route selection in a content delivery network
US7219125B1 (en) * 2002-02-13 2007-05-15 Cisco Technology, Inc. Method and apparatus for masking version differences in applications using a data object exchange protocol
US7127498B2 (en) * 2002-09-16 2006-10-24 Hewlett-Packard Development Company, L.P. Software application domain and storage domain constraining process and method
US7363540B2 (en) * 2002-10-22 2008-04-22 Microsoft Corporation Transaction-safe FAT file system improvements
US7653059B1 (en) * 2002-12-20 2010-01-26 Symantec Operating Corporation Communication sessions for a computer network
WO2004075494A1 (en) * 2003-02-21 2004-09-02 Nippon Telegraph And Telephone Corporation Device and method for correcting a path trouble in a communication network
US20050010685A1 (en) * 2003-07-11 2005-01-13 Agency For Science, Technology And Research Method and a system for enabling data to be stored in a computer network; a method and a system for storing data in a computer network
US8171562B2 (en) * 2003-08-26 2012-05-01 Oregon Health & Science University System and methods for protecting against denial of service attacks
US7209934B2 (en) * 2003-11-21 2007-04-24 Bellsouth Intellectual Property Corporation Methods, systems and computer program products for monitoring files
US7477621B1 (en) * 2004-09-07 2009-01-13 Marvell International Ltd. System and method for estimating bandwidth requirements of and allocating bandwidth to communication devices operating in a network
US8090880B2 (en) * 2006-11-09 2012-01-03 Microsoft Corporation Data consistency within a federation infrastructure
US7408911B2 (en) * 2004-11-08 2008-08-05 Meshnetworks, Inc. System and method to decrease the route convergence time and find optimal routes in a wireless communication network
US7730258B1 (en) * 2005-11-01 2010-06-01 Netapp, Inc. System and method for managing hard and soft lock state information in a distributed storage system environment
US7950046B2 (en) * 2007-03-30 2011-05-24 Uranus International Limited Method, apparatus, system, medium, and signals for intercepting a multiple-party communication
US8289845B1 (en) * 2007-05-15 2012-10-16 Avaya Inc. Assured path optimization
US8793307B2 (en) * 2009-01-28 2014-07-29 Blue Coat Systems, Inc. Content associative caching method for web applications
US20120089700A1 (en) * 2010-10-10 2012-04-12 Contendo, Inc. Proxy server configured for hierarchical caching and dynamic site acceleration and custom object and associated method
KR20120071953A (en) * 2010-12-23 2012-07-03 한국전자통신연구원 Method for transmitting routing information and routing apparatus in wireless network
US20130148658A1 (en) * 2011-12-09 2013-06-13 Raytheon Bbn Technologies Corp Systems and methods for scalable multicast communication using self-rooted forwarding trees
US9935781B2 (en) * 2012-01-20 2018-04-03 Arris Enterprises Llc Managing a large network using a single point of configuration

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070208757A1 (en) * 2000-12-18 2007-09-06 Kazar Michael L Mechanism for handling file level and block level remote file accesses using the same server
US8417681B1 (en) * 2001-01-11 2013-04-09 F5 Networks, Inc. Aggregated lock management for locking aggregated files in a switched file system
US20030043797A1 (en) * 2001-07-06 2003-03-06 Abbas Rashid Cross-bar switch
US20040054777A1 (en) * 2002-09-16 2004-03-18 Emmanuel Ackaouy Apparatus and method for a proxy cache
US20040103261A1 (en) * 2002-11-25 2004-05-27 Hitachi, Ltd. Virtualization controller and data transfer control method
US20040193858A1 (en) * 2003-03-24 2004-09-30 Infineon Technologies North America Corp. Zero-overhead loop operation in microprocessor having instruction buffer
US20060031602A1 (en) * 2004-08-09 2006-02-09 Ellis Jackson L Scalable architecture for context execution
US20080027920A1 (en) * 2006-07-26 2008-01-31 Microsoft Corporation Data processing over very large databases
US20120150804A1 (en) * 2010-12-08 2012-06-14 International Business Machines Corporation Multiple contexts in a redirect on write file system
US20120166891A1 (en) * 2010-12-22 2012-06-28 Dahlen Eric J Two-level system main memory

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10534709B2 (en) * 2016-08-31 2020-01-14 Sandisk Technologies Llc Flush command consolidation
US11301144B2 (en) 2016-12-28 2022-04-12 Amazon Technologies, Inc. Data storage system
US11444641B2 (en) 2016-12-28 2022-09-13 Amazon Technologies, Inc. Data storage system with enforced fencing
US11467732B2 (en) 2016-12-28 2022-10-11 Amazon Technologies, Inc. Data storage system with multiple durability levels
CN106713470A (en) * 2016-12-29 2017-05-24 北京奇艺世纪科技有限公司 Distributed cache updating method and cache updating system
US10521135B2 (en) * 2017-02-15 2019-12-31 Amazon Technologies, Inc. Data system with data flush mechanism
US11169723B2 (en) 2019-06-28 2021-11-09 Amazon Technologies, Inc. Data storage system with metadata check-pointing
CN112448968A (en) * 2019-08-28 2021-03-05 华为技术有限公司 Method for processing network request, related device and storage system
US11182096B1 (en) 2020-05-18 2021-11-23 Amazon Technologies, Inc. Data storage system with configurable durability
US11853587B2 (en) 2020-05-18 2023-12-26 Amazon Technologies, Inc. Data storage system with configurable durability
US11681443B1 (en) 2020-08-28 2023-06-20 Amazon Technologies, Inc. Durable data storage with snapshot storage space optimization

Also Published As

Publication number Publication date
US20150006649A1 (en) 2015-01-01
US20140006464A1 (en) 2014-01-02
US10440141B2 (en) 2019-10-08
US11050839B2 (en) 2021-06-29
US20140006542A1 (en) 2014-01-02
US20210329092A1 (en) 2021-10-21
US20180048733A1 (en) 2018-02-15
US20140006618A1 (en) 2014-01-02
US20190394297A1 (en) 2019-12-26
US11349949B2 (en) 2022-05-31

Similar Documents

Publication Publication Date Title
US11349949B2 (en) Method of using path signatures to facilitate the recovery from network link failures
US7254636B1 (en) Method and apparatus for transparent distributed network-attached storage with web cache communication protocol/anycast and file handle redundancy
US7500020B1 (en) Coherency of replicas for a distributed file sharing system
US20180011874A1 (en) Peer-to-peer redundant file server system and methods
US8311980B2 (en) Namespace consistency for a wide-area file system
US7653668B1 (en) Fault tolerant multi-stage data replication with relaxed coherency guarantees
US7383463B2 (en) Internet protocol based disaster recovery of a server
US7734820B1 (en) Adaptive caching for a distributed file sharing system
US8180747B2 (en) Load sharing cluster file systems
US8914429B2 (en) Method for creating global distributed namespace
US7822711B1 (en) Conflict resolution for a distributed file sharing system
US7546364B2 (en) Replication of remote copy data for internet protocol (IP) transmission
US9286298B1 (en) Methods for enhancing management of backup data sets and devices thereof
US7433928B1 (en) System pre-allocating data object replicas for a distributed file sharing system
US20060168145A1 (en) Method for creating a secure and reliable content distribution framework
Marwah Enhanced server fault-tolerance techniques for improved user experience

Legal Events

Date Code Title Description
STCB Information on status: application discontinuation

Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION