4 High Availability

This chapter explains how to design high availability into the database and database applications.

Topics:

4.1 Failover and Query Replay

This topic describes the following topics:

4.1.1 Transparent Application Failover (TAF)

This section describes what Transparent Application Failover (TAF) is, how to configure TAF, and using TAF callbacks to notify the application of events as they are generated.

Topics:

4.1.1.1 About Transparent Application Failover

Transparent Application Failover (TAF) is a client-side feature of OCI, OCCI, Java Database Connectivity (JDBC) OCI driver, and ODP.NET designed to minimize disruptions to end-user applications that occur when database connectivity fails because of instance or network failure. TAF can be implemented on a variety of system configurations including Oracle Real Application Clusters (Oracle RAC), Oracle Data Guard physical standby databases, and on a single instance system after it restarts (for example, when repairs are made).

TAF enables client applications to automatically (transparently) reconnect to a preconfigured secondary instance, creating a fresh connection, but identical to the connection that was established on the first original instance. That is, the connection properties are the same as that of the earlier connection, regardless of how the connection was lost. In this case, the active transactions roll back. Also, all statements that an application attempts to use after a failure attempt also failover.

See Also:

4.1.1.2 Configuring Transparent Application Failover

TAF can be configured on both the client side and server side with the server side taking precedence if both client and server sides are configured. On the client side, you configure TAF by including the FAILOVER_MODE parameter in the CONNECT_DATA portion of a connect descriptor. On the server side, you configure TAF by modifying the target service with the DBMS_SERVICE.MODIFY_SERVICE packaged procedure.

See the references in Section 4.1.1.1 for more configuration information.

4.1.1.3 Using Transparent Application Failover Callbacks

TAF callbacks are callbacks that are registered in case of failover and called during failover to notify the application of events as they are generated. They are called several times while reestablishing the user's session.

As the application developer you may want to inform the user that failover is in progress because there is a slight delay as failover proceeds. The first call to the callback carries out that function. Also, when failover is successful and the connection is reestablished, you may want to inform the user that failover has happened and then you may want to replay ALTER SESSION commands because these commands are not automatically replayed on the second instance. A subsequent call to the callback performs that function. Also, if failover is unsuccessful, then you want to inform the application that failover cannot occur. A third call to the callback performs this function as well.

Using TAF callbacks makes possible:

  • Notifying users of the status of failover throughout the failover process; when failover is underway, when failover is successful, and when failover is unsuccessful

  • Replaying of ALTER SESSION commands when that is needed

  • Reauthenticating a user handle besides the primary handle for each time a session begins on the new connection. Because each user handle represents a server-side session, the client may want to replay ALTER SESSION commands for that session.

See the references in Section 4.1.1.2 for specific callback registration information for each interface; such as, the structure and parameters of the callback for each interface, and for failover callback examples.

4.1.2 Application Continuity

Beginning with Oracle Database 12c Release 1 (12.1.0.1), Oracle Database supports Application Continuity for Java. This feature supports the concept of commit outcome and recovery of active requests allowing JDBC-based applications in high availability (HA) infrastructures to avoid application errors from system failures by providing an immediate implicit recovery and resubmission of all active requests as soon as failover occurs and connectivity is restored to database services. See Section 4.3, "Application Continuity and Transaction Guard," for more information.

See Also:

Chapter 26, "Ensuring Application Continuity," for more information about Application Continuity

4.2 Fast Application Notification (FAN) and Fast Connection Failover (FCF)

This section describes what Fast Application Notification (FAN) and Fast Connection Failover (FCF) are and how applications can respond to FAN events in a high availability environment and use FCF to relocate connections after a failover.

Topics:

4.2.1 About Fast Application Notification (FAN)

An important component of high availability is a notification mechanism called Fast Application Notification (FAN). FAN notifies other processes about configuration and service level information that includes service status changes, such as UP or DOWN events. Applications can respond to FAN events and take immediate action. FAN UP and DOWN events can apply to instances, services, and nodes.

FAN provides the ability to immediately terminate an active transaction when an instance or server fails. FAN integrated Oracle clients receive the events and respond. Applications can respond either by propagating the error to the user or by resubmitting the transactions and masking the error from the application user. When a DOWN event occurs, FAN integrated clients immediately clean up connections to the terminated database. When an UP event occurs, the FAN integrated clients create new connections to the new primary database instance.

Oracle has integrated FAN with many of the common Oracle client drivers. Therefore, the easiest way to use FAN is to use an integrated Oracle client. You can use the Oracle Connection Manager (CMAN) session pools, OCI, Universal Connection Pool for Java, JDBC simplefan API, and ODP.NET connection pools. The overall goal is to enable applications to consistently obtain connections to the available primary database at anytime.

FAN events are published using Oracle Notification Service and Oracle Streams Advanced Queuing. The publication mechanisms are automatically configured as part of an Oracle RAC installation. Here, an Oracle RAC installation means any installation of Oracle Clusterware with Oracle RAC, Oracle RAC One Node, Oracle Data Guard (fast-start-failover), or Oracle Data Guard single instance with Oracle Clusterware). Beginning with Oracle Database 12c Release 1 (12.1), ONS is the primary notification mechanism for a new client (Oracle Database 12c Release 1 (12.1)) and a new server (Oracle Database 12c Release 1 (12.1)), while the AQ HA notification feature is deprecated and maintained only for backward compatibility when there is an old client (pre-Oracle Database 12c Release 1 (12.1)) or old server (pre-Oracle Database 12c Release 1 (12.1)).

Applications can use FAN programmatically by using the JDBC and Oracle RAC FAN application programming interface (API) or by using callbacks with OCI to subscribe to FAN events and to execute event handling actions upon the receipt of an event.

When you use JDBC or Oracle Database 12c Release 1 (12.1.0.1) OCI or ODP.Net clients, you must create an Oracle Notification Services (ONS) network that is running on the server. When you use Oracle Database 10g or Oracle Database 11g OCI or ODP.NET clients, you must enable Oracle Advanced Queuing (AQ) HA notifications for your services.

See Also:

4.2.2 About Fast Connection Failover (FCF)

In a configuration with a standby database, after you have added Oracle Notification Services (ONS) to your Oracle Restart configurations and enabled Oracle Advanced Queuing (AQ) HA notifications for your services, you can enable clients for Fast Connection Failover (FCF). The clients then receive FAN events and can relocate connections to the current primary database after an Oracle Data Guard failover. Beginning with Oracle Database 12c Release 1 (12.1), ONS is the primary notification mechanism for a new client (Oracle Database 12c Release 1 (12.1)) and a new server (Oracle Database 12c Release 1 (12.1)), while the AQ HA notification feature is deprecated and maintained only for backward compatibility when there is an old client (pre-Oracle Database 12c Release 1 (12.1)) or old server (pre-Oracle Database 12c Release 1 (12.1)).

For databases with no standby database configured, you can still configure the client FAN events. When there is an outage (planned or unplanned), you can configure the client to retry the connection to the database. Because Oracle Restart restarts the failed database, the client can reconnect when the database restarts.

You must enable FAN events to provide FAN integrated clients support for FCF in an Oracle Data Guard or standalone environment with no standby database.

FCF offers a driver-independent way for your Java Database Connectivity (JDBC) application to take advantage of the connection failover facilities offered by Oracle Database. FCF is integrated with implicit connection cache and Oracle RAC to provide high availability event notification.

OCI clients can enable FCF by registering to receive notifications about Oracle Restart high availability FAN events and respond when events occur. This improves the session failover response time in OCI and removes terminated connections from connection and session pools. This feature works on OCI applications, including those that use Transparent Application Failover (TAF), connection pools, or session pools.

See Also:

4.3 Application Continuity and Transaction Guard

In Oracle Database 10g Release 2 (10.2) HA framework, JDBC clients, OCI clients, and ODP.NET clients support fast application notification (FAN) messages. FAN is designed to quickly notify an application of outages at the node, database, instance, service, and public network levels. After being notified of the failure, an application can reestablish the failed connection on a surviving instance. However, for releases before Oracle Database 12c Release 1 (12.1.0.1) there was no ability to reliably determine the outcome of the failed transaction and the ability to recover the active transaction after database connectivity had been restored.

Beginning with Oracle Database 12c Release 1 (12.1.0.1), the DBA can configure server-side settings for the database services used by the applications to support two new features Application Continuity for Java and Transaction Guard. See Section 26.2.2.1, "When Is Application Continuity Transparent?," for information about when Application Continuity for Java is transparent (performed automatically) and when it is not transparent and what infrastructure changes may be needed when it is not transparent.

See Also:

4.3.1 Transaction Guard

Transaction Guard is a protocol and developer API supported for JDBC Type 4 (Oracle Thin), OCI, OCCI, and ODP.Net drivers.

Transaction Guard introduces the concept of at-most-once transaction semantics, also referred to as transaction idempotence. When an application opens a connection to the database using this service, the logical transaction ID (LTXID) is generated at authentication and stored in the session handle at the database and a copy at the client driver. This is a globally unique ID that identifies the database transaction from the application perspective. Applications use the Transaction Guard APIs to obtain a known outcome following a recoverable error.

When there is an outage, an application using Transaction Guard can retrieve the LTXID from the previous failed session's handle and use it to determine the outcome of the transaction that was active prior to the session failure. If the LTXID is determined to be UNCOMMITTED, then the application can return the UNCOMMITTED outcome to the user to decide what action to take, or optionally, the application can replay an uncommitted transaction. If the LTXID is determined to be COMMITTED, then the transaction is committed and the application can return this outcome to the end user and might be able to take a new connection and continue. Transaction Guard also reports whether the last user call not only COMMITTED, but also whether it completed changing needed nontransactional states - see USER_CALL_COMPLETED. See Chapter 25 for more information about Transaction Guard.

4.3.2 Application Continuity for Java

Application Continuity for Java masks planned or unplanned outages (that cause database session unavailability) by attempting to rebuild the database session transactional and nontransactional states, so the outage appears to the user as no more than a delayed execution.

Application Continuity for Java works with the Oracle Database 12c Release 1 (12.1.0.1) server to determine if the database session can be replayed.

When a recoverable error occurs that makes the database session unavailable, an error message is sent back to the application. A driver receives the FAN message (down or interrupt), aborts the dead session, initiates reconnect and reauthenticates, checks out a new session and uses the LTXID of the dead session to determine the last outcome of the session at the outage. If an optional callback has been registered, the JDBC replay driver initializes the connection using this callback to restore initial nontransactional session state (NTSS), then continues replaying the request.

Application Continuity prepares replay by using Transaction Guard to determine the outcome of the last operation submitted by the session that received the error. If the submission committed and completed, the new session returns this result to the application and continues with the nontransactional state established if the SESSION_STATE_CONSISTENCY mode is STATIC, or exits if the SESSION_STATE_CONSISTENCY mode is DYNAMIC. DYNAMIC session state consistency is appropriate for most applications.

If the last submission has replay enabled, the replay driver prepares to replay the submission and replays the saved statements for the request. When replaying, preserved mutable data are restored if permission has been granted. Validation is performed at the server to ensure that the client-visible results are identical to the original submission. When replay is complete, the application proceeds with its application logic returning to runtime mode as if all that occurred was a delay in execution similar to that which happens under load.

In some cases, replay cannot restore the data that the client potentially made decisions upon. The replay then returns the original error to the application and appears like a delayed error.

Application Continuity supports recovering any outage that is due to database unavailability against a copy of a database with the same DBID (forward in time) or within an Active Data Guard farm. This may be Oracle RAC One, Oracle Real Application Clusters, Data Guard, or Active Data Guard.

See Section 26.2.2.1, "When Is Application Continuity Transparent?," for information about when Application Continuity for Java is transparent (performed automatically) and when it is not transparent and what infrastructure changes may be needed when it is not transparent. See Section 26.3, "Potential Side Effects of Application Continuity," for some replay side effects that you may need to mitigate. See Section 26.4, "Restrictions and Other Considerations for Application Continuity," for replay restrictions.

4.4 Service and Load Management for Database Clouds

This section describes the service and load management framework. It also explains how OCI based applications in a service and load management framework environment utilize fast application notification (FAN) messages to respond to outages at the node, database, instance, service, and public network levels. After being notified of the outage, an application can reestablish the failed connection on a surviving instance in this framework.

Topics:

4.4.1 About Service and Load Management for Database Clouds

The database cloud is a self-contained system of databases integrated by the service and load management framework that ensures high performance, availability and optimal utilization of resources. This framework provides effective balancing of processing workload across distributed databases that maintain multiple synchronized replicas both locally and in geographically disparate regional data centers. Replicas may be instances in an Oracle RAC environment, or single instances interconnected using Oracle Data Guard, Oracle Golden Gate, or any combination that supports replication technology. Thus, the service and load management framework provides dynamic load balancing, failover, and centralized service management for these replicated databases.

A global service is a database service provided by multiple databases synchronized through some form of data replication that satisfies quality of service requirements for the service. This allows a client request for a service to be forwarded to any database that provides that service.

A database pool within a database cloud consists of all databases that provide the same global service that belong to the same administrative domain. The database cloud is partitioned into multiple database pools to simplify service management and to provide higher levels of security by allowing each pool to be administered by a different administrator.

A global service manager (GSM) is a software component that provides service-level load balancing and centralized management of services within the database cloud. Load balancing is done at connection and runtime. Other capabilities provided by GSM include creation, configuration, starting, stopping, and relocation of services and maintaining global service properties such as cardinality and region locality. A region is a logical boundary known as a data center that contains database clients and servers that are considered close enough to each other so as to reduce network latency to levels required by applications accessing these data centers.

The GSM can run on a separate host, or can be colocated with a database instance. Every region must have at least one GSM instance running. For high availability, Oracle recommends that you deploy multiple GSM instances in a region. A GSM instance belongs to only one particular region; however, it manages global services in all database pools associated with this region.

From an application developer's perspective, a client program connects to a regional global service manager (GSM) and requests a connection to a global service. The client need not specify which database or instance it requires. GSM forwards the client's request to the optimal instance within a database pool in the database cloud that offers the service.

Beginning with Oracle Database 12c Release 1 (12.1.0.1), the DBA can configure client-side connect strings for database services in a Global Data Services (GDS) framework using an Oracle Net string.

Introduced in Oracle Database 12c Release 1 (12.1.0.1), the logical transaction ID (LTXID) is initially generated at authentication and stored in the session handle and used to identify a database transaction from the application perspective. The logical transaction ID is globally unique and identifies the transaction within a highly available (HA) infrastructure.

Using the HA Framework, a client application (JDBC, OCI, and ODP.NET) supports fast application notification (FAN) messages. FAN is designed to quickly notify an application of outages at the node, database, instance, service, and public network levels. After being notified of the outage, an application can reestablish the failed connection on a surviving instance.

Beginning with Oracle Database 12c Release 1 (12.1.0.1), the DBA can configure server-side settings for the database services used by the applications to support Application Continuity for Java and Transaction Guard.

See Also: