.. include:: ../macros.inc

.. index:: replication, replication; overview

.. _#ha-replication:

.. _replication:

Replicating Messages Between Instances
======================================

AMPS has the ability to replicate messages to downstream AMPS instances
once those messages are stored to a transaction log. Replication in AMPS
involves the configuration of two or more instances designed to share
some or all of the published messages. With AMPS replication, an
upstream (or source) instance delivers messages to
a downstream instance.

Replication is typically used to improve the availability of a
set of AMPS instances by creating a set of servers that hold
the same messages, where each server can take over for the
others in the event of a network or server failure.

Since AMPS replication operates on individual messages,
replication is also an efficient way to split and share message
streams between multiple sites where each downstream site
may only want a subset of the messages from the upstream
instances.

AMPS replication uses a leaderless, "all nodes hot" model. Any instance
in a replication fabric can accept publishes and updates at any time,
and there is no need for a message to be replicated downstream before
it is delivered to subscribers (unless a subscriber explicitly
requests otherwise using the ``fully_durable`` option on a
bookmark replay).

.. tip::

  The only communication between instances of AMPS is
  through replication. AMPS instances do not share
  state through the filesystem or any out-of-band
  communication.

  AMPS high availability and replication do not rely
  on a quorum or a controller instance. Each instance
  of AMPS processes messages independently. Each
  instance of AMPS manages connections and subscriptions
  locally.

AMPS supports two forms of message acknowledgment for replication links:
*synchronous* and *asynchronous*; these settings control when AMPS considers
a message persisted. This controls when AMPS sends publishers
``persisted`` acknowledgments and the point at which AMPS considers
a message to be fully persisted for bookmark subscribers. These settings do not
affect when or how messages are replicated, or when or how messages are
delivered to subscribers unless a subscriber explicitly requests this
behavior. These settings only affect when AMPS acknowledges to the publisher
that the message has been persisted. They do not affect the speed of
replication, the priority of replication, or the guarantees that AMPS
makes for ensuring that all replicated messages are acknowledged by
the downstream instance before they can be removed from the transaction
log.

AMPS replication consists of a message stream (or, more precisely, a
command stream) provided to downstream instances. AMPS replicates
the messages produced as a result of ``publish`` and ``delta_publish``
and replicates ``sow_delete`` commands. AMPS does not
replicate messages produced internally by AMPS, such as the results of
``Views`` or updates sent to a ``ConflatedTopic``. When replicating
queues, AMPS also uses the replication connection to send and receive
administrative commands related to queues, as described in the section
on :ref:`Replicated Queues <#ug-replicated-queues>`.

.. important::

      60East recommends that any server that participates in replication
      *and* that accepts publishes, SOW deletes, or queue acknowledgments
      directly from applications is configured with at least
      one ``sync`` replication destination.

      Without at least one destination that uses ``sync`` acknowledgment,
      an AMPS instance could be a single point of failure, resulting in
      possible message loss if the instance has an unrecoverable failure
      (such as a hardware failure)  between the time that
      the server acknowledges a command to a client, and the time that
      a replication destination receives and persists the command.

Notice that when replicating the results of a ``delta_publish`` command, and
publishes to a topic that provides preprocessing and enrichment, AMPS
replicates the fully processed and merged message -- exactly the information
that is written to the local transaction log and sent to subscribers. AMPS does
*not* save the original command to the transaction log or replicate the original
command. Instead, AMPS stores the message produced by the command and replicates that
message.

Replication Basics
-------------------

Before planning an AMPS replication topology, it can be helpful to understand
the basics of how AMPS replication works. This section presents a general
overview of the concepts that are discussed in more detail in the following
sections.

* **Replication is point-to-point**. Each replication connection involves exactly two
  AMPS instances: a source (that provides messages) and a destination that receives
  messages.
* **Replication is always "push" replication**. In AMPS, the source configures a
  destination, and pushes messages to that destination. (Notice that it is possible
  to configure the source to wait for the destination to connect rather than
  actively making an outgoing connection, but replication is still a "push" from
  the source to the destination once that connection is made). The source must be
  configured to push messages to the destination, and
  the source guarantees that all messages to be replicated must be acknowledged
  by the destination before they can be removed from the transaction log.
* **Replication is one-link by default**. By default, an instance of AMPS
  *only* replicates messages that are published to that instance by a client.
  Optionally, an instance of AMPS can be configured to replicate messages
  that arrive over replication. Adding this configuration is typically
  required if there are more than two instances in a replicated set
  of AMPS instances. See :ref:`#ug-ha-passthrough`
  for details.
* **Replication relies on the transaction log**. AMPS replicates the commands
  as preserved in the transaction log. This means, for example, that the
  results of delta publishes are replicated as fully-merged messages,
  since fully-merged messages are stored in the transaction log. Likewise,
  if duplicate messages arrive over different paths, only the first message
  to arrive will be stored in the transaction log, and that message is
  the one that will be replicated.
* **Replication provides a command stream**. In AMPS replication, the server
  replicates the results of ``publish``, ``delta_publish`` and ``sow_delete`` commands
  once those results are written to the transaction log. Each individual command is
  replicated, for low latency and fine-grained control of what is replicated. If a
  command is not in the transaction log (for example, a maintenance action has
  removed the journal that contains that command, or the command is for a topic
  that is not recorded in the transaction log), that command will not be replicated.

  Replication is intended to guarantee that the command stream for a set of topics
  on one instance is present on the other instance, with the ordering of each
  message source preserved. This means that there can be only *one* connection
  from a given upstream instance to a given downstream instance.
* **Replication is customizable by topic, message type, and content**. AMPS can be configured to
  replicate the entire transaction log, or any subset of the transaction log. This makes it
  easy to use replication to populate view servers, test environments, or similar instances
  that require only partial views of the source data.
* **Replication guarantees delivery**. AMPS will not remove a journal file until
  *all* messages in that journal file have been replicated to, and acknowledged by,
  the destination.
* **Replication is composable**. AMPS is capable of building a sophisticated replication
  topology by composing connections. For example, full replication between two servers
  is two point-to-point connections, one in each direction. The basic point-to-point
  nature of connections makes it easy to reason about a single connection, and the
  composable nature of AMPS replication allows you to build replication networks
  that provide data distribution and high availability for applications across
  data centers and around the globe.
* **Replication acknowledgment is configurable**. The acknowledgment mode
  provides different guarantees: *async* acknowledgment provides durability
  guarantees for the local instance, whereas *sync* acknowledgment provides
  durability guarantees for the local instance and the downstream instance.
* **Group** identifies a set of instances that are intended to be fully
  equivalent for the purposes of message contents, application failover,
  and AMPS replication failover. Instances that are not intended to be
  fully equivalent for all of these purposes should be given a different
  ``Group`` name, even if they are in the same data center or geographic location,
  or if they would be treated as equivalent for some, but not all, purposes.

More details on each of these points is provided in this section.

Benefits of Replication
^^^^^^^^^^^^^^^^^^^^^^^^

.. index:: replication; benefits,

Replication can serve two purposes in AMPS:

1. It can increase the fault-tolerance of AMPS by creating a spare
   instance to cut over to when the primary instance fails.
2. Replication can be used in message delivery to a remote site.

In order to provide fault tolerance and reliable remote site message
delivery, for the best possible messaging experience, there are some
guarantees and features that AMPS has implemented. Those features are
discussed below.

Replication in AMPS supports filtering by both topic and by message
content. This granularity in filtering allows replication sources to
have complete control over what messages are sent to their downstream
replication instances.

Additionally, replication can improve availability of AMPS by creating a
redundant instance of an AMPS server. Using replication, all of the
messages which flow into a primary instance of AMPS can be replicated to
a secondary spare instance. This way, if the primary instance should
become unresponsive for any reason, then the secondary AMPS instance can
be swapped in to begin processing message streams and requests.

.. tip::

   When an AMPS instance is a replication source, that instance
   *guarantees* that messages will not be removed from the
   transaction log until all destinations have acknowledged
   the message.


Configuring Replication
------------------------

.. index:: replication; configuration,

Replication configuration involves the configuration of two or more
instances of AMPS. For testing purposes both instances of AMPS can
reside on the same physical host before deployment into a production
environment. When running both instances on one machine, the performance
characteristics will differ from production, so running both instances
on one machine is more useful for testing configuration correctness than
testing overall performance.

Any instance that is intended to receive messages via replication
must define an incoming replication transport as one of the
``Transports`` for the instance. An instance may have only one
incoming replication transport.

Any instance that is intended to replicate messages to another
instance must specify a ``Replication`` stanza in the
configuration file with at least one ``Destination``. An
instance can have multiple ``Destination`` declarations: each
one defines a single outgoing replication connection.

In AMPS replication, instances should only be configured as part
of the same ``Group`` if they are fully equivalent. That is, not
only should they contain the same messages, but they should be
considered failover alternatives for applications and other
AMPS servers. If two servers are not intended to be fully replicated
(for example, if there is one-way replication between a production
server and a test server), they should have different ``Group``
values.

.. caution::

  It's important to make sure that when running multiple AMPS
  instances on the same host there are no conflicting ports.
  AMPS will emit an error message and will not start properly if
  it detects that a port specified in the configuration file is
  already in use.

For the purposes of explaining this example, we're going to assume a
simple hot-hot replication case where we have two instances of
AMPS - the first host is named ``amps-1`` and the second host is named
``amps-2``. Each of the instances are configured to replicate data to
the other. That is, all messages published to ``amps-1`` are
replicated to ``amps-2`` and vice versa. This configuration ensures that
a message published to one instance is available on the other instance in
the case of a failover (although, of course, the publishers and subscribers
should also be configured for failover).

.. caution::

      Every instance of AMPS that will participate in replication
      must have a unique ``Name`` among *all* of the instances that
      are part of replication.

      All instances that have the same ``Group`` must be able to
      be treated as equivalent by AMPS replication and AMPS
      clients.

      If two instances of AMPS should be treated differently
      (for example, one instance receives publishes while the other
      is a read-only instance that receives one-way replication),
      those instances should be in different groups.

We will first show the relevant portion of the configuration used in
``amps-1``, and then we will show the relevant configuration for
``amps-2``.

.. tip::

       All topics to be replicated must be recorded in the transaction
       log. The examples below omit the transaction log configuration
       for brevity. Please reference the Transaction Log chapter for
       information on how to configure a transaction log for a topic.

.. code-block:: xml

    <AMPSConfig>
        <Name>amps-1</Name>
        <Group>DataCenter-NYC-1</Group>
        ...

        <Transports>
            <Transport>

                <!-- The amps-replication transport is required.
                     This is a proprietary message format used by
                     AMPS to replicate messages between instances.
                     This AMPS instance will receive replication messages
                     on this transport. The instance can receive
                     messages from any number of upstream instances
                     on this transport.

                     An instance of AMPS may only define one incoming replication
                     transport.  -->

                <Name>amps-replication</Name>
                <Type>amps-replication</Type>
                <InetAddr>10004</InetAddr>
            </Transport>

      <!--
            Transports for application use also need to
            be defined. An amps-replication transport can
            only be used for replication -->

       ... transports for client use here ...

        </Transports>

        ...

        <!-- All replication destinations are defined inside the Replication block. -->

        <Replication>

            <!--
                 Each individual replication destination defines outgoing
                 replication, that is, messages being replicated from
                 this instance of AMPS to another instance of AMPS.
              -->

            <Destination>

                <!-- The replicated topics and their respective message types are defined here. AMPS
                    allows any number of Topic definitions in a Destination. -->
                <Topic>
                    <MessageType>fix</MessageType>

                    <!-- The Name definition specifies the name of the topic or topics to be replicated.
                        The Name option can be either a specific topic name or a regular expression that
                        matches a set of topic names. -->
                    <Name>topic</Name>

                </Topic>
                <Topic>
                  <!-- Replicate any topic that uses the JSON message type
                       and that starts with /orders -->
                  <MessageType>json</MessageType>
                  <Name>^/orders/</Name>
                </Topic>

                <Name>amps-2</Name>

                <!-- Fully synchronize messages, including messages that
                     were not originally published to this instance. -->
                <PassThrough>.*</PassThrough>

                <!-- The group name of the destination instance (or instances). The name specified
                    here must match the Group defined for the remote AMPS instance, or AMPS reports
                    an error and refuses to connect to the remote instance. -->
                <Group>DataCenter-NYC-1</Group>

                <!-- Replication acknowledgment can be either synchronous or
                     asynchronous. This does not affect the speed or priority of
                     the connection, but does control when this instance will
                     acknowledge the message as safely persisted. -->

                <SyncType>sync</SyncType>

                <!-- The Transport definition defines the location to which this AMPS instance will
                    replicate messages. The InetAddr points to the hostname and port of the
                    downstream replication instance. The Type for a replication instance should
                    always be amps-replication. -->
                <Transport>

                    <!-- The address, or list of addresses, for the replication destination. -->
                    <InetAddr>amps-2-server.example.com:10005</InetAddr>
                    <Type>amps-replication</Type>
                </Transport>
            </Destination>
        </Replication>

        ...

    </AMPSConfig>

*Configuration used for amps-1*

For the configuration of ``amps-2``, we will use the following example.
While this example is similar, only the differences between the
``amps-1`` configuration will be called out.

.. _#repl-amps2-config:

.. code-block:: xml

    <AMPSConfig>
        <Name>amps-2</Name>
        <Group>DataCenter-NYC-1</Group>

        ...


        <Transports>

                <!-- The amps-replication transport is required
                     This is a proprietary message format used by
                     AMPS to replicate messages between instances.
                     This AMPS instance will receive replication messages
                     on this transport. The instance can receive
                     messages from any number of upstream instances
                     on this transport.

                     An instance of AMPS may only define one incoming replication
                     transport.  -->

            <Transport>
                <Name>amps-replication</Name>
                <Type>amps-replication</Type>

                <!-- The port where amps-2 listens for replication messages matches the port where
                    amps-1 is configured to send its replication messages. This AMPS instance will
                    receive replication messages on this transport. The instance can receive
                    messages from any number of upstream instances on this transport. -->
                <InetAddr>10005</InetAddr>
            </Transport>
        </Transports>

        ...

        <Replication>
            <Destination>

                <!-- The Topic definitions for amps-2 match
                     the definitions for amps-1 so that these
                     topics contain the same messages in the
                     transaction log on both instances. -->

                <Topic>
                    <MessageType>fix</MessageType>
                    <Name>topic</Name>
                </Topic>
                <Topic>
                  <MessageType>json</MessageType>
                  <Name>^/orders/</Name>
                </Topic>
                <Name>amps-1</Name>

                <!-- Fully synchronize messages, including messages that
                     were not originally published to this instance. -->
                <PassThrough>.*</PassThrough>

                <Group>DataCenter-NYC-1</Group>

                <SyncType>sync</SyncType>
                <Transport>

                    <!-- The replication destination port for amps-2 is configured to send replication
                        messages to the same port on which amps-1 is configured to listen for them. -->
                    <InetAddr>amps-1-server.example.com:10004</InetAddr>
                    <Type>amps-replication</Type>
                </Transport>
            </Destination>
        </Replication>

        ...

    </AMPSConfig>

*Configuration for amps-2*

These example configurations replicate the topic named ``topic`` of the message type ``nvfix``
and any topic of the message type ``json`` that begins with ``/orders/`` between the
two instances of AMPS. To replicate more topics, these instances could add
additional ``Topic`` blocks.

Sync vs Async Acknowledgment
^^^^^^^^^^^^^^^^^^^^^^^^^^^^

.. _#replication-sync-vs-async:

When publishing to a topic that is recorded in the transaction log, it
is recommended that publishers request a ``persisted`` acknowledgment.
The ``persisted`` acknowledgment message is how AMPS notifies the
publisher that a message received by AMPS is considered to be safely
persisted, as specified in the configuration. (The AMPS client
libraries automatically request this acknowledgment on each ``publish``
command when a publish store is present for the client -- that is, any
time that the client is configured to ensure that the publish is received
by the AMPS server.)

Depending on the replication destination configuration for the AMPS
instance that receives the message, that ``persisted`` acknowledgment
message will be delivered to the publisher at different times in the
replication process.

There are two options: *synchronous* or *asynchronous* acknowledgment.
These two acknowledgment ``SyncType`` options control when
publishers of messages are sent ``persisted`` acknowledgments.

For *synchronous* replication acknowledgments, AMPS will not return a
``persisted`` acknowledgment to the publisher for a message until the
message has been stored to the local transaction log, to the SOW, and
all downstream synchronous replication destinations have acknowledged
the message. The figure below shows the cycle of a message being
published in a replicated instance, and the persisted acknowledgment
message being returned back to the publisher. Notice that, with this
configuration, the publisher will not receive an acknowledgment if the
remote destination is unavailable. 60East recommends that when you use
``sync`` replication, you consider setting a policy for downgrading the
link when a destination is offline, as described in
:ref:`#ug-replication-downgrade`.

.. _#ha-figure-sync-ack:

.. figure:: ../../common/png/HA_sync_ack.png
   :alt: Synchronous acknowledgment -- no acknowledgment to publisher until the downstream instance acknowledges

*Sequence for a Connection Using Sync Acknowledgment*

For destinations configured with *async* replication acknowledgment,
an AMPS instance does not require that the destination acknowledge that the
message is persisted before acknowledging the message to the publisher.
That is, for *async* acknowledgment, the AMPS instance replicating
the message sends the ``persisted`` acknowledgment message back
to the publisher as soon as the message is stored in the local transaction
log and SOW stores.

The acknowledgment type (``SyncType``) has no effect on how an instance of
AMPS replicates the message to other instances of AMPS. The acknowledgment
type *only* affects whether an instance of AMPS must receive an acknowledgment
from that ``Destination`` before it will acknowledge a message as having
been persisted.

The figure below shows the cycle of a message being published with a
``SyncType`` configuration set to *async* acknowledgment.

.. _#ha-figure-async-ack:

.. figure:: ../../common/png/HA_async_ack.png
   :alt: Aynchronous acknowledgment -- acknowledgment to publisher immediately, independent of downstream replication

*Sequence for a Connection Using Async Acknowledgment*

By default, replication destinations do not affect when a message is delivered
to a subscription. Optionally, a subscriber can request the ``fully_durable``
option on a bookmark subscription (that is, a replay from the transaction log).
When the ``fully_durable`` option is specified, AMPS does not deliver a message
to that subscriber until all replication destinations configured for ``sync``
acknowledgment have acknowledged the message.

.. caution::

   Every instance of AMPS that accepts publish commands, SOW delete commands
   or allows consumption of messages from queues, should specify at least
   one destination that uses ``sync`` acknowledgment.  If a publish or
   queue consumer may fail over between two (or more) instances of AMPS, those
   instances should specify ``sync`` acknowledgment between them to
   prevent a situation where a message could be lost if an instance fails
   immediately after acknowledging a message to a publisher.

A destination configured for ``sync`` acknowledgment can be
downgraded to ``async`` acknowledgment while AMPS is running.
This can be useful in cases where a server is offline for an
extended period of time due to hardware failure or persistent
network issues.

.. _#ug-replication-downgrade:

Downgrading Acknowledgments for a Destination
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

The AMPS administrative console provides the ability to downgrade a
replication link from *synchronous* to *asynchronous* acknowledgment.
This feature is useful to relieve memory or storage pressure on publishers
should a downstream AMPS instance prove unstable, unresponsive, or be
experiencing excessive latency to the point that it should be
considered to be offline.

Downgrading a replication link to *asynchronous* means that any
``persisted`` acknowledgment message that a publisher may be waiting on
will no longer wait for the downstream instance to confirm that it has
committed the message to its downstream Transaction Log or SOW store.
AMPS immediately considers the downstream instance to have acknowledged
the message for existing messages, which means that if AMPS was waiting
for acknowledgment from that instance to deliver a ``persisted``
acknowledgment, AMPS immediately sends the ``persisted``
acknowledgment when the instance is downgraded.

The result of a link being downgraded is:

* The number of messages that the publisher must retain is reduced, *but*
* The downgraded link is unsafe for the publisher to fail over to
* The downgraded link is unsafe for a bookmark subscriber to fail over to

Automatic downgrade is most suitable for a situation where an instance
should be considered offline or unavailable.

.. caution::

   Downgrading a destination means that this instance will not
   wait for that destination to acknowledge a message
   before acknowledging that message to publishers or
   upstream instances. It does not affect any other
   behavior of the instance.

   A publisher or queue consumer must not fail over
   from this instance to a destination that has been
   downgraded to ``async`` acknowledgment. This can
   cause message loss, since the upstream instance may
   have acknowledged a message that the downstream
   instance has not yet processed.

   A bookmark subscriber must not fail over from
   this instance to a destination that has been
   downgraded to ``async`` acknowledgment. This can
   cause replay gaps, since that destination is
   no longer considered when determining whether
   a message is persisted.

AMPS can be configured to automatically downgrade a replication link to
``asynchronous`` if the remote side of the link cannot keep up with
persisting messages or becomes unresponsive. This option prevents
unreliable links from holding up publishers, but increases the chances
of a single instance failure resulting in message loss, as described
above. AMPS can also be configured to automatically upgrade a
replication link that has previously been downgraded. If an instance
is configured to downgrade acknowledgment, it should also
be configured to upgrade acknowledgment.

Automatic downgrade is implemented as an AMPS action. To configure
automatic downgrade, add the appropriate action to the configuration
file as shown below:

.. code-block:: xml

    <AMPSConfig>
        ...
        <Actions>
            <Action>
                <On>
                    <Module>amps-action-on-schedule</Module>
                    <Options>

                        <!--This option determines how often AMPS checks whether destinations have fallen
                        behind. In this example, AMPS checks destinations every 15 seconds. In most
                        cases, 60East recommends setting this to half of the Interval setting. -->
                        <Every>15s</Every>
                    </Options>
                </On>
                <Do>
                    <Module>amps-action-do-downgrade-replication</Module>
                    <Options>

                        <!--The maximum amount of time for a destination to fall behind. If AMPS has been
                         waiting for an acknowledgment from the destination for longer than the
                         Interval, AMPS downgrades the destination. In this example, AMPS downgrades any
                         destination for which an acknowledgment has taken longer than 300 seconds. -->
                        <Age>300s</Age>
                    </Options>
                </Do>
                <Do>
                    <Module>amps-action-do-upgrade-replication</Module>
                    <Options>

                       <!-- The threshold for upgrading the replication link back to sync
                            acknowledgment. If the destination is behind by less than this
                            amount, and was previously downgraded to async acknowledgment,
                            AMPS will upgrade to sync acknowledgment.

                            -->

                        <Age>10s</Age>
                    </Options>
                </Do>
            </Action>
        </Actions>
       ...
    </AMPSConfig>

In this configuration file, AMPS checks every 15 seconds to see if a
destination has fallen behind by 300 seconds.  If a destination has
fallen behind by more than 300 seconds, that destination should
no longer be considered reliable. AMPS downgrades the destination
to ``async`` acknowledgment. That destination will no longer
be considered when acknowledging messages to publishers,
and publishers should not consider that destination to be
an option for failover until the destination catches up.


.. tip::

   All publishers using a publish store should be able to hold
   a number of messages equal to the number of messages published,
   at peak message volume, for a time period equal to the
   periodicity of the downgrade check plus the threshold for downgrade.
   With the configuration above, a publisher that publishes at a peak
   rate of 10,000 messages per second should, at a minimum, be able
   to allocate a publish store that holds 750,000 messages.

In some cases, it is important that a destination maintain a minimum
number of destinations that use ``sync`` acknowledgment. For those
cases, an instance-level ``Tuning`` parameter is available that
will prevent the action from downgrading a connection if doing so
would reduce the number of destinations that use ``sync`` acknowledgment
below the configured limit. This parameter does not guarantee whether
a specific destination will continue using ``sync`` acknowledgment. This
parameter only limits whether AMPS will downgrade a destination that
meets downgrade criteria. AMPS will not upgrade a destination that
has previously been downgraded if a connection is lost, even if
this means that the number of currently connected destinations that
use ``sync`` acknowledgment is less than the configured minimum.
See the *AMPS Configuration Guide* section on instance-level
parameters for details.

Replication Configuration Validation
---------------------------------------

Replication configuration validation helps to ensure that
any configuration that could result in message loss or inconsistent
message contents between two instances of AMPS is explicitly
designed into the replication topology and not the result
of accidental misconfiguration.

Replication can involve coordinating configuration among a large number
of AMPS instances. It can sometimes be difficult to ensure that all
of the instances are configured correctly, and to ensure that a
configuration change for one instance is also made at the replication
destinations. For example, if a high-availability pair replicates the
topics ORDERS, INVENTORY, and CUSTOMERS to a remote disaster recovery
site, but the disaster recovery site only replicates ORDERS and
INVENTORY back to the high-availability pair, disaster recovery may not
occur as planned. Likewise, if only one member of the HA pair replicates
ORDERS to the other member of the pair, the two instances will contain
different messages, which could cause problems for the system.

Starting in the 5.0 release, AMPS automatic replication configuration
validation makes it easier to keep configuration items consistent across
a replication fabric.

Replication configuration validation happens when a replication
connection is made between two instances. The validation compares
the configuration of those two instances. By default, any difference
in configuration that could result in message loss, different
behavior between the source instance and the destination instance,
or different replication guarantees between the source instance
and the destination instance is reported as an error.

.. important::

  When replication validation reports an error, the reason for the
  error is logged to the event log on the instance that detects
  the problem, and the connection is closed.

Automatic configuration validation is enabled for all elements of the
replication configuration by default. You can turn off validation for
specific elements of the configuration, as described below.

AMPS replication uses a leaderless, "all nodes hot" model. This means
that no single AMPS instance has a view of the entire replication
fabric, and a single AMPS instance will always assume that there are
instances in the replication fabric that it is not aware of. The
replication validation rules are designed with this assumption. The
advantage of this assumption is that if instances are added to the
replication fabric, it is typically only necessary to change
configuration on the instances that they directly communicate with
for replication to function as expected. The tradeoff, however, is
that it is sometimes necessary to configure an instance as though
it were part of a larger fabric (or exclude a validation rule)
even in a case where the instance is part of a much simpler
replication design.

Each ``Topic`` in a replication ``Destination`` can configure a unique
set of validation checks. By default, all of the checks apply to all
topics in the ``Destination``.

When troubleshooting a configuration validation error, it is important
to look at the AMPS logs on *both* sides of the connection. Typically, the
AMPS instance that detects the error will log complete information on the
part of validation that failed and the changes required for the connection
to succeed, while the other side of the connection will simply note that the
connection failed validation. This means that if a validation error is reported
on one instance, but details are not present, the other side of the connection
detected the error and will have logged relevant details.

.. warning::

   Excluding a validation check directs AMPS to make a
   replication connection that could result in inconsistent
   state or data loss. Use caution when excluding
   validation checks. See the table below for
   details on each validation check.

The table below lists aspects of replication that AMPS validates. By default, replication
validation treats the downstream instance as though it is intended to be a full
highly-available failover partner for any topic that is replicated. For situations
where that is not the case, many validation rules can be excluded. For example, if
the downstream instance is a view server that does not accept publishes and,
therefore, is not configured to replicate a particular topic back to this instance,
the ``replicate`` validation check might need to be excluded.

AMPS performs the following validation checks:

.. include:: ../../common/chapters/ha/rep_validation.inc

*Replication Configuration Validation*

Notice that, by default, *all* of these checks are applied.

The sample below shows how to exclude validation checks for a replication
destination. In this sample, the ``Topic`` does not require the remote destination
to replicate back to this instance, and does not require that the remote
destination enforce the same configuration checks for any downstream
replication of this topic.

.. code-block:: xml

    <Destination>
        ...
        <Topic>
            <MessageType>json</MessageType>
            <Name>MyStuff-VIEW</Name>
            <ExcludeValidation>replicate,cascade</ExcludeValidation>
        </Topic>
        ...
    </Destination>

Replication Resynchronization
------------------------------

When a replication connection is established between AMPS
instances, the upstream instance publishes any messages that it
contains in its transaction log that the downstream instance
may not have previously received. This process is called
"replication resync". During resync, the upstream instance
replays from the transaction log, replicating messages that
match the ``Topic`` (and, optionally, ``Filter``) specification(s)
for the downstream ``Destination``.

When a replication connection is established between AMPS servers
that are both version 5.3.3.0 or higher, the servers exchange
information about the messages present in the transaction log to
determine the earliest message in the transaction log on the
upstream instance that is not present on the downstream instance.
Replication resynchronization will begin at that point. This
approach to finding the resynchronization point applies
whether or not these two instances have had a replication
connection before.

For replication between older versions of AMPS, replication
resynchronization begins at the last point in the upstream
instance's transaction log that the downstream instance has
received. For those versions of AMPS, messages from other
instances are not considered when determining the
resynchronization point.

.. index:: replication; compression,


Replication Compression
------------------------

AMPS provides the ability to compress the replication connection. In
typical use, using replication compression can greatly reduce the
bandwidth required between AMPS instances.

The precise amount of compression that AMPS can achieve depends on the
content of the replicated messages. Compression is configured at the
replication source, and does not need to be enabled in the transport
configuration at the instance receiving the replicated messages.

For AMPS instances that are receiving replicated messages, no additional
configuration is necessary. AMPS automatically recognizes when an
incoming replication connection uses compression.

See the *AMPS Configuration Guide* for enabling compression and choosing
a compression algorithm.

Destination Server Failover
---------------------------

Your replication plan may include replication to a server that is part
of a highly-available group.

There are two common approaches to destination server failover:

1. **Wide IP** - AMPS replication works transparently with wide IP and
   many installations use wide IP for destination server failover. The
   advantage of this approach is that it requires no additional
   configuration in AMPS and redundant servers can be added or removed
   from the wide IP group without reconfiguring the instances that
   replicate to the group. A disadvantage to this approach is that failover
   can require several seconds and messages are not replicated during the
   time that it takes for failover to occur.

2. **AMPS Failover** - AMPS allows you to specify multiple downstream
   servers in the ``InetAddr`` element of a destination. In this case, AMPS
   treats the defined list of servers as a list of equivalent servers, listed
   in order of priority.

When multiple addresses are specified for a destination, each time AMPS
needs to make a connection to a destination and there is no incoming
connection from a server in that destination, AMPS starts at the
beginning of the list and attempts to connect to each address in the
list. If AMPS is unable to connect to any address in the list, AMPS
waits for a timeout period, then begins again with the first server on
the list. Each time AMPS reaches the end of the list without
establishing a connection, AMPS increases the timeout period. If an incoming
connection from one of the servers on the list exists, AMPS will use that
connection for outgoing replication. If multiple incoming connections
from servers in the list exist, AMPS will choose one of the incoming
connections to use for outgoing traffic.

This capability allows you to easily set up replication to a
highly-available group. If the server you are replicating to fails over,
AMPS uses the prioritized list of servers to re-establish a connection.

Two-Way Replication
-------------------

Two-way replication, sometimes called *Back Replication*, is a term used
to describe a replication scenario where there are two instances of AMPS
-- termed ``AMPS-A`` and ``AMPS-B`` for this example.

In a two-way replication configuration, messages that are published to ``AMPS-A``
are replicated to ``AMPS-B``. Likewise, messages which are published to
``AMPS-B`` are replicated to ``AMPS-A``. This replication scheme is used
when both instances of AMPS need to be in sync with each other to handle
a failover scenario with no loss of messages between them. This way, if
``AMPS-A`` should fail at any point, applications can immediately fail over
to the ``AMPS-B`` instance, allowing message flow to resume with as little
downtime as possible.

To enable two-way replication, each instance of AMPS defines
a replication ``Transport`` to receive incoming replication
messages. Each instance also defines a replication ``Destination``
to deliver messages to the other instance.

Notice that servers are intended to function as failover partners.
Since a publisher may fail over between these two instances, the
``Destination`` on each instance that replicates to the other instance
is configured to use ``sync`` message acknowledgment. This
ensures that a publisher does not consider a message to be persisted
until all of the failover partners have received and persisted the
message.


.. warning::

  When configuring a set of instances for failover, it is important that
  the instances use ``sync`` message acknowledgment among the set of
  instances that a given client will consider for failover. It should never be
  possible for a publisher to fail over from one instance to another instance
  if the replication link between those instances is configured for ``async``
  acknowledgments.

.. index:: replication ; single connection between two servers,

Starting with the 5.0 release, when AMPS detects back replication
between a pair of instances, AMPS will prefer using a single network
connection between the servers, replicating messages in both directions
(that is, to both destinations) over a single connection.  This can
be particularly useful for situations where you need to have messages
replicated, but only one server can initiate a connection. For example,
when one of the servers is in a DMZ, and cannot make a connection to a
server within the company. AMPS also allows you to specify a replication
destination with no InetAddr provided. In this case, the instance will
replicate once the destination establishes a connection, but will not initiate
a connection. When both instances specify an InetAddr, AMPS may temporarily
create two connections between the instances while replication is being established.
In this case, after detecting that there are two connections active, AMPS will
close one of the network connections and allow both AMPS instances to use
the remaining network connection to publish messages to the other instance.
Notice that using a single network connection is simply an optimization to
more efficiently use the available sockets. This does not change the way
messages are replicated or the replication protocol, nor does it change the
requirement that all messages in a journal are replicated to all destinations
before a journal can be removed.

.. _#ug-ha-passthrough:

PassThrough Replication
-----------------------

PassThrough Replication is a term used to describe the ability of an
AMPS instance to pass along replicated messages to another AMPS
instance. This allows you to easily keep multiple failover or DR
destinations in sync from a single AMPS instance. Unless passthrough
replication is configured, an AMPS instance only replicates messages
directly published to that instance from an application. By default,
an instance *does not* re-replicate messages received over replication.

PassThrough replication uses the name of the originating AMPS group to
indicate that messages that arrive at this instance of AMPS directly
from that group are to be replicated to the specified destination.
PassThrough replication supports regular expressions to specify groups,
and allows multiple server groups per destination. Notice that if the
destination instance does not specify a ``Group`` in its instance config,
the group name is the ``Name`` of the instance.

To ensure that an instance replicates a full copy of its transaction
log downstream (which is typically the intended result), include
a ``PassThrough`` configuration item that matches any group name.

With care, some topologies can use a ``PassThrough`` configuration
that only replicates messages directly published to the instance
and a subset of messages received over replication.
This can result in significant bandwidth reduction in some
topologies, but must be configured with care to ensure that
messages do not fail to reach all of the instances, since
in this case AMPS is configured to replicate only a *part* of
the transaction log of the local instance.

.. code-block:: xml

    <Replication>
        <Destination>
            <Name>AMPS2-HKG</Name>
            <!-- No group specified: this destination is for
                 a server at the same site, and is responsible for
                 populating the specific replication partner. -->
            <Transport>
                <Name>amps-replication</Name>
                <Type>amps-replication</Type>
                <InetAddr>secondaryhost:10010</InetAddr>
            </Transport>
            <Topic>
                <Name>/rep_topic</Name>
                <MessageType>fix</MessageType>
            </Topic>
            <Topic>
                <Name>/rep_topic2</Name>
                <MessageType>fix</MessageType>
            </Topic>
            <SyncType>sync</SyncType>

            <!-- Specify which messages received via replication will be replicated
                 to this destination (provided that the Topic and MessageType also match).

                 This destination will receive messages that arrive via replication from
                 AMPS instances with a group name that does not contain HKG. Replicated
                 messages from an instance that has a group name that matches HKG will not be
                 sent to this destination.

                 Regardless of the PassThrough configuration, all messages published directly
                 to this instance by an AMPS client will be replicated to this destination
                 if the Topic and MessageType match.
            -->
            <PassThrough>^((?!HKG).)*$</PassThrough>
         </Destination>
    </Replication>

When a message is eligible for passthrough replication, topic and
content filters in the replication destination still apply. The
passthrough directive simply means that the message is eligible for
replication from this instance if it comes from an instance in the
specified group.

AMPS protects against loops in passthrough replication by tracking the
instance names or group names that a message has passed through. AMPS
does not allow a message to travel through the same instance and group name
more than once.

.. caution::

   When using passthrough, AMPS will not send a message to an
   instance that the message has already traveled through
   to protect against replication loops.

If an instance replicates a queue (distributed queue) or a group local queue, it
*must also* provide passthrough for any incoming replication group that replicates that
topic (even if the incoming replication connection is from the same group that this
instance belongs to). The reason for this is simple: AMPS must ensure that messages
for a replicated queue, including acknowledgments and transfer messages, are able
to reach every instance that hosts the queue if possible, even if a network connection
fails or an instance goes offline. Therefore, this instance must pass through messages
received from other instances that affect the queue.

Guarantees on Ordering
----------------------

For each publisher, on a single topic, AMPS is guaranteed to deliver
messages to subscribers in the same order that the messages were
published by the original publisher. This guarantee holds true
regardless of how many publishers or how many subscribers are connected
to AMPS at any one time.

For each instance, AMPS is guaranteed to deliver messages in the order
in which the messages were received by the instance, regardless of
whether a message is received directly from a publisher or indirectly
via replication. The message order for the instance is recorded in the
transaction log, and is guaranteed to remain consistent across server
restarts.

These guarantees mean that subscribers will not spend unnecessary CPU
cycles checking timestamps or other message content to verify which
message is the most recent, or reordering messages during playback. This
frees up subscriber resources to do more important work.

AMPS preserves an absolute order across topics for a single subscription
for all topics *except* views, queues, and conflated topics.
Applications often rely on this behavior to correlate the times at which
messages to different topics were processed by AMPS. See
:ref:`#pub-sub-ordering` for more information.

Replication Security
--------------------

AMPS allows authorization and entitlement to be configured on
replication destinations. For the instance that receives connections,
you simply configure ``Authentication`` and ``Entitlement`` for the transport
definition for the destination, as shown below:

.. code-block:: xml

    <Transports>
        <Transport>
            <Name>amps-replication</Name>
            <Type>amps-replication</Type>
            <InetAddr>10005</InetAddr>

            <!-- Specifies the entitlement module to use to check permissions for incoming
                connections. The module specified must be defined in the Modules section of the
                config file, or be one of the default modules provided by AMPS. This snippet
                uses the default module provided by AMPS for example purposes. -->
            <Entitlement>
                <Module>amps-default-entitlement-module</Module>
            </Entitlement>

            <!-- Specifies the authentication module to use to verify identity for incoming
                connections. The module specified must be defined in the Modules section of the
                config file, or be one of the default modules provided by AMPS. This snippet
                uses the default module provided by AMPS for example purposes. -->
            <Authentication>
                <Module>amps-default-authentication-module</Module>
           </Authentication>
        </Transport>
     ...
    </Transports>

For incoming connections, configuration is the same as for other types
of transports.

For connections from AMPS to replication destinations, you can configure
an ``Authenticator`` module for the destination transport. ``Authenticator``
modules provide credentials for outgoing connections from AMPS. For
authentication protocols that require a challenge and response, the
``Authenticator`` module handles the responses for the instance requesting
access.

.. code-block:: xml

    <Replication>
        <Destination>
            <Topic>
                <MessageType>fix</MessageType>
                <Name>topic</Name>
            </Topic>
            <Name>amps-1</Name>
            <SyncType>async</SyncType>
            <Transport>
                <InetAddr>amps-1-server.example.com:10004</InetAddr>
                <Type>amps-replication</Type>

                <!-- Specifies the authenticator module to use to provide credentials for the
                    outgoing connection. The module specified must be defined in the Modules section
                    of the config file, or be one of the default modules provided by AMPS. This
                    snippet uses the default module provided by AMPS for example purposes. -->

                <Authenticator>
                    <Module>amps-default-authenticator-module</Module>
                </Authenticator>
            </Transport>
        </Destination>
    </Replication>


Understanding Replication Message Routing
------------------------------------------

An instance of AMPS will replicate a message to a given ``Destination`` when
all of the following conditions are met (and there is an active replication
connection to the ``Destination``):

1. The message must be recorded in the transaction log for this instance (that is,
   the topic that the message is published to must be recorded in the transaction
   log with the appropriate message type).

2. The AMPS instance must be configured to replicate messages with that
   topic and message type to the ``Destination``. (If a content filter
   is included in the replication configuration, the message must
   also match the content filter.)

3. The message must *either* have been directly published to this instance, or
   if the message was received via replication, the ``Destination`` must specify
   a ``PassThrough`` rule that matches the AMPS instance that this instance
   received the message from.

4. The message must not have previously passed through the ``Destination``
   being replicated to; replication loops are not permitted.

Each instance that receives a message evaluates the conditions above for each
``Destination``. The same process is followed when replaying messages from
the transaction log while resynchronizing the downstream replication
destination.

To verify that a given set of configurations replicates the appropriate
messages, start at each instance that will receive publishes and trace the
possible replication paths, applying these rules. If, at any time, applying
these rules creates a situation where a message does not reach an instance
of AMPS that is intended to receive the message, revise the rules (typically,
by adjusting the topics replicated to a ``Destination`` or the
``PassThrough`` configuration for a ``Destination``) until
messages reach the intended instances regardless of route.


Maximum Downstream Destinations
--------------------------------

AMPS has support for up to 64 downstream destinations
that use synchronous acknowledgment. There is no
explicit limit on the number of downstream
destinations that use asynchronous (immediate)
acknowledgment.

.. include:: ./queue_replication.inc

.. include:: ./bootstrap.inc

.. include:: ./replication_best_practices.inc


