m4_comment([$Id: build.so,v 10.11 2003/10/18 19:16:21 bostic Exp $]) m4_ref_title(Distributed Transactions, Building a Global Transaction Manager, [Transaction Manager], xa/intro, xa/xa_intro) m4_p([dnl Managing distributed transactions and using the two-phase commit protocol of m4_db from an application requires the application provide the functionality of a global transaction manager (GTM). The GTM is responsible for the following:]) m4_bulletbegin m4_bullet([dnl Communicating with the multiple environments (potentially on separate systems).]) m4_bullet([Managing the global transaction ID name space.]) m4_bullet([Maintaining state information about each distributed transaction.]) m4_bullet([Recovering from failures of individual environments.]) m4_bullet([dnl Recovering the global transaction state after failure of the global transaction manager.]) m4_bulletend m4_section([Communicating with multiple m4_db environments]) m4_p([dnl Two-phase commit is required if an application wants to transaction protect m4_db calls across multiple environments. If the environments reside on the same machine, the application can communicate with each environment through its own address space with no additional complexity. If the environments reside on separate machines, the application can either use the m4_db RPC server to manage the remote environments or it may use its own messaging capability, translating messages on the remote machine into calls into the m4_db library (including the recovery calls). For some applications, it might be sufficient to use Tcl's remote invocation to remote copies of the tclsh utility into which the m4_db library has been dynamically loaded.]) m4_section([Managing the Global Transaction ID (GID) name space]) m4_p([dnl A global transaction is a transaction that spans multiple environments. Each global transaction must have a unique transaction ID. This unique ID is the global transaction ID (GID). In m4_db, global transaction IDs must be represented with the confines of a m4_ref(DB_XIDDATASIZE) size (currently 128 bytes) array. It is the responsibility of the global transaction manager to assign GIDs, guarantee their uniqueness, and manage the mapping of local transactions to GID. That is, for each GID, the GTM should know which local transactions managers participated. The m4_db logging system or a m4_db table could be used to record this information.]) m4_section([Maintaining state for each distributed transaction.]) m4_p([dnl In addition to knowing which local environments participate in each global transaction, the GTM must also know the state of each active global transaction. As soon as a transaction becomes distributed (that is, a second environment participates), the GTM must record the existence of the global transaction and all participants (whether this must reside on stable storage or not depends on the exact configuration of the system). As new environments participate, the GTM must keep this information up to date.]) m4_p([dnl When the GTM is ready to begin commit processing, it should issue m4_ref(txn_prepare) calls to each participating environment, indicating the GID of the global transaction. Once all the participants have successfully prepared, then the GTM must record that the global transaction will be committed. This record should go to stable storage. Once written to stable storage, the GTM can send m4_ref(txn_commit) requests to each participating environment. Once all environments have successfully completed the commit, the GTM can either record the successful commit or can somehow "forget" the global transaction.]) m4_p([dnl If nested transactions are used (that is, the m4_arg(parent) parameter is specified to m4_ref(txn_begin)), no m4_ref(txn_prepare) call should be made on behalf of any child transaction. Only the ultimate parent should even issue a m4_ref(txn_prepare). ]) m4_p([dnl Should any participant fail to prepare, then the GTM must abort the global transaction. The fact that the transaction is going to be aborted should be written to stable storage. Once written, the GTM can then issue m4_ref(txn_abort) requests to each environment. When all aborts have returned successfully, the GTM can either record the successful abort or "forget" the global transaction.]) m4_p([dnl In summary, for each transaction, the GTM must maintain the following:]) m4_bulletbegin m4_bullet([A list of participating environments]) m4_bullet([dnl The current state of each transaction (pre-prepare, preparing, committing, aborting, done)]) m4_bulletend m4_section([Recovering from the failure of a single environment]) m4_p([dnl If a single environment fails, there is no need to bring down or recover other environments (the only exception to this is if all environments are managed in the same application address space and there is a risk the failure of the environment corrupted other environments). Instead, once the failing environment comes back up, it should be recovered (that is, conventional recovery, via m4_ref(db_recover) or by specifying the m4_ref(DB_RECOVER) flag to m4_ref(dbenv_open) should be run). If the m4_ref(db_recover) utility is used, then the -e option must be specified. In this case, the application will almost certainly want to specify environmental parameters via a DB_CONFIG file in the environment's home directory, so that m4_ref(db_recover) can create an appropriately configured environment. If the m4_ref(db_recover) utility is not used, then m4_ref(DB_PRIVATE) should not be specified, unless all processing including recovery, calls to m4_ref(txn_recover), and calls to finish prepared, but not yet complete transactions take place using the same database environment handle. The GTM should then issue a m4_ref(txn_recover) call to the environment. This call will return a list of prepared, but not yet committed or aborted transactions. For each transaction, the GTM should look up the GID in its local store to determine if the transaction should commit or abort.]) m4_p([dnl If the GTM is running in a system with multiple GTMs, it is possible that some of the transactions returned via m4_ref(txn_recover) do not belong to the current environment. The GTM should detect this and call m4_ref(txn_discard) on each such transaction handle. Furthermore, it is important to note the environment does not retain information about which GTM has issued m4_ref(txn_recover) operations. Therefore, each GTM should issue all its m4_ref(txn_recover) calls, before another GTM issues its calls. If the calls are interleaved, each GTM may not get a complete and consistent set of transactions. The simplest way to enforce this is for each GTM to make sure it can receive all its outstanding transactions in a single m4_ref(txn_recover) call. The maximum number of possible outstanding transactions is roughly the maximum number of active transactions in the environment (which value can be obtained using the m4_refT(txn_stat) or the m4_ref(db_stat) utility). To simplify this procedure, the caller should allocate an array large enough to be certain to hold the list of transactions (for example, allocate an array able to hold three times the maximum number of transactions). If that's not possible, callers should check that the array was not completely filled in when m4_ref(txn_recover) returns. If the array was completely filled in, each transaction should be explicitly discarded, and the call repeated with a larger array.]) m4_p([dnl The newly recovered environment will forbid any new transactions from being started until the prepared but not yet committed/aborted transactions have been resolved. In the multiple GTM case, this means that all GTMs must recover before any GTM can begin issuing new transactions.]) m4_p([dnl Because m4_db flushed both commit and abort records to disk for two-phase transaction, once the global transaction has either committed or aborted, no action will be necessary in any environment. If local environments are running with the m4_ref(DB_TXN_WRITE_NOSYNC) or m4_ref(DB_TXN_NOSYNC) options (that is, is not writing and/or flushing the log synchronously at commit time), then it is possible that a commit or abort operation may not have been written in the environment. In this case, the GTM must always have a record of completed transactions to determine if prepared transactions should be committed or aborted.]) m4_section([Recovering from GTM failure]) m4_p([dnl If the GTM fails, it must first recover its local state. Assuming the GTM uses m4_db tables to maintain state, it should run m4_ref(db_recover) (or the m4_ref(DB_RECOVER) option to m4_ref(dbenv_open)) upon startup. Once the GTM is back up and running, it needs to review all its outstanding global transactions, that is all transaction which are recorded, but not yet committed or aborted.]) m4_p([dnl Any global transactions which have not yet reached the prepare phase should be aborted. If these transactions were on remote systems, the remote systems should eventually time them out and abort them. If these transactions are on the local system, we assume they crashed and were aborted as part of GTM startup.]) m4_p([dnl The GTM must then identify all environments which need to have their m4_refT(txn_recover)s called. This includes all environments that participated in any transaction that is in the preparing, aborting, or committing state. For each environment, the GTM should issue a m4_ref(txn_recover) call. Once each environment has responded, the GTM can determine the fate of each transaction. The correct behavior is defined depending on the state of the global transaction according to the table below.]) m4_tagbegin m4_tag(preparing, [dnl if all participating environments return the transaction in the prepared but not yet committed/aborted state, then the GTM should commit the transaction. If any participating environment fails to return it, then the GTM should issue an abort to all environments that did return it.]) m4_tag(committing, [dnl the GTM should send a commit to any environment that returned this transaction in its list of prepared but not yet committed/aborted transactions.]) m4_tag(aborting, [dnl the GTM should send an abort to any environment that returned this transaction in its list of prepared but not yet committed/aborted transactions.]) m4_tagend m4_page_footer