m4_comment([$Id: page.so,v 10.19 2002/06/01 23:42:12 bostic Exp $]) m4_ref_title(Locking Subsystem, Locking granularity, [page-level @locking, locking granularity], lock/deaddbg, lock/notxn) m4_p([dnl With the exception of the Queue access method, the m4_db access methods do page-level locking. The size of pages in a database may be set when the database is created by calling the m4_refT(dbh_set_pagesize). If not specified by the application, m4_db selects a page size that will provide the best I/O performance by setting the page size equal to the block size of the underlying file system. Selecting a smaller page size can result in increased concurrency for some applications.]) m4_p([dnl In the Btree access method, m4_db uses a technique called lock coupling to improve concurrency. The traversal of a Btree requires reading a page, searching that page to determine which page to search next, and then repeating this process on the next page. Once a page has been searched, it will never be accessed again for this operation, unless a page split is required. To improve concurrency in the tree, once the next page to read/search has been determined, that page is locked and then the original page lock is released atomically (that is, without relinquishing control of the lock manager). When page splits become necessary, write locks are reacquired.]) m4_p([dnl Because the Recno access method is built upon Btree, it also uses lock coupling for read operations. However, because the Recno access method must maintain a count of records on its internal pages, it cannot lock-couple during write operations. Instead, it retains write locks on all internal pages during every update operation. For this reason, it is not possible to have high concurrency in the Recno access method in the presence of write operations.]) m4_p([dnl The Queue access method uses only short-term page locks. That is, a page lock is released prior to requesting another page lock. Record locks are used for transaction isolation. The provides a high degree of concurrency for write operations. A metadata page is used to keep track of the head and tail of the queue. This page is never locked during other locking or I/O operations.]) m4_p([dnl The Hash access method does not have such traversal issues, but it must always refer to its metadata while computing a hash function because it implements dynamic hashing. This metadata is stored on a special page in the hash database. This page must therefore be read-locked on every operation. Fortunately, it needs to be write-locked only when new pages are allocated to the file, which happens in three cases:]) m4_bulletbegin m4_bullet([a hash bucket becomes full and needs to split]) m4_bullet([a key or data item is too large to fit on a normal page]) m4_bullet([dnl the number of duplicate items for a fixed key becomes so large that they are moved to an auxiliary page]) m4_bulletend m4_p([dnl In this case, the access method must obtain a write lock on the metadata page, thus requiring that all readers be blocked from entering the tree until the update completes.]) m4_p([dnl Finally, when traversing duplicate data items for a key, the lock on the key value also acts as a lock on all duplicates of that key. Therefore, two conflicting threads of control cannot access the same duplicate set simultaneously.]) m4_ignore([dnl As the Hash access method implements dynamic hashing, it must always refer to its metadata while computing a hash function. This metadata is stored on a special page in the hash database. The hash access uses a similar lock coupling technique to coordinate access between its metadata page and the remaining buckets in the database. The metadata page is read-locked in order to determine the next bucket to traverse. That bucket is then locked, and the metadata page lock is atomically released. If it is later necessary to write lock the metadata page, the Hash method uses an algorithm similar to that used for Btree splits. It releases its bucket lock, obtains a write lock on the metadata page and then reacquires the lock on the appropriate bucket (which may have changed while the lock was released). The conditions under which this scenario occurs are: 1) a hash bucket becomes full and needs to split, 2) a key or data item is too large to fit on a normal page, and 3) the number of duplicate items for a fixed key becomes sufficiently large that they are moved to an auxiliary page. This algorithm provides much better concurrency and fewer deadlocks than previous algorithms, which required that the metadata page remain locked for the duration of every operation.]) m4_page_footer