|
Description  |
|
|
Field of the Invention
This invention relates generally to simultaneous database transaction
processing and query processing, and, more particularly, to functionally
separating a database transaction entity and a query entity that access
the same data by providing a method and apparatus called an intelligent
page store that provides two access paths, one for the transaction entity
and one for the query entity, and an implicit versioning mechanism, that
provide the transaction entity with the most recent data and provide the
query entity with a recent but consistent version of the data, while
physically maintaining a single copy of most of the data.
Description of the Prior Art
In recent years the demand for database transaction processing capacity in
large installations has been growing significantly. At the same time, a
large fraction of new database applications have been in a relational
database environment, which is also an ideal environment for supporting ad
hoc queries on the database. This has given rise to a concomitant growth
in the use of ad hoc unstructured queries-a trend which is expected to
accelerate. Consequently, there is a growing requirement for
simultaneously supporting both high volume transaction processing and
unstructured queries against the same database. Therefore, a principal
objective of this invention is to design an architecture that effectively
supports both high volume transactions and complex queries, with minimal
interference between the two, while sharing a single copy of most of the
data.
Typically enterprises create and maintain their databases through a high
volume of relatively simple transactions. Each transaction represents a
well-understood business operation (creating a new customer record, noting
an account payment or transfer). Increasingly enterprises are becoming
interested in running more ad hoc unstructured queries against their
online data. This is stimulated by the feasibility of writing these more
complex queries in SQL. Typical applications might be: testing new market
opportunities, decision support, detecting historical trends,
profitability analysis etc.. These unstructured queries are characterized
by:
they are unplanned and not frequently used-performance tuning for each
query is not practical
they do not modify the operational business data
they can execute against somewhat old database data without loss of value
they may require large amounts of data scanning and processing-hence have
long execution times compared with the standard transactions.
In Chan, A., Fox, S., Lin W. T., Nori, A. and Ries, D., "The Implementation
of an Integrated Concurrency Control and Recovery Scheme", Proc. ACM
SIGMOD Conf., 1982, pp. 184-191, a versioning scheme is described. In this
scheme, different versions of pages are chained, and again each version is
identified by the ID transaction that created it. Each query has
associated with it a copy of a Completed Transaction List (CTL) that is in
effect at the time of its initiation. Query access is by chasing down the
chain of physical pages till a version in the queries CTL is detected.
First, this requires information of completed transactions to be available
to the query processor, again preventing transaction and query processing
to be functionally separated Second, chasing down pages may require
several I/Os. Third, in this scheme pages must be forced to disk by
committing transactions. Finally, the scheme supports only page level
locking for transactions. The scheme is generalized to a distributed
environment in Chan, A., and Gray, R., "Implementing Distributed Read-Only
Transactions", IEEE Trans. Software Engrg., Vol. SE-11, No. 2, February
1985, by using a complex scheme in which CTLs are sent between sites and
are merged to create new versions of CTLs.
In Robinson, J., Thomasian, A. and Yu, P., "Elimination of Lock Contention
in Relational Databases Accessed by Read-Only Queries and On-Line Update
Transactions", IBM Technical Disclosure Bulletin, Vol. 31, No. 1, pp.
180-185, June 1988, an explicit page versioning method for queries and
transactions that both access data by requesting locks at a common
concurrency controller is described. The scheme requires knowledge of
which pages are locked by transactions and queries, and when a lock
contention is detected, a version is created for the query to access. For
queries this is done by keeping status arrays of queries in progress and
checking these arrays when a transaction makes a lock request that results
in a conflict. The scheme also requires that committed updated pages by
transactions be immediately accessible by queries. Essentially, this
scheme requires that queries and transactions be run under a single DBMS
(common concurrency control manager and buffer manager), so that locks
made by queries and transactions are known to each other. The disclosure
does not describe how garbage collection is is done to remove versioned
pages that are no longer required.)
The general difference from the prior art outlines above and this invention
is that in the prior art, queries and transactions are not mutually
functionally separated. That is, in the prior art there is a single
concurrency control entity that ensures consistent access among
transactions and between transactions and queries. This precludes
independent implementation and optimization of query and transaction
processing. There is another extreme in the prior art that separates the
data accessed by queries and transactions by making a complete copy of the
database.
SUMMARY OF THE INVENTION
In accordance with a preferred but nonetheless illustrative embodiment
demonstrating objects and features of the present invention there is
provided a novel method and apparatus for simultaneous database
transaction processing and query processing wherein an intelligent page
store containing shared disk storage is provided. The intelligent page
store provides two access paths to the shared data, one by a transaction
entity and one by a query entity. In the intelligent page store an
implicit versioning mechanism allows simultaneous access by the
transaction entity and the query entity to the shared disk storage, where
the transaction entity is presented the current data and wherein the query
entity is presented a recent and consistent version of the data.
Furthermore, a single copy of all but recently updated pages is maintained
by the intelligent page store, and the query and transaction entities
operate independently of each other.
As relational database queries become more complex, parallel intra-query
processing, which exploits a large number of processors cooperating on the
same query, has become important as a means of improving query response
times, and providing incremental growth. On the other hand, transaction
processing is, for the most part, not amenable to intra-transaction
parallelism, but requires the support of a large number of concurrent
transactions with sub-second response times. Reducing data contention by
shortening lock hold times becomes critical as the transaction rate
increases. This favors large processors with shared buffers. Therefore, a
principal objective of this invention is to provide a logical database
with two paths for accessing data: one for database transactions, and
another for adhoc queries. This allows the transaction and query
processing systems to be independently optimized, while providing access
to the same data. For instance, update and transaction traffic can exploit
the performance of large processors in tightly coupled shared memory
configurations, while complex queries against the same data can be handled
by parallel database software on loosely coupled micro-processors.
In the environment that supports transactions and queries with the above
characteristics, further objectives of this invention are as follows:
Disks and disk controllers are a significant component of total cost when a
large database is present. Thus the disk space for combined transaction
and query processing should be minimized. Thus, an objective of this
invention is that online data should be shared by transactions and
queries.
Complex queries will sometimes have execution times, and lock holds which
are significantly longer than the response time of the "structured
transaction" for which throughput is important. Therefore, yet another
objective of this invention is that the complex queries should see a
consistent view of the database data without withholding locks from the
transaction traffic.
It is still another objective that the transaction processing database
software and the query processing DB software should be effectively
decoupled (no access to each others buffers in memory or exchange of lock
information). In general this allows software for transaction and query
processing to be independently optimized.
These, and other, objects, advantages, and features of the invention will
be more apparent from the following description and the appended drawings.
BRIEF DESCRIPTION OF THE DRAWINGS
FIG. 1 is a schematic diagram describing the relevant system structure for
prior art transaction and query systems.
FIG. 2 is a schematic diagram of the system structure for the preferred
embodiment of the invention: an intelligent page store for concurrent and
consistent access to data by transactions and queries.
FIG. 3 is a state-time diagram showing when database snapshots are created,
what it means for them to be consistent and what it means to maintain a
query version.
FIG. 4 is a state-time diagram showing how the number of copies of each
page in the page store varies over time as a result of write requests from
the transaction processor and new query version creation.
FIG. 5 defines the algorithms and logic used by the intelligent page store
in the preferred embodiment to implement implicit versioning.
DESCRIPTION OF THE PREFERRED EMBODIMENT
Prior art transaction and query systems
To simplify the description of our invention, we summarize the essential
features of prior art systems for transaction and query processing with a
schematic in FIG. 1.
In this figure, box 1 shows a set of Transactions X1, X2, . . . Xm and
Queries Q1, Q2, . . . Qn executing against the database. In our
environment many transactions may need to be executing concurrently
against the data to provide adequate throughput. However the view of data
seen by these programs is that they execute serially, atomically and
without interference.
A Transaction Processor 3 provides the concurrency control, locking, data
access checking, index management, buffering and data protection needed to
satisfy the view of data expected by Transactions 1. In most prior
systems, update transactions and read only queries are not separated out
and handled independently. The Transaction Processor is typically tuned to
support the required throughput of Transactions. Some concurrent queries
are also supported on this Transaction Processor but difficulties arise
when a high throughput of update transactions is required in the presence
of concurrent much longer running read-only queries.
Interaction between Transactions 1 and the Transaction Processor 3 consists
of data access requests to read and write database records and information
on when to commit or abort transactions. This is illustrated as Interface
2.
The word processor in the Transaction Processor is not meant to imply any
particular system organization, machine packaging or physical unit
boundaries. The Transaction Processor function could execute on a single
physical processor, multiprocessor network of clustered processors or as a
component sharing a processing system with other functions. We freely use
processor in this way from now on.
An important component of the Transaction Processor is the Buffer 4. This
is a pool of fast access storage (such as electronic memory) managed by
the Transaction Processor. To read or modify database data, pages must be
read into the Buffer from pages of Transaction Database Data 8 stored on a
non-volatile medium (such as magnetic disks). Updates are made to pages in
the Buffer which must eventually be written back to the non-volatile
Database Data storage.
Transaction processing also makes use of a Transaction Database Log 6 to
protect the database data from transaction aborts and system failures. The
Transaction Database Log must be stored on non-volatile storage (such as
magnetic disks). Standard terminology and algorithms describing what the
transaction processor should write into the log and how it should be
coordinated with buffer management are described in
C. Mohan, D. Haderle, B Lindsay, H Pirahesh, P. Schwarz, "ARIES: A
Transaction method supporting fine granularity, locking and partial
rollbacks using write ahead locking", IBM Research Report RJ 6649 1/23/89,
and R. A. Cruz, "Data Recovery in IBM Database 2", IBM Systems Journal,
Vol 23 no 2 1984.
The Transaction Database Log 6 and Transaction Database Data 8 must both be
stored on non-volatile storage. This is usually provided using magnetic
disks but any comparable storage medium would suffice. The Transaction
Processor issues streams of page read and write requests for the log via
Interface 5 and for the Database Data via Interface 7. The content of
these page read and write commands is defined by the Transaction
Processor. The basic requirement to provide the non-volatile storage
services required by Interfaces 5 and 7 is that any page once written can
have its contents exactly retrieved by a read request at any later time.
System structure for intelligent page store method and apparatus
Referring now to FIG. 2, there is shown a block diagram of the system
structure for our invention: an intelligent page store scheme to allow
concurrent processing of queries and transactions.
It consists of a Transaction Processor 3, a Query Processor 17, and an
Intelligent Page Store 10. A fundamental difference between this and the
prior art transaction and query systems is that update Transactions are
separated out from Queries and served independently by a Transaction
Processor and a Query Processor. The Intelligent Page Store provides the
Transaction and Query Processors with access to shared physical pages of
the database in a way which supports important performance requirements
(concurrent transaction and query processing) at minimal cost in the
amount of non-volatile storage required.
The set of Transactions 9 supported by the Transaction Processor contains
only update transactions and read-only transactions which require access
to current data from the database. This is represented by the subset
X1,X2, . . . Xm of the Transactions And Queries 1 handled by the
Transaction Processor 3 in FIG. 1. Transactions see the database data with
the same view as in prior art-as atomic, non-interfering, serialized
actions. Thus is illustrated by the fact that they interact with the
Transaction Processor 3 via the same Interface 2 of record access requests
and commit/abort information.
The function of the Transaction Processor 3 and its Buffer 4 in this figure
are the same as in FIG. 1. Together they provide, buffering locking,
concurrency control, index management services to support the view of data
expected by Transactions X1,X2, . . . Xm. The Transaction Processor
continues to write its recovery log with a stream of page read and write
requests via Interface 5 just as it would if the recovery log were stored
directly on non-volatile storage attached to it. It also reads and writes
database data pages via Interface 7 exactly as in prior art.
The Query Processor 17 supports read-only queries which can accept data
which is not completely current provided that it is consistent. The set of
Queries 15 is shown with example queries Q1,Q2, . . . Qn which were in
prior art served by the Transaction Processor. Queries Q1,Q2, . . . Qn
have explicitly been identified and separated from Transactions X1,X2, . .
. Xm so that they can be processed independently. Queries received via
this separate interface are processed with the assumption that they are
read-only and do not need completely current data. The separate Interface,
a stream of record read accesses, known to contain only requests from
queries is shown as Interface 16.
The Query Processor 17 must provide, access checking, index management,
buffering etc. for queries. It is possible for the same software and
hardware processing to be used for Query Processor 17 as was used for
Transaction Processor 3 in prior art where it also served queries. However
since most existing database systems are driven by the requirement to
support adequate transaction throughput, having query processing handled
by a separate entity allows retuning or redesign of the software and
hardware processing to support read only queries alone. Since the
requirements for concurrency control, locking and data protection in a
query only system are substantially relaxed relative to general purpose
transaction processing, considerable performance and cost performance
gains become possible.
The Query Processor 17 will include some internal page buffering, but since
the buffer management scheme may differ from that used in Buffer 4,
buffering is not explicitly identified as a subcomponent. Interface 18
enables the Query Processor 17 to request pages of database data via a
stream of page read requests. The data supplied presents a consistent view
of the database data from some recent time.
The Intelligent Page Store 10 is a new concept which makes the separated
processing of transactions and queries feasible without great cost in
non-volatile storage (independent copies of all the database data for use
by transaction and query processing respectively).
The Intelligent Page Store contains a Processing part 11 and a Non-Volatile
Storage part 12. The Processing in the Intelligent Page Store consists of
a Version Manager 13, which handles the page read and write requests from
the Transaction and Query Processors via Interfaces 5,7,18. The
Non-Volatile Storage in the Intelligent Page Store acts as a repository
for the Transaction Database Log 6 and the Transaction Database Data 8.
This Transaction Database Data is exactly that shown as 8 in FIG. 1, i.e.
it is the backing storage for the current copy of every page which the
Transaction Processor 6 has written to Non-Volatile Storage. The
Intelligent Page Store provides additional Non-Volatile Storage for pages
of Query Version Data 14. These allow a consistent query view of the
database to be presented to the Query Processor via requests in Interface
18.
The function of the Version Manager 13 is to control access to shared
logical pages of data by the Transaction Processor and Query Processor
while preventing them from affecting each other's performance
significantly, and minimizing total requirements for physical non-volatile
storage. As a result the total non-volatile storage needed to save the
Transaction Database Data 8 and the Query Version Data 14 is considerably
less than twice the amount which would have been needed for Transaction
Database Data in the prior art.
The Version Manager 13, responds to page read requests from the Query
Processor so as to present a recent consistent version of database data
via Interface 18, and at the same time to appear as simple non-volatile
storage in response to requests from the Transaction Processor via
Interfaces 5 and 7.
In order to prevent long running complex queries from locking out
transaction updates on data read by the queries while preserving
consistent access, queries see a consistent query snap-shot of the data;
transaction updates are made to a logically separate transaction version.
However, in order to share most of the same pages between the transaction
and query versions of the data all corresponding logical pages of the
transaction and query version data are supported from a single page of
physical storage for pages which have been updated by a transaction since
the query version was created by a process called database snapshot. We
refer to this method of using shared physical pages to support independent
transaction and query views as implicit versioning. The method is
described in more detail using FIG. 3 and FIG. 4.
The Intelligent Page Store also includes a mechanism for determining when a
new time step should be taken. This can be based on an internal algorithm
or an external prompt from users, the Transaction Processor or the Query
Processor.
To implement implicit versioning, the Version Manager 13 is responsible for
routing page accesses from both transaction and query processing to the
correct physical pages, for maintaining and managing versions of pages,
for initiating and processing the creation of new query snapshots and for
recovering and reusing physical storage from old page versions.
The Transaction Database Log 6 saved on Non-Volatile Storage 12 in the
Intelligent Page Store 10, is exactly the information which the
Transaction Processor 3 needs to save on non-volatile storage to protect
database data against transaction aborts and system failures. The Version
Manager 13 in the Intelligent Page Store responds to read and write
requests in the log Interface 5 saving the information in the Transaction
Database Log 6 on the Intelligent Page Store Non-Volatile Storage and
returning it to subsequent read requests. The advantage of having the
Database Log in the Intelligent Page Store is that the log information can
be used to create consistent versions of database data without disturbing
the Transaction Processor or reducing its throughput. Since a log is
maintained only for update transactions and the Query Processor needs no
access to transaction log information, the amount of non-volatile storage
required to store the Database Log in the Page Store is the same as if the
log were directly attached to the Transaction Processor as in prior art.
Similarly the Version Manager responds to Transaction Database Data page
read and write requests in Interface 7 by saving page images in the
Intelligent Page Store Non-Volatile Storage for Transaction Database Data
8 and returning them on subsequent read requests. This enables the Version
Manager to create an additional physical copy of pages in Query Version
Data 14 to support a consistent view of the database for queries only if
some transaction has modified the page since the query version was created
by the database snapshot processing. Since non-volatile storage is a
significant component of the cost of database installations, avoiding
unnecessary separate physical copies of pages for the Transaction and
Query Processors is important. Implicit versioning, described in FIG. 3,
and FIG. 4 includes an efficient scheme for determining when a copy of a
database data page must be made to meet the requirement of presenting a
consistent view to queries and the appearance of a non-volatile medium to
transaction processing. This enables queries and transactions to execute
concurrently without unnecessary replication of the database pages and
hence at minimal cost.
The Non-Volatile Storage 12 in the Intelligent Page Store 10 can be
implemented with any standard non-volatile medium (such as magnetic disks)
for storing the Transaction Database Log 6, Transaction Database Data 8
and Query Version Data 14.
Implicit Versioning: management of query versions
FIG. 3 is a state diagram defining implicit versioning by showing the
relationship between database states, query snapshots, transactions and
queries as time progresses. The diagram is not to scale in the sense in
the sense that queries and query versions may have lifetimes which may be
100 times or 1000 times longer than typical transaction lifetimes.
We describe implicit versioning for the case where transaction access to
current data and one query snapshot are supported. A straightforward
extension of the scheme can reduce disruption when new snapshots are
created by allowing additional query versions at the cost of additional
non-volatile storage.
FIG. 3 is a state-time diagram with time advancing from left to right.
Events directly above each other are simultaneous.
Referring to FIG. 3, the top section shows Lifetimes 19,20,21,22,23,24,25
of sample update transactions X1,X2, . . . X7 respectively executing on
the Transaction Processor. The left end of each of these boxes shows when
an update transaction begins execution and starts holding locks and
database resources. The right hand end shows when it ends and commits its
updates in the database. Notice that transactions are overlapped (execute
concurrently) and are committed in an order which is not necessarily their
order of starting.
The next section of FIG. 3 shows states of the database of the database as
it evolves in time as a result of update transactions X1,X2, . . . X7
being applied. The Initial Database State 26 represents the state of the
database at some starting time. Subsequent states of the database
27,28,29,30,31,32,33 show the changed states after transactions X1,X2, . .
. X7 commit in order. Notice that the transaction database state advances
in small granular steps each containing the updates of one transaction.
The order in which transactions commit X1 before X2, X2 before X3, etc
will be exactly reflected in the Transaction Database Log 6 generated by
the Transaction Processor 3. In these state diagrams, all updates from
preceding transactions are included and no updated from following
transactions are included. For example state 30 "database state after
transaction X4" includes the updates from X1, X2, X3 and X4 but no updates
from X5, X6 or X7. We use the term consistent to define this property of a
database state.
The next section of FIG. 3 including 34, 35, 36, 37 shows Query Versions
being created by taking a snapshot of database state and maintained for
use by queries. In action 34 the Version Manager takes a snapshot of the
Initial Database State 26 to create Query Version V0 whose lifetime is
shown by 36. The length of 36 shows exactly the lifetime during which
query version V0 is available for use by queries running on the Query
Processor. Action 35 shows the Version Manager at some time after
transaction X3 has committed, taking a snapshot of the consistent database
state 29 and using it to create Query Version V1. The lifetime during
which Query Version V1 is available to queries is shown by the exact
length of 37.
The next section of FIG. 3 shows sample query lifetimes: the period during
which Q1 and Q2 are being processed are shown active Lifetimes 39 and 39
respectively. Throughout this time Q1 and Q2 must have access to the data
of "Query Version V0" 36. Queries Q3 and Q4 have Lifetimes 40, 41
respectively; these queries must have access to the data of "Query Version
V1" 37.
The implicit versioning scheme implemented by the Version Manager 11 has to
deal with the fact, that the transaction processor 3 will write out pages
of data via Interface 7 in a way which optimizes the use of its Buffer 4.
In particular pages will be written out including uncommitted data and
without any guarantee that the set of pages written out represent a
consistent state of the transaction database in the sense of database
States 26, 27, 28, 29, 30, 31, 32, 33. The Intelligent Page Store, has to
receive these pages and return them on subsequent read requests without
disturbing the view of data seen by longer running queries in the Query
Processor; these must see the Query Versions 36, 37. In implicit
versioning, this is achieved by advancing the version of data seen by the
queries in discrete time steps. At each time stem a new query snapshot is
created and made visible to queries. Each snapshot is a consistent view of
committed database data at some recent time. Between time steps, the
Intelligent Page Store presents a constant view of the data to queries.
When the Transaction Processor writes out updated pages to the Intelligent
Page Store, the updated pages are saved but not made visible to the
queries until the next time step.
Implicit Versioning: management of page copies
Implicit versioning also determines accurately when an additional copy of a
database data page is required to support the current transactional and
query views. This enables non-volatile storage requirements to be
minimized. The management of page copies by implicit versioning is
described by a time state diagram in FIG. 4. Time advances from left to
right in this diagram. Events directly above each other are simultaneous.
The initial state of the database data i.e. the combined Transaction
Database Data 8 and the Query Version Data 14, in the Intelligent Page
Store 10 is shown as 47. This is actually a composite state represented by
a set of logical pages P1, P2, . . . P7 which store a set of page values.
We represent the values stored in these pages by letters "a", "b", "c",
"d", "e", "f", "g" respectively. This database state is assumed to be a
consistent state of the transaction database and the actual state of
transaction data, as would occur if transaction processing had just
restarted after being quiesced. State 26 in in FIG. 3 illustrated this
case. We assume that a snapshot of this consistent state has been taken as
shown in Action 34 and a Query Version created corresponding to it as in
"Query Version VO" 36.
A key point is that in this State 47 no page is stored twice; a single copy
of each of pages P1, P2, . . . P7 is adequate to support correct
transaction and query views of the database data. This set of physical
pages is acting as Transaction Database Data 8; no additional physical
storage is required for Query Version Data 14 at this time.
A sequence of page read and write request Actions 42, 43, 44, 45 from the
Transaction Processor via Interface 7 affect the page data stored in the
Intelligent Page Store. Action 46 shows the effect of subsequently
creating a new query version by taking a snapshot of the database. The
sequence of states of data storage in resulting from these operations is
shown as States 48, 49, 50, 51, 52.
Action 42 is a request from the Transaction Processor to read the value of
page P5. It receives "e" the current value in the page store. The values
stored for pages P1, P2, . . . P7 have not been changed and no copies have
been made.
The next Action 43 is a request from the Transaction Processor to write the
value "x" into page P3. State 49 shows that a copy of page P3 is made so
that both the old value "c" and the new value "x" can be saved. At this
point queries will see the old value "c" residing in Query Version Data
14, whereas Transaction Processor requests to non-volatile storage will
see the new value "x" in a copy of the page in Transaction Database Data.
The next Action 44 is a request from the Transaction Processor to read page
P3. This will receive "x" the value of the page written there previously
at the request of the Transaction Processor. State 50, the state of the
page store after Action 44, shows that the state of the page store is not
changed as a result of this read operation.
Action 45 is a subsequent write to page P3 from the Transaction Processor
requests that value "y" be written. State 51 shows that the new value "y"
is written over the old transaction value "x" but that no new copy of the
page is made since the value needed by queries in this time step namely
"c" is already saved in a copy.
When a new query version is created by taking a snapshot of the database in
Action 46, the old query value "c" for page P3 can be discarded and the
non-volatile storage used for this copy recovered for reuse as is
illustrated by State 52. Assuming that the transaction which was
responsible for writing the value "y" into page P3 has committed before
the "snapshot" is made, | | |