|
Description  |
|
|
BACKGROUND OF THE INVENTION
1. Field of the Invetion
This invention relates to the field of parallel processing in an electronic
mail environment.
2. Background Art
Electronic mail messaging provides the ability to communicate information
throughout an enterprise (e.g., send and receive messages and files
between enterprise users). Electronic mail users can send, for example,
mail messages, scheduling messages, directory information, and files.
Electronic mail systems provide the ability to perform mail operations. For
example, electronic mail operations include the ability to send and
receive messages (i.e., mail or calendar scheduling messages directory
information, and/or files). Messages received by a user can be, for
example, read and/or forwarded to another mail user. Further, a user can
send a reply message to the sending user. Other operations may be provided
to manage messages and files.
Messages in a electronic mail systems can be grouped, or queued, based on
some like characteristic (e.g., the type of further processing required).
For example, a submission queue can contain messages targeted for a
particular location. A rerouting queue can be used to store messages that
need to be routed to another location. A notification queue can contain a
list of messages that have been placed in a user's incoming mail box, and
for which users are to receive notification. A dead message queue can be
used to identify messages that are not deliverable or returnable to the
sender. A garbage collection queue can be used to contain messages that
can be removed from a system. Remote queues contain messages bound for
remote locations. Gateway queues contain messages destined for foreign
messaging environments.
As the number of mail users increases, the number of messages to be
processed by a mail system typically increases. Conversely, as the number
of mail users decreases, the number of messages decreases. If, for
example, messaging increases and processing capability to handle messaging
remains constant, the number of messages in the message queues such as the
ones discussed above can increase. Prior art systems provide the ability
to serially process messages, or queue entries. However, these systems do
not provide the ability to scale processing (up or down) to accommodate a
change in messaging activity.
SUMMARY OF THE INVENTION
The present invention provides the ability to scale an electronic mail
system. The present invention provides the ability to process mail entries
in parallel to accommodate increased messaging activity. Further, the
present invention provides the ability to down scale processing capability
to accommodate decreases in messaging activity.
The present invention provides the ability to scale a queue such that a
queue can be generic and have one or more processes manage a portion of
messages in the queue. Instead of assigning a message to a particular
process, a message can be assigned to a queue. Further, multiple processes
can be assigned to process a queue. Thus, as more activity causes the
number of entries in a queue to increase, additional processes can be
assigned to process the queue's entries. Similarly, as activity decreases
and the number of queue entries decrease, the processing capability
assigned to a queue can be decreased.
Each process can identify the next entry to be processed, and then process
the entry. Entries previously processed can be marked such that processes
that subsequently access the entry are aware that the entry has been
processed. Any order for entry selection can be used. For example, queue
entries can be placed in the queue in the order in which they are
received. Further, priorities can be assigned to queue entries. Thus, for
example, each process can select queue entries on a First In First Out
(FIFO) basis. Further, the FIFO selection can be varied based on the
priorities assigned to the queue entries.
Any method can be used to identify queue entries previously or currently
being processed by one process. In the preferred embodiment, messaging and
process information are stored in a relational database system that
provides the ability to perform locking at the record level. Such a
relational database management system (RDBMS) is provided by Oracle
Corporation. Messaging and process information are stored in relations, or
tables, in the RDBMS.
A process can be used to perform multiple tasks or activities. Each process
can be configured to perform one or more of these activities. Further,
processes can be configured to run during a certain time period. Thus, for
example, multiple processes can be configured to perform garbage
collection. A garbage collector process can be further configured to, for
example, clean up mail messages or scheduler messages, or clean up
replication or directory registration information. Further, a garbage
collector can be run at night to perform garbage collection on mail
messages. Another garbage collector can be run during the daytime to
perform garbage collection tasks.
The number and type of processes can be determined or altered by a
electronic mail system administrator. The present invention can retain
information related to the processes. A parent process, the guardian
process, can initiate or terminate other processes. A guardian process can
access process information to determine what number and type of processes
to initiate. Further, the guardian process can examine the system
information at an interval of time to determine what processes are
running. Based on the system information and the process information, the
guardian can identify any need to initiate, restart, or stop one or more
processes. Further, the guardian process can pass process identification
and other process information to an initiated process to assist the
process in determining how to proceed.
Using a RDBMS with record locking capability, queue entries can be stored
in a database with each queue entry being a row in a database relation, or
table. As each entry is selected for processing, the row in the table that
corresponds to the queue entry can be locked. Each process can examine a
snapshot of the queue and attempt to access the next queue entry. If the
entry is not locked, the entry can be selected for processing. If the
entry is locked, the entry cannot be selected by a subsequent process.
One or more tables can be used to retain message information. For example,
an instance table can contain an entry for each instance of a message and
retain queue information. This table can be examined by the processes to
identify the next message to be processed.
Additional tables can be used to retain process information. For example, a
process table can contain a class designation, instance identifier, flags,
timestamps (e.g., last wake time and last sleep time), and a process state
(e.g., run or not run). Another table can be used to define general
information for each class of processes. For example, fields in the table
can be used to assign names to the executables in each class.
A process parameters table contains parameter information for a process
instance or for a class of processes. A process can be configured for
periods of dormancy between work cycles (i.e., performing configured
tasks). A process time table is used to determine the periods in which a
process is to remain dormant. For example, the table can contain
information regarding the time of day that a process is to run.
BRIEF DESCRIPTION OF THE DRAWINGS
FIGS. 1A-1E illustrate mail system queues and processes.
FIGS. 2A-2B illustrate a process flow for a guardian process.
FIG. 3 illustrates a guardianInit process flow.
FIG. 4 illustrates a terminateProcess process flow.
FIG. 5 illustrates a startServer process flow.
FIG. 6 illustrates a findChild process flow.
FIG. 7 illustrates a findProcess process flow.
FIG. 8 illustrates a spawnProcess process flow.
FIGS. 9A-9D illustrate a postman process flow.
FIG. 10 provides an example of a checkState process flow.
FIG. 11 illustrates a performServerAction process flow for a Postman
process.
FIG. 12 illustrates a local message delivery process flow including
locking.
FIG. 13A illustrates a process table.
FIG. 13B provides an example of a class table structure.
FIG. 13C provides an example of a process parameters table.
FIG. 14A provides an example of a process tokens table.
FIG. 14B provides an illustration of a process time table.
FIG. 15 provides an example of an instance queue table.
DETAILED DESCRIPTION OF THE INVENTION
A method and apparatus for processing electronic mail in parallel is
described. In the following description, numerous specific details (e.g.,
specific table entries) are set forth in order to provide a more thorough
description of the present invention. It will be apparent, however, to one
skilled in the art, that the present invention may be practiced without
these specific details. In other instances, well-known features have not
been described in detail so as not to obscure the invention.
Electronic mail systems store mail items while they wait to be processed by
the system. In the present invention, queues can be used to store mail
items awaiting processing by the mail system. For example, a mail message
sent by one mail user to another may be stored in multiple queues on its
journey from the sender to the recipient. The message is maintained in a
queue awaiting whatever processing is needed. For example, a message being
sent across a gateway to a user on another mail system may be stored in a
remote queue to await forwarding to the other system. Upon its arrival at
the remote node, it can be placed in a rerouting queue awaiting
transmittal to the appropriate queue on the remote node.
The amount of traffic in a mail system can vary. As the mail activity
varies, the number of items stored in a system queue can vary. For
example, when mail activity increases while the ability to process the
increased mail items remains stable, the number of mail items waiting to
be processed can increase. The present invention provides the ability to
extend the processing capability of an electronic mail system to handle
such increases in activity. That is, the present invention provides the
ability to process mail entries in parallel to accommodate increased
messaging activity.
Conversely, when a decrease in mail activity occurs and processing
capability remains stable, some processing capability can become idle. The
present invention provides the ability to scale back processing capability
to accommodate the reduced mail activity.
The present invention provides the ability to scale a queue such that a
queue can be generic and have one or more processes process a portion of
the messages in the queue. Instead of assigning a message to a particular
process, a message can be assigned to a queue. Further, multiple processes
can be assigned to process a queue. FIG. 1A illustrates a mail system
queue 102 that contains mail entries 112A. Mail entries 112A are assigned
to queue 102. Server A 104 and Server B 106 have been configured to
process mail entries in queue 102.
Server A 104 and Server B 106 select one or more of entries 112A in queue
102 to process. In the present invention, any selection technique can be
used to select the next entry or entries to be processed by a process. For
example, queue entries can be selected from the queue in the order in
which they are received into the queue using a First In First Out (FIFO)
method. Further, priorities can be assigned to queue entries. Thus, the
selection can be made based on priorities assigned to queue entries.
Thus, using a selection technique, each process processing queue entries
can identify the next entry or entries to be processed. Once an entry has
been processed, it can be marked to prevent another process from
processing the entry. Any method can be used to identify queue entries
previously or currently being processed by one process without departing
from the scope of the present invention.
In the preferred embodiment, messaging and process information are stored
in a relational database system that provides the ability to perform
locking at the record level. Such a relational database management system
(RDBMS) is provided by Oracle Corporation. Using an RDBMS, messaging
(e.g., queue entries) can be stored in relations, or tables, in the RDBMS.
When a process selects a mail item (i.e., queue entry) for processing, the
record that represents the item is locked. If another process attempts to
select the same mail item from the queue, a locking exception is
generated. Thus, subsequent processes can identify the queue entries
handled by another process. Other methods for identifying items previously
processed can be used without departing from the scope of the present
invention.
As mail system activity increases, there is an increase in the number of
mail items that must be processed by the mail system. If processing
capability remains stable, the number of mail entries in a queue can
increase such as is illustrated in FIG. 1B. Queue 102 now contains entries
112B for processing by Servers A and B.
To handle the increase in queue entries, additional processes can be added
as illustrated in FIG. 1C. In addition to servers A and B, servers C and D
have been configured to process entries 112B. Assuming a stable level of
system activity, the additional processing capability can result in a
reduction in queue entries as illustrated in FIG. 1D. A system
administrator, upon viewing the situation illustrated in FIG. 1D, can
determine that some of the processing capability assigned to queue 102 is
not needed and can be removed. FIG. 1E illustrates queue 102 and a
reduction in the processing capability illustrated in FIG. 1D. That is,
Servers B-D in FIG. 1D have been eliminated and one server (i.e., server
A) remains to handle the mail entries 112C.
PROCESSES
Different types of processes can be used in the present invention to
perform mail system tasks. The following are examples of processes and
some of the tasks that can be performed in a mail system. Additional
process types and tasks can be used with the present invention without
departing from its scope. Examples of types of processes include: postman,
scheduler, replicator, monitor, statistics, garbage collector, and
guardian. A process can be used to perform multiple, or different tasks or
activities. Further, processes can be configured to run during a certain
time period. The number and characteristics of processes can be determined
or altered by a electronic mail system administrator based on such factors
as system activity levels.
A postman process, for example, delivers local mail items (e.g., scheduling
and mail), remote mail items, handles triggered mail items (e.g., return
receipts and auto-forward), and send notification of new messages locally.
A scheduler process can be used to handle scheduling requests. A
replicator process can be used to synchronize directory information. A
monitor process can be used to check message flow, database space usage,
and process status. A garbage collector process can remove unneeded mail
items (e.g., unowned messages) and reclaim the space used for these items.
A process, a guardian process, can act as the parent process for the other
processes. The parent process can start and then start the processes. It
can verify that the proper number of each process type is running.
Multiple processes, for example, can be configured to perform garbage
collection. A garbage collector process can be further configured to, for
example, clean up mail or scheduler messages, or clean up replication or
directory registration information. One of the garbage collector processes
can run at night to perform garbage collection on mail messages. Another
garbage collector can be run during the daytime to perform garbage
collection tasks.
DATA TABLES
System information can be stored such that it can be referenced
intermittently during processing, and at system startup. In the preferred
embodiment, this information is stored in a relational database system
such as the relational database management system (RDBMS) provided by
Oracle Corporation. Information stored in RDBMS tables includes messaging
and processing information. Specific details used to describe the type of
information associated with mail and processes is only for the sake of
illustration. Additional or different information can be used without
departing from the scope of the invention.
Process Information
A process is assigned a record in a process table. This record is used by
the guardian process as a request for invocation. FIG. 13A illustrates a
process table. Classid 1304 contains an identification of a process class
(e.g., postman). InstanceId 1306 contains a unique value within a
particular process class. It differentiates among different instances of a
particular class of process.
Flags field 1308 can contain any number of flags to further define a
process instance. For example, flags 1308 can be used to particularize the
tasks to be performed by a process instance. Thus, multiple instances of a
process class can handle some subset of the total tasks defined for the
class.
The flags field for an instance of the postman process, for example, can be
used to indicate that the postman instance perform local delivery, remote
delivery, gateway processing, or notification. To illustrate further, the
flags field for an instance of a garbage collector process can be used to
indicate that the process cleanup registration records, or perform
scheduler, directory, or mail garbage collection.
A process instance can become dormant during execution. For example, during
its active state, a process can perform its tasks. After performing its
task, the process can lay dormant, or passive, for a period of time before
becoming active again and performing its defined tasks. LastWakeTime 1310
is used to identify the time at which a process awoke from a dormant
period. LastSleepTime 1312 is used to identify the time at which the
process last entered into a dormant period. ProcessState 1314 indicates
the state of a process (e.g., whether or not the process should be run).
Class information is stored in the class table. The guardian process can,
for example, use the information contained in this table to determine
names for an executable in each class. FIG. 13B provides an example of a
class table structure. ClassId 1324 has the same meaning as in the process
table. LoginName 1326 and password 1332 are used to authenticate the login
to, for example, the RDBMS.
DisplayName 1328 is used to identify a process class, for example, in a
configuration or management panel or report. DomainId 1330 can be used for
gateways (i.e., a link between systems with different protocols) and for
user-defined applications as defined by ApplicationId field 1334. ExecName
1336 identifies the name of an executable module (i.e., a module capable
of execution in the system). Subsystem field 1338 can be used to group
together a variety of individual processes into a single module (i.e.,
mail or schedular).
A guardian process is responsible for invoking a process and passing to the
initiated process its process class and instanceId value. A guardian
process can access process information to determine what number and type
of processes to initiate.
Further, the guardian process can examine the system information at an
interval of time to determine what processes are running. Based on the
system information and the process information, the guardian can identify
any need to initiate, restart, or stop one or more processes. Further, the
guardian process can pass process identification and other process
information (i.e., parameters) to an initiated process to assist the
process in determining how to proceed.
Parameters specific to each individual server can be defined in a process
paramaters table. Further, generic process parameters can be stored in the
parameters table. Once a process is initiated, it is responsible for
fetching any parameters in the parameters table. Further, each process can
determine the frequency at which to refresh the values for its parameters.
FIG. 13C provides an example of a process parameters table.
ClassId 1354 has the same meaning as previously described. InstanceId 1356
identifies a process instance as previously described, or identifies that
the record contains generic, class parameters. That is, a null or zero
value for instanceId 1356 indicates that the corresponding record contains
class level parameters. These generic parameters can be overridden by
specific parameters (i.e., parameters specific to a process instance).
Parameter 1358 identifies a particular parameter. The valueNum 1360,
valueChar 1362, and valueDate 1364 fields contain the actual parameter
values (i.e., of type number, character, and date, respectively).
Each parameter for a process is paired with an identifying token. Tokens
are stored in the process tokens table. Tokens can be described, for
example, by the mail administrator. FIG. 14A provides an example of a
process tokens table. ClassId 1404 has the same description as previously
described. ParameterId 1406 identifies a particular parameter.
ParameterType 1408 identifies parameter types (e.g., number, character and
date). Name 1410 can be used to identify the token in a display. The
description field 1412 can be used to provide a description or commentary
for a token.
Each process has associated record(s) in the process time table. The
process time table is used to manage the wake and sleep times for a
process instance. Process time table records can be used by an
instantiated process to determine its actual requested Active and Passive
(i.e., sleeping) times. FIG. 14B provides an illustration of a process
time table.
The classId field 1434 and instanceId field 1436 are the same as the
similarly-named fields in the previously described tables. StartTime 1438
contains the value that identifies when a process begins its current
state. The duration field 1440 indicates the length of time that a process
is to remain in a state (e.g., active or dormant). The process will
compare the startTime and duration values and the current time to
determine whether or not it is to change states.
The flags field 1442 is used to specify the desired state during this
designated time. For example, a flags value may indicate active to specify
that the associated process is to be active at this time, or it may be
used to indicate passive to specify that the process is meant to be
dormant during this time. Processes can be tuned with this parameter
having a different value during different times of the day. The sleepTime
field 1444 indicates the delay (e.g., in minutes) between cycles. The
state field 1446 indicates the state of the process (e.g., active or
passive). The runIndex 1448 indicates a run state that is examined for
changes.
Mail Objects
As previously indicated, mail objects (e.g., messages) can be retained in
tables in an RDBMS. Tables can indicate one or more queues to which a mail
object belongs. A message can be contained in more than one queue. For
example, a message sent to both local and remote users can be contained in
multiple queues (e.g., a local delivery queue and remote delivery queue).
Further, information associated with mail objects can be stored in tables
such as an instance table. An instance table entry contains information
associated with a message instance. FIG. 15 provides an example of an
instance table.
Each object is identified by an identifier that is unique at each node. The
msgId 1504 provides this unique identification. Using a unique message
identifier, for example, provides the ability to relate additional mail
object information in other tables with a given mail object. FolderId 1506
provides ownership and location information. For example, a user's inbox
value is stored in the folderId field value for new, unread or read
messages. Or a gateway outbox value is stored in the folderId field for a
message awaiting submission to a gateway.
A priority field 1508 identifies a mail object's priority. As previously
indicated, the priority can be used to determine the order in which mail
objects are processed. The flags field 1510 provides additional
information associated with a mail object. For example, whether or not the
owner of a message is a blind carbon copy recipient. The retentionDate and
receivedDate fields (i.e., 1512 and 1514, respectively) provide time stamp
information that can be used, for example, in garbage collection or as the
entry time of a message in a queue. Status 1516 indicates the state of a
mail object (e.g., new or unread).
The queue field 1518 defines the queue in which the associated mail object
instance resides. This field can be examined by a process to determine the
mail objects to be processed in a particular queue. For example, a postman
process that is configured to perform a notification task may examine the
instance table to identify objects in the notification queue that are to
be processed.
GUARDIAN
A guardian process determines the number and type of processes to initiate
based on configuration information supplied by the mail system
administrator. In the preferred embodiment, this information is stored in
relations in an RDBMS as previously described. However, any method of
retaining configuration information can be used with the present
invention.
Further, the guardian process retains a snapshot of current processes, and
can obtain a new snapshot. Based on a comparison of the two snapshots and
the configuration information, the guardian can determine whether or not
to initiate, restart, or stop one or more processes. Further, the guardian
process can pass process identification and other process information to a
process. A guardian process can act as the parent process for other
processes. It spawns or terminates a process after it verifies the proper
number of each process type.
FIGS. 2A-2B illustrate a process flow for a guardian process. At decision
block 202 (i.e., any signal from a child process?"), if there is no signal
from a child process, processing continues at block 204 to block any
restart signals and to get any previously generated restart or terminate
signals that have not been processed. Processing continues at decision
block 206. If a signal is received from a child process, processing
continues at decision block 206.
At decision block 206 (i.e., "terminate signal?"), if the signal is a
terminate signal, processing continues at block 208 to mark all processes
spawned by the guardian as obsolete (i.e., terminable). Processing
continues at processing block 210. If, at decision block 206, the signal
is not a terminate signal, processing continues at block 210 to invoke
GuardianInit to, for example, generate a new process snapshot. At block
212, terminateProcess is invoked to kill the appropriate processes. At
block 214, startProcess is invoked to start the appropriate processes.
At decision block 216 (i.e., "any child processes still running?"), if
there are no spawned processes running, processing ends at block 218. If
spawned processes are running, processing continues at block 220 to
unblock the restart signal. At block 222, guardian waits for a signal.
Signals can be generated by a child or as a result of system administrator
input. When guardian receives a signal, processing continues at block 224.
At block 224, findChild is invoked to identify the processId associated
with the signal generator. At decision block 226 (i.e., "child found?"),
if the signal generator is unknown, processing continues at block 216.
If, at decision block 226, the signal generator is identified, processing
continues at block 228. At block 228, the respawn variable is set to
include the run and restart alternatives. At decision block 230 (i.e.,
"child terminated and configured to run if it terminates?"), if a
terminated process is configured to be restarted upon termination,
processing continues at block 232 to reset respawn to indicate "stateRun"
and processing continues at decision block 234. If not, processing
continues at decision block 234.
At decision block 234 (i.e., "process state for child process &
respawn=respawn), if the state of a child process (i.e., signal generator)
is to be respawned based on the value of the respawn variable, processing
continues at block 236 to invoke spawnChild. Processing continues at
decision block 216. If it is determined that the child process is not
intended to be respawned, processing continues at decision block 216.
GuardianInit
GuardianInit is invoked in the guardian process flow to, for example, fetch
a new process snapshot from the RDBMS. FIG. 3 illustrates a guardianInit
process flow. At block 302, an RDBMS connection is established. At block
304, the number of processes in the process table is determined. This
count can be used, for example, for memory allocation purposes. As
illustrated in block 306, the count is used to allocate any additional
memory for the process information data structures stored in memory and
accessed by the guardian process.
At block 308, a new process snapshot is fetched from the process table. At
block 310, the restart bit in the processState field of the process table
is turned off. At block 312, a node state variable is set to "shut down."
At decision block 314 (i.e., "at least one process in table with state
=`run`?"), if the snapshot contains at least one process that is to be
run, processing continues at block 316 to set the node state variable to
"operational," and processing continues at block 318. If not, processing
continues at block 318. At block 318, the state of the node is set to the
node state variable. Processing returns at block 320.
TerminateProcess
Process termination can, for example, occur when it is determined that a
surplus of processing capability exists for a given queue. For example, a
mail system administrator monitoring system activity may determine that a
queue that is being managed, or handled, by two Postman processes, can be
managed by one Postman process. The system administrator can generate a
signal for the guardian | | |