|
Claims  |
|
|
We claim:
1. A monitor system for use in a distributed computing environment having a
plurality of workstations, including a local workstation and a remote
workstation, interconnected by a network, said monitor system for
monitoring said remote workstation from said local workstation, said
monitor system comprising:
communication means at each workstation of said plurality of workstations
for communicating via said network with other workstations of said
plurality of workstations;
a diagnostic routine at said remote workstation for monitoring a
configuration characteristic of said remote workstation and for providing
based thereon a report file, said diagnostic routine being responsive to
an execute command sent from said local workstation to said remote
workstation via said communication means, said communication means being
operable to initiate from said local workstation the execution of said
diagnostic routine and to transfer said report file from said remote
workstation to said local workstation;
monitor means at said local workstation for automatically and periodically,
in accordance with a diagnostics schedule, generating said execute command
to be sent to said remote workstation via said communication means; and
analysis means at said local workstation for analyzing the report file
transferred from said remote workstation to said local workstation.
2. The monitoring system of claim 1, further comprising:
secondary monitor means at another workstation of said plurality of
workstations of said distributed computing environment, for alternatively
generating said execute command and serving as said local workstation when
said local workstation is disabled; and
control-point arbitration routine at each of said local workstation and
said another workstation for determining when to employ said another
workstation in place of said local workstation for providing the
monitoring.
3. The monitor system of claim 1, wherein said distributed computing
environment comprises a local area network, said local workstation
comprising other than a local area network server station, and wherein
said communication means comprises a peer-to-peer communication protocol.
4. The monitor system of claim 3, wherein said monitor means includes a
database containing the diagnostic routine, and wherein said monitor
system further comprises means for transferring the diagnostic routine
from said local workstation to said remote workstation via said
communication means using a remote file transfer procedure.
5. The monitor system of claim 4, further comprising means for remotely
erasing said diagnostic routine from said remote workstation via said
communication means, said means for remotely erasing residing at said
local workstation and being responsive to said monitor means.
6. The monitor system of claim 3, wherein said monitor means includes a
database containing said diagnostic routine, and wherein said monitor
system further comprises means for verifying accuracy of said diagnostic
routine disposed at said remote workstation using said diagnostic routine
disposed at said local workstation.
7. The monitor system of claim 1, wherein said diagnostic routine comprises
a hardware diagnostic routine or a software diagnostic routine and wherein
said monitor system includes a rules database containing said diagnostic
schedule for execution of said diagnostic routine at said remote
workstation.
8. The monitor system of claim 7, wherein said plurality of workstations of
said distributed computing environment includes multiple remote
workstations interconnected to said local workstation by said network,
said monitor system monitoring from said local workstation each of said
multiple remote workstations, and wherein said rules database contains a
list of said multiple remote workstations and a time schedule for
monitoring each of said multiple remote workstations in said list.
9. The monitor system of claim 7, wherein said analysis means includes
means for referencing said rules database and for providing an alert
message if a predefined condition contained therein exists, said analysis
means including means for transferring said alert message to an alert
workstation for display on an associated display device, said alert
workstation comprising a workstation of said plurality of workstations.
10. The monitor system of claim 8, further comprising a report database
containing report files for said multiple remote workstations in said
list, said report database further comprising means for tracking an
interval of time that a particular remote workstation of the multiple
remote workstations is off-line from the network.
11. In a distributed computing environment containing a plurality of
workstations, including a local workstation and a remote workstation,
interconnected by a network, an automated method for monitoring from the
local workstation a configuration characteristic of the remote
workstation, said method comprising the steps of:
a) generating an execute command at the local workstation and periodically,
in accordance with a monitor schedule, transferring said execute command
to the remote workstation;
(b) responsive to said execute command of said step (a), automatically
executing at the remote workstation a diagnostic routine to monitor said
configuration characteristic of said remote workstation, and providing
based thereon a representative report file;
(c) transferring the representative report file from the remote workstation
to the local workstation; and
(d) analyzing at the local workstation the representative report file
transferred from the remote workstation.
12. The method of claim 11, wherein said generating step (a) includes
referencing a predefined rules database for said monitor schedule and
generating said execute command based upon said monitor schedule.
13. The method of claim 12, wherein said execute command transferring of
said step (a) employs a peer-to-peer communication protocol in
transferring said execute command from said local workstation to said
remote workstation.
14. The method of claim 11, wherein said local workstation includes a
repository database containing said diagnostic routine, and wherein said
method further comprises transferring the diagnostic routine from the
local workstation to the remote workstation using a remote file transfer
procedure prior to said generating step (a).
15. The method of claim 14, further comprising the step of remotely erasing
the diagnostic routine from the remote workstation after executing the
diagnostic routine at the remote workstation.
16. The method of claim 11, wherein the local workstation includes a
repository database containing the diagnostic routine, and wherein said
method further comprises prior to said step (b) comparing said diagnostic
routine at the local workstation with the diagnostic routine at the remote
workstation, and replacing the diagnostic routine at the remote
workstation with the diagnostic routine at the local workstation if a
difference is detected from said comparing.
17. The method of claim 16, wherein said local workstation includes a rules
database, and said method further comprises providing an alert message
whenever a predefined condition contained in said rules database is met.
18. The method of claim 11, wherein said plurality of workstations of said
distributed computing environment contains multiple remote workstations,
and wherein said automated method monitors from the local workstation at
least one configuration characteristic of each remote workstation by
repeating said steps (a)-(d) for each workstation of the multiple remote
workstations.
19. The method of claim 11, further comprising the step of identifying when
the remote workstation is off-line from the network, and tracking a time
interval that the remote workstation is off-line. |
|
|
|
|
Claims  |
|
|
Description  |
|
|
TECHNICAL FIELD
The present invention relates to information processing systems and, more
particularly, to system utilities for information processing systems.
Specifically, the present invention relates to a system and method for
remotely inventorying and monitoring hardware, software, etc., within a
collection of interconnected information processing systems.
BACKGROUND ART
Computer solutions are increasingly implemented in the form of distributed
computer processors connected by a communications network. An example of
such a network is a series of interconnected workstations such as the IBM
Personal System/2 (IBM PS/2) computer or the IBM RISC system/6000
workstation (Personal System/2, PS/2, and RISC System/6000 are trademarks
of the IBM Corporation). These workstations may be connected by a local
area network (LAN) operating as a token ring, ethernet, or other form of
network.
In attempting to manage and maintain a set of commonly shared workstations
in a local area network used for distributed computing, problems such as
system outages, reduced availability or degraded performance can often
occur. Frequently, the origin of such problems may be an inappropriate
configuration change to the hardware or software on a workstation in the
LAN. Although such problems can be analyzed and recovered from by manually
correcting the workstation configuration, investigating each problem and
determining its cause is often a time-consuming process since it is
typically not easy to identify the workstation where the inappropriate
configuration change was made. Additionally, even after identifying the
affected workstation, when the inappropriate change was made or by whom
often remain unknown. Thus, although the workstation configuration could
be corrected to resolve the immediate problem, the same user might return
later to the same workstation or to another workstation and make the same
mistake again, thereby repeatedly disabling or impairing the distributed
computing system.
Limiting access to workstations is typically not a viable option in the
work environment. In some cases, users have perfectly valid reasons for
needing to change the hardware or software configuration on a workstation,
and know how to perform the procedures correctly. In other cases, however,
a workstation configuration could be updated incorrectly for one of a
variety of reasons, including: the user did not know how to correctly
update the workstation; the user understood how to update the workstation,
but made an inadvertent mistake such as a typographical error; the user
"borrowed" a piece of critical equipment for use in another workstation;
the configuration was corrupted by defective hardware or software; or
deliberate user mischief may have occurred, whether frivolous or
malicious.
A wide array of diagnostic tools exists for inventorying and monitoring
hardware and software. However, most of these diagnostic tools must be run
manually and locally on each individual workstation. Further, report files
are generated and typically saved on each individual workstation, rather
than in a consolidated database.
On a system level, solutions typically use "passive" monitoring techniques
in which a server workstation listens for error signals sent by other
stations in the network. When such a technique is employed, it relies on
the other stations to accurately report errors as they occur. Several
disadvantages to this approach are apparent. Specifically, because each
station is separately programmed to report error conditions, it is
difficult to administer monitor changes since each individual station is
affected if new types of error monitoring need to be added to the
diagnostic system. In addition, conditions which are not necessarily
errors, but which may indicate a potential hazard, may pass undetected.
This is because such systems typically report failures only as they occur
instead of periodically running selected diagnostic routines. Finally, it
may be possible for a user to tamper with an individual workstation and
thereby prevent the workstation from reporting an error to the server
workstation, while simultaneously proceeding with other deliberate
mischief, all the while going undetected.
A few of the existing diagnostic tools can be activated remotely, and can
save information to a consolidated database. However, these solutions
typically suffer from a number of drawbacks, including: dependence upon a
centralized LAN server, which can make the tool unusable in case of
failure of a critical workstation or communications link; inability to run
the diagnostic tool automatically (i.e., unattended) at a specific time
interval; inability to save information on previously reported
configuration data; lack of an early warning system to draw attention to
potential system problems; lack of tuning parameters or rule databases to
adjust the behavior of the diagnostic tools, such as which conditions to
report or to ignore; vulnerability to attempts by a malicious user to
deceive the tool into reporting no problem, while tampering with a remote
workstation; and excessive "false positive" reports where the tool does
not tolerate momentary outages at a remote workstation.
The peer-to-peer system and method for remote inventorying and monitoring
presented herein address the deficiencies of the above-discussed existing
art in the distributed processing environment.
DISCLOSURE OF INVENTION
Briefly described, the present invention comprises in one aspect a monitor
system for use in a distributed computing environment wherein a plurality
of workstations, including a local workstation and a remote workstation,
are interconnected by a network. The monitor system, which allows for
monitoring of the remote workstation from the local workstation, includes
communication means at each workstation of the plurality of workstations
for communicating via the network with the other workstations in the
distributed computing environment. A diagnostic routine is disposed at the
remote workstation for monitoring a configuration characteristic of the
remote workstation and for providing based thereon a report file. The
diagnostic routine is responsive to an execute command sent from the local
workstation to the remote workstation via the communication means. The
communication means is operable to initiate from the local workstation
execution of the diagnostic routine and to transfer the report file from
the remote workstation to the local workstation. The local workstation
includes a monitor routine for generating the execute command to be sent
to the remote workstation via the communication means and an analysis
routine for analyzing the resultant report file transferred thereto.
In another aspect, an automated monitor method is presented for a
distributed computing environment that contains a plurality of
workstations, including a local workstation and a remote workstation,
interconnected by a network. The automated method monitors from the local
workstation at least one configuration characteristic of the remote
workstation. The method includes the steps of: generating an execute
command at the local workstation and transferring the execute command to
the remote workstation; in response to the generating step, automatically
executing at the remote workstation a diagnostic routine to monitor a
configuration characteristic of the remote workstation and for providing
based thereon a representative report file; transferring the
representative report file from the remote workstation to the local
workstation; and analyzing at the local workstation the representative
report file transferred from the remote workstation.
To Summarize, the automated system and method for remote workstation
inventorying and monitoring presented herein can eliminate a significant
amount of the manual effort otherwise required to run diagnostic tools and
search through multiple workstations to identify an underlying cause of a
system problem. With this novel system and method, the availability and
performance of a local area network can be improved. In addition,
preventive procedures such as virus scans can be more readily performed,
thereby facilitating early problem detection.
Indirectly, the system and method can assist all users of the distributed
computing system to work more efficiently since system interruptions can
be reduced, thus ensuring a normal, stable computing environment. In cases
where a system problem may have been caused by an inappropriate hardware
or software configuration change, the present invention facilitates quick
identifying of recent configuration changes on the workstations comprising
the distributed computing environment. The system and method can be
readily configured by one of ordinary skill in the art to implement one or
more commercially available, diagnostic routines such as those mentioned
herein below.
BRIEF DESCRIPTION OF DRAWINGS
These and other objects, advantages and features of the present invention
will be more readily understood from the following detailed description of
certain preferred embodiments of the invention, when considered in
conjunction with the accompanying drawings in which:
FIG. 1 is a block diagram of a distributed computing system incorporating
the present invention;
FIG. 2 is a flowchart of one embodiment of a monitor routine in accordance
with the present invention;
FIG. 3 is a flowchart of one embodiment of the run diagnostics process step
of FIG. 2;
FIG. 4 is a flowchart of one embodiment of the install program process step
of FIG. 3;
FIG. 5 is a flowchart of one embodiment of the erase program process step
of FIG. 3; and
FIG. 6 is a flowchart of one embodiment of the analyze results process step
of FIG. 2.
BEST MODE FOR CARRYING OUT THE INVENTION
This invention comprises a system and method for inventorying and
monitoring one or more workstations in a distributed computing system,
automatically and remotely. The information obtained can be used to more
quickly and accurately identify the cause of a problem in the distributed
computing system. The particular diagnostic routine employed can be
assembled by combining portions of existing, commercially available
programs. Such programs are typically used independently to perform one or
more of the following individual tasks:
analyze the hardware configuration on a workstation, and generate a report
file which lists installed features and adapter cards;
analyze the software configuration on a workstation, and generate a report
file which lists installed software components and their version levels;
analyze the communications routing map of a workstation, and verify that
lists are defined correctly and are operational or else report detected
errors;
check the file systems and databases on a workstation, to verify that they
are fully operational or else report detected errors;
examine a workstation for previously cataloged computer viruses and report
if a virus is found; and
compute a checksum of the executable programs which make up the diagnostic
software to detect and report attempts to replace the software.
As explained further below, a monitor control program is used in accordance
with the present invention in combination with programs written to perform
remote file transfer and remote command execution by using peer-to-peer
communication protocols such as the IBM APPC (LU 6.2) communications
protocol between any two arbitrary workstations in the local area network.
Use of a predefined. "LAN server" is not required, which gives the
solution presented herein additional flexibility and reliability in case
of total failure of one or more workstations or communication links. Only
basic point-to-point communications is needed. Further, other suitable
communications protocol could be employed in place of the IBM APPC (LU
6.2) protocol.
The controlling monitor program refers to its own rules database to
determine how frequently to scan a group of workstations; what types of
diagnostic analysis should be performed at each workstation; what types of
early warnings should be reported to a LAN administrator; etc. Using the
remote file transfer and remote command execution programs described
below, the monitoring workstation can save the report files for each
workstation's configuration in a report database, along with the results
of previous scans. After each scan of a workstation grouping, diagnostics
can be performed to compare the most recently received data with data
received on prior scans. Any configuration changes can be noted and listed
in a summary report file, as well as possibly triggering a rule in an
early warning system.
A detailed explanation of a system and method in accordance with the
present invention is presented below with reference to FIGS. 1-6.
A distributed computing environment to employ a system and method in
accordance with the present invention is shown conceptually in FIG. 1. The
computing environment comprises a plurality of computing units or
workstations 100, 110, 120 & 130, such as personal computers
interconnected via a local area network 150. Any number of such processors
could be attached depending only on the physical limitations of the
network. On each workstation 100, 110, 120 & 130, there is a basic
communications routine 105, 115, 125 & 135, respectively, as described in
a co-pending, commonly assigned U.S. patent application entitled "System
and Method for Remote Software Configuration and Distribution", Ser. No.
07/923,126, which is hereby incorporated herein by reference and described
further below. On at least one designated workstation capable of
initiating remote inventorying and monitoring in accordance with the
present invention, herein either workstation 100 or workstation 120, there
exists the following additional components:
one or more instances of diagnostic software (101, 121),
one or more instances of analysis software (102, 122),
a monitor routine (104, 124),
a control-point arbitration routine (103, 123),
a rules database (106, 126),
a report database (107, 127), and
a summary file (108, 128).
At a designated workstation 110 in the local area network, there exists an
alert processing routine 112 and an associated display device 114 for
alerting a system administrator of a potential problem. Although only one
is shown, it is anticipated that in the local area network there will be a
large number of workstations 130 which comprise the client workstations
that are the subject of the monitoring system and method presented herein.
As shown in FIG. 1, there can be multiple workstations 100 & 120
designated for the monitoring role. Further, although only one workstation
110 is shown, multiple workstations could be designated for receipt of
alerts.
There exists a wide variety of commercially available diagnostic software
capable of running unattended, without user interaction, that write
results into an output file. By way of example, the following utilities
are commercially available:
______________________________________
CheckIt-Pro (TouchStone
check hardware
Software Corp.) configuration
SYSLEVEL.EXE (OS/2) check software
configuration
APPNV.EXE (OS/2 COMM MGR)
check communications map
CHKDSK.EXE (OS/2) check file systems
IBM ANTI-VIRUS Software
check for viruses
CRC.EXE (from CompuServe)
check integrity of
diagnostic routines.
______________________________________
The instances of analysis routines should read from input files, and
compare them to find reportable conditions such as configuration changes,
warnings, and errors. Using existing programs such as DIFF.EXE (OS/2),
which compares files for differences, and GREP.EXE (OS/2), which scans
files for a given search key, and programmable environments such as REXX
(implemented, e.g., on IBM's OS/2) or other shell scripting languages, one
of ordinary skill in the art can construct a desired analysis routine for
a specific monitoring requirement.
Continuing with FIG. 1, control-point arbitration routines 103, 123 can
operate as described in another commonly assigned, co-pending application
entitled "Network Management Method Using Redundant Distributed Control
Processors," Ser. No. 07/771,063, which is commonly assigned and hereby
incorporated herein by reference. If two or more monitoring workstations,
such as stations 100 and 120, are running at the same time, these
control-point arbitration routines will control which workstation runs the
monitoring and which workstation acts as a standby in the event of
failure. The control-point arbitration routines can also assist with the
synchronization of databases such as those used for rules and report data.
Rules databases 106, 126 are manually configured by the system
administrator to direct the work of the monitor program by listing the
tasks to be performed, remote workstation(s) to be checked, time interval
or schedule between scans, etc. By way of example, the following is a
rules database in textual form:
__________________________________________________________________________
#Sample Rules Database
#1. Rule for a workstation
#
workstation-name = ps258
LAN-address = 199.33.162.4
maximum-tolerance-for-no-response
= 1 hour
when-no-response = summary report to filename
#
#2. Rule for a workstation group
#
workstation-group = dept34
group-list = (ps205, ps206, ps207, ps208)
#
#3. Rule for a step
#
step-n e = swcheck
diagnostic-program = SYSLEVEL.EXE
remove-after-use = no
output-file = SYSLEVEL.OUT
analysis-program = DIFF.EXE
#
#4. Rule for a simple task
#
task-n = swonly
task-list = (swcheck)
#
#5. Rule for a complex task
#
task-name = checkup
tank-list = (swcheck, hwcheck, viruscheck
#
#6. Rules for when & where to run tas | | |