|
Description  |
|
|
FIELD OF THE INVENTION
This invention relates to a method of determining the topology of a network
of objects, such as the physical topology of a network of data
communications devices.
BACKGROUND TO THE INVENTION
Operators of many data communications networks are typically ignorant of
the exact topology of the networks. The operators need to know the exact
topology in order to properly manage the networks, for example, for the
accurate diagnosis and correction of faults.
Network managers that do know the very recent topology of their network do
so by one of two methods: an administrative method and an approximate AI
(artificial intelligence) method.
Administrative methods require an entirely up to date record of the
installation, removal, change in location and connectivity of every
network device. Every such change in topology must be logged. These
updates are periodically applied to a data base which the network
operators use to display or examine the network topology. However, in most
such systems the actual topology information made available to the
operators is usually that of the previous day or previous days, because of
the time lag in entering the updates. This method has the advantage that a
network device discovery program need not be run to find out what devices
exist in the network. This method has a disadvantage that it is almost
impossible to keep the data base from which the topology is derived both
free of error and entirely current.
The approximate AI methods use routing/bridging information available in
various types of devices, for example, data routers typically contain
routing tables. This routing information carries a mixture of direct
information about directly connected devices and indirect information. The
AI methods attempt to combine the information from all the devices in the
network. This method requires that a network device discovery program be
run to find out what devices exist in the network, or that such a list of
devices be provided to the program. These approximate AI methods require
massive amounts of detailed and very accurate knowledge about the internal
tables and operations of all data communications devices in the network.
These requirements make the AI methods complex, difficult to support and
expensive. In addition, devices that do not provide connectivity
information, such as ethernet or token ring concentrators must still be
configured into the network topology by the administrative method.
One major problem with the Al methods is that inaccurate or incomplete
information can cause their logic to deduce incorrect conclusions. The
probabilistic methods described here are far less vulnerable to such
problems.
SUMMARY OF THE INVENTION
The present invention exploits the fact that traffic flowing from a first
device to a second device can be measured both as the output from the
first device and as the input to the second device. The volume of traffic
is counted periodically as it leaves the first device and as it arrives at
the second device. With the two devices being in communication, the two
sequences of measurements of the traffic volumes will tend to be very
similar. The sequences of measurements of traffic leaving or arriving at
other devices have been found in general, to tend to be different because
of the random (and fractal) nature of traffic. Therefore, the devices
which have the most similar sequences have been found to be likely to be
interconnected. Devices can be discovered to be connected in pairs, in
broadcast networks or in other topologies. This method is therefore
extremely general. Various measures of similarity can be used to determine
the communication path coupling. However the chi squared statistical
probability has been shown to be robust and stable. Similarity can be
established when the traffic is measured in different units, at different
periodic frequencies, at periodic frequencies that vary and even in
different measures (e.g. bytes as opposed to packets).
In accordance with an embodiment of the invention, a method of determining
the existence of a communication link between a pair of devices is
comprised of measuring traffic output from one device of the pair of the
devices, measuring the traffic received by another device of the pair of
devices, and declaring the existence of the communication link in the
event the traffic is approximately the same.
Preferably the traffic parameter measured is its volume, although the
invention is not restricted thereto.
In accordance with another embodiment of the invention, a method of
determining network topologies is comprised of monitoring traffic received
by devices connected in the network and traffic emitted out of the
devices, correlating traffic out of the devices with traffic into the
devices, indicating a network communication path between a pair of the
devices in the event that the correlation of traffic out of one of the
pair of the devices and into another of the pair of the devices is in
excess of a predetermined threshold.
An embodiment of the present invention has been successfully tested on a
series of operational networks. It was also successfully tested on a large
data communications network deliberately designed and constructed to cause
all other known methods to fail to correctly discover its topology.
BRIEF INTRODUCTION TO THE DRAWINGS
A better understanding of the invention will be obtained by reference to
the detailed description below, in conjunction with the following
drawings, in which:
FIG. 1 is a block diagram of a structure on which the invention can be
carried out,
FIG. 2 is a block diagram of a part of a network topology, used to
illustrate operation of the invention,
FIG. 3 is a flow chart of the invention in broad form, and
FIG. 4 is a flow chart of an embodiment of the invention.
DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS
The invention will be described by reference to its theory of operation,
and then by practical example. However, first, a description of a
representative network with apparatus which can be used to implement the
invention will be described.
With reference to FIG. 1, a data communication network 1 can be comprised
of devices such as various subnetworks, comprised of e.g. routers, serial
lines, multiplexers, Ethernet.TM. local area networks (LANs), bridges,
hubs, gateways, fiber rings, multibridges, fastpaths, mainframes, file
servers and workstations, although the network is not limited to these
elements. Such a network can be local, confined to a region, span a
continent, or span the world. For the purposes of this description,
illustrative devices are included in the network, and can communicate with
each other via the network. Each of the devices contain a traffic counter
3, for counting the number of packets S it received and the number of
packets it transmitted, since reset of the traffic counter. Each device
can be interrogated to provide both its address and with its address a
count, in the traffic counter, of the number of packets. A network of
devices such as the above is not novel.
A processor comprised of CPU 4, memory 5 and display 6 are also connected
to the network, and can communicate with each of the devices 2 (A, B, C
and D) connected to the network.
FIG. 2 illustrates communication paths between each of the four devices 2,
which paths are unknown to the system operator. The output o of device A
transmits to the input i of device D, the output o of device D transmits
to the input i of device C, the output o of device C transmits to the
input i of device B, and the output o of device B transmits to the input i
of device A. Each of the devices is also connected to the network 1, while
any of the communication paths between the devices 2 may also be connected
to the network 1 (not shown). However, the CPU can be in communication
with each of the devices by other communication paths. In the examples
described later the inventive method of discovering the communication
paths, i.e. the topology of the part of the network between these devices
will be used.
As a preliminary step, the existence and identity of each of the presumed
devices that exist in the network is determined. Determination of the
existence and identity of these devices is not novel, and is described for
example in U.S. Pat. No. 5,185,860 issued Feb. 9th, 1993 and entitled
AUTOMATIC DISCOVERY OF NETWORK ELEMENTS and which is assigned to
Hewlett-Packard Company.
The invention will first be described in theoretical, and then practical
terms with respect to the example network described above.
Each device in the network must have some activity whose rate can be
measured. The particular activity measured in a device must remain the
same for the duration of the sequence of measurements. The activities
measured in different devices need not be the same but the various
activities measured should be related. The relationships between the rates
of the different activities in devices should be linear or defined by one
of a set of known functions (although a variation of this requirement will
be described later). An example of activities that are so related are
percentage CPU utilization in a data packet switch and its packet
throughput. It should be noted that the functions that relate different
activity measures need not be exact.
The units (e.g. cms/sec or inches/min) in which an activity are measured
can vary from device to device but must remain constant for the duration
of the sequence of measurements.
This method of discovery does not depend on particular relationships
between the intervals between collection of activity measurements and the
rates of activity, except that should the activity rates be so low that
few intervals record any activity, more measurements may need to be
recorded to reach a certain accuracy of topological discovery.
This method of discovery does not depend on particular relationships
between the intervals between collection of activity measurements and the
transit time between devices except that should the intervals between
measurements be much smaller than the transit time between devices, more
measurements may need to be recorded to reach a certain accuracy of
topological discovery.
The activity of the devices in the network should be measured in sequences.
There are four aspects to such measurements: how to measure the activity,
who or what measures activity, when to measure the activity and lastly
transmitting the measurements to this method for determining network
topology.
Measurements made be made in four ways:
a: directly from observations made inside the device:
b: directly from observations made of the device from outside:
c: computed from observations made inside the device:
d: computed from observations made of the device from outside.
Examples of these are as follows:
a: CPU utilization in a computer:
b: number of frames transmitted on a communications line, counted in a data
router connected to this line:
c: number of packets transmitted per active virtual circuit in an data
router:
d: temperature of an device computed from spectral observations.
All such activity which is measured should be construed in this
specification as "traffic".
The activity can be then be expressed as any function or combination of
functions of the four classes of observations.
For example, let the activity of an device be directly measured as the
number of operations of a certain type that it has carried out since it
was started. The computed measurement could be the difference between the
number of such operations now and the number of such operations at the
time of the previous measurement.
Measurements may be made by the device itself, by another network device,
by a device external to the network or by a combination of devices
internal and external to the network. Measurement devices are not
restricted to electronic or mechanical means. Any mixture of measuring
methods may be used. Different devices may be measured by different
measuring methods from each other and such measuring methods may change
with time for devices.
Activity can be measured at regular periodic intervals or at irregular
intervals. Different devices in the network can have their activities
measured in either way. Individual devices can use a mixture of methods.
Sufficient temporal data must be collected or recorded at the time of each
measurement of activity on each device to allow the time at which each
measurement was made to be determined, either absolutely or with respect
to some relative standard.
The accuracy with which the time needs to be recorded to achieve a certain
level of performance of this method will vary from network to network.
The measurements of activity may be transmitted directly or indirectly from
devices 2 to CPU 4 for processing to determine the network topology. The
measurements may be made, stored and then retrieved, or may be transmitted
directly, or transmitted by some mixture of these methods. The
transmission of the measurements may use the inband or outband
communications facilities of the network (should they exist for the
network) or any other means of communication. These options permit the
operation of the invention for topological discovery in realtime or later.
The network itself can be used to transmit the measurements and should this
transmission affect activity as measured, then the operation of the
invention can itself, on a network with very low .activity, generate
relatively significant activity. This can be exploited to improve the
speed of discovery, to operate the method effectively during very inactive
or quiet periods and for other advantages.
In its simplest form each device in the network is selected in turn. Let
device `a` have been selected. The sequence of measurements for this
device `a` is compared with the sequence of measurements for every other
device. The device with the sequence of measurements most similar to that
of `a` is considered to be connected to `a`.
There are several methods for restricting or indicating probably correct
connections, as follows. These can generally be used in any combination.
(a) A proposed connection with a corresponding similarity measure with less
than a chosen value can be rejected.
(b) Proposed connections are preferred to be displayed or indicated with
some direct or indirect notification of the associated probability (e.g.
green if more probable than a cutoff, yellow if less probable).
(c) The maximum similarity for any known to be correct connection after a
given sequence length or time period can be recorded. Putative connections
with similarity less than this empirical level should be considered
invalid and should not be included in the proposed network topology.
(d) Some devices will be connected in a broadcast or other manner, such
that they are apparently or actually connected to more than one other
device. Should this be considered a possibility for the network in
question, the following extra sequence should be used once the suggested
pair connections have been determined:
Let device `a` be assessed as being connected to device `b`. Should the
similarity measure between device `a` and a further device `c` be probably
the same as the similarity measure between device `a` and device `b`, then
device `a` should be considered as being connected to both device `b` and
device `c`. This search for extra connections could be unrestricted (e.g.
allowing all devices in the network to be connected together) or
restricted by a number (e.g. allowing no more than 48 devices ever to be
connected together).
Once the measurements for a pair of devices have been made (either they are
complete or at least 1 measurement has been made on each device), the two
sequences of activity of the two devices can be compared. The two
sequences of measurements may need to be time aligned, functionally mapped
and normalized before having their similarity computed.
The following definitions are used below, in this specification:
A: a measure of the quantity of activity that has passed since the previous
measure was reported by this device. A(j,1) is the first measurement made
for device j.
Activity: some operation or combination of operations in or including an
device. The rate of such operations must be measurable.
Activity sequence: a series of measurements of activity rates made at
recorded variable intervals or at fixed periodic intervals for a device.
Class: a device may belong to one or more classes (e.g. bridges, routers)
Discovery: the determination of what devices exist in the network, but not
how they are connected.
g.sub.s (x): a functional transform of the value of the measure of activity
x. The subscript s indicates which from a possible set of transform
functions is being used.
G: the total number of different transform functions in the set g.sub.s.
L: the number of measurements in two sequences that are to be compared.
N: there are N devices in the network.
Physical or Logical Device: an device can be physical or logical. The
network consists partially or entirely of devices that can be located in
the network. Each device that can be located must have some measurable
activity and this activity should be related to some measurable activity
of the device or devices connected to this device.
S(a,b): the similarity of device b compared to device a.
Sequence length: the number of measurements of activity made in a given
activity sequence.
Similarity: an arithmetic measure of likelihood that two activity sequences
have been measured from devices that are connected together (see S).
Likelihood increases as the similarity measure increases.
Sum: Sum(j) is the sum of the activity measurements in a sequence for the
device (j).
T: a transformed measure of the volume of activity that has passed since
the previous measure was reported by this device. T(j,i) is the i'th
measurement made for device j, transformed by the function chosen from the
set g.
T*: T*(j,i) is the normalized i'th measurement made for device j such that
over L measurements, the sum of T*(j,i)=the sum of T(k,i) for same
reference device k.
Topology: how the devices in the network are connected.
x: x(j,i) is the value of the i'th time aligned activity measurement for
device j.
y: y(j,i) is the value of the ilth activity measurement for device j.
Device: an input or output communications port of a physical or logical
device. Each device that can be located must be able to measure and report
some measure of the traffic or activity at this port, or to have such a
measurement made on it and reported (eg: by an external agent).
Device index: the letter j indicates which device (1 . . . N) is being
referred to.
Device suffix: the suffix i indicates the input side (traffic arriving at
this device). The suffix o indicates the output side (traffic leaving this
device).
Discovery machine: the machine, possibly connected to the network, that is
running the method.
j: the letter j indicates which device (1 . . . N) is being referred to.
+x+:x is the name of a device. For example, +b+ described the device b.
fom: a figure of merit that describes similarity.
Q: the probability of similarity.
V*(a,i): the variance of the normalised T*(a,i)
SNMP: Simple Network Management Protocol.
NMC: Network Management Centre.
Ariadne: an embodiment of the invention is termed Ariadne.
D(a,b): a difference measure between the mean traffic from device a and the
mean traffic from device b.
port: a device may have more than one communications interface, each such
interface on a device is termed a `port`.
MIB: Management information base. A set of monitored values or specified
values of variables for a device. This is held in the device or by a
software agent acting for this device, or in some other manner.
Polling: sending an SNMP request to a specified device to return a measure
(defined in the request) from the MIB in that device. Alternatively the
information can be collected or sent periodically or intermittently in
some other manner.
Traffic sequence: a series of measurements of traffic rates or volumes made
at recorded variable intervals or at fixed period intervals for a device
(input or output).
The following describes how sequences of measurements made at possible
varying periodic intervals and at possibly different times for two
different devices can be time aligned. This alignment, necessary only if
the activity measures vary with time, can greatly improve the accuracy of
determining which devices are connected to each other, given a certain
number of measurements. It can correspondingly greatly reduce the number
of measurements needed to reach a certain level of accuracy in determining
which devices are connected to each other. The method is carried out by
CPU 4, using memory 5.
The measurements from the sequence for device b (ie:y(b,i)) are
interpolated and, if necessary, extrapolated, to align them with the times
of the measurements in the sequence for device a (i.e.: y(a,i)). This
interpolation can be done using linear, polynomial or other methods: e.g.:
natural cubic splines, for example as described in W. H. Press, S. A.
Teukolsky, B. P. Flannery, W. T. Vetterring: "Numerical Recipes in Pascal.
The Art of Scientific Computing": Cambridge University Press, 1992, and C.
E. Froberg: "Numerical Mathematics: Theory and Computer Applications":
Benjamin Cummings, 1985. The interpolation will be more accurate if the
form of the function used for the interpolation more closely follows the
underlying time variation of the activity in device +b+.
However interpolation can very largely be avoided by the following method.
Let M(a) be the mean value of the traffic in the first X sampling periods
for device a. Sort the list M(a) (e.g. using Heapsort which is NlogN in
computational complexity). Now arrange that the devices be polled in the
sequence given by the sorted list M(a). Since devices with very similar
mean values of traffic will be polled with very small relative offsets in
time, the degree of interpolation is very radically reduced.
Should the measurements in +b+ be started after those in +a+, the
measurements in the +b+ sequence generally cannot be safely extrapolated
backwards a time greater than the average time between measurements in the
+b+ sequence. Similarly, should the measurements in +b+ stop before those
in +a+, the measurements in the +b+ sequence generally cannot be safely
extrapolated forward a time greater than the average time between
measurements in the +b+ sequence. In some cases extrapolation beyond one
or other end may reduce the accuracy of the method. In other cases
extrapolation beyond one or other end may improve the accuracy of the
method.
L (the number of measurements to be used in comparing the two sequences) is
the number of measurements in the sequence of device +a+ that have
corresponding interpolated or extrapolated time aligned measurements in
the sequence for device +b+. The aligned data is copied into the arrays
x(b,1 . . . L) and x(a,1 . . . L) for devices `b` and `a` respectively.
Comparison between two activity sequences is only done once the
measurements in each sequence have been first transformed and then
normalized. The transform process permits different types of measure of
activity to be compared even though they are not linearly related. The
normalization process permits linear related measures of activity to be
compared, regardless of the units they are measured in.
The transform function for the sequence from device +a+ is chosen from the
set g. The transform function for the sequence from device +b+ is chosen
from the set g. For each possible combination of such functions, the
resulting sequences are then normalized as described below and then are
compared as will be described below. Since there are G functions in the
set g, this means that G.sup.2 such comparisons will be carried out.
For a chosen function g.sub.s from the set g:
T(j,i)=g.sub.s (x(j,i))
The set g will generally contain the linear direct transform function:
g.sub.1 (x)=x
Other functions may be added to this set g should they be suspected or
known to exist as relationships between different activity measures. For
example, should activity measure y be known to vary as the log(x) for the
same device, the following two functions would be added to the set g.
g.sub.2 (x)=log(x)
g.sub.3 (x)=exp(x)
The sum of all the traffic measurements T(b,1 . . . L) in the sequence for
device +b+ is adjusted to equal the sum of all the traffic measurements
T(a,1 . . . L) in the sequence for device +a+. This corresponds to
normalizing the sequence T(b,i) with respect to T(a,i). This automatically
compensates for differences in units of measure. It also automatically
compensates for linear functional differences between the activities that
may be measured on device +a+ and device +b+.
In detail, for i=1 . . . L:
T*(b,i)=T(b,i) Sum(a)/Sum(b)
T*(a,i)=T(a,i)
The similarity between T*(a,i) and T*(b,i) for the range of i=1 . . . L is
determined as follows. In other words, the probability that the two
observed sets of data are drawn from the same distribution function is
determined. The similarity can be established by a wide variety of
similarity measures. Any statistical measure or test of similarity between
two single measurements, between a time series of measurements or of the
distribution of values in two sets of measurements could be used. The
robustness and effectiveness of particular similarity measures will vary
with the network topology, the patterns of activity in the network and on
the forms of the measures. An incomplete list of such measures is least
squares, chi-squared test, Student's t-test of means, F-test on variance,
Kolmogorov-Smirnov test, entropy measures, regression analysis and the
many nonparametric statistical methods such as the Wilcoxon rank sum test.
Various forms of such measures are described in H. O. Lancaster: "The
Chi-Squared Distribution", Wiley, 1969, R. L. Scheaffer, J. T. McClave:
"Statistics for Engineers", Duxbury, 1982, and R. von Mises: "Mathematical
Theory of Probability and Statistics", Academic Press, 1964.
One of the most widely used and accepted forms of such similarity
comparison is the chi-squared method, and is suitable for discovering the
topology of many types of networks. So, by way of example using the
chi-squared measure:
To compute S(a,b)=chi-squared probability that the sequence for +b+
(T*(b,i), i=1 . . . L) is drawn from the same distribution as the sequence
as +a+ (T*(a,i), i=1 . . . L).
let:
Q=.SIGMA.[(T*(a,i)-T*(b,i)).sup.2 /(T*(a,i)+T*(b,i))]for i=1 . . . L 1
and let all L measurements in both T*(a,i) and T*(b,i) (for i=1 . . . L) be
nonzero; then we have L-1 degrees of freedom (because the two sequences
were sum normalized): giving, for this example:
S(a,b)=incomplete gamma function (Q. L-1)
(or the chi-squared probability function)
It should be noted that the similarity measure has been defined to increase
as the likelihood of the two devices being connected increases. This means
that a similarity measure such as least squares would be mapped by, for
example:
S(a,b)=.SIGMA.(T*(a,i)-T*(b,i)).sup.2
The incomplete gamma function used for chi-squared probability calculation
is described in, for example, H. O. Lancaster: "The Chi-Squared
Distribution", Wiley, 1969.
It should be noted that we are comparing two effectively binned data sets
so the denominator in equation 1 approximates the variance of the
difference of two normal quantities.
The method described above requires every device to be compared to every
other device twice, using the full sequence measured so far. This means
the computational complexity (for N devices, with L measurements for each
but assuming G=1) is:
complexity is proportional to: N.sup.2 L.
In practice some measurements of T*(a,i) or T*(b,i) may not be available or
considered corrupt. Let L* be the number of valid measures of T*(a,i) and
T*(b,i) that a and b share in the sequence i=1 . . . L. Then the
assessment of the probability will use (L*-1) degrees of freedom instead
of (L-1) degrees of freedom.
The following variations in design can improve the efficiency of the
method. The improvements will depend on the network, the devices in it,
the activities measured and their distributions with respect to time. The
variations can be used in a great variety of combinations.
(a) Curtail Search Once a Reasonable Fit has been Found.
Once a connection to device +a+ has been found that has a probability
greater than the cutoff, do not consider any other devices. This applies
to non-broadcast type connections.
(b) Do not Consider Devices Already Connected.
Devices that already have an acceptable connection found should not be
considered in further searches against other devices. This applies to
non-broadcast type connections.
(c) Curtail Comparison of Sequences Before L is Reached.
During the determination of the similarity of +a+ to +b+ should it already
be certain that the final estimate of this similarity be less than a
cutoff, discontinue this determination. This cutoff would either be the
best similarity already found for this device `a`, or the minimum. Not all
similarity measures are amenable to this curtailment.
(d) Examine Similar Devices First.
The order in which devices are compared to devices +a+ can be set so that
those devices with some attribute or attributes most similar to +a+ are
checked first. For example, in a TCP/IP data communications network one
might first consider devices which had IP addresses most similar to device
`a`.
(e) Restrict Search by Class.
In many networks devices can only connect to a subset of other devices,
based on the two classes of the devices. Therefore, should such class
exclusion or inclusion logic be available and should the classes of some
or all devices be known, the search for possible connections can be
restricted to those devices that may connect, excluding those that may
not.
The classes to which devices can connect can, for some devices (e.g.: data
communications routers), be extracted from the device itself.
(f) Use Fewer Measurements.
Should the method be operated with only a subset of the measurements,
complexity is reduced. Should an acceptable connection be found to an
device, it need not be considered with a larger number of measurements.
This subset of the sequence of measurements can be made such that the
subset is not sequential in the list of measurements, nor need its start
or end coincide with that of the original full set of measurements.
(g) Use Fewer Measurements to Start ith.
The variation of (f) could be used to create a short list of possible
connections to each device using a few measurements. Only devices on this
list will even be considered as candidates for connection to this device
using a large subset or the full set.
(h) Discovering the Network in Parts.
The network topology may be known to exist in portions. These portions may
each only have one or a few connections between them. The devices in each
portion can be assigned a particular class and devices only within the
same portion class considered for connection to each other. Each portion
of the network could then be connected to others by connections discovered
in a separate pass or discovered in another way (e.g. administratively) or
by other information. This variation in the method reduces the
computational complexity by reducing the effective N (number of devices)
to be compared to each other.
(i) Discovering the Network in Parts in parallel.
The method can be run simultaneously or serially on more than one system.
Each system can be responsible for discovering part of the network. The
parts could then be assembled together.
(i) Using a Multiprocessor System.
The method can be operated in parallel. Each of a number of processors
could be assigned a portion of the similarity calculations (e.g.:
processor A is given devices 1-10 to be compared to all other devices,
processor B is given devices 11-20 to be compared to all other devices and
so on).
(k) Using the Devices to Perform the Calculation for Themselves.
The devices themselves, should they be capable of such processing, could be
given the activity sequences of all devices or a subset of the devices.
Each device then assesses for itself the devices to which it is connected.
It would, as appropriate, report this to one or more sites for collection
of the network topology.
The subset of devices for which an device might restrict its search could
be generally those within a given class. Such a class might be defined by
being within a certain time of flight, or being with a certain subset of
labels.
The traffic sequences need not be time aligned and normalized other than by
the device itself (e.g.: it could take a copy of the activity measurements
as they are transmitted, perhaps restricting its collection of such
measurements to devices within a certain class).
(l) When L is the same for all sequences, the incomplete gamma function
need not be evaluated for comparisons of all devices B with respect to
each device A. Since the incomplete gamma function is monotonically
related to the value of Q (given fixed L), the device B with the lowest
value of Q will necessarily have the highest associated chi-squared
probability. Therefore the incomplete gamma function need only be computed
for the best fitting device to each device A.
(m) Should a probability cutoff be applied, such that a sufficiently
improbable connection will not be considered viable, this probability
cutoff can be reexpressed in terms of Q for each possible value of L.
this, coupled to method (1), further reduces the number of evaluations of
the incomplete gamma function.
Appropriate probability cutoffs for each L* can be precomputed once to give
appropriate Q cutoffs for each L*.
(n) The incomplete gamma function (Q,L*-1) is constant when Q=L*1.
Therefore a cutoff of probability independent of L* can be made by
rejecting all comparisons for which (Q/(L*-1))>1.
(o) Let Z=(Q/(L*-1)).
This ratio Z provides a useful approximate measure such that, for large
enough and close enough *(a,b) and L*(a,c):
if Z(a,b)<Z(a,c) then it is more probable that a is connected to b than
a is to c.
This technique allows for an approximate method that never evaluates the
incomplete gamma function, by selecting for consideration only sequences
which are both long enough (have enough data points) and are complete
enough (have enough valid data points).
(p) Summary of Computational Improvements.
The impact of the variations above can reduce the complexity enormously.
For example, in data communications networks the use of variations (a),
(b), (c) and (g) in combination has been observed to reduce the complexity
to be approximately linear in N (the number of network devices) and to be
invariant with L (the total number of measurements made on each device).
This was true both in a very broadcast oriented network and in a very
pair-wise connected network.
The application of the method to a particular problem of discovering the
topology of a particular class of data communications networks will now be
described. The mapping of the general theory onto this particular
application is performed primarily by replacing the general concepts of
devices and activity by devices and traffic respectively. However, this
particular data communication network is assumed to collect measurements
using polling.
There are three main steps to this embodiment of the invention: discovering
the devices in the network, collecting sequences of measurements of the
traffic from the devices and comparing these sequences to determine which
devices are connected together. This can be carried out by CPU 4 with
memory 5.
A particular class of data communications networks have the following
characteristics:
a: its measurements are requested by polling using inband signalling,
b: its measurements are returned using inband signalling,
c: polling is performed preferably every 60 seconds,
d: a single machine (e.g. CPU 4 with memory 5) operates the method for
determining the topology. This machine also performs the polling of the
devices 2 and receives the polling replies from the devices, and
e: all devices of interest in the network can have their traffic measured.
The existence and network addresses can be determined by the administrative
method described above, or by automated methods, such as described in U.S.
Pat. No. 5,185,860, referred to above.
In a successful prototype of the invention a time indication from 0 . . .
59 was randomly allocated to each device in the network. This time defined
how many seconds after the beginning of each minute the discovery machine
should wait before sending a device its request for the total traffic
measured so far. Of course, these requests are interleaved so that in a
large network many requests should be sent out each second. All devices
will therefore get a request every minute and this request (for a device)
will be sent out very nearly at one minute intervals. The reason the times
should be randomly allocated is to smooth out the load on the network,
since inband signalling was used.
Each device 2 on receipt of a poll should extract the value of the variable
requested from the traffic counter 3 (the total traffic since reset,
measured in packets) and should send this back preferably in an SNMP
format packet to the discovery machine. On receipt, the address of the
device 2, the time of arrival of this information is stored along with the
value of the counter, indexed for this device. The new value of the
counter is subtracted from the previous one in order to compute the total
traffic measured in the last minute, not the total since that device was
reset. In this way a sequence of traffic measurements for all the devices
in parallel is built up and stored in memory 5.
Before two traffic sequences (for device +a+ and device +b+) can be
compared, they are time aligned, functionally mapped and then normalized
as described earlier. The measurements from the second sequence (b) are
interpolated to align them with the times of the measurements in the first
sequence (a). Since the only function for mapping considered in this
example is the direct linear mapping, no functional mapping is performed
on any measurements.
For normalization, let the shorter of the two sequences have length L. The
sum of all the traffic measurements 1 . . . L in the sequence for device
+b+ is adjusted to equal the sum of all the traffic measurements 1 . . . L
in the sequence for device +a+. This corresponds to normalizing the
sequence T(b,i) with respect to T(a,i).
The chi-square probability comparison of the sequences computes the
similarity. S(a,b)=chi-squared probability that the traffic sequence for
+b+ (T*(b,i), i=1 . . . L) is drawn from the same distribution as the
traffic sequence for +a+ (T(a,i), i=1 . . . L).
The device +x+ with the highest value of S(a,x) is the one most probably
connected to +a+.
A probability cutoff (threshold) of a minimum value of F can be applied. If
the highest value of S(a,x) is less than this cutoff, that means that
device +a+ has no device considered to be connected to it after a certain
number of polls. A suitable such cutoff, for a network with N devices,
might be 0.01/N, given perhaps more than 10-15 measurements of traffic on
each device.
As indicated above, a number of the devices in the network may be connected
in broadcast mode: i.e. they may be apparently or actually connected to
more than one other device. The logic described above can therefore be
applied. For example, any device +a+ can be considered to be connected to
all devices z for which S(a,z) is greater than some cutoff.
A variety of similarity measures from the possible list described earlier
were experimentally tested. These tests were carried out on a simulated
network of 2000 devices and also on data collected from a real network,
which had over 1500 devices. The first was connected pairwise, and the
second network had a mixture of broadcast and pairwise connections.
The measure of similarity which required fewest average measurements to
produce the correct topologies was:
S(a,b)=.SIGMA.[(T*(a,i)-T*(b,i)).sup.2 /(T*(a,i).sup.2)]V*(a,i)/Li=1 . . .
L
This similarity measure was better than the chi-squared probability, likely
| | |