|
Description  |
|
|
FIELD OF THE INVENTION
The present invention relates generally to merging of a prepared image with
a video signal.
BACKGROUND OF THE INVENTION
Sports arenas typically include a game area where the game occurs, a
seating area where the spectators sit and a wall of some kind separating
the two areas. Typically, the wall is at least partially covered with
advertisements from the companies which sponsor the game. When the game is
formed, the advertisements on the wall are filmed as part of the sports
arena. The advertisements cannot be presented to the public at large
unless they are filmed by the television cameras.
Systems are known which merge predefined advertisements onto surfaces in a
video of a sports arena. One system has an operator define a target
surface in the arena. The system then locks on the target surface and
merges a predetermined advertisement with the portion of the video stream
corresponding to the surface. When the camera ceases to look at the
surface, the system loses the target surface and the operator has to
indicate again which surface is to be utilized.
The above-described system operates in real-time. Other systems are known
which perform essentially the same operation but not in real-time.
Other systems for merging data onto a video sequence are known. These
include inserting an image between video scenes, superposition of image
data at a fixed location of the television frame (such as of television
station logos) and even electronic insertion of image data as a
"replacement" of a specific targeted billboard. The latter is performed
using techniques such as color keying.
U.S. Pat. No. 5,264,933 describes an apparatus and method of altering video
images to enable the addition of advertising images to be part of the
image originally displayed. The operator selects where in the captured
image the advertising image is to be implanted. The system of U.S. Pat.
No. 5,264,933 can also implant images, in selected main broadcasting
areas, in response to audio signals, such as typical expressions of
commentators.
PCT Application PCT/FR91/00296 describes a procedure and device for
modifying a zone in successive images. The images show a non-deformable
target zone which has register marks nearby. The system searches for the
register marks and uses them to determine the location of the zone. A
previously prepared image can then be superimposed on the zone. The
register marks are any easily identifiable marks (such as crosses or other
"graphemes") within or near the target zone. The system of PCT/FR91/00296
produces the captured image at many resolutions and utilizes the many
resolutions in its identification process.
SUMMARY OF THE PRESENT INVENTION
It is an object of the present invention to provide a system and method
which mix images, such as an advertisement, with a video stream of action
occurring within a relatively unchanging space. Such a space may be a
playing field or court, a stage or a room and the location is typically
selected prior to the action (e.g. game or show). The images are
"implanted" onto a selected surface of the background space, where the
term "implanted" herein means that the images are mixed onto the part of
the video stream showing the selected surface.
Specifically, the present invention utilizes apriori information regarding
the background space to change the video stream so as to include the image
at some location within the background space. The system and method
operate no matter which perspective view of the background space is
presented in the video stream.
In accordance with a preferred embodiment of the present invention, the
system preferably includes a video frame grabber and an image implantation
system. The frame gabber grabs a single frame of the video signal at one
time. The image implantation system typically implants the advertising
image into the frame onto a predefined portion of a preselected one of the
surfaces of the background space if the portion is shown in the frame. To
determine the location of the portion to receive the implantation, the
image implantation system includes a unit for receiving a) a flat model of
the fixed surfaces of the background space and b) an image mask indicating
the portion of the fiat model onto which the image is to be mixed. Via the
model, the image implantation system identifies if and where the portion
is shown in the frame.
Moreover, in accordance with a preferred embodiment of the present
invention, the system also includes a design workstation on which the
image and an image mask which indicates the preselected surface can be
designed.
Further, the identification preferably involves a) reviewing the frame and
extracting features of the fixed surfaces therefrom and b) determining a
perspective transformation between the model and the extracted features.
Still further, the reviewing and extracting includes creating a background
mask and a foreground mask. The background mask indicates the locations of
features of interest, of background elements in the frame and is utilized
to extract desired features. The foreground mask is formed of the
foreground elements of the frame which must remain unchanged.
Additionally, in accordance with a preferred embodiment of the present
invention, the implantation includes the steps of a) transforming the
image, an image mask and, optionally, a blending mask, with the
perspective transformation, and b) mixing the transformed image, image
mask and optional blending mask with the frame and with the foreground
mask. The foreground mask, as mentioned hereinabove, indicates locations
of foreground data not to be covered by the transformed image.
Further, the system preferably includes a lookup table for convening
between the multiplicity of colors in the frame to one of: colors of
features of interest, colors of background elements and a color indicating
foreground elements. The lookup table is preferably created by having a
user indicate the relevant colors. If the relevant colors no longer
indicate the features of interest and the background elements (typically
due to fighting changes), the user can indicate new colors which do
indicate the desired elements and the lookup table is then corrected.
Still further, in accordance with a preferred embodiment of the present
invention, the lookup table is utilized to create the background and
foreground masks of the frame indicating the locations of features of
interest, of background elements and of foreground elements in the frame.
In accordance with an exemplary embodiment of the present invention, the
features are lines. In one embodiment, they are extracted with a Hough
transform. In another embodiment, they are extracted by determining the
angles of line segments. Pixels of interest are selected and a
neighborhood opened. The neighborhood is subdivided and the sector having
the greatest activity is selected. The selected sector is then extended
and divided. The process is repeated as necessary.
Moreover, in accordance with a preferred embodiment of the present
invention, the system projects the extracted features onto an asymptotic
function to determine which of the features are perspective versions of
parallel lines.
Further, in accordance with the exemplary embodiment of the present
invention, the background space is a sports arena having lines marked on
it. The system has a model of the sports arena and, preferably, has a list
of the rectangles in the model and the locations of their comer points.
The system preferably perform the following operations:
a) selects two vertical and two horizontal lines from the extracted
features and determines their intersection points;
b) generates a transformation matrix from the corner points of each
rectangle of the model to the feature intersection points;
c) transforms the model with each transformation matrix;
d) utilizing the background elements of the background mask, matches each
transformed model with the frame; and
e) selects the transformation matrix which matches the features of the
frame best.
Moreover, in accordance with the exemplary embodiment of the present
invention, camera parameters can be utilized to reduce the number of lines
in the frame needed to identify the sports field. For this embodiment, the
following actions occur:
receiving or extracting the coordinates of a set of cameras; representing a
current transformation matrix as a product of coordinate, tilt, turn and
zoom matrices and then determining the values for the tilt, turn and zoom;
and
identifying the camera having the calculated values for tilt, turn and zoom
and storing the information; and
repeating the steps of receiving, representing and identifying whenever
there is a new cut in the video.
Any frame in the video stream can now be treated as either being similar to
the previous frame or as part of a new cut taken by an identified camera.
BRIEF DESCRIPTION OF THE DRAWINGS
The present invention will be understood and appreciated more fully from
the following detailed description taken in conjunction with the drawings
in which:
FIG. 1 is a block diagram illustration of a system for implanting images
into a video stream, constructed and operative in accordance with a
preferred embodiment of the present invention;
FIG. 2 is a schematic illustration of a tennis game used as an example for
explaining the operation of the system of FIG. 1;
FIG. 3 is an illustration of a model of a tennis court, useful in
understanding the operation of the system of FIG. 1;
FIG. 4A is an illustration of an image to be implanted; FIG. 4B is an
illustration of an image region mask for the image of FIG. 4A and the
model of FIG. 3;
FIG. 4C is an illustration of a blending mask for the image of FIG. 4A and
the model of FIG. 3;
FIG. 5 is a block diagram illustration of elements of an image implantation
unit forming part of the system of FIG. 1;
FIG. 6 is an illustration of an exemplary video frame into which the image
of FIG. 4A is to be implanted;
FIG. 7 is an illustration of a background mask generated from the video
frame of FIG. 6;
FIG. 8 is a block diagram illustration of the operations of a feature
identification unit forming part of the image implantation unit of FIG. 5;
FIG. 9A is a flow chart illustration of a method of feature extraction;
FIG. 9B is an illustration of a portion of the background mask, useful in
understanding the method of FIG. 9A;
FIG. 9C is an illustration of a histogram of subsectors of the background
mask of FIG. 9B, useful in understanding the method of FIG. 9A;
FIG. 10 is a block diagram illustration of the operations of a perspective
identification unit forming part of the image implantation unit of FIG. 5;
FIG. 11A is an illustration of the meeting points of extracted features
from FIG. 7;
FIG. 11B is an illustration of perspective parallel lines meeting at
different points due to calculation inaccuracies;
FIGS. 12A and 12B are illustrations of gnomonic projections, useful in
understanding the operations of the perspective identification unit of
FIG. 10;
FIG. 12C is a graphical illustration of an exemplary function useful for
the gnomonic projections of FIGS. 12A and 12B;
FIG. 13 is a detailed block diagram illustration of the operations
illustrated in FIG. 10;
FIGS. 14A and 14B are useful in understanding the operations of FIG. 13;
FIG. 15 is an illustration of the use of transformation matrices;
FIG. 16 is an illustration useful in understanding the matching process
between quadrilaterals and the geometric model, useful in understanding
the operations of FIG. 13;
FIG. 17 is a block diagram illustration of the operations of transformer
and mixing units of the image implantation unit of FIG. 5;
FIG. 18 is a block diagram illustration of a correction method for updating
a lookup table used in the image implantation unit of FIG. 5;
FIG. 19 is a schematic illustration of camera parameters;
FIG. 20 is a flow chart illustration of transformation matrix operations
when the camera parameters of FIG. 19 are known or calculable;
FIG. 21 is an illustration of a table useful in the process shown in FIG.
20; and
FIG. 22 is a flow-chart illustration of a method of operation when the
camera parameters are known or calculable.
DETAILED DESCRIPTION OF PREFERRED EMBODIMENTS
Reference is now made to FIG. 1 which illustrates a system for mixing
images, such as advertisements, with a video stream of action occurring
within a relatively unchanging background space. The images are implanted
onto a selected surface of the background space. The system will be
described in the context of a video of a tennis game, illustrated in FIG.
2 to which reference is also made. It will be understood that the present
invention is operative for all situations in which the surfaces in which
action occurs are known a priori and are identifiable.
The system of the present invention typically comprises a video frame
grabber 10 for converting an input video sequence (such as of a tennis
game) into video frames, a design workstation 12 for designing images
(such as advertisements) to be implanted onto a selected surface (such as
on the tennis court) seen within the video frame, an image implantation
system 14 for merging the video frame with the designed image, a control
computer system 16 for controlling the action of and providing operator
input to the image implantation system 14 and a broadcast monitor 18.
The control computer system 16 typically comprises a central processing
unit (CPU) 20, a keyboard 22, a mouse 24, a disk 26, a removable media
drive such as a floppy 27, and a monitor 28. The monitor 28 is typically
driven by a graphics adaptor forming part of the CPU 20. The design
workstation 12 typically also includes a removable media drive such as
floppy 27.
The control computer system 16 and the image implantation system 14
typically communicate via a system bus 29. The design workstation and the
control computer system 16 typically communicate via removable media.
The video sequence can be received from any source, such as a videotape, a
remote transmitting station via satellite, microwave or any other type of
video communication, etc. If the sequence is provided from a satellite,
the system has no control over the video rate. Therefore, the image
implantation system 14 must perform its operations within the video rate,
typically 30 ms between frames, of the satellite video stream if the
sequence comes from a videotape, the system can control the video rate and
operate at any desired speed.
The video sequence is originally produced at the site of the game. As can
be seen in FIG. 2, for tennis games, there are typically two television
cameras 30 viewing the action on the tennis court 32. The locations of the
television cameras 30 typically are fixed.
The court 32 is divided into two halves by a net 34. Each half has a
plurality of areas 36, typically painted a first shade of green, divided
by a plurality of lines 38, typically painted white. The outer court area
40 is typically painted a second shade of green.
In reality, the lines 38 are parallel and perpendicular lines. Since the
cameras 30 zoom in on the action from an angle, rather than from above,
the images of the action which they receive are perspective views. Thus,
in the video output of the cameras 30, the parallel lines 38 appear as
though they converge at infinity. The angle of perspective of the video
output changes as the angles of the cameras 30 and the amount of zoom.
The present invention will implant an image 42, such as the word "IMAGE",
at a desired location on a selected background surface, for all
perspective angles and amount of zoom. For tennis courts, the possible
locations are any rectangles within one haft of the tennis court 32
defined by four lines 38. As shown in FIG. 2, the image 42 will not
interfere with the action of players 44; it will appear as though the
image 42 was painted on the surface of the court.
Since, in reality, the shape of court 32 and the location of lines 38
within the court 32 do not change, if the image implantation system has a
model of the playing space, including the location in which the image is
to be implanted, and can identify at least the viewing angle and amount of
zoom, it can merge the image into the video sequence so that it will
appear as though the image was implanted at the desired location. To do
this, the image implantation system additionally needs to know the colors
of the court as seen by the cameras. These colors can change as the
lighting (daylight or artificial) change.
Reference is now additionally made to FIG. 3 which illustrates a geometric
model 50 of the tennis court and to FIGS. 4A, 4B and 4C which illustrate
data which an implantation designer prepares.
The implantation designer works at the design workstation 12, such as the
BLAZE workstation manufactured by Scitex Corporation Ltd. of Herzlia,
Israel, and typically has the geometric model 50 of the tennis court 32,
typically as a top view. The model 50 is typically a scaled version of the
court 32, indicating the elements of it which are to be identified by the
implantation system 14, such as the lines 38. Other playing fields may
include circles or other well-defined curves. Other identifiable elements
include intersections 54 of the lines 38.
The implantation designer designs the image 42 (illustrated in FIG. 4A) to
be implanted and determines where on the model 50 to place it. A number of
possible locations 52 are shown in FIG. 3. The designer then prepares an
image location mask 56 (FIG. 4B) to identify where within the model 50 the
image 42 is to be placed. The mask 56 is light at the location in model 50
where the image 42 is to be placed and dark everywhere else.
Since the image 42 may be of bright colors, it may be desired not to
implant the image itself but a softened version of it, so as not to
significantly disturb the action on the court 32. Therefore, the
implantation designer may optionally prepare a blending mask 58 (FIG. 4C)
indicating how the image 42 is to be blended with the color of the court
32 at the location of implantation as indicated by location mask 56. The
blending mask 58 can be any suitable mask such as are known in the art. In
FIG. 4C, mask 58 is shown to have four areas 59, each indicating the
inclusion of a different amount of court color, where the outer area 59
typically incorporates much more of the court color than the inner areas.
Reference is now made back to FIGS. 1 and 2. The implantation data, formed
of the geometric model 50, the image 42, the image location mask 56 and
the optional blending mask 58, are typically prepared before the relevant
tennis match and are provided to the image implantation system 14,
typically via removable media, for implantation into the input video
sequence when the match occurs.
Most video sequences of live televised games begin with an initializing
sequence operative to enable local station operators to synchronize their
systems to the input sequence. This is also typically true for taped video
data.
In the present invention, the initializing video data is grabbed by the
frame grabber 10 and is provided fast to the control computer system 16. A
station operator selects a frame which has a clear view of the game field
and uses it to provide calibration information, as described hereinbelow.
The calibration information is utilized by the image implantation system
14 to identify the court 32 and its features (such as lines 38). In the
embodiment described hereinbelow, the calibration information includes the
colors of the features of interest in the background, such as the field
lines, the playing field (court 32) and the ground outside the playing
field (outer court area 40). The remaining colors which may be received
are defined as foreground colors. Other playing fields may require fewer
or more features to define them and thus, fewer or more colors.
The station operator, utilizing the mouse 24 and keyboard 22, interactively
defines the calibration colors. This can be achieved in a number of ways,
one of which will be described herein. A four color layer is superimposed
over the frame currently displayed on control monitor 28. Initially, the
four color layer is comprised of one color only, a transparent color.
Thus, the current frame is initially visible.
The operator indicates pixels describing one of the three features, lines
38, inner playing field 36 and outer playing field 40. When he selects a
pixel, those pixels in the superimposed layer which correspond to pixels
in the current frame having the selected color are colored a single
translated color, thereby covering their corresponding pixels of the
current frame. The selected color is stored. The process is repeated for
all three areas. All colors not selected are assigned a fourth translated
color.
If the operator approves the resultant four color layer, a lookup table
(LUT) between the colors selected from the current frame and the
translated colors is produced.
If desired, the control computer system 16 can store the pixels which the
operator selected for later use in a LUT correction cycle, described
hereinbelow with reference to FIG. 18.
The control computer system 16 provides the frame data, consisting of the
LUT and the pixels utilized to produce the LUT, to the image implantation
system 14. System 14 utilizes the above described frame data to identify
the desired features in each frame of the input video signal.
Reference is now made to FIG. 5 which illustrates the general elements of
the image implantation system 14. Reference is also made to FIGS. 6 and 7
which are useful in understanding the operation of the system 14.
The system 14 typically comprises a feature identification unit 60 (FIG. 5)
for identifying which features of the court 32 are present in each input
video frame and a perspective identification unit 62 for identifying the
viewing angle and zoom of an active camera 30 and for determining an
appropriate perspective transformation between the model 50 and the input
video frame. The system 14 also comprises a transformer 64 for
transforming the implantation data from the model plane to the image
viewing plane and a mixer 66 for mixing the perspective implantation data
with the current video frame, thereby to implant the image 42 onto the
court 32.
As described in more detail hereinbelow, the feature identification unit 60
utilizes the LUT to create a background mask of the input frame indicating
which pans of the frame have possible background features of interest and
which parts are foreground and therefore, are not to be changed in later
operations. FIGS. 6 and 7 respectively provide an exemplary input frame 68
and its corresponding background mask 70.
The input frame 68 of FIG. 6 has two players 44 on the court 32. The
background mask 70 of FIG. 7 shows the areas of the four colors. The areas
marked 114 4 are the areas of line color, inner court color, outer court
color and remaining colors, respectively. It is noted that the areas of
the players 44 are marked with the background color 4 and cover over other
important areas, such as those of the white lines 1.
From the background mask 70, unit 60 (FIG. 5) extracts the features of the
playing field. For tennis courts, the features of interest are the lines
38. The perspective identification unit 62 compares the extracted features
with those of the model 50 and produces therefrom a transformation matrix.
Using the transformation matrix, the transformer 64 converts the image
implantation data (i.e. image 42 to be implanted, the image location mask
56 and the blending mask 58) to the perspective of the input video frame.
Finally, using the transformed image location mask 56 and the background
mask 70, the mixer 66 implants the perspective version of image 42 into
the desired background parts of the input video frame. Thus, if the
players walk on the part of the court 32 where the image 42 is implanted,
they will appear to walk "over" the implanted image. If desired, the
transformed blending mask 58 can be utilized to blend the image 42 with
the colors of the field on which the image 42 is implanted.
Reference is now made to FIG. 8 which details the operations of the feature
identification unit 60. In step 72, unit 60 uses the LUT to convert the
input video frame from a many colored frame to the four color picture
called the background mask 70. Specifically, for the tennis court 32, the
LUT provides first value to pixels having colors of the lines 38, a second
value to pixels having colors of the inner court 36, a third value to
pixels having colors of the outer court 40 and a fourth value (indicating
foreground pixels) to the remaining pixels. This is shown in FIG. 7. The
LUT can be implemented in any suitable one of the many methods known in
the art.
The background mask 70 not only defines which pixels belong to the
background of interest, it also includes in it the features of Interest,
such as lines 38. Thus, in step 74, the feature identification unit 60
processes background mask 70 to extract the features of interest.
Typically though not necessarily, the LUT is designed to provide the
features with a single color value.
For the example of a tennis match, the extraction involves reviewing those
pixels of the background mask 70 having the first value and extracting
straight segments therefrom. For example, step 74 can be implemented with
a Hough transform operating on the background mask 70. Hough transforms
are described on pages 121-126 of the book Digital Picture Processing,
Second Edition, Vol. 2 by Azriel Rosenreid and Avinash C. Kak, Academic
Press, 1982, which book is incorporated herein by reference.
The result is an array of line parameters, each describing one straight
segment in the background mask 70. The line parameters for each segment
include the coefficients of the line equations describing it as well as a
weight value indicating the number of pixels included within the segment.
An alternative method of extraction is illustrated in FIGS. 9A, 9B and 9C
to which reference is now briefly made. As shown generally in FIG. 9A, the
method begins at a first pixel 69 (FIG. 9B) of the background mask 70
having the color of interest (in this example, white) and looks in its
neighborhood 75 to determine where there are more white pixels (marked by
shading). To do so, it divides the neighborhood 75 into subsectors 71 74
of a predetermined size and performs a histogram of distribution of white
pixels in the each subsector. FIG. 9C illustrates the histogram for the
sectors 7114 74 of FIG. 9B. The one with a strong maximum (subsector 73)
is selected as the next sector for searching.
In the next step, a new neighborhood 78 is defined which consists of the
selected subsector 73 and an extension thereof. The entire neighborhood 78
is twice as long as the neighborhood 75. This new neighborhood 78 is
subdivided into four subsectors 76 and the process repeated.
This process continues until one of the following criteria are met:
1. the sub-sector is narrow enough to be deemed as a straight line;
2. no strong maximum is obtained in the histogram.
If condition 1 is obtained, the coefficients of the straight line are
stored and the pixels forming the straight line are then "colored" to have
the "remaining color" and so eliminated from the search.
The feature extraction process produces an array of possible features which
includes the true features as well as stray lines.
Reference is now made to FIG. 10 which illustrates, in general, the
operations of the perspective identification unit 62 of FIG. 5. Reference
is also made to FIGS. 11A and 11B which are useful in understanding the
operation of unit 62 in general, to FIG. 13 which details the operations
of unit 62 for the example of the tennis court 32 and to FIGS. 12A, 12B,
12C, 14A and 14B which are useful in understanding the operations detailed
in FIG. 13.
Using a priori information, unit 62, in step 80, processes the array of
possible features and determines which ones are most likely to be the
features of interest. In step 82, unit 62 selects a minimum set of
features from the resultant true features and attempts to match them to
features of the model 50. The process is repeated as often as necessary
until a match is found. In step 84, the matched features are utilized to
generate a transformation matrix M transforming the model to the features
in the input video frame.
In the example of the tennis court 32, step 80 utilizes the fact that the
lines 38 of model 50 are parallel in two directions (vertical and
horizontal) and that in perspective views (such as in the input video
frame), lines which are parallel in reality meet at a finite point. This
is illustrated in FIG. 11A in which all the extracted line segments,
represented by solid lines, are extended by dashed lines. The perspective
lines which correspond to parallel lines in reality (e.g. pseudo parallel
lines 90) intersect at a point 91 far from the outer edges 92 of the
frame. All other intersections, labeled 94, occur within the edges 92 or
close to its borders.
However, as illustrated in FIG. 11B, because of digitization errors, it
might be determined that the extensions of three pseudo parallel lines do
not meet at a single point. In fact, they might meet at three widely
separated points 96.
Applicants have realized that, since perspective parallel lines do meet at
infinity, the projection of the extracted lines onto an asymptotic
function will cause the intersection points to occur close together.
Therefore, in accordance with a preferred embodiment of the present
invention, the extracted line segments are projected onto a
two-dimensional asymptotic function. One such projection is known as a
"Gnomonic Projection" and is described on pages 258, 259 and 275 of the
book Robot Vision by Berthold Klans Paul Horn, The MIT Press, Cambridge,
Mass., 1986, which pages are incorporated herein by reference. Examples of
gnomonic projections are illustrated in FIGS. 12A and 12B.
In the gnomonic projection, a point 100 on an XY plane 102 is projected
onto a point 100' on a hemisphere 104. A line 106 in the XY plane is
projected onto a great arc 106' of the hemisphere 104 (i.e. an arc of a
great circle of a sphere). The origin is represented by the south pole 109
and infinity is represented by the equator 108. Thus, any cluster 110
(FIG. 12B) of points near the equator 108 represents the intersection of
pseudo parallel lines and thus, the lines which have points going through
a cluster 110 are parallel lines.
FIG. 12B illustrates a plurality of great arcs, labeled 120a-120f,
corresponding to some arbitrary extracted line segments (not shown). The
three arcs 120a-120c have intersection points 122 which form a cluster
110a near the equator 108. Great arcs 120d-120f also intersect near the
equator, but at cluster 110b. All of the great arcs intersect each other,
but their other intersections are at locations closer to the south pole
109 than to the equator 108.
In step 130 (FIG. 13), the gnomonic projection is utilized to produce an
array of great arcs from the array of straight line segments produced from
the feature extraction (step 74, FIG. 8).
In step 132, the area around equator 108 is searched to find all of the
intersection points 122. A value V.sub.k is given to each intersection
point. The value V.sub.k is a function of the weights W.sub.i of the line
segments which intersect and the Z coordinate of the intersection point
122. An example of a function V.sub.k is provided in equation 1:
V.sub.k =W.sub.line 1 * W.sub.line 2 * f(Z.sub.intersection point)(1)
where f(Z.sub.intersection point) is any function having a curve similar to
curve 134 of FIG. 12C wherein most points receive a low value and only
those points approaching the equator 108 (Z=1) receive values close to 1.
For example, f(Z.sub.intersection point) might be Z.sup.5.
In step 136, a small neighborhood around each intersection point 122 is
searched for other intersection points. If any are found, the present
intersection point and the ones found are stored as a cluster 110 (FIG.
12B). A cluster 110 is also defined as one whose value of
f(Z.sub.intersection point) is above a predefined threshold. Thus, a
cluster 110 can include only one intersection point. In FIG. 12B there are
three clusters 110a-110c, one of which, cluster 110c, includes only one
intersection point 122.
Once all of the points have been searched, a location of each cluster 110
is determined by Finding the "center of gravity" of the points in the
cluster. The weight of the cluster 110 is the sum of the values V.sub.k of
the points in the cluster.
In step 138, the two clusters with the highest weights are selected. For
the example of FIG. 12B, clusters 110a and 110b are selected.
In step 140, one cluster is assumed to represent "vertical" lines and the
other to represent "horizontal" lines. Also, in step 140, the straight
segments corresponding to the lines of the two selected clusters are
marked "vertical" or "horizontal", respectively.
In step 142, the "vertical" and "horizontal" lines are reviewed and the two
heaviest vertical and two heaviest horizontal lines are selected, where
"heaviest" is determined by the values of W.sub.i. The selected lines,
labeled 146, are shown in FIG. 14A for the lines of FIG. 11A. In step 144
the intersection points, labeled A, B, C and D, of the four selected lines
are determined and stored. As shown in FIG. 14A, the selected lines may
intersect out of the frame.
Steps 130-144 are the operations needed to identify the true features in
the video frame (step 80 of FIG. 10). The output of step 144 are the
features which are to be matched to the model. The remaining steps match
the features to the model and determine the transformation (steps 82 and
84 of FIG. 10) as an integrated set of operations.
A standard tennis court has five vertical lines and four horizontal lines.
Since it is not possible to differentiate between the two halves of the
court, only three horizontal lines are important. The number of different
quadrilaterals that can be formed from a selection of two horizontal lines
out of three (three possible combinations) and two verticals out of five
(10 possible combinations) is thirty. The thirty quadrilaterals may be in
four different orientations for a total of 120 rectangles.
In step 150, one of the 120 rectangles in the geometric model 50 is
selected by selecting its four comers, labeled A', B', C' and D' (FIG.
14B). As can be seen, this is not the correct match.
In step 152, the matrix M, which transforms from the four points A', B', C'
and D' of the model (FIG. 14B) to the four points | | |