|
Claims  |
|
|
We claim:
1. A system for computer vision based upon an image sensor that maps an
image to memory cells to association with a computer comprising:
an object orientation and position patch for being attached to an object to
be observed comprising a planar substantially nonreflective surface, at
least three substantially coplanar and non-collinear reflective locations
(a, b, and c) positioned upon the patch and a reflector (d) having the
reflective properties of the surface of a regular curved surface
intersecting the planar surface,
an image sensor comprising means for generating a two-dimensional digital
representation of the image stored in an array of memory cells arranged in
a plurality of rows and column, each cell having an associated row and
column address and being accessible for reading,
a lens for focusing an image of the object including the orientation and
position patch upon the image sensor, said lens defining an optical axis
extending through the center of the lens and intersecting the image sensor
near the center thereof,
a light source fixed relative to the sensor and lens and being directed
outwardly therefrom;
a computer having a processor and main memory, and means for reading the
row and column addresses of the memory cells corresponding to the images
(a', b', and c') of reflective locations a, b, and c and an image (dls) of
the light source reflected from the reflector d, and
said computer having a task stored in main memory for detecting and
quantifying a change in orientation and position of the object, said task
detecting changes in the position along the optical axis and in directions
perpendicular thereto by computing changes in the row and column addresses
of the memory cells corresponding to the images a', b', and c', and said
task detecting change in rotation about axes perpendicular to the optical
axes by computing differences in the row and column addresses of the
memory cells corresponding to the image dls and the images a', b', and/or
c'.
2. The system for computer vision according to claim 1 wherein the task
stored in main memory for detecting and quantifying a change in
orientation and position detects changes in the position along the optical
axis by computing the average of the changes in distance between the
addresses of images a', b', and c' and interpreting a decrease in the
average as a movement away from the lens.
3. The system for computer vision according to claim 1 wherein the task
stored in main memory for detecting and quantifying a change in
orientation and position detects changes in the positions perpendicular to
the optical axis by computing the average of the changes in the row or
column addresses of the images a', b', and c'.
4. The system for computer vision according to claim 1 wherein the task
stored in main memory for detecting and quantifying a change in
orientation and position detects changes in the rotation of the object
about axes perpendicular to the optical axis by computing differences in
the row or column addresses of at least one of the images a', b', c' and
the row or column address of image dls.
5. The system for computer vision according to claim 1 wherein the task
stored in main memory for detecting and quantifying a change in
orientation and position detects changes in the rotation about an axis
parallel to the optical axis by observing the change in row or column
addresses of at least one of the images a', b' and c' and interpreting the
direction of movements of said images as tangent to concentric circles
about the axis of rotation.
6. The system for computer vision according to claims 1 to 5 wherein the
image sensor comprises a solid-state image sensor having a two-dimensional
array of light sensitive memory cells.
7. The system according to claim 1 having multiple light sources fixed
relative to the sensor and lens and means for switching only one source on
at one time.
8. The system according to claim 1 having multiple image sensors fixed
relative to the light source and means for switching only one sensor on at
one time.
9. A body pointer based upon an image sensor used in association with a
computer comprising
an orientation and position patch for being attached to the portion of the
body used for pointing comprising a planar substantially nonreflective
surface, at least three substantially coplanar and nonaligned reflective
locations (a, b, and c) positioned upon the patch and a reflector (d)
having the reflective properties of the surface of a regular curved
surface mounted upon the planar surface,
an image sensor comprising means for generating a two-dimensional
representation of the image stored in an array of memory cells arranged in
a plurality of rows and columns, each cell having an associated row and
column address and being accessible for reading,
a lens for focusing an image of the orientation and position patch upon the
image sensor, said lens defining an optical axes extending through the
center of the lens and intersecting the image sensor near the center
thereof,
a light source fixed relative to the sensor and lens and being directed
outwardly therefrom,
a computer having a processor and main memory,
means for transferring to the computer the row and column addresses of the
memory cells corresponding to the images (a', b', and c') of reflective
locations a, b, and c and an image (dls) of the light source reflected
from the surface reflector d to the computer,
said computer having a first task stored in main memory for calibrating the
body pointer for an initial orientation and position of the pointer by
recording the row and column addresses of the images a', b', c' and dls,
said computer having a second task stored in main memory for detecting and
quantifying a change in orientation and position of the patch, said task
detecting changes in the position along the optical axis and in direction
perpendicular thereto by computing changes in the row and column addresses
of the image a', b', and c', and the task detecting the change in rotation
about axes perpendicular to the optical axes by computing differences in
the addresses of the image dls and images a', b', and/or c'.
10. The body pointer according to claim 9 wherein the task stored in main
memory for detecting and quantifying a change in orientation and position
detects changes in the position along the optical axis by computing the
average of the changes in distance between the addresses of images a', b',
and c' and interpreting a decrease in the average as a movement away from
the lens.
11. The body pointer according to claim 9 wherein the task stored in main
memory for detecting and quantifying a change in orientation and position
detects changes in the position perpendicular to the optical axis by
computing the average of the changes in the row or column addresses of the
images a', b', and c'.
12. The body pointer according to claim 9 wherein the task stored in main
memory for detecting and quantifying a change in orientation and position
detects changes in the rotation of the patch about axes perpendicular to
the optical axis by computing differences in the row or column addresses
of at least one of images a', b', c' and image dls.
13. The body pointer according to claim 12 wherein the task stored in main
memory for detecting and quantifying a change in orientation and position
detects changes in the rotation of the patch about an axis parallel to the
optical axis by observing the change in row or column addresses of at
least one of images a', b', and c' and interpreting the directions of
movements of said images as tangent to concentric circles about the axis
of rotation.
14. The body pointer according to claims 7 to wherein the image sensor
comprises a solid-state image sensor having a two-dimensional array of
light sensitive memory cells.
15. The body pointer according to claim 9 having multiple light sources
fixed relative to the sensor and lens and means for switching only one
source on at one time.
16. The body pointer according to claim 9 having multiple image sensors
fixed relative to the light source and means for switching only one sensor
on at one time.
17. An eyetracker based upon an image sensor used in association with a
computer comprising
a display bearing indicia which may be selected with the user's eyes,
a head orientation and position patch for being attached to the face near
an eye comprising a planar substantially nonreflective surface, at least
three substantially coplanar and nonaligned reflective locations (a, b,
and c) positioned upon the patch and a reflector (d) having the reflective
properties of a regular curved surface mounted upon the planar surface,
an image sensor comprising means for generating a two-dimensional
representation of the image stored in an array of memory cells arranged in
a plurality of rows and columns, each cell having an associated row and
column address and being accessible for reading,
a lens for focusing an image of the eye and the orientation and position
patch upon the image sensor, said lens defining an optical axis extending
through the center of the lens and intersecting the image sensor near the
center thereof,
a light source fixed relative to the sensor and lens and being directed
outwardly therefrom,
a computer having a processor and main memory,
means for transferring the row and column addresses of the memory cells
corresponding to the images (a', b', and c') of reflective locations a, b,
and c and an image (dls) of the light source reflected from the spherical
surface reflector d and the row and column addresses of the image (cls) of
the light source reflected from the cornea of the eye to the computer,
said computer having a first task stored in main memory or calibrating the
eyetracker for an initial head orientation and position by recording the
row and column addresses of the images a', b', c', dls and cls when the
user gazes sequentially at least two of the indicia upon the display while
the head is substantially immobile,
said computer having a second task stored in main memory for detecting and
quantifying a change in orientation and position of the head, said task
selecting changes in the position along the optical axis and in directions
perpendicular thereto by computing changes in the row and column addresses
of the images a', b', and c', and the task detecting the change in
rotation about the axes perpendicular to the optical axes by computing
differences in the addresses of the image dls and the images a', b', and
c', and
said computer having a third task stored in main memory for correlating the
row and column address of the corneal reflection image cls with indicia
upon the display taking into consideration the instantaneous head position
and orientation as determined by reference to the initial head orientation
and position determined by the first task and the change in head position
and orientation determined by the second task.
18. The eyetracker according to claim 17 wherein the task stored in main
memory for detecting and quantifying a change in orientation and position
detects changes in the position along the optical axis by computing the
average of the changes in addresses between the images a', b', and c' upon
the image sensor and interpreting a decrease in the average as a movement
away from the lens.
19. The eyetracker according to claim 17 wherein the task stored in main
memory for detecting and quantifying a change in orientation and position
detects changes in the positions perpendicular to the optical axis by
computing the average of the changes in the row or column addresses of the
images a', b', and c'.
20. The eyetracker according to claim 17 wherein the task stored in main
memory for detecting and quantifying a change in orientation and position
detects changes in the rotation of the head about axes perpendicular to
the optical axis by computing differences in the row or column addresses
of at least one of images a', b', c' and image dls.
21. The eyetracker according to claim 17 wherein the task stored in main
memory for detecting and quantifying a change in orientation and position
detects changes in the rotation of the head about an axis parallel to the
optical axis by observing the change in row, or column addresses of at
least one of images a', b', and c' and interpreting the directions of
movements of said images as tangent to concentric circles about the axis
of rotation.
22. The eyetracker according to claims 17 to 21 wherein the image sensor
comprises a solid-state image sensor having a two-dimensional array of
light sensitive memory cells.
23. The eyetracker according to claim 17 having multiple light sources
fixed relative to the sensor and lens and means for switching only one
source on at one time.
24. The eyetracker according to claim 17 having multiple image sensors
fixed relative to the light source and means for switching only one image
sensor on at one time. |
|
|
|
|
Claims  |
|
|
Description  |
|
|
BACKGROUND OF THE INVENTION
Computer vision systems are systems that analyze camera generated images
and enable useful conclusions to be reached from that analysis. Such
systems often comprise digitizing an analog image to produce a table of
digital data. Thereafter, a digital computer processes the digital data,
for example, to locate edges, recognize shapes, or to verify spatial
relationships. Eyetrackers are a splendid example of computer vision
systems. An eyetracker is a device that measures the movements of the eye
upon which the camera is trained. Specifically, an eyetracker detects the
direction an eye is gazing. An eyetracker communication system allows the
physically handicap person to use eye gaze and movements to communicate.
Examples of computer vision systems particularly useful as an eyetracker
are set forth in U.S. Pat. No. 4,648,052 entitled "Eyetracker
Communication System" and allowed U.S. patent application Ser. No.
06/897,497 entitled "Computer Vision System Based Upon Solid-State Image
Sensor." Both of these patent documents disclose an eyetracker wherein the
direction of eye gaze is determined by comparison of the location of the
center of the image of the corneal reflection and the location of the
center of the image of the pupil. While many of the concepts and
techniques disclosed in those patent documents are used in the various
embodiments of the invention disclosed herein, a drawback in those systems
has been overcome. A simple technique is provided for determining the
orientation of an object the image of which has been generated by the
computer vision system. This technique has application to body pointing in
general and robot pickup systems. It has resulted in a substantially
improved eyetracker in which it is no longer necessary to locate the image
of the pupil.
An eyetracker (eye gaze sensing system) or almost any other computer vision
system is improved if the field of view and depth of view of the image are
enlarged. In the case of an eyetracker, the larger the field of view, the
easier it is to position the device such that the user's eye can be
viewed. Maximizing the depth of the field of view is particularly
important since it allows the user to be at various positions along the
optical axis. One technique for improving the depth of the field is to use
an automatic focusing lens, for example, a lens that will mechanically
move in order to focus the image of objects at various depths along the
optical axis. A mechanical focus is not often desirable. It adds to the
expense of the system, adds to the physical size, decreases reliability
due to mechanical wear and tear, increases the audible noise generated by
the system and slows the eye gaze sensing process during the time when
focusing is taking place because the image is blurred at that time. An
alternative to an automatic mechanical focusing lens system is to use a
fixed focus lens system; that is, a lens focusing an image of an object at
only one depth. The depth of field inherent in the fixed focus system is a
function of the f number of the lens. The higher the f number, the smaller
the diameter of the lens. Also, the higher the f number, the less the
capacity to gather light and the larger the depth of field. To maximize
the depth of field, the f number must be maximized but this results in
less light being gathered by the lens and focused on the image. Hence, to
maximize the depth of field by increasing the f number, it is essential to
increase the brightness of the image features to be analyzed.
Previous eye gaze sensing techniques have used the position of the corneal
reflection relative to the center of the pupil to determine eye gaze
location. Because the pupil is very dark, lenses with a low f number have
had to be used resulting in an eye gaze system with a small depth of
field. To alleviate this problem, the applicants have developed an eye
gaze sensing technique that utilizes only bright reflections. The image of
the pupil is ignored, thus enabling the use of a lens with a high f number
resulting in a large depth of field.
A feature of the applicants computer vision system is a simple technique
for determining the three-dimensional (3D) position and orientation of a
plane in the field of view. In the case of an eyetracker according to this
invention, it is used for determining change in the user's head position.
One prior technique used for determining the position and orientation of an
arbitrary plane surface of the object is to view the object with two
cameras. Corresponding features of the two images are then matched and a
triangulation function is used to determine the distance of the feature
points from the cameras. See IEEE PAMI article "Error Analysis in Stereo
Determination of 3-D Point Positions", Nov. 1987 and the extensive
bibliography of stereo camera literature. Three feature points on a plane
serve to uniquely specify the orientation of the plane surface.
Two cameras increase the cost of the system. Therefore, it is desirable to
determine the 3D position and orientation of a surface given only one
view. This is possible if certain a priori information about the image
geometry and behavior of the object to be viewed is available. The image
processing literature presents algorithms for extracting 3D information
from single perspective views. See, for example, IEEE PAMI article "New
Methods for Matching 3-D Objects With Single Perspective Views", May 1987
and the article bibliography. These algorithms are generally concerned
with looking at real world scenes and are very computationally intensive.
Some of the algorithms use geometric models generated from a CAD data base
in order to recognize and locate the corresponding parts in 3D with the
machine vision system. These also tend to be very computationally
intensive.
According to this invention, a unique reflective patch is attached to the
surface whose orientation is to be determined. This results in a simple
and inexpensive way to detect 3D orientation and position. In addition,
the use of the reflective patch as described herein would reduce the
computational complexity of algorithms using two views.
SUMMARY OF THE INVENTION
Briefly, according to this invention, there is provided a system for
computer vision based upon an image sensor that maps an image to memory
cells in association with a computer. A unique object orientation and
position patch is attached to an object to be observed. The position patch
comprises a planar substantially nonreflective surface, at least three
substantially coplanar and non-collinear reflective locations (a, b, and
c) positioned upon the patch and a reflector (d) having the reflective
properties of a regular curved surface. The simplest example of a regular
curved surface is a spherical surface. The image sensor comprises means
for generating a two-dimensional digital representation of the image
stored in an array of memory cells arranged in a plurality of rows and
columns. The cells each have an associated row and column address and the
information at each cell is available for reading. A lens focuses an image
of the object including the orientation and position patch upon the image
sensor. The lens defines an optical axis extending through the center of
the lens and intersecting the image sensor near the center thereof. A
light source is fixed relative to the sensor and lens and is directed
outwardly therefrom. The system further comprises a computer having a
processor and main memory and the capability for reading the row and
column addresses of the memory cells corresponding to the images (a', b',
and c') of regions a, b, and c and an image (dls) of the light source
reflected from the reflector d having the reflective properties of a
regular curved surface. The computer has a task stored in main memory for
detecting and quantifying a change in orientation and position of the
object and, more particularly, a change in orientation and position of the
surface upon which the orientation and position patch is secured. The task
detects changes in the position along the optical axis and in directions
perpendicular thereto by computing changes in the row and column addresses
of images a,, b,, and c, The task detects the change in rotation about
axes perpendicular to the optical axis by computing differences in the row
and column addresses of the image dls and the images a', b', and/or c'.
Preferably, the system for computer vision has a task stored in main memory
for detecting and quantifying changes in the position of the object and
position patch along the optical axis by computing the average of the
changes in distance between the addresses of images a', b', and c' as
represented by the differences in row and column addresses upon the image
sensor and interpreting a decrease in the average as a movement away from
the lens. Preferably, according to this invention the task stored in the
optical axis by computing the average of the changes in row or column
addresses of the images a', b', and c'. Preferably, according to this
invention, the task stored in main memory detects changes in the rotation
of the object about axes perpendicular to the optical axis by computing
differences in the row or column addresses of at least one of the images
a', b', c', and the row or column address of the image dls. Preferably,
the system for computer vision has a task stored in main memory for
detecting and quantifying a change in the rotation about an axis parallel
to the optical axis by observing the change in row or column addresses of
at least one of the images a', b', c' and interpreting the directions of
movements of said images as tangent to concentric circles about the axis
of rotation.
According to a preferred embodiment of this invention, the system for
computer vision comprises a solid-state image sensor having a
two-dimensional array of light sensitive memory cells upon which the lens
focuses the image. The computer vision system described so far may
comprise the heart of any number of specific purpose computer vision
systems, for example, a body pointing system, a robot pickup system, or an
eyetracker. A body pointing system comprises a system that recognizes the
orientation of a body within the field of view of the computer vision
system and may, for example, comprise a system for detecting head
orientation, finger orientation, or the orientation of an inanimate part
upon an assembly line. A robot pickup system is a system for picking up
parts comprising a robot arm and a computer vision system as disclosed
herein. The computer vision system is used to determine the orientation of
the part to be picked up and the robot arm is then controlled to approach
the part from a selected direction.
An eyetracker according to this invention comprises a computer vision
system as already described. Additionally, it comprises a display bearing
an indicia which may be selected with the user's eyes. The display is
typically mounted to surround the image sensor. The unique head
orientation and position patch is attached to the face of the user near an
eye. The computer has a first task stored in main memory for calibrating
the eyetracker for an initial head orientation and position. The user
looks sequentially toward at least two of the indicia upon the display
while the head is substantially immobile. At each position the row and
column addresses of the images a', b', c', dls and cls are recorded. The
image cls is the reflection of the light source from the cornea of the eye
of the user to the image sensor. The computer has a second task stored in
main memory for detecting and quantifying a change in head orientation and
position as aforedescribed for the basic computer vision system. The
computer has a third task stored in main memory for correlating the row
and column addresses of the corneal reflection image cls with indicia upon
the display taking in consideration the head position and orientation as
determined by the first task and the change in head position and
orientation determined by the second task.
DESCRIPTION OF THE DRAWINGS
Further features and other objects and advantages will become clear from
the following detailed description made with reference to the drawings in
which:
FIG. 1 is a front view of a "keyboard" used in an eyetracker embodiment of
this invention,
FIG. 2 is a front view of an orientation and position patch according to
one embodiment of this invention,
FIG. 3 is a perspective view of a user using an eyetracker according to
this invention,
FIG. 4 is a diagram for explaining the coordinate system used to explain
the use of the orient and position patch,
FIG. 5 is a diagram for understanding the elements of the image of the
orientation and position patch upon the image sensor,
FIGS. 6, 7 and 8 are schematic diagrams for explaining the hardware that
captures the information on the image sensor for transfer to the computer,
and
FIG. 9 is a flow diagram for explaining the software use interpreting the
image upon the image sensor.
DESCRIPTION OF THE PREFERRED EMBODIMENTS
Referring now to FIG. 1, there is shown the front view of the board of an
eyetracker communication system. The entire system has been implemented to
fit an 8 by 12 by 3 inch case 10 and to weigh less than five pounds. The
front of the case 11, referred to as the "board" or "keyboard", has
mounted therein a lens 12 and a plurality of infrared lights 13 near the
lens. The lights are directed outward from the keyboard. An image sensor,
for example, CCD solid-state image sensor is positioned behind the lens. A
set-up light 14 is mounted at the top edge of the keyboard to direct a
narrow visible beam of light perpendicular to the keyboard. The keyboard
can then be easily positioned relative to the user by locating the beam
spot above an eye on the forehead of the user.
The keyboard is provided with eight eye gaze positions (15a to 15h) with
indicia of up to eight characters or functions located at each position.
An acknowledge light may also be located at each position. In one mode of
operation, a character or function may be established by the double gaze
technique. In other words, the user gazes first at one of the positions
containing the desired character of function. Then, the user gazes at the
position on the keyboard which corresponds to the position of the indicia
desired relative to the first position. For example, to select the
character "A" the user would first gaze at the position 15a in the upper
left-hand corner, and then gaze again at position 15b. As a further
example, to select the character "G" the user would first gaze at the
position 15a in the upper left-hand corner and then at the position 15h in
the lower right-hand corner. The acknowledge lights simply indicate the
location of the gaze. When the gaze has been held long enough to make a
selection a tone is emitted. This length of time, called "dwell time", can
be varied by the user to be short (1/2 second) or long (4 seconds).
A forty character display 16 is provided along the bottom of the keyboard
to display characters that have been selected and words and sentences as
they are assembled. The words or sentences may be dispatched to a printer,
for example, with a carriage return. The possibilities are unlimited. The
eyetyper may actually have associated word processing software for the
case where the user desired to prepare a substantial amount of text. In
the more usual and essential case of making possible the most basic of
communication, the sentence may be dispatched to a speech synthesizer to
actually speak the message.
In another mode of operation, the vision system is preprogrammed with short
messages associated with each eye gaze position. These may be selected by
a single gaze.
The image of the eye formed by the lens 12 upon the image sensor is
analyzed to determine the direction of gaze. The specific means of
detecting which of the eight eye gaze positions is being looked upon is a
variation of the corneal reflection technique. Basically, the system
detects the position of the corneal reflection relative to the position of
the head. The position of the corneal reflection can be correlated to the
position being gazed upon.
In order to determine the 3D orientation and position of a user's head
(generalizing to a plane surface) with a single view and with an
inexpensive 8 bit microprocessor, a reflective patch 20 is attached to the
user's face with double stick tape. The physical design of the patch is
shown in FIG. 2. There are four spots (0.1" squares) 21, 22, 23 and 24 of
retroreflective material (3M type 8610, for example) positioned in the
corners of the patch. In the center of the patch is a 0.5" diameter
spherical plastic bubble 25 whose radius of curvature is 7/16". When the
user is in front of the eyetyper, the infrared light emitting diodes
beneath the camera lens are reflected from the front surface of the
plastic bubble back to the camera lens. A small, bright reflection from
the surface of the spherical plastic bubble 25 results from the fact that
the bubble acts as a spherical reflector. As the patch is rotated or
translated left or right the reflection moves left or right. As the patch
is rotated or translated up or down the reflection moves up or down. The
rotation effect is much larger than the translation effect. It should be
emphasized that the reflective patch is a passive reflector as opposed to
the active transponders/transducers used in other multi degree of freedom
orientation sensing systems. In the general case, the technique being
described required only 3 coplanar spots. Any one of reflectors 21, 22,
23, 24 could be eliminated. The spherical bubble could be replaced by flat
fresnel lens, a conventional lens, a curved mirror (concave or convex), or
reflective material physically elevated above the plane. With current
optical technology, the curved optical elements may be replaced with their
flat holographic equivalent. Obviously multiple reflective devices can be
attached to any object to extend the range of motion an object may undergo
and still be sensed from a given viewpoint. A spherical reflector, as
described in the preferred embodiment, can minimize the computational
complexity of head orientation compensation for eye gaze sensing purposes.
Other curved surfaces may be used. Preferably, the curved surface is
symmetrical with respect to two perpendicular planes the intersection of
which defines a line perpendicular to the patch. The specific physical
separation between the reflective spots and, therefore, the size of the
device can be varied for different application specific reasons (e.g.
different lens-camera arrangements). Rotational ambiguities as a result of
severe rotations of the reflective device can be dealt with by having each
of the reflective spots uniquely identifiable (e.g. different sizes,
shapes, colors, etc.). In the general case, the coplanar spots need not be
discrete. It is only required that a planar region whose center can be
distinguished and whose perimeter is distinct be imaged. The preferred
embodiment uses four retroreflective coplanar spots and a plastic bubble
to generate a fifth elevated point. This is done to simplify fabrication
of the reflective patch, to ease the image processing requirements, and to
provide redundancy in the orientation computations for noise immunity
purposes.
FIG. 3 shows the relation of the board 10, the user and the patch 20 when
the computer vision system is being used as an eyetyper. The patch is worn
by the user just below one eye. Double stick tape may be used to hold the
patch in place.
Referring to FIG. 4, the coordinate system used throughout this discussion
has its origin at the focal point F of the lens that forms an image of the
eye and patch. The image plane is parallel to the xy-plane at distance f
(the focal length) from the origin along the z axis. So, as the user sits
in front of the eyetyper and moves left and right he moves along the x
axis; as the user moves up and down he moves along the y axis; as the user
moves in and out he moves along the z axis.
The head rotating left and right (rotation about the y axis) is referred to
as x rotation because this rotation direction predominantly affects the
horizontal eye gaze selection location (x location). Rotation up and down
(about the x axis) will be referred to as y rotation because this rotation
direction predominantly affects the vertical eye gaze selection (y
location). Rotation about the z axis will be referred to as z rotation.
It is important to recognize, for eye gaze sensing purposes, that the human
physiology makes it much easier for a person to rotate their head left and
right (x rotation) than to move in any other direction. This makes head x
rotation changes the major effect that must be compensated for when
attempting to determine the location of a person's gaze.
For the computation of head orientation from a single view of the
reflective patch, the 5 spots as shown in FIG. 5 will be referred to as
tlr (top left region), trr (top right region), blr (bottom left region),
brr (bottom right region) and dr (dome region). The x position of a spot
is referred by an ".x" following the designation of the region, for
example, "tlr.x" Likewise, "tlr.y" is reference to the y position of
region tlr. Also shown on FIG. 5 is the corneal reflection, "cr", which is
not an image caused by the patch and will be explained hereafter. The
image processing hardware and software extracts the 5 regions from a
single camera image and computes the x and y positions of the center of
each spot.
Determining the x and y translation of the patch is very straightforward.
One can simply compute change in the average of the x and y centers of the
four corner regions.
##EQU1##
If there are no rotations, the z position (distance of the user from the
camera lens) can be computed by using one of four differences: y
difference between the y centers of the trr and brr regions; y difference
of tlr and blr regions; x difference of blr and brr regions; and x
difference of tlr and trr regions.
z position=(tlr.y-blr.y) or (trr.y-brr.y) or (tlr.x-trr.x) or (blr.x-brr.x)
As the user moves in and out these differences will all expand and contract
proportionately. Using 4 coplanar points instead of the minimum required 3
points there exists redundancy since the x distances should be identical
and the y distances should be identical. This can be used to increase
immunity to noise in the imaging system.
If there is x rotation then one should use the y differences as a measure
of z position. If there is y rotation then one should use the x
differences as a measure of z position. For many rotations it is possible
to measure them and compensate for the inaccuracies they introduce into
the z position measure. In the preferred embodiment, if there are large x
and y rotations then it is difficult to extract the z position
information. In cases of large rotations of the orientation and position
patch or of the cornea of the eye in the case of an eyetracker, the
reflected image of the light source may move off of the image sensor. To
compensate for this loss of the image of the light source, one or more
secondary geometrically displaced light sources (13a and 13b in FIG. 1)
may be turned on under computer control. If the image from one of the
secondary light sources falls upon the image sensor, large x and y
rotations may be compensated. Another manner of dealing with such large
rotations would be to provide multiple spaced image sensors.
In the general case, where there may be a rotationally invariant reflective
perimeter defining the primary plane (an annulus around the non-coplanar
reflector), this restriction is not as severe. Fortunately, in the
application domain of the preferred embodiment, this is infrequent because
the human head usually does not move significantly.
Since the reflection off of the plastic bubble moves left and right as the
patch is rotated, it can be used to measure x rotation (head rotating left
and right). Taking the ratio of (brr.x-dr.x) and (dr.x-blr.x) or the ratio
or (trr.x-dr.x) and (dr.x-tlr.x) can give a z position invariant measure
of x rotation since the differences will expand and contract in proportion
as the z position changes.
##EQU2##
These differences must be computed relative to the origin of the patch
which may be taken on blr in order to make the measure invariant to z
rotation. X translation also affects the position of the reflection off of
the plastic bubble but the rotation effect greatly dominates the
translation effect. Since it is very easy to know the translational
position of the patch, and the imaging geometry is known (position of
light source that causes the reflection), it is very easy to compensate
for the translational effects on the measure of x rotation.
The y rotation measure (head rotating up and down, generally, about the
axis defined by a line going straight from one ear to the other through
the head) is a direct corollary of the x rotation measure. Since the
reflection off of the plastic bubble moves up and down as the patch is
rotated, it can be used to measure y rotation. Taking the ratio of
(dr.y-trr.y) and (brr.y-dr.y) or the ratio of (dr.y-tlr.y) and
(blr.y-dr.y) can give a z position invariant measure of y rotation since
the differences will expand and contract in tandem as the z position
changes.
##EQU3##
These differences must be computed relative to the origin of the patch
(bottom left region) in order to make the measure invariant to z rotation.
Y translation also affects the position of the reflection off of the
plastic bubble but the rotation effect greatly dominates the translation
effect. Since it is very easy to know the translational position of the
patch, and the imaging geometry is known (position of light source that
causes the reflection), it is very easy to compensate for the
translational effects on the measure of y rotation.
The z rotation or head "tilt" can be measured by using x differences or y
differences. These differences must be z scaled, i.e., they must be
normalized to a common z position reference since a change in z position
will make the differences larger or smaller. This z position normalization
can be performed by multiplying the x differences by the amount that they
expand or contract relative to the "normal" reference and multiplying the
y differences by the amount that they expand or contract relative to the
"normal" reference.
##EQU4##
Most modern, automated, eye gaze sensing methods include some type of image
sensor upon which an eye is focussed, a light source that illuminates the
eye, and a computer which processes the information from the image sensor
in order to determine where the eye is gazing. Referring to FIG. 6, the
system disclosed herein has been constructed and uses a 16 mm camera lens
30, a 298,000 pixel solid-state CCD (charge coupled device) image sensor
31, infrared light emitting diodes, and an inexpensive 8 bit
microprocessor. The analog video output of the CCD camera is digitized by
an analog-to-digital converter 32 whose output is connected to a digital
comparator 33. The comparator generates a signal called "A>B" when the
digital value representing an individual pixel is greater than the value
stored in an 8 bit register called the threshold register 34. This single
level thresholding effectively converts a multiple gray level analog image
into a binary image, i.e., an image with only 2 levels (black or white).
This threshold register is programmable by the software and is adjusted
dynamically in order to obtain the best separation of the reflective patch
and corneal reflection from the user's face. An alternative to dynamic
thresholding of cornea and patch, is to vary the integration times of
image sensor regions (frame, line, or pixel based). In the preferred
embodiment, dynamic single level thresholding results in the device being
much more robust to widely varying lighting conditions. While the
preferred embodiment uses binary thresholding image economy, gray scale
image processing may be used to extract the key features from suboptimal
or noisy images.
The CCD camera also generates digital horizontal (HS) and vertical (VS)
synchronization pulses that conform to the RS-170 video standard. A 10 MHz
element rate clock (ERC) or pixel clock is also generated. This serves to
synchronize the | | |