|
Claims  |
|
|
We claim:
1. Apparatus for generating and displaying user created animated objects
having synchronized visual and audio characteristics, said apparatus
comprising:
a program-controlled microprocessor;
first means coupled to said microprocessor and responsive to user input
text for segmenting said user input text in accordance with predefined
algorithms to generate a set of vector signals, each said vector signal
associated with a sound segment of said user input text;
second means coupled to said microprocessor and to said first means and
responsive to said user input text and said set of vector signals for
generating a sequence of phonetic codes, each of said phonetic codes
associated with a corresponding vector signal and its associated sound
segment, each of said phonetic codes identifying a predefined visual image
associated with said phonetic code;
display means coupled to said first and second means responsive to said set
of vector signals for displaying a dendrogram defined by said set of
vector signals and representative of an acoustical relationship among said
sound segments of said user input text, each of said phonetic codes
displayed on said dendrogram disposed in relative relationship with said
associated sound segment; and
controller means coupled to said first and second means, to said display
means and to said microprocessor and having editing means for generating a
set of instructions synchronizing said sequence of visual images with said
associated sound segments corresponding to said user input text and for
editing said set of instructions hereby defining an animated object having
synchronized visual and audio characteristics.
2. Apparatus as in claim 1 further comprising audio means coupled to said
microprocessor and to said controller means, said audio means responsive
to said set of instructions for producing sounds associated with said
phonetic codes, said display means responsive to said set of instructions
for displaying said sequence of visual images of said animated object
synchronized with said sound.
3. An apparatus for generating and displaying a user created animated
object having synchronized visual and audio characteristics, the apparatus
comprising:
a programmed computer including storage means for storing signals
representing sound, a display device, an input means, a real-time random
access speech synchronization animation means for controlling the animated
object, editing means and speech segmentation generation means for
displaying speech synchronization;
the input means for providing signals to the speech segmentation generation
means, the signals representing text;
the speech segmentation generation means including first generation means,
second generation means and third generation means;
the first generation means for generating acoustic information representing
the text, the acoustic information having component parts;
the second generation means for generating visual information associated
with the acoustic information, the visual information defining facial
expressions associated with the acoustic information;
the third generation means for generating timing information, the timing
information capable of being manipulated for synchronization between the
visual information and the acoustic information; and
the display device for providing an aligned display of the component parts
of the acoustic information with the visual information and the timing
information, the aligned display representative of a synchronization
between the component parts and the visual and timing information
associated therewith;
the editing means for editing the aligned display to adjust the timing
values to alter the synchronization; and
the programmed computer for synchronized integration of the acoustic,
visual and timing information with the visual and audio characteristics to
provide the animated object, the audio characteristics being retrieved
from the storage means.
4. The apparatus of claim 3 wherein the acoustic information is a digital
acoustic wave representation, the visual information is in a form of
phonetic codes, and the timing information is timing values.
5. The apparatus of claim 4 wherein the aligned display illustrates a
spacial relationship between the phonetic codes and the component parts of
the digital acoustic wave representation.
6. The apparatus of claim 5 wherein the animation means uses the first
generation means to automatically generate the acoustic wave
representation for display.
7. The apparatus of claim 6 wherein the timing values are RECITE command
timing values.
8. The apparatus of claim 7 wherein the RECITE command timing values and
the phonetic codes are automatically computed and displayed with the
animation means.
9. The apparatus of claim 3 wherein the animation means includes a
real-time random access interface driver, scripting language, and
animation and vivification engine language.
10. In a system having a programmed computer including a display device, an
input device and a real-time random access speech synchronization
animation means for controlling the animated object, a method for
generating and displaying a user created animated object having
synchronized visual and audio characteristics, the method comprising the
steps of:
providing speech segmentation means for displaying speech synchronization;
providing signals representing text with the input device to the speech
segmentation generation means;
generating acoustic information representing the text, the acoustic
information having component parts;
generating visual information associated with the acoustic information, the
visual information defining facial expressions associated with the
acoustic information;
generating timing information capable of being manipulated for
synchronization between the visual information and the acoustic
information;
providing on the display device an aligned display of the component parts
of the acoustic information with the visual information and the timing
information, the aligned display representative of a synchronization
between the component parts and the visual and timing information
associated therewith;
editing the aligned display to adjust the timing values to manipulate the
synchronization; and
synchronized integration of the acoustic, visual and timing information
with the visual and audio characteristics to provide the animated object.
11. The method of claim 10 wherein the acoustic information is a digital
acoustic wave representation, the visual information is phenemes, and the
timing information is timing values.
12. The method of claim 11 wherein the aligned display illustrates a
spacial relationship between the phenemes and the component parts of the
digital acoustic wave representation.
13. The method of claim 12 further comprising the step of automatically
generating the digital acoustic wave representation for display on the
display device with the animation means.
14. The method of claim 13 wherein the timing values are RECITE command
timing values.
15. The method of claim 14 further comprising the steps of automatically
computing and displaying the RECITE command timing values and the phenemes
with the animation means.
16. The method of claim 10 wherein the animation means includes a real-time
random access interface driver, scripting language, and animation and
vivification engine language. |
|
|
|
|
Claims  |
|
|
Description  |
|
|
A portion of the disclosure of this patent document contains material which
is subject to copyright protection. The copyright owner has no objection
to the facsimile reproduction by any one of the patent disclosure, as it
appears in the Patent and Trademark Office patent files or records, but
otherwise reserves all copyright fights whatsoever.
BACKGROUND OF THE INVENTION
The present invention relates generally to computerized animation methods
and, more specifically to a method and apparatus for creation and control
of random access sound-synchronized talking synthetic actors and animated
characters.
It is well-known in the prior art to provide video entertainment or
teaching tools employing time synchronized sequences of pre-recorded video
and audio. The prior art is best exemplified by tracing the history of the
motion picture and entertainment industry from the development of the
"talkies" to the recent development of viewer interactive movies.
In the late nineteenth century the first practical motion pictures
comprising pre-recorded sequential frames projected onto a screen at 20 to
30 frames per second to give the effect of motion were developed. In the
1920's techniques to synchronize a pre-recorded audio sequence or sound
track with the motion picture were developed. In the 1930's animation
techniques were developed to produce hand drawn cartoon animations
including animated figures having lip movements synchronized with an
accompanying pre-recorded soundtrack. With the advent of computers, more
and more effort has been channeled towards the development of computer
generated video and speech including electronic devices to synthesize
human speech and speech recognition systems.
In a paper entitled "KARMA: A system for Storyboard Animation" authored by
F. Gracer and M. W. Blasgen, IBM Research Report RC 3052, dated Sep. 21,
1970, an interactive computer graphics program which automatically
produces the intermediate frames between a beginning and ending frame is
disclosed. The intermediate frames are calculated using linear
interpolation techniques and then produced on a plotter. In a paper
entitled "Method for Computer Animation of Lip Movements", IBM Technical
Disclosure Bulletin, Vol. 14 No. 10 March, 1972, pages 5039, 3040, J. D.
Bagley and F. Gracer disclosed a technique for computer generated lip
animation for use in a computer animation system. A speech-processing
system converts a lexical presentation of a script into a string of
phonemes and matches it with an input stream of corresponding live speech
to produce timing data. A computer animation system, such as that
described hereinabove, given the visual data for each speech sound,
generates intermediate frames to provide a smooth transition from one
visual image to the next to produce smooth animation. Finally the timing
data is utilized to correlate the phonetic string with the visual images
to produce accurately timed sequences of visually correlated speech
events.
Recent developments in the motion picture and entertainment industry relate
to active viewer participation as exemplified by video arcade games and
branching movies. U.S. Pat. Nos. 4,305,131; 4,333,152; 4,445,187 and
4,569,026 relate to remote-controlled video disc devices providing
branching movies in which the viewer may actively influence the course of
a movie or video game story. U.S. Pat. No. 4,569,026 entitled "TV Movies
That Talk Back" issued on Feb. 4, 1986 to Robert M. Best discloses a video
game entertainment system by which one or more human viewers may vocally
or manually influence the course of a video game story or movie and
conduct a simulated two-way voice conversation with characters in the game
or movie. The system comprises a special-purpose microcomputer coupled to
a conventional television receiver and a random-access videodisc reader
which includes automatic track seeking and tracking means. One or more
hand-held input devices each including a microphone and visual display are
also coupled to the microcomputer. The microcomputer controls retrieval of
information from the videodisc and processes viewers' commands input
either vocally or manually through the input devices and provides audio
and video data to the television receiver for display. At frequent branch
points in the game, a host of predetermined choices and responses are
presented to the viewer. The viewer may respond using representative code
words either vocally or manually or a combination of both. In response to
the viewer's choice, the microprocessor manipulates pre-recorded video and
audio sequences to present a selected scene or course of action and menu.
In a paper entitled "Soft Machine: A Personable Interface", "Graphics
Interface '84", John Lewis and Patrick Purcell disclose a system which
simulates spoken conversation between a user and an electronic
conversational partner. An animated person-likeness "speaks" with a speech
synthesizer and "listens" with a speech recognition device. The audio
output of the speech synthesizer is simultaneously coupled to a speaker
and to a separate real-time format-tracking speech processor computer to
be analyzed to provide timing data for lip synchronization and limited
expression and head movements. A set of pre-recorded visual images
depicting lip, eye and head positions are properly sequenced so that the
animated person-likeness "speaks" or "listens". The output of the speech
recognition device is matched against pre-recorded patterns until a match
is found. Once a match is found, one of several pre-recorded responses is
either spoken or executed by the animated person-likeness.
Both J. D. Bagley et al and John Lewis et al require a separate
format-tracking speech processor computer to analyze the audio signal to
provide real-time data to determine which visual image or images should be
presented to the user. The requirement for this additional computer adds
cost and complexity to the system and introduces an additional source of
error. Further, neither Bagley et al nor Lewis address techniques and
processes for constructing authoring systems.
SUMMARY OF THE INVENTION
The present invention provides a method and apparatus for a random access
animation user interface environment referred to as interFACE, which
enables a user to cream and control animated lip-synchronized images or
objects utilizing a personal computer, and incorporate them into their own
programs and products. The present invention may be utilized as a general
purpose learning tool, interface device between a user and a computer, in
video games, in motion pictures and in commercial applications such as
advertising, information kiosks and telecommunications. Utilizing a
real-time random-access interface driver (RAVE) together with a
descriptive authoring language called RAVEL (real-time random-access
animation and vivification engine language), synthesized actors,
hereinafter referred to as "synactors", representing real or imaginary
persons and animated characters, objects or scenes can be created and
programmed to perform actions including speech which are not sequentially
pre-stored records of previously enacted events. Animation and sound
synchronization are produced automatically and in real-time.
The communications patterns--the sounds and visual images of a real or
imaginary person or of an animated character associated with those
sounds--are input to the system and decomposed into constituent parts to
produce fragmentary images and sounds. Alternatively, or in conjunction
with this, well known speech synthesis methods may also be employed to
provide the audio. That set of communications characteristics is then
utilized to define a digital model of the motions and sounds of a
particular synactor or animated character. A synactor that represents the
particular person or animated character is defined by a RAVEL table
containing the coded instructions for dynamically accessing and combining
the video and audio characteristics to produce real-time sound and video
coordinated presentations of the language patterns and other behavior
characteristics associated with that person or animated character. The
synactor can then perform actions and read or say words or sentences which
were not prerecorded actions of the person or character that the synactor
models. Utilizing these techniques, a synactor may be defined to portray a
famous person or other character, a member of one's family or a friend or
even oneself.
In the preferred embodiment, interFACE, a general purpose system for random
access and display of synactor images on a frame-by-frame basis that is
organized and synchronized with sound is provided. Utilizing the interFACE
system, animation and sound synchronization of a synactor is produced
automatically and in real time. In the preferred embodiment, each synactor
is made up of as many as 120 images, between 8 and 32 of which are devoted
to speaking and 8 to animated expressions.
The speaking images correspond to distinct speech articulations and are
sufficient to create realistic synthetic speaking synactors. The remaining
images allow the synactor to display life-like expressions. Smiles, frowns
and head turns can all be incorporated into the synactor's appearance.
The interFACE system provides the capability to use both synthetic speech
and/or digitized recording to provide the speech for the synactors. Speech
synthesizers can provide unlimited vocabulary while utilizing very little
memory. To make a synactor speak, the text to be spoken is typed on a
computer keyboard or otherwise input to the system. The input text is
first broken down into its phonetic components. Then the sound
corresponding to each component is generated through a speaker as an image
of the synactor corresponding to that component is simultaneously
presented on the display device. Digitized recording provides digital data
representing actual recorded sounds which can be utilized in a computer
system. Utilizing a "synchronization lab" defined by the interFACE system,
a synactor can speak with any digitized sound or voice that is desired.
The preferred embodiment allows both experienced and novice users to
understand and operate the interFACE system for creating, editing and
working with the synactors.
The Dressing Room is where synactors are created and edited and is where
users--and synactors--spend most of their time. The Dressing Room
comprises menus and tools which allow the user to navigate between
synactor images by pressing or clicking with a mouse or other input
device. Within the Dressing Room, the image of the synactor is placed in a
screen area referred to as the synactor Easel. Utilizing "paint tools" or
"face clip art", the user can create and edit a synactor. With a paint
tool, a synactor may be drawn from scratch or, with clip art, a synactor
may be created by copying and "pasting" eyes, ears, noses and even mouths
selected from prestored sets of the different features. In addition to
providing fundamental paint tools, the dressing room provides
import/export ability, synactor resize/conversion commands and a variety
of animation tools. The tools incorporated within the dressing room are
simple enough to allow a user to easily generate simple cartoons and yet
powerful enough to create complex animation.
Once a synactor has been created or built in the dressing room, the
synactor is transferred to a Stage screen where audio/lip synchronization
and animation of the synactor can be observed. The stage screen includes a
text field wherein a user can enter text and observe the synactor speak
the entered text. If the synactor thus created needs additional work, the
user can return the synactor to the dressing room for touchup. If the user
is satisfied with the synactor, the synactor can then be saved to memory
for future use.
In the interFACE system, the synactor file is manipulated like a document
in any application. Opening (transferring a synactor file to the dressing
room), editing and deleting synactors from memory is accomplished from
menus and other control tools. Sound resources comprising digitized sounds
are also copied and deleted from windows. The digitized sound resources
are synchronized with the image of the synactor in a mode referred to as
the interFACE speech synchronization lab (Speech Sync Lab). The Speech
Sync Lab examines the sound and automatically creates a phonetic string
which is used to create the animation and sound synchronization of the
synactor. The Speech Sync Lab provides several complementary methods which
allows a user, either automatically or manually, to generate, edit and
optimize a RECITE command. The RECITE command identifies for the RAVE
driver both the sound resource to use and the phonetic string including
associated timing values which produces the desired animation of the
associated synactor. The Speech Sync Lab also provides for testing and
refinement of the animation. If the resulting synchronization is not
correct, the user can modify the RECITE command manually.
The above described functions and screens are coordinated together and
accessed via menus and a "navigation palette". The navigational palette or
window includes four screen buttons providing a user easy navigation
through the various screens of the interFACE system features and online
help system.
The RAVE driver is responsible for the animation and sound synchronization
of the synactors. RAVEL defines and describes the synactor while the RAVE
scripting language is an active language which controls the synactor after
it has been created by a user. RAVE scripting language commands enable a
programmer to control the RAVE driver for use with an application program
created by a programmer utilizing a desired programming system. Utilizing
facilities provided in the programming system to call external functions,
the programmer invokes the RAVE system and passes RAVE scripting language
commands as parameters to it. A RAVE script command controller interprets
these commands to provide control of the synactor.
When a synactor has been created, it is controlled in an application
program by scripts through the RAVE scripting language. All of the
onscreen animation is controlled by scripts in the host system through the
RAVE scripting language. Various subroutines referred to as external
commands ("XCMD") or external functions ("XFCN") are utilized to perform
functions not available in the host language, for example creating
synactors from the dressing room. The RAVE XCMD processes information
between the RAVE scripts and the RAVE driver. Separate commands are
utilized to enable users to open, close, move, hide or show the synactor
and to cause the synactor to speak. An application program may have these
commands built in, selected among or generated by the RAVE driver itself
at runtime.
The interFACE system of the present invention provides a user with the
capability to quickly and efficiently create advanced animated talking
agents (synactors) to provide an interface between users and computers.
BRIEF DESCRIPTION OF THE DRAWING
A fuller understanding of the present invention and of its features and
advantages will become apparent from the following detailed description
taken in conjunction with the accompanying drawing which forms a part of
the specification and in which:
FIG. 1 is a block diagram of a system which displays computer generated
visual images with real time synchronized computer generated speech
according to the principles of the present invention;
FIG. 2 is a conceptual block diagram illustrating the interFACE
synchronized animation system as implemented in the system shown in FIG.
1;
FIG. 3 is a functional block diagram illustrating the major data flows and
processes for the system shown in FIG. 2;
FIG. 4 is a functional block diagram illustrating a hierarchical overview
of the InterFACE screens;
FIG. 5 is a diagram illustrating the navigation panel as displayed on a
screen in the system shown in FIG. 4;
FIG. 6 is a presentation of the Dressing Room screen of the system shown in
FIG. 4;
FIG. 7a is a presentation illustrating the stage of the screen system shown
in FIG. 4;
FIG. 7b is an illustration of the Stage Menu;
FIG. 8a is a presentation illustrating the Speech Sync Lab screen of the
system shown in FIG. 4;
FIG. 8b is an illustration of the Speech Sync Lab menu;
FIGS. 9a-9k are detailed screen presentations illustrating various menus
and screen windows of the system shown in FIG. 4;
FIG. 10 is a diagram illustrating the data structure of a synactor model
record according to the principles of the present invention;
FIG. 11 illustrates selected speaking images correlated with speech samples
for animation according to the principles of the present invention;
FIG. 12 is a diagram illustrating a digital representation of a sound
sample;
FIGS. 13a-13d are dendrogramic representations of acoustic segmentation for
selected sound samples;
FIGS. 14 and 14.2-14.5 are a source code program listing for a standard
synactor in accordance with the present invention;
FIGS. 15 and 15.2-15.4 are a source code program listing for an extended
synactor in accordance with the present invention;
FIGS. 16 and 16.2-16.7 are a source code program listing for a
coarticulated synactor in accordance with the present invention;
FIGS. 17 and 17.2-17.8 illustrate voice reconciliation phoneme table and
generic phoneme tables in accordance with the present invention;
FIG. 18 is a listing of microprocessor instructions for a CD-ROM in
accordance with the present invention; and
FIGS. 19 and 19.2-19.3 are a script flow for CD synchronization in
accordance with the present invention.
DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENT
Referring now to FIG. 1, in one preferred embodiment of the present
invention, a special purpose microcomputer comprises a program-controlled
microprocessor 10 (a Motorola MC68000 is suitable for this purpose),
random-access memory (RAM) 20, read-only memory (ROM) 11, disc drive 13,
video and audio input devices 7 and 9 and user input devices such as
keyboard 15 or other input devices 17 and output devices such as video
display 19, audio output device 25 and CD-ROM drive 4 and its associated
speaker 3. RAM 20 is divided into at least four blocks which are shared by
the microprocessor 10 and the various input and output devices.
The video output device 19 may be any visual output device such as a
conventional television set or the CRT monitor for a personal computer.
The video output 19 and video generation 18 circuitry are controlled by
the microprocessor 10 and share display RAM buffer space 22 to store and
access memory mapped video The video generation circuits also provide a 60
Hz timing signal interrupt to the microprocessor 10.
Also sharing the audio RAM buffer space 23 with the microprocessor 10 is
the audio generation circuitry 26 which drives the audio output device 25.
Audio output device 25 may be a speaker or some other type of audio
transducer such as a vibrator to transmit to the hearing impaired.
Disc controller 12 shares the disc RAM 21 with the microprocessor 10 and
provides control reading from and writing to a suitable non-volatile mass
storage medium, such as floppy disc drive 13, for long-term storing of
synactors that have been created using the interFACE system and to allow
transfer of synactor resources between machines.
Input controller 16 for the keyboard 15 and other input devices 17 is
coupled to microprocessor 10 and also shares disc RAM 21 with the disc
controller 12. This purpose may be served by a Synertek SY6522 Versatile
Interface Adaptor. Input controller 16 also coordinates certain tasks
among the various controllers and other microprocessor support circuitry
(not shown). A pointing input device 17 such as a mouse or light pen is
the preferred input device because it allows maximum interaction by the
user. Keyboard 15 is an optional input device in the preferred embodiment,
but in other embodiments may function as the pointing device, or be
utilized by an instructor or programmer to create or modify instructional
programs or set other adjustable parameters of the system. Other pointing
and control input devices such as a joy stick, a finger tip (in the case
of a much screen) or an eye-motion sensor are also suitable. An audio
digitizer 6 such as a MacRecorder, available from Farallon Computing,
Inc., to provide pre-digitized sound samples to the microprocessor may
also be coupled to the input controller 16.
RAM 24 is the working memory for the microprocessor 10. The RAM 24 contains
the system and applications programs and other information required by the
microprocessor 10. Microprocessor 10 also accesses ROM 11 which is the
system's permanent read-only memory. ROM 11 contains the operational
routines and subroutines required by the microprocessor 10 operating
system, such as the routines to facilitate disc and other device I/O,
graphics primitives and real time task management, etc. These routines may
be additionally supported by extensions and patches in RAM 24 and on disc.
Controller 5 is a serial communications controller such as a Zilog Z8530
SCC chip. Digitized samples of video and audio may be input into the
system in this manner to provide characteristics for the animated
characters and sound resources for synthesized speech. Digitizer 8
comprises an audio digitizer and a video digitizer coupled to the video
and audio inputs 7 and 9, respectively. Standard microphones, video
cameras and VCRs will serve as input devices. These input devices are
optional since digitized video and audio samples may be input into the
system by keyboard 15 or disc drive 13 or may be resident in ROM 11.
Controller 5 may also control a CD-ROM drive 4 and its associated
independent speaker 3.
Referring now also to FIG. 2, a conceptual block diagram of the animated
synthesized actor, hereinafter referred to as synactor, editing or
authoring and application system according to the principles of the
present invention is shown. The animation system of the present invention,
hereinafter referred to as "interFACE", is a general purpose system which
provides a user with the capability to create and/or edit synactors and
corresponding speech scripts and to display on a frame-by-frame basis the
synactors thus created. The interFACE system provides animation and sound
synchronization automatically and in real time. To accomplish this, the
interFACE system interfaces with a real time random access driver
(hereinafter referred to as "RAVE") together with a descriptive authoring
language (hereinafter referred to as "RAVEL") which is implemented by the
system shown in FIG. 1.
Prototype models for the types of synactors to be edited by the authoring
system are input via various input devices 31. The prototype models may
comprise raw video data which is convened to digital data in video
digitizer 33 or other program data which is compiled by a RAVEL compiler
37. The prototype synactors are saved in individual synactor fries
identified by the name of the corresponding synactor. The synactor files
are stored in memory 39 for access by the interFACE system as required.
Memory 39 may be a disk storage or other suitable peripheral storage
device.
To create a new synactor or to edit an existing prototype synactor, the
interFACE system is configured as shown by the blocks included in the
create/edit block 30. The author system shell 41 allows the user to access
any synactor file via RAM 20 and display the synactor and its images in a
number of screen windows which will be described in detail hereinbelow.
Utilizing the various tools provided, and the script command controller
43, the user is able to create a specific synactor and/or create and test
speech and behavior scripts for use in a specific application program. The
new synactor thus created may be saved in the original file or in a new
file identified by a name for the new synactor. The synactor is saved as a
part of a file called a resource. The microprocessor 10 provides
coordination of the processes and control of the input/output (I/O)
functions for the system.
When using a synactor, as an interactive agent between a user and an
applications program, for example, the interFACE system is configured as
shown by the use block 40. User input to the applications controller 45
will call the desired synactor resource from a file in memory 39 via RAM
20. The script command controller 43 interprets scripts from the
application controller 45 and, using the RAVE driver, provides the
appropriate instructions to the display and the microprocessor 10.
Similarly, as during the create (and test) process, the microprocessor 10
provides control and coordination of the processes and I/O functions for
the interFACE system via a RAVER driver.
Referring now to FIG. 3, a functional block diagram illustrating the major
data flows, processes and events required to provide speech and the
associated synchronized visual animation is shown. A detailed description
of the processes and events that take place in the RAVE system is given in
U.S. Pat. No. 4,884,972, and in U.S. patent applications Ser. No.
07/384,243 entitled "Authoring and Use Systems for Sound Synchronized
Animation" filed on Jul. 21, 1989, now U.S. Pat. No. 5,111,409 issued May
5, 1992, and Ser. No. 07/497,937 entitled "Voice Animation System" filed
on Mar. 23, 1990, now U.S. Pat. No. 5,278,943 issued Jan. 11, 1994, all of
which are assigned to the instant assignee, and which are all incorporated
by reference as if fully set forth herein and will not be repeated in full
here.
The interFACE system comprises the author system shell 41, the application
controller 45, the script command processor 49 and associated user input
devices 47, which may include one or more input devices as shown in FIG.
1, and which is interfaced with the RAVE system at the script command
processor 49. In response to a user input, the application controller 45
or the author system shell 41 calls on the microprocessor 10 to fetch from
a file in memory 39 a synactor resource containing the audio and visual
characteristics of a particular synactor. As required by user input, the
microprocessor will initiate the RAVE sound and animation processes.
Although both the author system shell 41 and the application controller 45
access the script command processor 49, the normal mode of operation would
be for a user to utilize the author system shell 41 to create/edit a
synactor and, at a subsequent time, utilize the application controller 45
to call up a synactor for use (that is, speech and visual display) either
alone or coordinated with a particular application program.
The interFACE software system is a "front end" program that interfaces a
host computer system as illustrated in FIG. 1 to the RAVE system to enable
a user to create and edit synactors. The system comprises a number of
modes each with an associated screen display, plus a navigation window or
"palette" for selecting and changing system modes, a set of menus with
commands (some of which vary with the mode currently selected), and
additional screen displays (alternately referred to as "cards" or
"windows") which are displayed in associated modes. The screen displays
have activatable areas referred to as buttons that respond to user actions
to initiate preprogrammed operations or to call up other windows, tools or
subroutines. The buttons may be actuated by clicking a mouse on them or by
other suitable methods, using a touch-screen for example. The screen
displays also may include editable text areas, referred to as "fields".
The interFACE system comprises a number of such displays which the user
moves between by activating window items or "pressing" buttons (that is,
by clicking on a button via a mouse) to create, edit and work with
synactors. A preferred embodiment of the interFACE system in accordance
with the principles of the present invention is described in the
"Installation Guide" and in the "User's Guide" for "interFACE, the
Interactive Facial Animation Construction Environment" which are also
incorporated by reference as if fully set forth herein.
Referring now to FIGS. 4-9, various diagrams illustrating an overview of
the interFACE system menus, windows, and screens are shown. Tables I and
II list the various menus and windows utilized by the interFACE system.
TABLE I
______________________________________
interFACE MENUS
______________________________________
1 100 Apple
2 101 File
3 110 Edit
4 102 Go
5 106 Actors
6 107 Sounds
7 104 Dressing Room
8 108 Speech Sync
9 114 Stage
10 122 Clip Art
______________________________________
TABLE II
______________________________________
interFACE Windows and Tables
______________________________________
1 100 Menu Screen
2 173 Untitled
3 155 Text
4 156 Phonetic
5 182 Go
6 107 Speech Sync
7 168 MacRecorder
8 103 ActorNav
9 169 Stage
10 140 dBox
11 152 Tool Palette
12 104 Line Palette
13 102 Pattern Palette
14 145 Color Palette
15 144 Brush Palette
16 153 Voice Palette
17 183 Express Palette
18 163 actorPref
19 165 Print
20 178 Scrap
21 179 Convert
22 184 Resize
23 186 wd2Print
24 175 Nav
25 187 layer
26 189 CDRecite
27 190 GUI sync
28 191 Dend | | |