WikiPatents - Community Patent Review
Create Free Account  |  License or Sell Your Patent  |  WikiPatents Marketplace  |  WikiPatents Blog
Username:  Password:  
    
Advanced Search
Advanced tools for speech synchronized animation    
United States Patent5613056   
Link to this pagehttp://www.wikipatents.com/5613056.html
Inventor(s)Gasper; Elon (Bellevue, WA); Matthews, III; Joseph H. (Bothell, WA); Wesley; Richard (Seattle, WA)
AbstractA random access animation user interface environment referred to as interFACE enabling a user to create and control animated lip-synchronized images or objects utilizing a personal computer for use in the users programs and products. A real-time random-access interface driver (RAVE) together with a descriptive authoring language (RAVEL) is used to provide synthesized actors ("synactors"). The synactors may represent real or imaginary persons or animated characters, objects or scenes. The synactors may be created and programmed to perform actions including speech which are not sequentially pre-stored records of previously enacted events. Furthermore, animation and sound synchronization may be produced automatically and in real-time. Sounds and visual images of a real or imaginary person or animated character associated with those sounds are input to a system and may be decomposed into constituent parts to produce fragmentary images and sounds. A set of characteristics is utilized to define a digital model of the motions and sounds of a particular synactor. The general purpose system is provided for random access and display of synactor images on a frame-by-frame basis, which is organized and synchronized with sound. Both synthetic speech and digitized recording may provide the speech for synactors.
   














 Title Information Submit all comments and votes
 
Patent Text Patent PDF Print Page Summary File History
Plain text PDF images Print Summary File History
Drawing from US Patent 5613056
Advanced tools for speech synchronized animation - US Patent 5613056 Drawing
Advanced tools for speech synchronized animation
Inventor     Gasper; Elon (Bellevue, WA); Matthews, III; Joseph H. (Bothell, WA); Wesley; Richard (Seattle, WA)
Owner/Assignee     Bright Star Technology, Inc. (Bellevue, WA)
Patent assignment
All assignments
Publication Date     March 18, 1997
Application Number     08/457,269
PAIR File History     Application Data   Transaction History
Image File Wrapper   Patent Term   Fees
Litigation
Filing Date     May 31, 1995
US Classification     345/473 704/270.1 704/276 715/500.1
Int'l Classification     G06T 013/00
Examiner     Zimmerman; Mark K.
Assistant Examiner    
Attorney/Law Firm     LaRiviere, Grubman & Payne
Address
Parent Case     This application is a division application of application Ser. No. 08/065,704, filed May 20, 1993, which is a continuation of application Ser. No. 07/657,714, filed Feb. 19, 1991, abandoned. A portion of the disclosure of this patent document contains material which is subject to copyright protection. The copyright owner has no objection to the facsimile reproduction by any one of the patent disclosure, as it appears in the Patent and Trademark Office patent files or records, but otherwise reserves all copyright rights whatsoever.
Priority Data    
USPTO Field of Search     395/152 395/2.44 395/2.69 395/2.74 395/2.85 395/154
Patent Tags     advanced tools speech synchronized animation
   
Enter a comma (,) or semicolon (;) between multiple tag words/phrases.
Describe this patent:
 Amusing   
 Clever   
 Complex   
 Efficient   
 Historic   
 Important   
 Innovative   
 Interesting   
 Practical   
 Simple   
[no votes]
Patent WIKI

Share information and news about this patent, including information and news about the technology, inventors, company, ligation and licensing.

 References Submit all comments and votes
 
*references marked with an asterisk below are user-added references
 U.S. References
 
Add a new US reference:  
ReferenceRelevancyCommentsReferenceRelevancyComments
5214758
Ohba
345/473
May,1993

[0 after 0 votes]
5101364
Davenport
715/723
Mar,1992

[0 after 0 votes]
5025394
Parke
345/475
Jun,1991

[0 after 0 votes]
4884972
Gasper
434/185
Dec,1989

[0 after 0 votes]
4841575
Welsh
704/260
Jun,1989

[0 after 0 votes]
4260229
Bloomstein
352/50
Apr,1981

[0 after 0 votes]
5111409
Gasper
715/500.1
Dec,1969

[0 after 0 votes]
 Foreign References
 Other References
 Market Review Submit all comments and votes
   
Market Size
Estimate the gross annual revenues of the relevant market sector:
> $10B
$5B - $10B
$2B - $5B
$500M - $2B
$100M - $500M
$10M - $100M
$1M - $10M
$500K - $1M
$100K - $500K
< $100K
[No votes]
$0
 
$0   $2.5B   $5B   $7.5B   $10B
Market Share
Estimate the percentage of the relevant market sector this invention will capture:
75% - 100%
50% - 74.99%
25% - 49.99%
10 - 24.99%
5 - 9.99%
2 - 4.99%
1 - 1.99%
< 1%
[No votes]
0.0%
 
0%   25%   50%   75%   100%
Reasonable Royalty
What percentage of gross sales should the inventor or assignee be paid?
75% - 100%
50% - 74.99%
25% - 49.99%
10 - 24.99%
5 - 9.99%
2 - 4.99%
1 - 1.99%
< 1%
[No votes]
0.0%
 
0%   25%   50%   75%   100%
Public's "Guesstimation" of Royalty Value
Market SizeN/A[No votes]
xMarket ShareN/A[No votes]
xReasonable RoyaltyN/A[No votes]

N/A

License Availablity
If you are NOT the owner or assignee, answer here:
Yes, license is available for purchase

No, license is not currently available



[No votes]
License Availablity
If you ARE the owner or assignee, answer here:
Yes, license is available for purchase

No, license is not currently available



[No votes]
Competitive Advantage
Does this invention have a significant competitive advantage over similar technologies?
Yes

No



[No votes]
Most helpful competitive advantage comment
[No comments]

Commercial Alternatives
Are there viable commercial alternatives for this invention?
Yes

No



[No votes]
Most helpful commercial alternative comment
[No comments]

 Technical Review Submit all comments and votes
 Claims Submit all comments and votes
 


We claim:

1. An apparatus for providing an animated object with facial images synchronized to sound, the facial images corresponding to the sound; the sound defined by user inputted text, the apparatus comprising:

a programmed computer including memory, real-time random access animation and vivification engine drivers, a real-time controller and at least one microprocessor;

at least one input device coupled to the computer for inputting text to the computer;

means of the computer for instructing the computer: to retrieve prerecorded digitized sound corresponding to the user inputted text, to determine a time value for the digitized sound, to store the time value in the memory, to convert the user inputted text to a phonetic string, to use the phonetic string to convert the digitized sound into a list of phocodes, to provide an associated timing value for each of the phocodes, to map each of the phocodes with its associated timing value to form pairs of data, to store each of the pairs of data in tabular form in the memory, to add each associated timing value to provide a sum total, to compare the sum total obtained by adding each associated timing value to the time value for the digitized sound, and to proportionally adjust as necessary each associated timing value to make the sum total obtained by adding each associated timing value approximately equal the time value for the digitized sound;

audio output devices coupled to the computer for receiving the digitized sound and for providing the sound;

the computer for providing a sequence of the facial images corresponding to the pairs of data as adjusted as necessary; and

a display device coupled to the computer for displaying the sequence of the facial images;

the real-time controller of the computer for controlling displaying of the sequence of the facial images to the sound to provide the animated object with the facial images synchronized with the sound, the sound defined by the user inputted text.

2. For a programmed computer having memory and real-time random access animation and vivification engine drivers, a method for synchronizing synactor facial images to corresponding sound comprising the steps of:

inputting text to be pronounced;

retrieving digitized sound corresponding to the text;

determining playing length of the digitized sound;

converting the text into a corresponding phonetic representation;

converting the digitized sound into photodes;

providing an associated timing value for each of the phocodes;

mapping each of the phocodes with the associated timing value to a look-up table for forming pairs of data in the look-up table;

determining a sum for all of the associated timing values;

comparing the sum to the playing length; and

adjusting the associated timing values proportionally to the playing length.

3. A method as in claim 2 further comprising the steps of:

obtaining determined ones of the phocodes from the look-up table;

obtaining the associated timing value for each of the determined ones of the phocodes;

obtaining phonetic codes corresponding to the determined ones of the phocodes; and

creating phonetic code/time value pairs from the phonetic codes and the associated timing value for each of the phonetic codes.

4. A method as in claim 3 further comprising the steps of:

creating a RECITE command with the phonetic code/time value pairs;

providing editing means for editing the RECITE command; and

editing the RECITE command with editing means.

5. For a programmed computer having memory and real-time random access animation and vivification engine drivers, a method for synchronizing synactor facial images to corresponding sound comprising the steps of:

inputting text to be pronounced;

retrieving digitized sound corresponding to the text;

determining playing length of the digitized sound;

storing the playing length of the digitized sound in memory;

converting the text into a corresponding phonetic representation;

converting the digitized sound into phocodes;

providing an associated timing value for each of the phocodes;

mapping each of the phocodes with the associated timing value to a look-up table, the phocodes and the associated timing values forming pairs of data in the look-up table;

determining a sum for all of the associated timing values;

comparing the sum to the playing length; and

rounding to whole numbers the associated timing values for adjusting the associated timing values proportionally to the playing length.

6. A method as in claim 5 further comprising the steps of:

obtaining determined ones of the phocodes from the look-up table;

obtaining the associated timing value for each of the determined ones of the phocodes;

obtaining phonetic codes corresponding to the determined ones of the phocodes; and

creating phonetic code/time value pairs from the phonetic codes and the associated timing value for each of the phonetic codes.

7. A method as in claim 6 further comprising the steps of:

creating a RECITE command with the phonetic code/time value pairs;

providing editing means for editing the RECITE command; and

editing the RECITE command with editing means.
 Description Submit all comments and votes
 


BACKGROUND OF THE INVENTION

The present invention relates generally to computerized animation methods and, more specifically to a method and apparatus for creation and control of random access sound-synchronized talking synthetic actors and animated characters.

It is well-known in the prior art to provide video entertainment or teaching tools employing time synchronized sequences of pre-recorded video and audio. The prior art is best exemplified by tracing the history of the motion picture and entertainment industry from the development of the "talkies" to the recent development of viewer interactive movies.

In the late nineteenth century the first practical motion pictures comprising pre-recorded sequential frames projected onto a screen at 20 to 30 frames per second to give the effect of motion were developed. In the 1920's techniques to synchronize a pre-recorded audio sequence or sound track with the motion picture were developed. In the 1930's animation techniques were developed to produce hand drawn cartoon animations including animated figures having lip movements synchronized with an accompanying pre-recorded soundtrack. With the advent of computers, more and more effort has been channeled towards the development of computer generated video and speech including electronic devices to synthesize human speech and speech recognition systems.

In a paper entitled "KARMA: A system for Storyboard Animation" authored by F. Gracer and M. W. Blasgen, IBM Research Report RC 3052, dated Sep. 21, 1970, an interactive computer graphics program which automatically produces the intermediate frames between a beginning and ending frame is disclosed. The intermediate frames are calculated using linear interpolation techniques and then produced on a plotter. In a paper entitled "Method for Computer Animation of Lip Movements", IBM Technical Disclosure Bulletin, Vol. 14 No. 10 March, 1972, pages 5039, 3040, J. D. Bagley and F. Gracer disclosed a technique for computer generated lip animation for use in a computer animation system. A speech-processing system converts a lexical presentation of a script into a string of phonemes and matches it with an input stream of corresponding live speech to produce timing data. A computer animation system, such as that described hereinabove, given the visual data for each speech sound, generates intermediate frames to provide a smooth transition from one visual image to the next to produce smooth animation. Finally the timing data is utilized to correlate the phonetic string with the visual images to produce accurately timed sequences of visually correlated speech events.

Recent developments in the motion picture and entertainment industry relate to active viewer participation as exemplified by video arcade games and branching movies. U.S. Pat. Nos. 4,305,131; 4,333,152; 4,445,187 and 4,569,026 relate to remote-controlled video disc devices providing branching movies in which the viewer may actively influence the course of a movie or video game story. U.S. Pat. No. 4,569,026 entitled "TV Movies That Talk Back" issued on Feb. 4, 1986 to Robert M. Best discloses a video game entertainment system by which one or more human viewers may vocally or manually influence the course of a video game story or movie and conduct a simulated two-way voice conversation with characters in the game or movie. The system comprises a special-purpose microcomputer coupled to a conventional television receiver and a random-access videodisc reader which includes automatic track seeking and tracking means. One or more hand-held input devices each including a microphone and visual display are also coupled to the microcomputer. The microcomputer controls retrieval of information from the videodisc and processes viewers' commands input either vocally or manually through the input devices and provides audio and video data to the television receiver for display. At frequent branch points in the game, a host of predetermined choices and responses are presented to the viewer. The viewer may respond using representative code words either vocally or manually or a combination of both. In response to the viewer's choice, the microprocessor manipulates pre-recorded video and audio sequences to present a selected scene or course of action and menu.

In a paper entitled "Soft Machine: A Personable Interface", "Graphics Interface '84", John Lewis and Patrick Purcell disclose a system which simulates spoken conversation between a user and an electronic conversational partner. An animated person-likeness "speaks" with a speech synthesizer and "listens" with a speech recognition device. The audio output of the speech synthesizer is simultaneously coupled to a speaker and to a separate real-time format-tracking speech processor computer to be analyzed to provide timing data for lip synchronization and limited expression and head movements. A set of pre-recorded visual images depicting lip, eye and head positions are properly sequenced so that the animated person-likeness "speaks" or "listens". The output of the speech recognition device is matched against pre-recorded patterns until a match is found. Once a match is found, one of several pre-recorded responses is either spoken or executed by the animated person-likeness.

Both J. D. Bagley et al and John Lewis et al require a separate format-tracking speech processor computer to analyze the audio signal to provide real-time data to determine which visual image or images should be presented to the user. The requirement for this additional computer adds cost and complexity to the system and introduces an additional source of error. Further, neither Bagley et al nor Lewis address techniques and processes for constructing authoring systems.

SUMMARY OF THE INVENTION

The present invention provides a method and apparatus for a random access animation user interface environment referred to as interFACE, which enables a user to create and control animated lip-synchronized images or objects utilizing a personal computer, and incorporate them into their own programs and products. The present invention may be utilized as a general purpose learning tool, interface device between a user and a computer, in video games, in motion pictures and in commercial applications such as advertising, information kiosks and telecommunications. Utilizing a real-time random-access interface driver (RAVE) together with a descriptive authoring language called RAVEL (real-time random-access animation and vivification engine language), synthesized actors, hereinafter referred to as "synactors", representing real or imaginary persons and animated characters, objects or scenes can be created and programmed to perform actions including speech which are not sequentially pre-stored records of previously enacted events. Animation and sound synchronization are produced automatically and in real-time.

The communications patterns--the sounds and visual images of a real or imaginary person or of an animated character associated with those sounds--are input to the system and decomposed into constituent parts to produce fragmentary images and sounds. Alternatively, or in conjunction with this, well known speech synthesis methods may also be employed to provide the audio. That set of communications characteristics is then utilized to define a digital model of the motions and sounds of a particular synactor or animated character. A synactor that represents the particular person or animated character is defined by a RAVEL table containing the coded instructions for dynamically accessing and combining the video and audio characteristics to produce real-time sound and video coordinated presentations of the language patterns and other behavior characteristics associated with that person or animated character. The synactor can then perform actions and read or say words or sentences which were not prerecorded actions of the person or character that the synactor models. Utilizing these techniques, a synactor may be defined to portray a famous person or other character, a member of one's family or a friend or even oneself.

In the preferred embodiment, interFACE, a general purpose system for random access and display of synactor images on a frame-by-frame basis that is organized and synchronized with sound is provided. Utilizing the interFACE system, animation and sound synchronization of a synactor is produced automatically and in real time. In the preferred embodiment, each synactor is made up of as many as 120 images, between 8 and 32 of which are devoted to speaking and 8 to animated expressions.

The speaking images correspond to distinct speech articulations and are sufficient to create realistic synthetic speaking synactors. The remaining images allow the synactor to display life-like expressions. Smiles, frowns and head turns can all be incorporated into the synactor's appearance.

The interFACE system provides the capability to use both synthetic speech and/or digitized recording to provide the speech for the synactors. Speech synthesizers can provide unlimited vocabulary while utilizing very little memory. To make a synactor speak, the text to be spoken is typed on a computer keyboard or otherwise input to the system. The input text is first broken down into its phonetic components. Then the sound corresponding to each component is generated through a speaker as an image of the synactor corresponding to that component is simultaneously presented on the display device. Digitized recording provides digital data representing actual recorded sounds which can be utilized in a computer system. Utilizing a "synchronization lab" defined by the interFACE system, a synactor can speak with any digitized sound or voice that is desired. The preferred embodiment allows both experienced and novice users to understand and operate the interFACE system for creating, editing and working with the synactors.

The Dressing Room is where synactors are created and edited and is where users--and synactors--spend most of their time. The Dressing Room comprises menus and tools which allow the user to navigate between synactor images by pressing or clicking with a mouse or other input device. Within the Dressing Room, the image of the synactor is placed in a screen area referred to as the synactor Easel. Utilizing "paint tools" or "face clip art", the user can create and edit a synactor. With a paint tool, a synactor may be drawn from scratch or, with clip art, a synactor may be created by copying and "pasting" eyes, ears, noses and even mouths selected from prestored sets of the different features. In addition to providing fundamental paint tools, the dressing room provides import/export ability, synactor resize/conversion commands and a variety of animation tools. The tools incorporated within the dressing room are simple enough to allow a user to easily generate simple cartoons and yet powerful enough to create complex animation.

Once a synactor has been created or built in the dressing room, the synactor is transferred to a Stage screen where audio/lip synchronization and animation of the synactor can be observed. The stage screen includes a text field wherein a user can enter text and observe the synactor speak the entered text. If the synactor thus created needs additional work, the user can return the synactor to the dressing room for touchup. If the user is satisfied with the synactor, the synactor can then be saved to memory for future use.

In the interFACE system, the synactor file is manipulated like a document in any application. Opening (transferring a synactor file to the dressing room), editing and deleting synactors from memory is accomplished from menus and other control tools. Sound resources comprising digitized sounds are also copied and deleted from windows. The digitized sound resources are synchronized with the image of the synactor in a mode referred to as the interFACE speech synchronization lab (Speech Sync Lab). The Speech Sync Lab examines the sound and automatically creates a phonetic string which is used to create the animation and sound synchronization of the synactor. The Speech Sync Lab provides several complementary methods which allows a user, either automatically or manually, to generate, edit and optimize a RECITE command. The RECITE command identifies for the RAVE driver both the sound resource to use and the phonetic string including associated timing values which produces the desired animation of the associated synactor. The Speech Sync Lab also provides for testing and refinement of the animation. If the resulting synchronization is not correct, the user can modify the RECITE command manually.

The above described functions and screens are coordinated together and accessed via menus and a "navigation palette". The navigational palette or window includes four screen buttons providing a user easy navigation through the various screens of the interFACE system features and online help system.

The RAVE driver is responsible for the animation and sound synchronization of the synactors. RAVEL defines and describes the synactor while the RAVE scripting language is an active language which controls the synactor after it has been created by a user. RAVE scripting language commands enable a programmer to control the RAVE driver for use with an application program created by a programmer utilizing a desired programming system. Utilizing facilities provided in the programming system to call external functions, the programmer invokes the RAVE system and passes RAVE scripting language commands as parameters to it. A RAVE script command controller interprets these commands to provide control of the synactor.

When a synactor has been created, it is controlled in an application program by scripts through the RAVE scripting language. All of the onscreen animation is controlled by scripts in the host system through the RAVE scripting language. Various subroutines referred to as external commands ("XCMD") or external functions ("XFCN") are utilized to perform functions not available in the host language, for example creating synactors from the dressing room. The RAVE XCMD processes information between the RAVE scripts and the RAVE driver. Separate commands are utilized to enable users to open, close, move, hide or show the synactor and to cause the synactor to speak. An application program may have these commands built in, selected among or generated by the RAVE driver itself at runtime.

The interFACE system of the present invention provides a user with the capability to quickly and efficiently create advanced animated talking agents (synactors) to provide an interface between users and computers.

BRIEF DESCRIPTION OF THE DRAWING

A fuller understanding of the present invention and of its features and advantages will become apparent from the following detailed description taken in conjunction with the accompanying drawing which forms a part of the specification and in which:

FIG. 1 is a block diagram of a system which displays computer generated visual images with real time synchronized computer generated speech according to the principles of the present invention;

FIG. 2 is a conceptual block diagram illustrating the interFACE synchronized animation system as implemented in the system shown in FIG. 1;

FIG. 3 is a functional block diagram illustrating the major data flows and processes for the system shown in FIG. 2;

FIG. 4 is a functional block diagram illustrating a hierarchical overview of the InterFACE screens;

FIG. 5 is a diagram illustrating the navigation panel as displayed on a screen in the system shown in FIG. 4;

FIG. 6 is a presentation of the Dressing Room screen of the system shown in FIG. 4;

FIG. 7a is a presentation illustrating the stage of the screen system shown in FIG. 4;

FIG. 7b is an illustration of the Stage Menu;

FIG. 8a is a presentation illustrating the Speech Sync Lab screen of the system shown in FIG. 4;

FIG. 8b is an illustration of the Speech Sync Lab menu;

FIGS. 9a-9k are detailed screen presentations illustrating various menus and screen windows of the system shown in FIG. 4;

FIG. 10 is a diagram illustrating the data structure of a synactor model record according to the principles of the present invention;

FIG. 11 illustrates selected speaking images correlated with speech samples for animation according to the principles of the present invention;

FIG. 12 is a diagram illustrating a digital representation of a sound sample;

FIGS. 13a-13d are dendrogramic representations of acoustic segmentation for selected sound samples;

FIGS. 14 and 14.2-14.5 are a source code program listing for a standard synactor in accordance with the present invention;

FIGS. 15 and 15.2-15.4 are a source code program listing for an extended synactor in accordance with the present invention;

FIGS. 16 and 16.2-16.7 are a source code program listing for a coarticulated synactor in accordance with the present invention;

FIGS. 17 and 17.2-17.8 illustrate voice reconciliation phoneme table and generic phoneme tables in accordance with the present invention;

FIG. 18 is a listing of microprocessor instructions for a CD-ROM in accordance with the present invention; and

FIGS. 19 and 19.2-19.3 are a script flow for CD synchronization in accordance with the present invention.

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENT

Referring now to FIG. 1, in one preferred embodiment of the present invention, a special purpose microcomputer comprises a program-controlled microprocessor 10 (a Motorola MC68000 is suitable for this purpose), random-access memory (RAM) 20, read-only memory (ROM) 11, disc drive 13, video and audio input devices 7 and 9 and user input devices such as keyboard 15 or other input devices 17 and output devices such as video display 19, audio output device 25 and CD-ROM drive 4 and its associated speaker 3. RAM 20 is divided into at least four blocks which are shared by the microprocessor 10 and the various input and output devices.

The video output device 19 may be any visual output device such as a conventional television set or the CRT monitor for a personal computer. The video output 19 and video generation 18 circuitry are controlled by the microprocessor 10 and share display RAM buffer space 22 to store and access memory mapped video. The video generation circuits also provide a 60 Hz timing signal interrupt to the microprocessor 10.

Also sharing the audio RAM buffer space 23 with the microprocessor 10 is the audio generation circuitry 26 which drives the audio output device 25. Audio output device 25 may be a speaker or some other type of audio transducer such as a vibrator to transmit to the heating impaired.

Disc controller 12 shares the disc RAM 21 with the microprocessor 10 and provides control reading from and writing to a suitable non-volatile mass storage medium, such as floppy disc drive 13, for long-term storing of synactors that have been created using the interFACE system and to allow transfer of synactor resources between machines.

Input controller 16 for the keyboard 15 and other input devices 17 is coupled to microprocessor 10 and also shares disc RAM 21 with the disc controller 12. This purpose may be served by a Synertek SY6522 Versatile Interface Adaptor. Input controller 16 also coordinates certain tasks among the various controllers and other microprocessor support circuitry (not shown). A pointing input device 17 such as a mouse or light pen is the preferred input device because it allows maximum interaction by the user. Keyboard 15 is an optional input device in the preferred embodiment, but in other embodiments may function as the pointing device, or be utilized by an instructor or programmer to create or modify instructional programs or set other adjustable parameters of the system. Other pointing and control input devices such as a joy stick, a finger tip (in the case of a touch screen) or an eye-motion sensor are also suitable. An audio digitizer 6 such as a MacRecorder, available from Farallon Computing, Inc., to provide pre-digitized sound samples to the microprocessor may also be coupled to the input controller 16.

RAM 24 is the working memory for the microprocessor 10. The RAM 24 contains the system and applications programs and other information required by the microprocessor 10. Microprocessor 10 also accesses ROM 11 which is the system's permanent read-only memory. ROM 11 contains the operational routines and subroutines required by the microprocessor 10 operating system, such as the routines to facilitate disc and other device I/O, graphics primitives and real time task management, etc. These routines may be additionally supported by extensions and patches in RAM 24 and on disc.

Controller 5 is a serial communications controller such as a Zilog Z8530 SCC chip. Digitized samples of video and audio may be input into the system in this manner to provide characteristics for the animated characters and sound resources for synthesized speech. Digitizer 8 comprises an audio digitizer and a video digitizer coupled to the video and audio inputs 7 and 9, respectively. Standard microphones, video cameras and VCRs will serve as input devices. These input devices are optional since digitized video and audio samples may be input into the system by keyboard 15 or disc drive 13 or may be resident in ROM 11. Controller 5 may also control a CD-ROM drive 4 and its associated independent speak 3.

Referring now also to FIG. 2, a conceptual block diagram of the animated synthesized actor, hereinafter referred to as synactor, editing or authoring and application system according to the principles of the present invention is shown. The animation system of the present invention, hereinafter referred to as "interFACE", is a general purpose system which provides a user with the capability to create and/or edit synactors and corresponding speech scripts and to display on a frame-by-frame basis the synactors thus created. The interFACE system provides animation and sound synchronization automatically and in real time. To accomplish this, the interFACE system interfaces with a real time random access driver (hereinafter referred to as "RAVE") together with a descriptive authoring language (hereinafter referred to as "RAVEL") which is implemented by the system shown in FIG. 1.

Prototype models for the types of synactors to be edited by the authoring system are input via various input devices 31. The prototype models may comprise raw video data which is converted to digital data in video digitizer 33 or other program data which is compiled by a RAVEL compiler 37. The prototype synactors are saved in individual synactor files identified by the name of the corresponding synactor. The synactor files are stored in memory 39 for access by the interFACE system as required. Memory 39 may be a disk storage or other suitable peripheral storage device.

To create a new synactor or to edit an existing prototype synactor, the interFACE system is configured as shown by the blocks included in the create/edit block 30. The author system shell 41 allows the user to access any synactor file via RAM 20 and display the synactor and its images in a number of screen windows which will be described in detail hereinbelow. Utilizing the various tools provided, and the script command controller 43, the user is able to create a specific synactor and/or create and test speech and behavior scripts for use in a specific application program. The new synactor thus created may be saved in the original file or in a new file identified by a name for the new synactor. The synactor is saved as a part of a file called a resource. The microprocessor 10 provides coordination of the processes and control of the input/output (I/O) functions for the system.

When using a synactor, as an interactive agent between a user and an applications program, for example, the interFACE system is configured as shown by the use block 40. User input to the applications controller 45 will call the desired synactor resource from a file in memory 39 via RAM 20. The script command controller 43 interprets scripts from the application controller 45 and, using the RAVE driver, provides the appropriate instructions to the display and the microprocessor 10. Similarly, as during the create (and test) process, the microprocessor 10 provides control and coordination of the processes and I/O functions for the interFACE system via a RAVER driver.

Referring now to FIG. 3, a functional block diagram illustrating the major data flows, processes and events required to provide speech and the associated synchronized visual animation is shown. A detailed description of the processes and events that take place in the RAVE system is given in U.S. Pat. No. 4,884,972, and in co-pending U.S. patent applications Ser. No. 07/384,243 entitled "Authoring and Use Systems for Sound Synchronized Animation" filed on Jul. 21, 1989, now U.S. Pat. No. 5,111,409 issued May 5, 1992, and Ser. No. 07/497,937 entitled "Voice Animation System" filed on Mar. 23, 1990, now U.S. Pat. No. 5,278,943 issued Jan. 11, 1994, all of which are assigned to the instant assignee, and which are all incorporated by reference as if fully set forth herein and will not be repeated in full here.

The interFACE system comprises the author system shell 41, the application controller 45, the script command processor 49 and associated user input devices 47, which may include one or more input devices as shown in FIG. 1, and which is interfaced with the RAVE system at the script command processor 49. In response to a user input, the application controller 45 or the author system shell 41 calls on the microprocessor 10 to fetch from a file in memory 39 a synactor resource containing the audio and visual characteristics of a particular synactor. As required by user input, the microprocessor will initiate the RAVE sound and animation processes. Although both the author system shell 41 and the application controller 45 access the script command processor 49, the normal mode of operation would be for a user to utilize the author system shell 41 to create/edit a synactor and, at a subsequent time, utilize the application controller 45 to call up a synactor for use (that is, speech and visual display) either alone or coordinated with a particular application program.

The interFACE software system is a "front end" program that interfaces a host computer system as illustrated in FIG. 1 to the RAVE system to enable a user to create and edit synactors. The system comprises a number of modes each with an associated screen display, plus a navigation window or "palette" for selecting and changing system modes, a set of menus with commands (some of which vary with the mode currently selected), and additional screen displays (alternately referred to as "cards" or "windows") which are displayed in associated modes. The screen displays have activatable areas referred to as buttons that respond to user actions to initiate preprogrammed operations or to call up other windows, tools or subroutines. The buttons may be actuated by clicking a mouse on them or by other suitable methods, using a touch-screen for example. The screen displays also may include editable text areas, referred to as "fields". The interFACE system comprises a number of such display which the user moves between by activating window items or "pressing" buttons (that is, by clicking on a button via a mouse) to create, edit and work with synactors. A preferred embodiment of the interFACE system in accordance with the principles of the present invention is described in the "Installation Guide" and in the "User's Guide" for "interFACE, the Interactive Facial Animation Construction Environment" which are also incorporated by reference as if fully set forth herein.

Referring now to FIGS. 4-9, various diagrams illustrating an overview of the interFACE system menus, windows, and screens are shown. Tables I and II list the various menus and windows utilized by the interFACE system.

TABLE I ______________________________________ interFACE MENUS ______________________________________ 1 100 Apple 2 101 File 3 110 Edit 4 102 Go 5 106 Actors 6 107 Sounds 7 104 Dressing Room 8 108 Speech Sync 9 114 Stage 10 122 Clip Art ______________________________________

TABLE II ______________________________________ interFACE Windows and Tables ______________________________________ 1 100 Menu Screen 2 173 Untitled 3 155 Text 4 156 Phonetic 5 182 Go 6 107 Speech Sync 7 168 MacRecorder 8 103 ActorNav 9 169 Stage 10 140 dBox 11 152 Tool Palette 12 104 Line Palette 13 102 Pattern Palette 14 145 Color Palette 15 144 Brush Palette 16 153 Voice Palette 17 183 Express Palette 18 163 actorPref 19 165 Print 20 178 Scrap 21 179 Convert 22 184 Resize 23 186 wd2Print 24 175 Nav 25 187 layer 26 189 CDRecite 27 190 GUI sync 28 191 Dendrogram ______________________________________

With particular reference to FIG. 4, a system block diagram illustrating the various operational modes of the interFACE system is shown. The interFACE system comprises four basic screens or modes: the Dressing Room 59, the Speech Sync Lab 61, the Stage 63 and the Help mode 65. Each mode is represented by a screen button on a navigation window 57. When a user initiates the interFACE system, a startup screen 55 is displayed by the host system. The startup screen (not shown) comprises one card and informs a user that he or she has initiated and is running the interFACE system. The startup screen also provides the user with bibliographic information and instructions to begin use of the interFACE system. After a short pause to load the program, the RAVE and RAVER drivers are loaded and called to perform predesignated system checks. The RAVE driver is a portion of the interFACE system that handles much of the programmatic functions and processes for synactor handling. The RAVER driver contains a number of programmatic functions related primarily to synactor editing. It is only used in the authoring system. The segregation of these functions reduces the memory requirements of the use system 40, which includes only the RAVE driver.

After the initial system checks have been completed, the interFACE system displays the navigation window or palette 57, unless other preferences were previously set up by the user. The navigation palette 57 displays four buttons 571, 573, 575, 577 for accessing the interFACE system modes. Access may also be accomplished from various modes within the system utilizing menu commands. Normally, the navigation palette 57 is positioned on the current screen display immediately below the menu bar 579 to conserve screen space and avoid blocking of other interFACE windows. The user, however, may position the navigation panel 57 anywhere on the screen display. The navigation palette 57 is provided with "handles" (not shown) to allow it to be repositioned on the display. To move the navigation palette 57 the cursor is moved to either handle with a mouse (or other means). With the mouse button depressed, the navigation palette is then moved (dragged) horizontally or vertically on the display to the desired position.

The four navigation buttons allow the user to go to the Dressing Room 59, go to the Stage 63, go to the Speech Sync Lab 61 or to go to the interFACE context sensitive Help mode 65. With the exception of the help button 577, the navigation buttons take the user to different screens within the interFACE system. The help button 577 initiates the context sensitive Help mode 65 which provides assistance when a user positions a "?" cursor over a desired item displayed on the screen currently in use.

Referring now to FIG. 6, the Dressing Room 59 is used to create new synactors or to edit existing synactors. A user may go to the Dressing Room from any mode within the interFACE system.

Generally, the Dressing Room can be selected by the navigation 57 when in any mode within the interFACE system. The dressing room may also be selected from the GO menu (as shown in FIG. 9d). The Dressing Room can also be selected as the opening startup section for interFACE by selecting dressing room on the interFACE Preferences window (as shown in FIG. 9f).

When the Dressing Room 59 as initiated, two windows 67, 71 automatically appear on the display. The dressing room window 67 which provides the working space for creation and editing of synactors. The control panel window 71 allows the user to navigate between the various images of the current synactor and displays information related to the image 711 currently displayed in the control panel window 71.

The title bar 671 at the top of the window 67 displays the name of the selected or current synactor 85. If a new synactor is being created/edited, the title 673 of the window will be "Untitled". The dressing room window 67 can be moved to any location on the screen 59. In addition, the size of the dressing room window 67 may be changed as desired. The synactor image 85 is displayed on the easel 83 in the dressing room window 67.

Referring now to FIG. 7a, the Stage screen 63 provides a display for examining and testing the lip-synchronization of newly construc