WikiPatents - Community Patent Review
Create Free Account  |  License or Sell Your Patent  |  WikiPatents Marketplace  |  WikiPatents Blog
Username:  Password:  
    
Advanced Search
Client server animation system for managing interactive user interface characters    
United States Patent5983190   
Link to this pagehttp://www.wikipatents.com/5983190.html
Inventor(s)Trower, II; Tandy W. (Woodinville, WA), Weinberg; Mark Jeffrey (Carnation, WA), Merrill; John Wickens Lamb (Redmond, WA)
AbstractA client server animation system provides services to enable clients to play animation and lip-synched speech output for an interactive user interface character. Through the programming interface of the server, clients can specify both speech and cursor device input that an instance of an interactive user interface character will respond to when the clients are active. Clients can also request playback of animation and lip-synched speech output through this interface. Services can be invoked from application programs as well as web scripts embedded in web pages downloaded from the Internet.



 Title Information Submit all comments and votes
 
Patent Text Patent PDF Print Page Summary File History
Plain text PDF images Print Summary File History
Drawing from US Patent 5983190
Client server animation system for managing interactive user interface
     characters - US Patent 5983190 Drawing
Client server animation system for managing interactive user interface characters
Inventor     Trower, II; Tandy W. (Woodinville, WA) , Weinberg; Mark Jeffrey (Carnation, WA) , Merrill; John Wickens Lamb (Redmond, WA)
Owner/Assignee     Microsoft Corporation (Redmond, WA)
Patent assignment
All assignments
Publication Date     November 9, 1999
Application Number     08/858,648
PAIR File History     Application Data   Transaction History
Image File Wrapper   Patent Term   Fees
Litigation
Filing Date     May 19, 1997
US Classification     704/276 704/275
Int'l Classification    
Examiner     Dorvil; Richemond
Assistant Examiner    
Attorney/Law Firm     Klarquist Sparkman Campbell Leigh & Whinston, LLP
Address
Parent Case    
Priority Data    
USPTO Field of Search     704/220 704/276 704/275 704/231 704/251 704/258 704/270 704/272 704/200
Patent Tags     client server animation managing interactive user interface characters
   
Enter a comma (,) or semicolon (;) between multiple tag words/phrases.
Describe this patent:
 Amusing   
 Clever   
 Complex   
 Efficient   
 Historic   
 Important   
 Innovative   
 Interesting   
 Practical   
 Simple   
[no votes]
Patent WIKI

Share information and news about this patent, including information and news about the technology, inventors, company, ligation and licensing.

 References Submit all comments and votes
 
*references marked with an asterisk below are user-added references
 U.S. References
 
Add a new US reference:  
ReferenceRelevancyCommentsReferenceRelevancyComments
5764241
Elliott et al.

Jun,1998

[0 after 0 votes]
5715416
Baker

Feb,1998

[0 after 0 votes]
5668997
Lynch-Fresner et al.

Sep,1997

[0 after 0 votes]
5630017
Gasper et al.

May,1997

[0 after 0 votes]
5613056
Gasper et al.

Mar,1997

[0 after 0 votes]
5430835
Williams et al.

Jul,1995

[0 after 0 votes]
5425139
Williams et al.

Jun,1995

[0 after 0 votes]
5377997
Wilden et al.

Jan,1995

[0 after 0 votes]
5287446
Williams et al.

Feb,1994

[0 after 0 votes]
5278943
Gasper et al.

Jan,1994

[0 after 0 votes]
5111409
Gasper et al.

May,1992

[0 after 0 votes]
4884972
Gasper

Dec,1989

[0 after 0 votes]
 Foreign References
 Other References
 Market Review Submit all comments and votes
   
Market Size
Estimate the gross annual revenues of the relevant market sector:
> $10B
$5B - $10B
$2B - $5B
$500M - $2B
$100M - $500M
$10M - $100M
$1M - $10M
$500K - $1M
$100K - $500K
< $100K
[No votes]
$0
 
$0   $2.5B   $5B   $7.5B   $10B
Market Share
Estimate the percentage of the relevant market sector this invention will capture:
75% - 100%
50% - 74.99%
25% - 49.99%
10 - 24.99%
5 - 9.99%
2 - 4.99%
1 - 1.99%
< 1%
[No votes]
0.0%
 
0%   25%   50%   75%   100%
Reasonable Royalty
What percentage of gross sales should the inventor or assignee be paid?
75% - 100%
50% - 74.99%
25% - 49.99%
10 - 24.99%
5 - 9.99%
2 - 4.99%
1 - 1.99%
< 1%
[No votes]
0.0%
 
0%   25%   50%   75%   100%
Public's "Guesstimation" of Royalty Value
Market SizeN/A[No votes]
xMarket ShareN/A[No votes]
xReasonable RoyaltyN/A[No votes]

N/A

License Availablity
If you are NOT the owner or assignee, answer here:
Yes, license is available for purchase

No, license is not currently available



[No votes]
License Availablity
If you ARE the owner or assignee, answer here:
Yes, license is available for purchase

No, license is not currently available



[No votes]
Competitive Advantage
Does this invention have a significant competitive advantage over similar technologies?
Yes

No



[No votes]
Most helpful competitive advantage comment
[No comments]

Commercial Alternatives
Are there viable commercial alternatives for this invention?
Yes

No



[No votes]
Most helpful commercial alternative comment
[No comments]

 Technical Review Submit all comments and votes
 Claims Submit all comments and votes
 


We claim:

1. A method for generating an interactive, animated character in the user interface of a computer using a client-server architecture, the method comprising:

in response to a request from a client, creating an instance of a character and displaying the character in the user interface;

in the server, receiving from a client a set of client-specified user input commands that the character will respond to, the set comprising cursor input from a cursor control device;

in the server, monitoring for the specified user input commands;

in the server, when one of the user input commands is detected, sending a notification to the client;

in the server, receiving from the client a request that is conditioned upon the notification from the server; and

in response to the request from the client, playing back a client-specified sequence of animation output to animate the character in the user interface.

2. The method of claim 1 wherein the set of client-specified user input commands further comprises speech input received through a speech recognition engine, the method further comprising:

in response to a request from the client, playing back a client-specified sequence of animation to animate the character in the user interface and generating speech output lip-synched to animation representing a mouth of the character.

3. The method of claim 1 further including:

queuing requests to animate the character from the client when the character is currently playing back an animation;

immediately returning control to the client making the request to animate the character after determining that the character is busy; and

deferring processing of the request until the current animation is complete.

4. The method of claim 1 further including: arbitrating requests to control the character from more than one client.

5. The method of claim 1 wherein the step of creating an instance of a character further comprises registering the client with the server, the method further comprising:

registering a second client with the server;

in the server, receiving from the second client a second set of client-specified user input commands that the character will respond to, the second set comprising cursor input from the cursor control device;

keeping track of clients that have registered with the server;

arbitrating requests to control the character from more than one client; and

terminating the character when no clients are currently registered with the server.

6. The method of claim 1 wherein the step of creating an instance of the character comprises:

starting execution of the server in response to a request from the client;

in the server, registering a notification interface for the client in response to a request from the client; and

in the server, receiving from the client a request telling the server which character to create.

7. The method of claim 1 including:

synchronizing execution of the client with the execution of the character by allowing the client to post notification requests with the server in a first in first out queue used to store animation requests while the character is currently being animated, and sending a notification from the server to the client when the first notification request is at the top of the queue.

8. The method of claim 7 wherein the notification requests are embedded with text that is synthesized into speech output by the server so that the client can synchronize itself to individual words in the speech output.

9. A computer readable medium on which is stored software for performing the method of claim 1.

10. A client-server animation system for generating interactive animated characters, the system comprising:

an animation server for receiving requests from clients to create a character on the user interface, for controlling playback of a sequence of frames of animation and lip synched speech output from the character on the user interface in response to requests from the clients, for receiving an identification of cursor device and speech input commands, and for notifying the clients when the server determines that the cursor device input and the speech input commands have been provided by a user;

a speech recognition engine in communication with an audio input device for receiving speech input from the user and for analyzing the speech input to identify the speech input commands; and in communication with the server for sending notification messages to the server when the speech input commands are detected; and

a speech synthesis engine in communication with an audio output device for generating speech output, and in communication with the server for receiving requests to generate audio output corresponding to a text string provided by the clients via the server, and for notifying the server when a tag is detected in the text string so that the server can synchronize display of text in the text string with the speech output.

11. The animation system of claim 10 wherein the server includes a queue for queuing requests from clients to play specified sequences of animation of the character; and wherein the server keeps track of which of the clients is currently active and processes the requests in the queue corresponding to an active client.

12. The animation system of claim 10 wherein the server includes a mouth animation module for receiving notifications from the speech synthesis engine synchronized with speech output of phonemes, and wherein the mouth animation module is operable to play a frame of animation of a mouth of the character that corresponds to a current phoneme such that animation of the mouth is synchronized with the speech output.

13. The animation system of claim 10 wherein the animation server includes a parser for parsing speech input commands provided by the clients and passing parsed speech input commands to the speech recognition engine.

14. The animation system of claim 10 wherein the animation server includes a regionizer for scanning an animation frame and computing a non-rectangular bounding region for a non-transparent portion of the animation frame in real time as the sequence of constructed animation frames is played in the user interface on the display monitor; and wherein the animation system includes a region window controller for receiving the non-rectangular bounding region from the regionizer, for creating a region window on a display screen independent of any other window on the display screen and having a screen boundary in the user interface defined by the non-rectangular bounding region, and for clipping the constructed animation frame to the non-rectangular bounding region.

15. The animation system of claim 10 including a web browser for retrieving a web page from secondary storage of a local computer or from a remote computer, for parsing the web page to identify an embedded agent object tag, and for starting the server in response to detecting the embedded agent object tag; wherein the server is responsive to a first script command embedded in the web page to play a first sequence of frames of animation and lip synched speech output from the character on the user interface, and wherein the server is responsive to a second script command for receiving an identification of a speech input command and for sending notification to a local client representing the web script when the server detects the speech input command.

16. The system of claim 15 wherein the animation system includes a runtime compiler in communication with the web browser for compiling and executing a script program including the first and second script commands.

17. A method for generating an interactive, animated character in the user interface of a computer using a client-server architecture, the method comprising:

in response to a request from a client, creating an instance of a character and displaying the character in the user interface;

in the server, receiving from a client a set of client-specified user input commands that the character will respond to, the set comprising a speech input command;

in the server, monitoring for the specified user input commands;

in the server, sending a notification to the client when one of the user input commands is detected;

in the server, receiving from the client a request that is conditioned upon the notification from the server; and

in response to the request from the client, playing back a client-specified sequence of animation and speech output to animate the character in the user interface.

18. The method of claim 17 wherein the client is a script embedded in a web page, wherein the script includes a first script command specifying text of the speech input command, wherein the server sends a notification to the client when the server detects that an end user has spoken the speech input command; and wherein the script includes a second script command requesting lip synched speech output from the server.

19. The method of claim 17 further including:

parsing a web page to identify an embedded script; and

compiling the script to create the client.

20. The method of claim 19 further including:

in the server, processing requests to animate the character and play lip-synched output from the web script client.

21. A computer readable medium on which is stored software for performing the method of claim 17.
 Description Submit all comments and votes
 


TECHNICAL FIELD

The invention relates to user interface design in computers and more specifically relates to animated user interfaces.

BACKGROUND

One way to make the user interface of a computer more user friendly is to incorporate natural aspects of human dialog into the user interface design. User interfaces that attempt to simulate social interaction are referred to as social interfaces.

An example of this type of interface is the user interface of a program called Bob from Microsoft Corporation. Bob uses a social interface with animated characters that assist the user by providing helpful tips as the user navigates through the user interface. The Bob program exposes a number of user interface services to application programs including an actor service, a speech balloon service, a tracking service and a tip service.

The actor service plays animated characters in response to an animation request from an application. This service allows applications to play animated characters to get the user's attention and help the user navigate through the user interface. To make the character appear as if it is conversing with the user, the application can use the speech balloon service to display text messages in a graphical object that looks like a cartoon-like speech balloon. Applications can use the speech balloon service to display a special kind of text messages called a "tip" that gives the user information about how to operate the program. In the Bob user interface environment, the application program is responsible for monitoring for user input events that trigger tips. In response to detecting an event, the application passes it to the tracking service, which determines whether a tip should be displayed. One function of the tracking service is to avoid bothering the user by displaying too many tips. To prevent this, the tracking service counts the number of occurrences of an event and prevents the display of a tip after a given number of occurrences. The tracking service tells the tip service whether to initiate the display of a tip. When a tip is to be displayed, the tip service provides information about the tip to the application so that it can display an appropriate text message in a speech balloon.

While the Bob program does provide a number of helpful user interface features, it has a number of limitations. One of the significant limitations is that the animated characters must be displayed within the window of a single host application. Specifically, the animation must be displayed within the window of a host Bob application program where the background image of the window is known. This is a significant limitation because the animation is confined within the window of single application program

Another important limitation of the animated characters in the Bob program is that they have no speech input or output capability. Speech input and output capability makes a user interface much more engaging to the user.

Speech synthesis and recognition software is commercially available. Microsoft Corporation has defined an application programming interface (API) called SAPI (Speech Application Programming Interface), and number of companies have created implementations of this interface. The purpose of SAPI is to provide speech services that application developers can incorporate into their programs by invoking functions in SAPI.

Despite the availability of speech services provided in SAPI compliant speech engines, there are a number of difficult design issues in developing interactive user interface characters that support speech input and output. One difficulty is determining how the interactive animation services will be exposed to application programs. In many applications with interactive animation, such as games for example, the application must provide and control its own user interface. This increases the complexity of the application program and prevents sharing of animation and input/output services among application programs.

A related difficulty with interactive animation is determining how to incorporate it into Internet applications. The content of a web page preferably should be small in size so that it is easy to download, it should be secure, and it should be portable. These design issues make it difficult to develop interactive animation for Web pages on the Internet.

SUMMARY OF THE INVENTION

The invention provides a client-server animation system used to display interactive, animated user interface characters with speech input and output capability. One aspect of the invention is an animation server that makes a number of animation and speech input and output services available to clients (e.g., application programs, Web page scripts, etc.). Another aspect of the invention is the way in which the clients can specify input commands including both speech and cursor device input for the character, and can request the server to play animation and speech output to animate the character. The animated output can combine both speech and animation such that the mouth of a user interface character is lip-synched to the speech output. The animation server exposes these services through an application programming interface accessible to applications written in conventional programming languages such as C and C++, and through a high level interface accessible through script languages. This high level interface enables programmers to embed interactive animation with speech input and output capability in Web pages.

One implementation of the animation system comprises an animation server, speech synthesis engine, and a speech recognition engine. The speech synthesis engine converts text to digital audio output in response to requests from the animation server. The speech recognition engine analyzes digitized audio input to identify words or phrases selected by the animation server.

The animation server exposes its animation and speech input/output services to clients through a programming interface. The server's interface includes methods such as Play(name of animation) or Speak(text string) that enable the clients to make request to animate a user interface character. The server constructs each frame of animation and controls the display of the animation in the user interface. To support lip- synched speech output, the server includes a mouth animation module that receives notification from the speech synthesis engine when it is about to output a phoneme. In response to this notification, it maps a frame of animation representing the character's mouth position to the phoneme that is about to be played back.

Clients specify the speech or cursor input that a character will respond to through a command method in the server's interface. The server monitors input from the operating system (cursor device input) and the speech recognition engine (speech input) for this input. When it detects input from the user that a client has requested notification of, it sends a notification to that client. This feature enables the client to tell the server how to animate the user interface character in response to specific types of input. The server enables multiple clients to control a single user interface character by allowing one client to be active at a time. The end user and clients can make themselves active.

The animation system outlined above has a number of advantages. It enables one or more clients to create an engaging user interface character that actually converses with the user and responds to specific input specified by the client. Clients do not have to have complex code to create animation and make an interactive interface character because the server exposes services in a high level interface. This is advantageous for web pages because a web page can include an interactive character simply by adding a reference to the agent server and high level script commands that specify input for the character and request playback of animation and lip-synched speech to animated the character.

Further features and advantages of the invention will become apparent from the following detailed description and accompanying drawings.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a general block diagram of a computer that serves as an operating environment for the invention.

FIG. 2 is a screen shot illustrating an example of animated character located on top of the user interface in a windowing environment.

FIG. 3 is a diagram illustrating the architecture of an animation system in one implementation of the invention.

FIG. 4 is flow diagram illustrating how the animation server in FIG. 3 plays an animation.

FIG. 5 illustrates an example of the animation file structure.

FIG. 6 is a flow diagram illustrating a method used to retrieve image data to construct a current frame of animation.

FIG. 7 is a flow diagram illustrating the process for obtaining the bounding region of an arbitrary shaped animation.

FIG. 8 is a diagram illustrating an example of a COM server and its relationship with an instance of object data.

FIG. 9 is a conceptual diagram illustrating the relationship between a COM object and a user of the object (such as a client program).

FIG. 10 illustrates the relationship among the different types of objects supported in the animation server.

FIG. 11 is a diagram of a web browsing environment illustrating how interactive, animated user interface characters can be activated from Web pages.

DETAILED DESCRIPTION

Computer Overview

FIG. 1 is a general block diagram of a computer system that serves as an operating environment for the invention. The computer system 20 includes as its basic elements a computer 22, one or more input devices 28, including a cursor control device, and one or more output devices 30, including a display monitor. The computer 22 has at least one high speed processing unit (CPU) 24 and a memory system 26. The input and output device, memory system and CPU are interconnected and communicate through at least one bus structure 32.

The CPU 24 has a conventional design and includes an ALU 34 for performing computations, a collection of registers 36 for temporary storage of data and instructions, and a control unit 38 for controlling operation of the system 20. The CPU 24 may be a processor having any of a variety of architectures including Alpha from Digital, MIPS from MIPS Technology, NEC, IDT, Siemens, and others, x86 from Intel and others, including Cyrix, AMD, and Nexgen, and the PowerPC from IBM and Motorola.

The memory system 26 generally includes high-speed main memory 40 in the form of a medium such as random access memory (RAM) and read only memory (ROM) semiconductor devices, and secondary storage 42 in the form of long term storage mediums such as floppy disks, hard disks, tape, CD-ROM, flash memory, etc. and other devices that store data using electrical, magnetic, optical or other recording media. The main memory 40 also can include video display memory for displaying images through a display device. The memory 26 can comprise a variety of alternative components having a variety of storage capacities.

The input and output devices 28, 30 are conventional peripheral devices coupled to or installed within the computer. The input device 28 can comprise a keyboard, a cursor control device such as a mouse or trackball, a physical transducer (e.g., a microphone), etc. The output device 30 shown in FIG. 1 generally represents a variety of conventional output devices typically provided with a computer system such as a display monitor, a printer, a transducer (e.g., a speaker), etc. Since the invention relates to computer generated animation and speech input and output services, the computer must have some form of display monitor for displaying this animation, a microphone and analog to digital converter circuitry for converting sound to digitized audio, and speakers and digital to audio converter circuitry for converting digitized audio output to analog sound waves.

For some devices, the input and output devices actually reside within a single peripheral. Examples of these devices include a network adapter card and a modem, which operate as input and output devices.

It should be understood that FIG. 1 is a block diagram illustrating the basic elements of a computer system; the figure is not intended to illustrate a specific architecture for a computer system 20. For example, no particular bus structure is shown because various bus structures known in the field of computer design may be used to interconnect the elements of the computer system in a number of ways, as desired. CPU 28 may be comprised of a discrete ALU 34, registers 36 and control unit 38 or may be a single device in which one or more of these parts of the CPU are integrated together, such as in a microprocessor. Moreover, the number and arrangement of the elements of the computer system may be varied from what is shown and described in ways known in the computer industry.

Animation System Overview

FIG. 2 is a screen shot illustrating an example of animated character located on top of the user interface in a windowing environment. This screen shot illustrates one example of how an implementation of the invention creates arbitrary shaped animation that is not confined to the window of a hosting application. The animated character 60 can move anywhere in the user interface. In this windowing environment, the user interface, referred to as the "desktop" includes the shell 62 of the operating system as well as a couple of windows 64, 66 associated with currently running application programs. Specifically, this example includes an Internet browser application running in one window 64 and a word processor application 66 running in a second window on the desktop of the Windows 795 Operating System.

The animated character moves on top of the desktop and each of the windows of the executing applications. As the character moves about the screen, the animation system computes the bounding region of the non-transparent portion of the animation and generates a new window with a shape to match this bounding region. This gives the appearance that the character is independent from the user interface and each of the other windows.

To generate an animation like this, the animation system performs the following steps:

1) loads the bitmap(s) for the current frame of animation;

2) constructs a frame of animation from these bitmaps (optional depending on whether the frame is already constructed at authoring time).

3) computes the bounding region of the constructed frame in real time;

4) sets a window region to the bounding region of the frame; and

5) draws the frame into the region window.

The bounding region defines the non-transparent portions of a frame of animation. A frame in an animation is represented as a rectangular area that encloses an arbitrary shaped animation. The pixels located within this rectangular area but do not form part of the arbitrary-shaped animation are transparent in the sense that they will not occlude or alter the color of the corresponding pixels in the background bitmap (such as the desktop in the Windows.RTM. Operating System) when combined with it. The pixels located in the arbitrary animation are non-transparent and are drawn to the display screen so that the animation is visible in the foreground.

The bounding region defines the area occupied by non-transparent pixels within the frame, whether they are a contiguous group of pixels or disjoint groups of contiguous pixels. For example, if the animation were in the shape of a red doughnut with a transparent center, the bounding region would define the red pixels of the doughnut as groups of contiguous pixels that comprise the doughnut, excluding the transparent center. If the animation comprised a football and goalposts, the bounding region would define the football as one or more groups of contiguous pixels and the goalposts as one or more groups of contiguous pixels. The bounding region is capable of defining non-rectangular shaped animation including one or more transparent holes and including more than one disjoint group of pixels.

Once computed, the bounding region can be used to set a region window, a non-rectangular window capable of clipping input and output to the non-transparent pixels defined by the bounding region. Region windows can be implemented as a module of the operating system or as a module outside of the operating system. Preferably, the software module implementing region windows should have access to input events from the keyboard and cursor positioning device and to the other programs using the display screen so that it can clip input and output to the bounding region for each frame. The Windows.RTM. Operating System supports the clipping of input and output to region windows as explained further below.

The method outlined above for drawing non-rectangular animation can be implemented in a variety of different types of computer systems. Below we describe an implementation of the invention in a client-server animation system. However the basic principles of the invention can be applied to different software architectures as well.

FIG. 3 is a general block diagram illustrating the architecture of a client server animation system. The animation system includes an animation server 100, which controls the playback of animation, and one or more clients 102-106, which request animation services from the server. During playback of the animation, the server relies on graphic support software in the underlying operating system 120 to create windows, post messages for windows, and paint windows.

In this specific implementation, the operating system creates and clips input to non-rectangular windows ("region windows"). To show this in FIG. 3, part of the operating system is labeled, "region window controller" (see item 122). This is the part of the operating system that manages region windows. The region window controller 122 creates a region window having a boundary matching the boundary of the current frame of animation. When the system wants to update the shape of a region window, the regionizer specifies the bounding region of the current frame to the operating system. The operating system monitors input and notifies the server of input events relating to the animation.

The services related to the playback of animation are implemented in four modules 1) the sequencer 108; 2) the loader 110 3) the regionizer 112; and 4) the mouth animation module 114. The sequencer module 108 is responsible for determining which bitmap to display at any given time along with its position relative to some fixed point on the display.

The loader module 110 is responsible for reading the frame's bitmap from some input source (either a computer disk file or a computer network via a modem or network adapter) into memory. In cases where the bitmap is compressed, the loader module is also responsible for decompressing the bitmap into its native format. There are variety of known still image compression formats, and the decompression method, therefore, depends on the format of the compressed bitmap.

The regionizer module 112 is responsible for generating the bounding region of the frame, setting it as the clipping region of the frame's hosting region window and then drawing the frame into the region. In slower computers, it is not feasible to generate the bounding region as frames are constructed and played back. Therefore, in this implementation the regionizer also supports the loading of bounding region information in cases where it is precomputed and stored along with the frame data in the animation file.

The mouth animation module 114 is responsible for coordinating speech output with the animation representing a user interface character's mouth. The mouth animation module receives a message from a speech synthesis engine 116 whenever a specific phoneme is about to be spoken. When the mouth animation module receives this message, it performs a mapping of the specified phoneme to image data stored in a animation mouth data file that corresponds to the phoneme. It is responsible for loading, decompressing, and controlling the playback of the animation representing the character's mouth.

The speech synthesis engine 116 is responsible for generating speech output from text. In this implementation, the speech synthesis engine 116 is a SAPI compliant text to speech generator from Centigram Communications Corp., San Jose, Calif. Other SAPI compliant text to speech generators can be used as well. For example, Lernout and Hauspie of Belgium also makes a SAPI compliant text to speech generator.

The speech recognition engine 118 is responsible for analyzing digitized audio input to identify significant words or phrases selected by the animation server. The animation server defines these words or phrases by defining a grammar of acceptable phrases. The client specifies this grammar by specifying sequences of words that it wants the system to detect in a text string format. The server also supports a command language that includes boolean operators and allows alternative words. This command language enables the client to specify a word or phrase along with a number of possible alternative or option words to look for in the speech input. The syntax of the command language is described in more detail below.

The speech recognition used in this implementation is a SAPI compliant speech recognition engine made by Microsoft Corporation. A suitable alternative speech recognition engine is available from Lernout and Hauspie of Belgium.

The operating system in this implementation is the Windows.RTM. 95 operating system from Microsoft Corporation. The application programming interface for the operating system includes two functions used to create and control region windows. These functions are:

1) SetWindowRgn; and

2) GetWindowRgn

SetWindowRgn

The SetWindowRgn function sets the window region of a rectangular host window. The window region is an arbitrary shaped region on the display screen defined by an array of rectangles. These rectangles describe the rectangular regions of pixels in the host window that the window region covers.

The window region determines the area within the host window where the operating system permits drawing. The operating system does not display any portion of a window that lies outside of the window region.

______________________________________ int SetWindowRgn( HWND hWnd, // handle to window whose window region is to be set HRGN hRgn, // handle to region BOOL bRedraw // window redraw flag ); ______________________________________

Parameters

hWnd

Handle to the window whose window region is to be set.

hRgn

Handle to a region. The function sets the window region of the window to this region. If hRgn is NULL, the function sets the window region to NULL.

bRedraw

Boolean value that specifies whether the operating system redraws the window after setting the window region. If bRedraw is TRUE, the operating system does so; otherwise, it does not.

Typically, the program using region windows will set bRedraw to TRUE if the window is visible.

Return Values

If the function succeeds, the return value is nonzero.

If the function fails, the return value is zero.

Remarks

If the bRedraw parameter is TRUE, the system sends the

WM.sub.-- WINDOWPOSCHANGING and WM.sub.-- WINDOWPOSCHANGED messages to the window.

The coordinates of a window's window region are relative to the upper-left corner of the window, not the client area of the window. After a successful call to SetWindowRgn, the operating system owns the region specified by the region handle hRgn. The operating system does not make a copy of the region. Thus, the program using region windows should not make any further function calls with this region handle. In particular, it should not close this region handle.

GetWindowRgn

The GetWindowRgn function obtains a copy of the window region of a window. The window region of a window is set by calling the SetWindowRgn function.

______________________________________ int GetWindowRgn( HWND hWnd, // handle to window whose window region is to be obtained HRGN hRgn // handle to region that receives a copy of the window region ); ______________________________________

Parameters

hWnd

Handle to the window whose window region is to be obtained.

hrgn

Handle to a region. This region receives a copy of the window region.

Return Values

The return value specifies the type of the region that the function obtains. It can be one of the following values:

______________________________________ Value Meaning ______________________________________ NULLREGION The region is empty. SIMPLEREGION The region is a single rectangle. COMPLEXREGION The region is more than one rectangle. ERROR An error occurred; the region is unaffected. ______________________________________

Comments

The coordinates of a window's window region are relative to the upper-left corner of the window, not the client area of the window.

The region window controller shown in FIG. 3 corresponds to the software in the operating system that supports the creation of region windows and the handling of messages that correspond to region windows.

In this implementation, the speech recognition engine and the speech synthesis engine communicate with an audio input and output device such as a sound card according to the SAPI specification from Microsoft. In compliance with SAPI, these engines interact with an audio device through software representations of the audio device referred to as multimedia audio objects, audio sources (which provide input to the speech recognition engine) and audio destinations (which mediate output from the speech synthesis engine). The structure and operation of this software representation are described in detail in the SAPI specification available from Microsoft.

In the next two sections, we describe two alternative implementations of the animation system shown in FIG. 3. Both implementations generate arbitrary shaped animation and can compute the arbitrary shaped region occupied by non-transparent pixels of a frame in real time. However, the manner in which each system computes and stores this region data varies. Specifically, since it is not computationally efficient to re-compute the region data for every frame, these systems use varying methods for caching region data. The advantages of each approach are summarized following the description of the second implementation.

First Implementation of the Animation System

FIG. 4 is flow diagram illustrating how the animation server plays an animation. First, the animation data file is opened via the computer's operating system as shown in step 150. The animation data file includes an animation header block and a series of bitmaps that make up each of the frames in the animation. Once operating system has opened the file, the loader module 108 reads the animation header block to get all