WikiPatents - Community Patent Review
Create Free Account  |  License or Sell Your Patent  |  WikiPatents Marketplace  |  WikiPatents Blog
Username:  Password:  
    
Advanced Search
System and method for associating multimedia objects for use in a video conferencing system    
United States Patent5896128   
Link to this pagehttp://www.wikipatents.com/5896128.html
Inventor(s)Boyer; David Gray (Tinton Falls, NJ)
AbstractA video conferencing system and method that uses a central multimedia bridge to combine multimedia signals from a plurality of conference participants into a single composite signal for each participant. The system gives each conference participant the ability to customize their individual display of other participants, including keying in and out selected portions of the display and overlapping displayed images, and the ability to identify individual images in a composed video stream by click and drag operations or the like. The system uses a chain of video composing modules that can be extended as necessary to combine video signal streams from any number of conference participants in real time. Multimedia association software is provided for associating different media types to enhance display and manipulation capabilities for multimedia uses. The system also allows each user to dynamically change who can receive the information they provide to the conference.
   














 Title Information Submit all comments and votes
 
Patent Text Patent PDF Print Page Summary File History
Plain text PDF images Print Summary File History
Drawing from US Patent 5896128
System and method for associating multimedia objects for use in a video

     conferencing system - US Patent 5896128 Drawing
System and method for associating multimedia objects for use in a video conferencing system
Inventor     Boyer; David Gray (Tinton Falls, NJ)
Owner/Assignee     Bell Communications Research, Inc. (Morristown, NJ)
Patent assignment
All assignments
Publication Date     April 20, 1999
Application Number     08/434,081
PAIR File History     Application Data   Transaction History
Image File Wrapper   Patent Term   Fees
Litigation
Filing Date     May 3, 1995
US Classification     715/716 348/14.09 348/14.11
Int'l Classification     H04N 007/15
Examiner     Flynn; Nathan
Assistant Examiner    
Attorney/Law Firm     Joseph, Hey; David A. Giordano; Yeadon; Loria B. ,
Address
Parent Case    
Priority Data    
USPTO Field of Search     348/15 348/16 348/17 348/18 348/578 348/584 348/598 348/722 348/585 348/586 348/587 348/588 348/591 348/12 348/13 348/14
Patent Tags     associating multimedia objects video conferencing
   
Enter a comma (,) or semicolon (;) between multiple tag words/phrases.
Describe this patent:
 Amusing   
 Clever   
 Complex   
 Efficient   
 Historic   
 Important   
 Innovative   
 Interesting   
 Practical   
 Simple   
[no votes]
Patent WIKI

Share information and news about this patent, including information and news about the technology, inventors, company, ligation and licensing.

 References Submit all comments and votes
 
*references marked with an asterisk below are user-added references
 U.S. References
 
Add a new US reference:  
ReferenceRelevancyCommentsReferenceRelevancyComments
 Foreign References
 Other References
 Market Review Submit all comments and votes
   
Market Size
Estimate the gross annual revenues of the relevant market sector:
> $10B
$5B - $10B
$2B - $5B
$500M - $2B
$100M - $500M
$10M - $100M
$1M - $10M
$500K - $1M
$100K - $500K
< $100K
[No votes]
$0
 
$0   $2.5B   $5B   $7.5B   $10B
Market Share
Estimate the percentage of the relevant market sector this invention will capture:
75% - 100%
50% - 74.99%
25% - 49.99%
10 - 24.99%
5 - 9.99%
2 - 4.99%
1 - 1.99%
< 1%
[No votes]
0.0%
 
0%   25%   50%   75%   100%
Reasonable Royalty
What percentage of gross sales should the inventor or assignee be paid?
75% - 100%
50% - 74.99%
25% - 49.99%
10 - 24.99%
5 - 9.99%
2 - 4.99%
1 - 1.99%
< 1%
[No votes]
0.0%
 
0%   25%   50%   75%   100%
Public's "Guesstimation" of Royalty Value
Market SizeN/A[No votes]
xMarket ShareN/A[No votes]
xReasonable RoyaltyN/A[No votes]

N/A

License Availablity
If you are NOT the owner or assignee, answer here:
Yes, license is available for purchase

No, license is not currently available



[No votes]
License Availablity
If you ARE the owner or assignee, answer here:
Yes, license is available for purchase

No, license is not currently available



[No votes]
Competitive Advantage
Does this invention have a significant competitive advantage over similar technologies?
Yes

No



[No votes]
Most helpful competitive advantage comment
[No comments]

Commercial Alternatives
Are there viable commercial alternatives for this invention?
Yes

No



[No votes]
Most helpful commercial alternative comment
[No comments]

 Technical Review Submit all comments and votes
 Claims Submit all comments and votes
 


What is claimed is:

1. A video conferencing system wherein each individual participant can compose the video images to be displayed to that participant distinct from the video images displayed to other participants, said system comprising

means for receiving a plurality of video signal streams from a plurality of participant stations, each video signal stream comprising a plurality of video instances, wherein each video instance is a distinct image element of the video picture represented by the video signal stream;

means for combining said plurality of video signal streams into a plurality of composite video streams, each composite video stream containing selected portions of two or more of said video signal streams;

means for outputting each of said composite video streams to a respective participant station;

means controlled by software for associating a plurality of instances from different video signal streams into a distinct group of video instances; and

means controlled by software for manipulating said distinct group of video instances as if it were a video signal stream.

2. A video conferencing system wherein each individual participant can compose the video images to be displayed to the participant distinct from the video images displayed to other participants, said system comprising

means for receiving a plurality of video signal streams from a plurality of participant stations, each video signal stream comprising a plurality of video instances, wherein each video instance is a distinct image element of the video pictures represented by the video signal stream;

means for combining said plurality of video signal streams into a plurality of composite video streams, each composite video stream containing selected portions of two or more of said video signal streams;

means for outputting each of said composite video streams to a respective participant station;

means for receiving a plurality of audio signal streams from said plurality of participant stations, each audio signal stream comprising a plurality of audio instances, wherein each audio instance is a distinct sound element;

means for combining said plurality of audio signal streams into a plurality of composite audio streams;

means for outputting each of said composite audio streams to a respective participant station; and

means controlled by software for associating video instances of a respective video signal stream with audio instances of a respective audio signal stream into a distinct group of associated audio and video instances; and

means controlled by software for manipulating said distinct group of audio and video instances as if it were a signal stream.

3. A method for controlling the presentation of a media signal stream comprising the steps of

providing a plurality of media signal streams, each of said streams comprising a plurality of media instances, wherein each media instance is a distinct portion of the total information represented by said media stream;

associating a plurality of instances from different media signal streams into a distinct group of media instances; and

manipulating said distinct group of media instances as if it were a media signal stream.

4. A method for enabling a viewer to control the presentation to the viewer from a plurality of discrete sources in a multi-point teleconferencing service, said method comprising the steps of

combining the images from the sources into composite streams;

grouping together a subset of the images from a plurality of composite streams;

manipulating the grouped together images as if it were a single stream; and

displaying the manipulated images to the viewer.

5. The video conferencing system of claim 1, wherein said associating means includes means for scaling the group of video instances as a group.

6. The video conferencing system of claim 1, wherein said associating means includes means for chroma keying the group of video instances as a group, whereby a color or luminance range of the group can be removed.

7. The video conferencing system of claim 1, wherein said associating means includes means for mirroring the group of video instances as a group.

8. The video conferencing system of claim 1, wherein said associating means includes means for changing the priority of the group of video instances as a group, whereby a stacking order of the associated group can be changed with respect to video instances not associated with the group.

9. A video conferencing system comprising:

means for receiving a plurality of video signal streams from a plurality of user stations, each video signal stream comprising one or more video instances;

means for combining said plurality of video signal streams into a plurality of composite video streams, each composite video stream containing selected portions of two or more of said video signal streams;

means for outputting each of said composite video streams to a respective user station; and

means for associating a plurality of instances from different video signal streams into a group of video instances that can be manipulated as a group;

said associating means including means for windowing the group of video instances as a group, whereby portions of the associated group within a defined window can be removed.

10. The video conferencing system of claim 1, further comprising:

means for receiving a plurality of audio signal streams from said plurality of user stations, said audio signal streams each comprising an audio instance;

means for combining said audio instances into a plurality of composite audio streams; and

means for outputting said composite audio signal streams to respective user stations.

11. The video conferencing system of claim 10, wherein said associating means includes means for associating said group of video instances with audio instances of respective audio signal streams corresponding to the group of video instances.

12. A video conferencing system, comprising:

means for receiving a plurality of video signal streams from a plurality of user stations, each video signal stream comprising one or more video instances;

means for combining said plurality of video signal streams into a plurality of composite video streams, each composite video stream containing selected portions of two or more of said video signal streams;

means for outputting each of said composite video streams to a respective user station;

means for receiving a plurality of audio signal streams from said plurality of user stations, said audio signal streams each comprising an audio instance;

means for combining said audio instances into a plurality of composite audio streams;

means for outputting said composite audio signal streams to respective user stations; and

means for associating a plurality of instances from different video signal streams into a group of video instances that can be manipulated as a group;

said associating means including means for associating said group of video instances with audio instances of respective audio signal streams corresponding to the croup of video instances; and

means for associating a volume of the audio instances associated with said group of video instances with a size of the group, whereby the volume of the audio instances increases or decreases with a change in the size of the group.

13. A video conferencing system comprising:

means for receiving a plurality of video signal streams from a plurality of user stations, each video signal stream comprising one or more video instances;

means for combining said plurality of video signal streams into a plurality of composite video streams, each composite video stream containing selected portions of two or more of said video signal streams;

means for outputting each of said composite video streams to a respective user station;

means for receiving a plurality of audio signal streams from said plurality of user stations, each of said audio signal streams comprising one or more audio instances;

means for combining said plurality of audio signal streams into a plurality of composite audio streams;

means for outputting each of said composite audio streams to said user stations; and

means for associating video instances of a respective video signal stream with audio instances of a respective audio signal stream, wherein said associating means includes means for associating a volume of at least one selected audio instance with a size of at least one selected video instance, whereby the volume of the selected audio instance increases or decreases with a change in the size of the selected video instance.

14. The method of controlling the presentation of a media signal stream in accordance with claim 3 wherein said media instances comprise video instances, said method further comprising the step of displaying said video instances on a video display device.

15. The method of controlling the presentation of a media signal stream in accordance with claim 14 wherein said media instances comprise audio instances in addition to said video instances.
 Description Submit all comments and votes
 


RELATED APPLICATIONS

Reference is made to copending M. E. Lukacs applications Ser. No. 08/432,242, M. E. Lukacs application Ser. No. 08/434,083, and D. G. Boyer--M. E. Lukacs--P. E. Fleisher, all filed on even date with this application and which disclose and claim related inventions.

BACKGROUND OF THE INVENTION

1. Field of the Invention

The present invention relates to multimedia object association. More specifically, the invention relates to a system and method for associating multimedia objects for enhancing display and manipulation capabilities for multimedia uses, such as, for example, real-time video conferencing.

2. Description of Related Art

Video teleconferencing occurs when people in different locations send voice and video data to each other in order to simulate having all of the participants present in a single room. Each person in a multi-point conference wants to see all or most of the other participants. Accordingly, the various video streams are presented to each participant in a spatially separate manner, either on separate screens or in separate areas of a single video display. Each of the video conferencing terminals sends a locally generated video image to each of the other participant terminals and receives a video image from each of the other participants. In the prior art, this meant that for a three-way conference, six video streams must be transmitted; for a five-way conference, twenty video streams must be transmitted; for an eight participant conference, fifty-six video streams must be transmitted, and so on. Generally, if N people are holding a televideo conference, then N.times.(N-1) transmission channels must be used. Accordingly, the relatively large number of channels used for a video teleconference involving multiple participants becomes prohibitive with the prior art systems.

Furthermore, participants must have a sufficient number of input channels, decoders, and translators (if transmitting different video formats) to receive and display multiple images from different participants. Accordingly, the required number of channels, decoders, and/or translators also becomes prohibitive.

With the prior art systems, video conferencing participants were unable to customize their video display by keying in or out portions of the displayed image, or by making the various images of participants overlap in a natural-looking manner, or place and size images as they like. The participants were also unable to associate video images with other multimedia objects to enhance the variety of conferencing functions that can be enjoyed.

It is an object of the present invention to provide a flexible real-time video conferencing system for use by a plurality of users in which the required transmission bandwidth to each user is minimized.

It is a further object of the present invention to provide a video conferencing system in which each participant receives just one video (and audio) stream of the bandwidth, encoding and video standard that they desire from a central multimedia bridge.

It is a further object of the present invention to provide a video conferencing service that gives each participant the ability to compose video images of other participants into a fully customized display.

It is a further object of the present invention to provide an infinitely expandable priority driven video composing unit to combine any number of video signals into a single prioritized video stream.

It is a further object of the present invention to provide a method of associating images of a video display in a hierarchal fashion, and of associating multimedia objects together to enhance video conferencing and other multimedia applications.

It is a further object of the present invention to allow each user to dynamically change who can receive the information they provide to the conference.

If is a further object of the present invention to provide the ability to users to identify individual images in a composed video stream by click and drag operations or the like.

Additional objects, advantages and novel features of the invention will be set forth in the description which follows, and will become apparent to those skilled in the art upon reading this description or practicing the invention. The objects and advantages of the invention may be realized and attained by the appended claims.

SUMMARY OF THE INVENTION

The present invention is a multi-point multimedia teleconferencing service with customer presentation controls for each participant. An advanced multimedia bridge provides feature rich customer-controlled media (mainly, video and audio) mixing capabilities for each participant. The multimedia bridge is a shared network resource that need not be owned by the users or co-located with them but can be rented on a time slice basis. A "star" network topology is used to connect each user to the server(s). Also available at the central bridging location are coders and decoders of different types, so that customers with different types and brands of equipment will be able to communicate with each other. Central combining eliminates the need for multiple input channels and multiple decoders on each participant's desktop.

Each user receives just one video stream of the bandwidth, encoding and video standard that they desire. All of the transcodings and standards conversions are accomplished at the multimedia bridge. The advanced multimedia bridge gives a user the ability to compose a visual space for himself/herself that is different from the displays of the other conference participants. Because of this "personal" control feature, the present invention will be referred to as a personal presence system (PPS).

The software of the present invention controls and manages the multimedia bridge, sets up and coordinates the conference, and provides easy-to-use human interfaces. Each participant in a multimedia conference using the present invention may arrange the various video images into a display in a way that is pleasing to them, and rearrange them at any time during the session.

To arrange their display, the conference participants can move and scale the video images and overlap them in a prioritized manner similar to a windowed workstation display. A user can select any of the images that appear on their video display for an operation on that image. The user's pointing device (e.g., mouse) can be used to move or resize an image, in an analogous way to the "click and drag" operations supported by PC Window environments. The present invention brings this unprecedented capability to the video work space. Additionally, various elements of each image, such as a person or a chart, can be "keyed" in or out of the image so that the elements desired can be assembled in a more natural manner, unrestricted by rectangular boundaries.

The present invention also provides a presentation control capability that allows users to "associate" multimedia streams with each other thereby enabling the creation of composite or object groups. The multimedia association feature can be used to provide joint reception and synchronization of audio and video, or the delivery of picture slides synchronized with a recorded audio. A multimedia provider can use this feature to synchronize information from different servers to deal with information storage capacity limitations or with the copyright constraints on certain information.

A user can associate different video images in order to compose a video scene. By associating the images being sent by an array of cameras, a panoramic view can be generated and panning of the panoramic view can be supported. Association of different incoming images also enables a teleconferencing user to select for viewing a subset of the other conferees and provides a convenient way to access different conferees' images by simply panning left or right on the combined video scene.

In addition, a user can associate audio and video instances together so that when the size of the video instance changes, the volume of the audio instance changes, and when the location of the video instance changes, the stereo pan volume of the audio instance changes.

BRIEF DESCRIPTION OF THE DRAWINGS

The invention is better understood by reading the following Detailed Description of the Preferred Embodiments with reference to the accompanying drawing figures, in which like reference numerals refer to like elements throughout, and in which:

FIG. 1 is a schematic overview of the main components of the present invention;

FIG. 2 is a pictorial diagram of a video conferencing session using the present invention;

FIG. 3 is a pictorial view of a user station associated with the present invention;

FIG. 4 is an illustration of a sample video display during a video conferencing session using the present invention;

FIG. 5 is a schematic diagram of an advanced multimedia bridge used in the present invention;

FIG. 6 is a schematic diagram of the video portion of the multimedia bridge of FIG. 5;

FIG. 7 is a schematic diagram of a video composer unit within the video bridge portion of FIG. 6;

FIG. 8 is a schematic diagram of a video composing module within the video composer chain of FIG. 7;

FIG. 9 is a building block diagram of the software components used in the present invention;

FIG. 10 is an object model diagram of the Client program shown in FIG. 9;

FIG. 11 is an object model diagram of the Service Session program shown in FIG. 9;

FIG. 12 is an object model diagram of a Bridge manager program used in conjunction with the Resource Agent program shown in FIG. 9;

FIG. 13 is a flow chart of a process for establishing a session with the multimedia bridge of the present invention;

FIG. 14 is a pictorial diagram of a video image association using the present invention;

FIG. 15 is an object model diagram of a Multimedia Object Association software architecture used with the present invention;

FIG. 16 is an object model diagram showing an example of multimedia object association using video instance group objects;

FIG. 17 is an object model diagram showing an example of multimedia object association with video and audio instances associated together; and

FIG. 18 is a pictorial diagram illustrating a process of keying out a portion of a video display using the present invention.

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS

In describing preferred embodiments of the present invention illustrated in the drawings, specific terminology is employed for the sake of clarity. However, the invention is not intended to be limited to the specific terminology so selected, and it is to be understood that each specific element includes all technical equivalents which operate in a similar manner to accomplish a similar purpose.

Referring to FIG. 1, a real-time video conferencing system 30 includes an advanced multimedia bridge (AMB) 32 and a plurality of user stations 34-37 which are connected to the AMB 32. The connections between the user stations 34-37 and the AMB 32 can be any one of a variety of conventional electrical/data connections such as telephone modem links, broadband ISDN, etc. Each of the user stations 34-37 transmits and receives video, audio, and/or other data to and from the AMB 32. The AMB 32 is configured to interface with a variety of conventional communication links between the user stations 34-37 and the AMB 32 and is configured to send and receive data to each of the user stations 34-37.

FIG. 2 shows a video conferencing session using the present invention. Each of the user stations 34-37 may contain one or more users having a video terminal for viewing the teleconference, audio input and output capabilities, and/or one or more video cameras. Data from the video cameras and audio data from the users is transmitted from each of the user stations 34-37 to the AMB 32. The AMB 32 combines and manipulates the data in a manner described in more detail hereinafter and provides a return signal to each of the users at the user stations 34-37.

Referring to FIG. 3, the user station 34 of FIG. 1 is shown in greater detail. The user station 34 is illustrated as having a single user 42, a single camera 44, and a single display station 46. The camera 44 and the display station 46 are electrically connected to the communication channel that connects the user station 34 to the AMB 32. The display station 46 has a conventional screen 48 that presents images received from video signals of other user stations 35-37 in a manner described in more detail hereinafter. If the user station includes a television and set-top-box, the user 42 can control the display of the screen 48 with a remote control device 49. If the user station has a PC or workstation, the user can control the video display with a mouse.

Although the user station 34 is shown as having one user 42, one camera 44 and one display terminal 46, it is possible for other user stations 35-37 to have more than one user and/or more than one camera. Moreover, it is possible to use a variety of terminal devices, including stand-alone PCs, network workstations, and even conventional television monitors with the control software (described below) located at a different location. The end user application would run in a set-top-box or a control PC. The specific configuration of the user station 34 shown in FIG. 3 is for illustrative purposes only.

Referring to FIG. 4, the screen 48 of FIG. 3 is shown in more detail. The screen 48 includes a pop-up window 52 showing other participants 54-58 of the video conference. The separate video images from each of the participants 54-58 could be provided to the AMB 32 by separate video signals from other ones of the user stations 35-37. Alternatively, it is possible for some of the participants 54-56 to be in the same room and hence captured by a single video image signal. This would occur if the participants 54-56 are in fact sitting together at a single user station in the manner shown in the window 52. However, it is also possible that the images from each of the participants 54-56 is from a separate video camera. As will be discussed in more detail hereinafter, the AMB 32 can combine the images from the various participants 54-58 in a manner shown in the pop-up window 52 to present the user with a single visual display of the participants of the teleconference, thus creating the illusion that the participants are sitting together at the teleconference.

Referring to FIG. 5, a schematic diagram illustrates the overall hardware architecture of the AMB 32. The AMB 32 includes network interfaces 72, 78 for handling incoming and outgoing signals from the user stations 34-37. A demultiplexer 73 separates the incoming signals into data, audio, video, and control signals, respectively, and routes the signals to respective data, audio and video bridges, and a control unit 76. The control unit 76 controls the functions of each of the data, audio and video bridges based on control signals and instructions received from the user stations 34-37. A multiplexer unit 77 multiplexes the outgoing signals from each of bridges and the control unit 76 and sends them through the network interface 78 back to the user stations 34-37.

Referring to FIG. 6, a schematic diagram illustrates the video portion (AVB) 32a of the AMB 32. The AVB 32a receives control signals C1, C2, . . . CN from each of the N users. The AVB 32a also receives video input signals VIN1, VIN2, . . . VINK from each of the K cameras located at the user stations 34-37. Note that, as discussed above, the number of cameras does not necessarily equal the number of users. The AVB 32a outputs video signals VOUT1, VOUT2, . . . VOUTN to the N users. In a manner discussed in more detail hereinafter, each of the video output signals is controlled by the control inputs from each of the users. For example, the video output signal VOUT1 could represent the video image shown in the pop-up window 52 of FIG. 4. The user viewing the pop-up window 52 can control the contents and presentation of the video signal VOUT1 by providing control signals C1 to the AVB 32a, in a manner discussed in more detail hereinafter.

The video input signals from the camera are provided to the video interface and normalization unit 72a. The video interface unit 72a handles, in a conventional manner, the various communication formats provided by the connections between the AMB 32 and the user stations 34-37. The unit 72a also normalizes the color components of the input video signals so that each picture element ("pel " or "pixel") for each of the video input signals has comparable red, green and blue components. The output signals of the video interface and normalization unit 72a are normalized input video signals.

A video composing unit (VCU) 74 receives the normalized input video signals from the cameras and combines the signals. Also input to the VCU 74 are control signals provided by a control unit 76 which processes the user control signals C1, C2 . . . CN, to control the contents and presentation of the output of the VCU 74. Operation of the VCU 74 and the control unit 76 is described in more detail hereinafter. The output of the VCU 74 is a plurality of normalized video signals, each of which contains a video image similar to the one shown in the pop-up window 52 of FIG. 4.

The video interface and denormalization unit 78a receives the outputs from the VCU 74 and provides output signals, VOUT1, VOUT2, . . . VOUTN, to each of the N users. The video interface and denormalization unit 78a denormalizes input video signals to provide an appropriate video output format according to each of the users desires.

Referring to FIG. 7, a schematic diagram illustrates the VCU 74 in detail. In order to simplify the discussion of FIG. 7, the control inputs and control circuitry of the VCU 74 are not shown in the schematic of FIG. 7.

The VCU 74 is comprised of a plurality of video composing chains (VCCs) 92-94. There is one VCC for each output: VOUT1, VOUT2, . . . VOUTN. That is, for a system to support N users, the VCU 74 must have at least N VCCs 92-94.

The VCCs 92-94 are comprised of a plurality of video composing module (VCM) units 96-107. The VCC 92 includes the VCMs 96-99, the VCC 93 includes the VCMs 100-103, and the VCC 94 comprises the VCMs 104-107.

Each of the VCMs 96-107 is identical to each of the other VCMs 96-107. Each of the VCMs 96-107 has an A input and a B input, each of which receives a separate video signal. Each of the VCMs 96-107 superimposes the video signal from the B input onto the video signal of the A input, in a manner described in more detail hereinafter. The output is the result of superimposing the B signal on the A signal.

The inputs to the VCCs 92-94 are provided by switches 112-114, respectively. The inputs to the switches are the video input signals from the cameras VIN1, VIN2, . . . VINK. Control signals (not shown in FIG. 7) operate the switches 112-114 so as to provide particular ones of the video input signals to particular inputs of the VCMs 96-107 of the VCCs 92-94. The control signals to the switches 112-114 vary according to the control inputs provided by the users. For example, if the user that is receiving the VOUT1 signal desires to see a particular subset of the video input signals, the user provides the appropriate control signals to the AVB 32a. Control logic (not shown in FIG. 7) actuates the switch 112 so that the switch provides the requested video input signals to the VCMs 96-99 of the VCC 92 that supplies VOUT1.

For the VCU 74 shown in FIG. 7, the VCCs 92-94 are illustrated as having four VCMs 96-99, 100-103, 104-107, respectively, each. Accordingly, each of the VCCs 92-94 is capable of combining five separate video images. This can be illustrated by examining the VCC 92 wherein the VCM 96 receives two of the video inputs and combines those inputs to provide an output. The output of the VCM 96 is provided as the A input to the VCM 97 which receives another video signal at the B input thereof and combines that signal with the A input to provide an output to the VCM 98 which receives the combined input as the A input thereof and receives a new video signal at the B input thereof, combines those signals, and provides an output to the A input of the VCM 99. The VCM 99 receives the combined signal at the A input thereof and a new video signal at the B input thereof, combines the signals, and provides the output VOUT1. It is possible to construct video composing chains having any number of video composing modules other than that shown in FIG. 7. The maximum number of images that can be superimposed is always 1 greater than the number of VCMs in the VCC.

Although FIG. 7 shows the VCCs 92-94 each with four VCMs 96-99, 100-103, 104-107, respectively, hardwired together, it is possible to configure the VCU 74 so that the connections between the VCMs are themselves switched. In that way, it would be possible for a user to request a particular number of VCMs from a pool of available VCMs which would then be wired together by the switches in a customized VCC. The particular switch arrangements used can be conventional, and the implementation of such switch arrangements is within the ordinary skill in the art.

The video composing chains described in FIG. 7 are shown as residing in a central network bridge. It should be understood that these parts of the invention might also be used within some user stations or similar terminal equipment for some of the same purposes as described herein, and therefore that these parts of the invention are not limited to use in a central facility.

Referring to FIG. 8, a schematic diagram illustrates in detail one of the VCMs 96 of FIG. 7. As discussed above, the VCMs 96-107 of FIG. 7 are essentially identical and differ only in terms of the inputs provided thereto.

The VCM 96 merges the video data from the A inputs with the video data from the B inputs. For each pel position in the output raster, one pel of data from either the A input or the B input is transferred to the output. The choice of which of the inputs is transferred to the output depends upon the priority assigned to each pel in each of the A and B input video streams.

For the A inputs of the VCM 96 shown in FIG. 8, each pel of the video is shown as having 24-bits each (8-bits each for red, green and blue) and as having 8-bits for the priority. Accordingly, each pel of the A input is represented as a 32-bit value. Similarly, for the B inputs, each pel is represented by a 24-bit video signal (8-bits each for red, green and blue) and an 8-bit priority. Accordingly, just as with the A inputs, each pel of the B inputs is represented by a 32-bit value.

The bit values discussed herein and shown in the drawings are used for purposes of illustration only and should not be taken as limiting the scope of the invention. All of the disclosed bit values for the inputs and outputs to the VCM 96 can be varied without changing the invention. For example, the video inputs and outputs could be 18- or 30-bits, the priority/key inputs and outputs could be 6- or 10-bits, and so forth.

The A video inputs are provided directly to a priority driven multiplexer 122. The B video inputs, on the other hand, are first provided to a 512K.times.32-bit frame memory 124 which stores the video data and the priority data for the B input video signal. Between the B priority input and the frame memory is a flexible system of priority masking and generation, described in detail below, which alters the original priority value of the B input. The frame memory 124 can be used to synchronize, offset, mirror, and scale the B video input with respect to the A video input.

The output of the frame memory 124 is provided to the priority driven multiplexer 122. Accordingly, the priority driven multiplexer 122 compares the priority for each pel of the A input with the priority for each pel of the B input from the frame memory 124 and outputs the pel having the higher priority associated therewith. The priority driven multiplexer 122 also outputs the priority of the pel having the highest priority between each pel of the A input and B input.

An input address generator 126 receives the H, V, and clock signals for the B video input. The input address generator 126 stores the 24-bit video portion of each pel of the B input in the frame memory 124 without making any significant modification to the B video input data. That is, the input address generator 126 stores the 24-bit video portion of each pel for the B video input without providing any offset, resizing, or any other image modifications to the B video input. Accordingly, the video portion of the B inputs stored in the frame memory 124 is essentially identical to that provided to the VCM 96.

The 8-bit priority portion of the B video inputs is provided to a B priority mask and selector 128. A priority generator 130 also provides inputs to the B priority mask and selector 128. Operation of the priority generator 130 is described in more detail hereinafter. The B priority mask and selector 128 selects certain bits from the output of the priority generator 130 and the input priority value and provides that output to a priority look-up table (P-LUT) 132. The P-LUT 132 is a 256.times.8 RAM (or any other compatible size) that maps the 8-bit input thereto into an 8-bit priority value which is stored, on a per pel basis, in the frame memory 124. Values for the priority look-up table 132 are provided to the VCM 96 in the manner discussed in more detail hereinafter.

The sizes of the P-LUT 132 and frame memory 124 can be varied for different maximum video raster formats, such as HDTV, and for different numbers of priority stacking levels, such as 256 (P-LUT=256.times.8) or 64 (P-LUT=64.times.6), without changing the invention.

The priority generator 130 generates a priority value for each of the pels of the B video input stored in the frame memory 124. One or more pel value keyer sections 134 provide a priority value for each of the pels according to the value of the 24-bit video signal. That is, the pel value keyer 134 alters the priority of each pel according to the input color and brightness of that pel.

The pel value keyer 134 shown has 3 sections labeled A, B, and C. Each section outputs 1-bit of the priority wherein the bit output equals a digital "1" if a pel falls into the specified color range and equals a digital "0" if the pel falls outside of the specified color range. For example, the pel value keyer-A has 6 values T1-T6 which are loaded with constant values in a manner described in more detail hereinafter. The pel value keyer A examines each pel from the input B video image and determines if the red portion of the pel is between the values of T1 and T2, the green portion is between the values of T3 and T4, and the blue value is between the values of T5 and T6. If all of these conditions hold, that is, if the pel has red, green and blue values that are all between T1 and T2, T3 and T4, and T5 and T6, respectively, then the pel value keyer-A outputs a "1". Otherwise, the pel value keyer-A outputs a "0". The operations of the pel value keyer-B and the pel value keyer-C are similar. In that way, each of the pel value keyers of the pel value keyer unit 134 can separately and independently provide a bit of the priority according to the color value of the input B video pel.

The pel value keyer 134 can be implemented in a conventional manner using digital comparator hardware. For some purposes it may be more useful for the three video channels to carry information in formats other than RGB (red, green, blue), such as conventional YIQ or YUV formats. Such alternate encodings are also usable by the pel value keyer and do not alter its operation other than by altering the color space and the required thresholds.

The priority generator 130 also contains one or more window generation sections 136. The window generation sections 136 each consists of a window generation A part, a window generation B part, and a window generation C part. Each of the parts operates independently. The window generation part processes the H, V, and clock (CLK) portions of the signal from the B video input and outputs a digital "1" bit or a digital "0" bit depending on the horizontal and vertical location of each of the pels of the B video input. For example, the window generation A part can have 4 separate values for H1, H2, V1 and V2. If the input value indicated by the H input for the B input video signal is between H1 and H2, and the input value indicated by the V input is between V1 and V2, then the window generation A part of the window generation section 136 outputs a digital "1" bit. Otherwise, the window generation A part outputs a digital "0" bit. Each of the window generation parts, window generation A part, window generation B part, and window generation C part, operate independently of each other. The window generation section 136 can be implemented in a conventional manner using digital comparator hardware.

Several window generators 136 and pel-value keyers 134, each producing 1-bit, can in combination define distinct priorities for several objects of various colors in different parts of the picture. The individual output bits are treated as an 8-bit word. This word is defined as a numerical value and used to address the P-LUT 132. Depending upon the contents of the memory of the P-LUT 132 any input can be transformed into any numerical priority output at the full video pel clock rate. This transformation is necessary because the multiplexer 122 passes only the highest priority input at each pel position.

The priority generator 130 needs only to assign different numeric priority values to different windows or objects within the B input video raster. The P-LUT 132 then allows the customer to control the ordering of those priorities. For example, when the customer makes a request by a graphical interaction at the user station 34-37 to raise a particular object or window in his composed scene, the human interface program and hardware control programs convert that request into a reassignment of the numerical priorities attached to that area of the image, raising the priority of the requested object, or lowering the priorities of occluding objects.

The priority generator 130 is illustrated in FIG. 8 as having a pel value keyer section 134 with three independent pel value keyer parts and a window generation section 136 with three separate and independent window generation parts. The number of window generators and pel value keyers can be varied without changing the invention. Further, the number of separate parts used for each of the sections 134, 136 is a design choice based on a variety of functional factors including the number of bits used for the priority, the number of desired independent parts, and other criteria familiar to one of ordinary skill in the art. Accordingly, the invention can be practiced with one or more pel value keyer sections 134 having a number of parts other than three and one or more window generation sections 136 having a number of independent window generation parts other than three.

The 6-bit output of the priority generator 130 is provided to the priority mask and selector 128 which is also provided with the input priority signal from the B video input. Conventional control registers (not shown) determine which 8- of the input 14-bits provided to the priority mask selector 128 will be provided to the priority look-up table 132. Although the output of the priority mask and selector 128 is shown as an 8-bit output, and similarly the input to the priority look-up table 132 is shown as an 8-bit input, the invention can be practiced with any number of bits output for the priority mask and selector 128 and input for the priority look-up table 132. The number of bits selected is a design choice based on a variety of functional factors known to one of ordinary skill in the art, including the number of desired distinct priorities and the amount of priority control desired.

As discussed above, the priority look-up table 132 is a 256.times.8 RAM which maps the 8-bits provided by the priority mask and selector 128 into an 8-bit value which is provided to the frame memory 124. Accordingly, the priority associated with each pel stored in the frame memory 124 is provided by the priority look-up table 132.

The priority mask and selector 128, priority generator 130 and priority look-up table 132 operate together to provide the priority for each pel of the B video input. As discussed in more detail hereinafter, the priority of the B video inputs can thus be altered in order to provide a variety of effects. For example, if the B video input is provided in a window that has been clipped, the window generation section 136 can be set accordingly so that pels that are outside the clipped window are given a low priority while pels that are inside the clipped window are given a relatively high priority. Similarly, the pel value keyer section 134 can be used to mask out one or more colors so that, for example, a video image of a teleconference participant showing the participant in front of a blue background can be provided as the B video input and the pel value keyer section 134 can be set to mask out the blue background by providing a relatively low priority to pels having a color corresponding to the blue background and a relatively high priority to other pels of the B video input image.

A read address generator 140 reads the B input data from the frame memory 124 and provides the data to the priority driven multiplexer 122. In order to compensate for different video standards being used for the A input and the B input, the read address generator 140 reads the data at a rate corresponding to the rate of data provided via the A video input. That is, the read address generator 140 synchronizes the inputs to the priority driven multiplexer 122 so that the pels from the frame memory 124 arrive simultaneously with corresponding pels from the A video input to the priority driven multiplexer 122.

The read address generator 140 also handles offsets between the A input and B input and any scaling and/or mirroring of the B video input. The requested amount of X and Y offset, amount of magnification or reduction, and any flipping are all provided to the VCM 96 in a manner described in more detail hereinafter.

The read address generator 140 handles offsets by providing the pel data from the frame memory 124 at a specified vertical and horizontal offset from the data from the A video input. For example, if the B video image is to be shifted horizontally 5 pels from the A video input, then the read address generator 140 would wait 5 pels after the left edge of the A video input to provide the left edge of the B video input. Magnification/reduction of the B video image and flipping the B video image are handled in a similar manner. Note that providing an offset to a video image, magnifying or reducing a video image, and flipping a video image are all known to one of ordinary skill in the art and will not be described in more detail herein.

A computer control interface 142 connects the VCM 96 to an external control device such as the control unit 76 shown in FIGS. 5 and 6. The computer control interface 142 has an address input and a data input. The address input is shown as a 16-bit value and the data input is shown in FIG. 8 as an 8-bit value. However, it will be appreciated by one of ordinary skill in the art that the number of bits for the address and the data inputs can be modified and are a design selection that depends on a variety of functional factors familiar to one of ordinary skill in the art.

The address input is used to select different VCMs and various registers within each VCM 96 and to load the priority look-up table 132. Different address inputs load different ones of these elements. The data input is the data that is provided to the various registers and the look-up table 132. Accordingly, a user wishing to provide values to the priority look-up table 132 would simply provide the appropriate address for each of the 256 locations in the priority look-up table 132 illustrated herein and would provide the data that is to be loaded into the look-up table 132. Similarly, the pel value keyer section 134 and/or the window generation section 136 can be loaded via the computer control interface 142 by providing the appropriate address for eac