WikiPatents - Community Patent Review
Create Free Account  |  License or Sell Your Patent  |  WikiPatents Marketplace  |  WikiPatents Blog
Username:  Password:  
    
Advanced Search
Video database indexing and method of presenting video database index to a user    
United States Patent5485611   
Link to this pagehttp://www.wikipatents.com/5485611.html
Inventor(s)Astle; Brian (Phoenix, AZ)
AbstractA computer-implemented method for generating a video database index for indexing a video database comprising a plurality of video frames, the video database index comprising a plurality of index frames, wherein each video frame within the video database has a unique location within the video database. According to a preferred embodiment of the invention, the video frames of the video database are transmitted to a processor. A processor generates the index frames of the video database index in accordance with the amount of change occurring in images depicted by the video frames of the video database. Each of the index frames represents a unique video sequence of a plurality of video sequences that constitute the video database. Each video sequence comprises a sequence of video frames of the video database. Also, each video sequence has a unique location within the video database.
   














 Title Information Submit all comments and votes
 
Patent Text Patent PDF Print Page Summary File History
Plain text PDF images Print Summary File History
Drawing from US Patent 5485611
Video database indexing and method of presenting video database index to

     a user - US Patent 5485611 Drawing
Video database indexing and method of presenting video database index to a user
Inventor     Astle; Brian (Phoenix, AZ)
Owner/Assignee     Intel Corporation (Santa Clara, CA)
Patent assignment
All assignments
Publication Date     January 16, 1996
Application Number     08/366,807
PAIR File History     Application Data   Transaction History
Image File Wrapper   Patent Term   Fees
Litigation
Filing Date     December 30, 1994
US Classification    
Int'l Classification    
Examiner     Kulik; Paul V.
Assistant Examiner    
Attorney/Law Firm     Murray; William H. Kinsella; N. Stephen
Address
Parent Case    
Priority Data    
USPTO Field of Search    
Patent Tags     video database indexing presenting video database index to user
   
Enter a comma (,) or semicolon (;) between multiple tag words/phrases.
Describe this patent:
 Amusing   
 Clever   
 Complex   
 Efficient   
 Historic   
 Important   
 Innovative   
 Interesting   
 Practical   
 Simple   
[no votes]
Patent WIKI

Share information and news about this patent, including information and news about the technology, inventors, company, ligation and licensing.

 References Submit all comments and votes
 
*references marked with an asterisk below are user-added references
 U.S. References
 
Add a new US reference:  
ReferenceRelevancyCommentsReferenceRelevancyComments
5440401
Parulski
386/124
Aug,1995

[0 after 0 votes]
5365384
Choi
360/72.2
Nov,1994

[0 after 0 votes]
5287230
Kamide
360/60
Feb,1994

[0 after 0 votes]
5164865
Shaw
360/72.2
Nov,1992

[0 after 0 votes]
5157511
Kawai
386/68
Oct,1992

[0 after 0 votes]
5083860
Miyatake

Jan,1992

[0 after 0 votes]
4649380
Penna
345/670
Mar,1987

[0 after 0 votes]
 Foreign References
 Other References
 Market Review Submit all comments and votes
   
Market Size
Estimate the gross annual revenues of the relevant market sector:
> $10B
$5B - $10B
$2B - $5B
$500M - $2B
$100M - $500M
$10M - $100M
$1M - $10M
$500K - $1M
$100K - $500K
< $100K
[No votes]
$0
 
$0   $2.5B   $5B   $7.5B   $10B
Market Share
Estimate the percentage of the relevant market sector this invention will capture:
75% - 100%
50% - 74.99%
25% - 49.99%
10 - 24.99%
5 - 9.99%
2 - 4.99%
1 - 1.99%
< 1%
[No votes]
0.0%
 
0%   25%   50%   75%   100%
Reasonable Royalty
What percentage of gross sales should the inventor or assignee be paid?
75% - 100%
50% - 74.99%
25% - 49.99%
10 - 24.99%
5 - 9.99%
2 - 4.99%
1 - 1.99%
< 1%
[No votes]
0.0%
 
0%   25%   50%   75%   100%
Public's "Guesstimation" of Royalty Value
Market SizeN/A[No votes]
xMarket ShareN/A[No votes]
xReasonable RoyaltyN/A[No votes]

N/A

License Availablity
If you are NOT the owner or assignee, answer here:
Yes, license is available for purchase

No, license is not currently available



[No votes]
License Availablity
If you ARE the owner or assignee, answer here:
Yes, license is available for purchase

No, license is not currently available



[No votes]
Competitive Advantage
Does this invention have a significant competitive advantage over similar technologies?
Yes

No



[No votes]
Most helpful competitive advantage comment
[No comments]

Commercial Alternatives
Are there viable commercial alternatives for this invention?
Yes

No



[No votes]
Most helpful commercial alternative comment
[No comments]

 Technical Review Submit all comments and votes
 Claims Submit all comments and votes
 


What is claimed is:

1. A computer-implemented method for generating a video database index for indexing a video database comprising a plurality of video frames, the video database index comprising a plurality of index frames, wherein each video frame within the video database has a unique location within the video database, the method comprising the steps of:

(a) transmitting the video frames of the video database to a processor; and

(b) generating with the processor the index frames of the video database index in accordance with the amount of change occurring in images depicted by the video frames of the video database;

wherein:

each of the index frames represents a unique video sequence of a plurality of video sequences that constitute the video database;

each video sequence comprises a sequence of video frames of the video database; and

each video sequence has a unique location within the video database.

2. The method of claim 1, wherein each index frame of the video database index further comprises information corresponding to the location of the respective video sequence represented by each index frame.

3. The method of claim 1, wherein some video sequences of the plurality of video sequences may comprise a plurality of video subsequences.

4. The method of claim 1, wherein each video sequence of the plurality of video sequences represents a scene, wherein the scene represented by a given video sequence of the sequential plurality of video sequences differs by a predetermined threshold amount from scenes represented by video sequences adjacent to the given video sequence.

5. The method of claim 1, wherein step (b) comprises the steps of:

(1) detecting a scene cut between sequences of video frames of the video database;

(2) selecting a first reference frame of the video database index after a detected scene cut;

(3) determining the difference between subsequent video frames and the first reference frame;

(4) selecting a second reference frame when a predetermined threshold difference is determined between the second reference frame and the first reference frame and defining a defined video sequence comprising the plurality of video frames from the first reference frame to the second reference frame;

(5) generating an index frame representative of the defined video sequence; and

(6) selecting a subsequent first reference frame to generate index frames for subsequent video sequences.

6. The method of claim 5, wherein:

step (b)(1) comprises the step of detecting a second predetermined threshold difference or scene fade between sequences of video frames of the video database; and

step (b)(2) further comprises the step of detecting transition frames and excluding the transition frames from being selected as a first reference frame.

7. The method of claim 6, wherein step (b)(3) comprises the step of determining the motion-compensated difference between subsequent video frames and the first reference frame.

8. The method of claim 6, wherein step (b)(3) comprises the step of determining the difference between subsequent video frames and the first reference frame, wherein motion, panning, and zooms are taken into account.

9. The method of claim 6, wherein the index frame comprises one of the video frames of the video sequence comprising the plurality of video frames from the first reference frame to the second reference frame.

10. The method of claim 9, wherein the index frame comprises a difference-centered video frame of the video sequence comprising the plurality of video frames from the first reference frame to the second reference frame.

11. The method of claim 1, further comprising the step of:

(c) detecting scene cuts between sequences of video frames of the video database; and

(d) defining a video shot as comprising a plurality of sequential video frames between two consecutive scene cuts;

wherein each video sequence of the plurality of video sequences comprises a video shot.

12. The method of claim 11, wherein each index frame of the video database index comprises a difference-centered video frame of a unique video shot of the video database.

13. The method of claim 1, further comprising the step of:

(c) storing each index frame of the video database index in a mass storage device.

14. The method of claim 13, further comprising the step of:

(d) displaying a predetermined number of index frames of the video database index in parallel on a monitor; and

(e) updating at a predetermined frequency the predetermined number of index frames displayed in parallel on the monitor.

15. The method of claim 14, wherein:

the predetermined number and predetermined frequency may be varied by a user; and

each index frame of the video database index further comprises information corresponding to the location of the respective video sequence represented by each index frame; and

further comprising the step of:

(f) displaying the location of a video sequence corresponding to an index frame selected by the user.

16. The method of claim 1, further comprising the step of:

(c) displaying a predetermined number of index frames of the video database index in parallel on a monitor; and

(d) updating at a predetermined frequency the predetermined number of index frames displayed in parallel on the monitor.

17. The method of claim 16, wherein:

the predetermined number and predetermined frequency may be varied by a user; and

each index frame of the video database index further comprises information corresponding to the location of the respective video sequence represented by each index frame; and

further comprising the step of:

(e) displaying the location of a video sequence corresponding to an index frame selected by the user.

18. A method of presenting a video database index to a user, the video database index comprising a plurality of index frames and corresponding to a video database, wherein each index frame of the video database index represents a video sequence of a plurality of video sequences of the video database, the method comprising the steps of:

(a) displaying a predetermined number of index frames of the video database index in parallel on a monitor; and

(b) updating at a predetermined frequency the predetermined number of index frames displayed in parallel on the monitor.

19. The method of claim 18, wherein:

the predetermined number and predetermined frequency may be varied by a user; and

further comprising the step of:

(c) displaying on the monitor the location of a video sequence corresponding to an index frame selected by the user.

20. An apparatus for generating a video database index for indexing a video database comprising a plurality of video frames, the video database index comprising a plurality of index frames, wherein each video frame within the video database has a unique location within the video database, the apparatus comprising:

(a) a processor and means for transmitting the video frames of the video database to the processor; and

(b) means for generating with the processor the index frames of the video database index in accordance with the amount of change occurring in images depicted by the video frames of the video database;

wherein:

each of the index frames represents a unique video sequence of a plurality of video sequences that constitute the video database;

each video sequence comprises a sequence of video frames of the video database; and

each video sequence has a unique location within the video database.

21. The apparatus of claim 20, wherein each index frame of the video database index further comprises information corresponding to the location of the respective video sequence represented by each index frame.

22. The apparatus of claim 20, wherein each video sequence of the plurality of video sequences represents a scene, wherein the scene represented by a given video sequence of the sequential plurality of video sequences differs by a predetermined threshold amount from scenes represented by video sequences adjacent to the given video sequence.

23. The apparatus of claim 20, wherein means (b) comprises:

(1) means for detecting a scene cut between sequences of video frames of the video database;

(2) means for selecting a first reference frame of the video database index after a detected scene cut;

(3) means for determining the difference between subsequent video frames and the first reference frame;

(4) means for selecting a second reference frame when a predetermined threshold difference is determined between the second reference frame and the first reference frame and defining a defined video sequence comprising the plurality of video frames from the first reference frame to the second reference frame;

(5) means for generating an index frame representative of the defined video sequence; and

(6) means for selecting a subsequent first reference frame to generate index frames for subsequent video sequences.

24. The apparatus of claim 23, wherein:

means (b)(1) comprises means for detecting a second predetermined threshold difference or scene fade between sequences of video frames of the video database; and

means (b)(2) further comprises means for detecting transition frames and excluding the transition frames from being selected as a first reference frame.

25. The apparatus of claim 24, wherein means (b)(3) comprises means for determining the motion-compensated difference between subsequent video frames and the first reference frame.

26. The apparatus of claim 24, wherein means (b)(3) comprises means for determining the difference between subsequent video frames and the first reference frame, wherein motion, panning, and zooms are taken into account.

27. The apparatus of claim 24, wherein the index frame comprises one of the video frames of the video sequence comprising the plurality of video frames from the first reference frame to the second reference frame.

28. The apparatus of claim 27, wherein the index frame comprises a difference-centered video frame of the video sequence comprising the plurality of video frames from the first reference frame to the second reference frame.

29. The apparatus of claim 20, further comprising:

(c) means for detecting scene cuts between sequences of video frames of the video database; and

(d) means for defining a video shot as comprising a plurality of sequential video frames between two consecutive scene cuts;

wherein each video sequence of the plurality of video sequences comprises a video shot.

30. The apparatus of claim 29, wherein each index frame of the video database index comprises a difference-centered video frame of a unique video shot of the video database.

31. The apparatus of claim 20, further comprising:

(c) a mass storage device for storing each index frame of the video database index.

32. The apparatus of claim 31, further comprising:

(d) a monitor and means for displaying a predetermined number of index frames of the video database index in parallel on the monitor; and

(e) means for updating at a predetermined frequency the predetermined number of index frames displayed in parallel on the monitor.

33. The apparatus of claim 32, wherein:

the predetermined number and predetermined frequency may be varied by a user; and

each index frame of the video database index further comprises information corresponding to the location of the respective video sequence represented by each index frame; and

further comprising:

(f) means for displaying on the monitor the location of a video sequence corresponding to an index frame selected by the user.

34. The apparatus of claim 20, further comprising:

(c) a monitor and means for displaying a predetermined number of index frames of the video database index in parallel on the monitor; and

(d) means for updating at a predetermined frequency the predetermined number of index frames displayed in parallel on the monitor.

35. The apparatus of claim 34, wherein:

the predetermined number and predetermined frequency may be varied by a user; and

each index frame of the video database index further comprises information corresponding to the location of the respective video sequence represented by each index frame; and

further comprising:

(e) means for displaying on the monitor the location of a video sequence corresponding to an index frame selected by the user.
 Description Submit all comments and votes
 


BACKGROUND OF THE INVENTION

1. Field of the Invention

The present invention relates to video processing and, in particular, to computer-implemented processes and apparatuses for indexing video databases.

2. Description of the Related Art

Video cassettes, also called video tapes, are frequently utilized as storage media to store video data. For example, a videocassette may be utilized to store two hours of home video movies. A typical VHS video cassette may also be utilized to store six hours of video footage. Often consumers store many hours of home video per year as a family video diary. Video data stored in this manner typically consists of sequences of images or video frames that constitute one or more motion pictures, when displayed on a monitor. In addition to video cassettes, video databases may also be stored on other storage media such as CD-ROM or hard disk drives. Also, a video database may be stored in analog or digital format in the storage medium.

As an example of a video database, a consumer's collection of home video movies stored on one or more video cassettes can be considered to constitute a video database of the consumer. As another example, a surveillance camera in a convenience store or bank that records a new video frame every few seconds can also generate a video database that is stored on a storage medium such as a video cassette.

In terminology typically associated with video, a "video shot" is a sequence of video frames that occurs between two scene cuts or other transitions such as fades or cross-fades. Thus, a video shot is a sequence of continuously-filmed or produced video frames generated by a video camera. For example, if a video camera is turned on to film a new event, and switched off one minute later, the video frames recorded on the video cassette during this one-minute time interval constitute a video shot. Such video shots may include pans, tilts, zooms, and other effects. Transitions between video shots may be by way of abrupt scene cuts, or by fades, wipes, dissolves, and the like.

Video databases typically contain a plurality of video shots, and the number of video frames and video shots stored in a given database can be extremely large. It is often difficult to access the particular contents of such video databases, however. It is easy to forget what has been recorded, and even if a user is searching for a known event, image, or video frame, the user often has forgotten the date of the event, the video cassette on which the event is recorded, or the location of the event on a particular video cassette. Because humans view video sequences sequentially and because of the large number of video frames and shots stored in video databases, it is difficult to locate specific events or video frames within a video database by searching manually for the event.

For example, a consumer who has produced several hours of home video movies may desire to locate a particular event, such as a hot-air balloon show previously filmed by the consumer, to show to his friends or to edit for an edited home video collection. The hot-air balloon event may occupy several minutes of footage somewhere on one of a plurality of video cassettes, and may comprise dozens of sequential video shots taken during the filming of the event. To find this scene, the consumer may need to resort to tedious sequential, hunt-and-peck searching of various video cassettes that constitute the video database, until the desired location is found. Even running the video cassette at high speed, such searching may be impractical. Further, there is a limit to how fast a video cassette may be played and the user still recognize scenes displayed. As another example, a user of the above-described surveillance video database may need to visually scan the surveillance video in order to find a particular video frame or frames that depict suspicious activity, such as a shoplifting incident.

Users of video databases can sometimes prepare indexes of the video database to make locating particular scenes more efficient. For example, each time a consumer shoots a new video scene or films a new event, he can write down a short description of the event along with a correlation to the appropriate location on the cassette. Alternatively, an index may be prepared by the consumer while viewing the video footage stored within the database. The location on the particular video cassette may be denoted by a time index or a counter index, for example. Thus, the user-prepared index may indicate that the hot-air balloon event appears on video cassette number 12, starting at time 1:17:23 (in hours:minutes:seconds format) from the beginning of the video cassette, and/or at counter number 2351 from the beginning of the tape.

However, preparation of such indexes is inconvenient and may not be sufficiently detailed to allow the consumer to locate a specific scene, image, or event. Also, if an index is not prepared during the filming of the home video, many hours of non-indexed video cassettes can accumulate, making the manual preparation of an index extremely inconvenient and infeasible. Further, some video databases are generated automatically, such as the surveillance example described above, so that the content of the database is not known until it is viewed. In this case an index cannot be prepared other than be viewing the entire video database and recording a description of its contents.

There is thus a need for a method and apparatus for video database indexing that avoids these problems.

It is accordingly an object of this invention to overcome the disadvantages and drawbacks of the known art and to provide a computer-implemented method and apparatus for indexing video databases.

It is also an object of this invention to provide for more efficient locating of particular scenes or images within video databases.

It is a further related object of this invention to allow more efficient scanning of the contents of a video database.

Further objects and advantages of this invention will become apparent from the detailed description of a preferred embodiment which follows.

SUMMARY

The previously mentioned needs and objectives are fulfilled with the present invention. There is provided herein a computer-implemented method for generating a video database index for indexing a video database comprising a plurality of video frames, the video database index comprising a plurality of index frames, wherein each video frame within the video database has a unique location within the video database. According to a preferred embodiment of the invention, the video frames of the video database are transmitted to a processor. A processor generates the index frames of the video database index in accordance with the amount of change occurring in images depicted by the video frames of the video database. Each of the index frames represents a unique video sequence of a plurality of video sequences that constitute the video database. Each video sequence comprises a sequence of video frames of the video database. Also, each video sequence has a unique location within the video database.

BRIEF DESCRIPTION OF THE DRAWINGS

These and other features, aspects, and advantages of the present invention will become more fully apparent from the following description, appended claims, and accompanying drawings in which:

FIG. 1 is a computer-based video processing system for encoding video signals, according to a preferred embodiment of the present invention;

FIG. 2 depicts a diagram of a video shot with alternative representative index frames in accordance with the present invention;

FIG. 3 depicts a diagram of a video shot with multiple video sequences having representative index frames in accordance with the present invention; and

FIG. 4 depicts a monitor displaying index frames of the present invention.

DESCRIPTION OF THE PREFERRED EMBODIMENT

The present invention exploits the fact that human visual systems are very efficient at pattern and object recognition. Thus, a limited number of representative index frames can be extracted or generated from the video database based on the amount of change in the video database, to form a video database index, which may be presented at a later time to a user, or in real-time in alternative preferred embodiments. The video database index allows the user to do a very efficient visual search of index frames displayed in parallel streams to locate portions of the video database of interest.

Video Processing System Hardware

Referring now to FIG. 1, there is shown a computer-based video processing system 100 for processing and encoding video image signals (i.e. video frames), according to a preferred embodiment of the present invention. The purpose of video processing system 100 is thus to perform a prescan of the video database, extract representative frames, and present them to the user. Video processing system 100 receives video signals representing video frames in a video database, processes these video signals, and selects or generates certain video frames to be stored as index frames in a mass storage device, depending upon the amount of change in scenes depicted in the video database. These index frames form a video database index, i.e., a video database comprising index frames representative of the prescanned video database. In the present invention, such a video database index comprises a set of index frames much smaller in number than the video database, wherein each significantly different scene within the video database is represented by an index frame and, ideally, by only one index frame. Since the video database index will contain a much smaller number of video frames than a typical database, the index may be searched easily and quickly to find the desired event, as described more particularly below with reference to FIG. 4.

In video processing system 100 of FIG. 1, analog-to-digital (A/D) converter 102 of video processing system 100 receives analog video image signals representative of video frames from a video database source such as VCR 135. A/D converter 102 decodes (i.e., separates the signal into constituent components) and digitizes each frame of the analog video image signals into digital video frame component signals (e.g., in a preferred embodiment, Y, U, and V component signals). It will be understood by those skilled in the art that the video database source may be any suitable source of digital or analog video signals, e.g. a digital VCR.

Capture processor 104 receives, captures, and stores the digitized component signals as subsampled video frames in memory device 112 via bus 108. As those skilled in the art will appreciate, the digitized component signals may also be stored without subsampling in alternative preferred embodiments of the present invention for higher image quality. Each subsampled video frame is represented by a set of two-dimensional component planes or pixel bitmaps, one for each component of the digitized video frame signals. In a preferred embodiment, capture processor 104 captures video image signals in a YUV4:1:1 format, in which every (4.times.4) block of pixels of the Y (luminance) component plane corresponds to a single pixel in the U (chrominance) component plane and a single pixel in the V (chrominance) component plane.

Pixel processor 106 accesses captured bitmaps representative of video frames from memory device 112 via bus 108 and selects index frames from the video frames processed, as more particularly described below. The selected index frames are typically encoded for compression purposes before being stored in a mass storage device such as mass storage device 120. Pixel processor 106 thus generates encoded index frames that represent different scenes, each different scene being comprised of a plurality of captured video frames. Depending upon the particular encoding method implemented, pixel processor 106 applies a sequence of compression techniques to reduce the amount of data used to represent the information in each index frame. The encoded index frame may then be stored to memory device 112 via bus 108 for transmission to host processor 116 via bus 108, bus interface 110, and system bus 114 for storage in mass storage device 120. Mass storage device 120 in this manner may comprise a video database index which may be utilized as an index by a user, as described below, to locate specific images or events within the video database. Those skilled in the art will appreciate that system bus 114 and bus 108 may be merged into the same system bus 114. It will further be understood that host processor 116 may in alternative preferred embodiments perform the functions of pixel processor 106 described herein.

In this embodiment, host processor 116 may also transmit encoded video frames to transmitter 118 for real-time transmission to a remote receiver (not shown) for video conferencing purposes.

Host processor 116 may be utilized to decode encoded index frames that were previously encoded and stored in mass storage device 120, so that the index frames may be displayed and viewed by a user. Host processor 116 receives encoded index frames via system bus 114 that were stored in mass storage device 120. Host processor 116 temporarily stores the encoded index frames in host memory 126.

Host processor 116 decodes the encoded index frames and scales the decoded index frames for display, as more particularly described below. Decoding the encoded index frames involves undoing the compression processing implemented by pixel processor 106. Scaling the decoded index frames involves upsampling the U and V component signals to generate full-sampled Y, U, and V component signals in which there is a one-to-one-to-one correspondence between Y, U, and V pixels in the scaled component planes. Scaling may also involve scaling the component signals to a display size and/or resolution different from the image signals as original captured. Host processor 116 then stores the scaled decoded index frames to host memory 126 for eventual transmission to digital-to-analog (D/A) converter 122 via system bus 114. D/A converter converts the digital scaled decoded index frames to analog image signals for display on monitor 124. As described in further detail below with reference to FIG. 2, host processor 116 may also merge a plurality of decoded index frames, for example nine index frames, into a 3.times.3 tiled format within a single video frame for display on monitor 124.

Video processing system 100 is preferably a general microprocessor-based personal computer (PC) system with a special purpose video-processing plug-in board. In particular, A/D converter 102 may be any suitable-means for decoding and digitizing analog video image signals. Capture processor 104 may be any processor suitable for capturing digitized video image component signals as subsampled frames. Pixel-processor 106 may be any suitable means for encoding and processing subsampled video frames, where the means is capable of implementing functions such as a forward discrete cosine transform. Memory device 112 may be any suitable computer memory device and is preferably a dynamic random access memory (DRAM) device. Bus 108 may be any suitable digital signal transfer device and is preferably an Industry Standard Architecture (ISA) bus or Extended ISA (EISA) bus or a Peripheral Component Interface (PCI) bus. Bus interface 110 may be any suitable means for interfacing between bus 108 and system bus 114. In a preferred embodiment, A/D converter 102, capture processor 104, pixel processor 106, bus 108, bus interface 110, and memory device 112 are contained in a single plug-in board, such as an Intel.RTM. ActionMedia.RTM.-II board, capable of being added to a general microprocessor-based personal computer (PC) system.

Host processor 116 may be any suitable means for controlling the operations of the special-purpose video processing board and is preferably an Intel.RTM. general purpose microprocessor such as an Intel.RTM. i386.TM., i486.TM., or Pentium.TM. processor. Host memory 126 may be any suitable memory device used in conjunction with host processor 116 and is preferably a combination of random access memory (RAM) and read-only memory (ROM). System bus 114 may be any suitable digital signal transfer device and is preferably a PCI bus. Alternatively, system bus 114 may be an Industry Standard Architecture (ISA) bus or Extended ISA (EISA) bus. Mass storage device 120 may be any suitable means for storing digital signals and is preferably a computer hard disk drive or CD-ROM device. Mass storage device 120 may also be a digital or analog VCR that records video signals. Transmitter 118 may be any suitable means for transmitting digital signals to a remote receiver and is preferably transmits digital signals over PSTN lines. Those skilled in the art will understand that encoded video frames may be transmitted using any suitable means of transmission such as telephone line (PSTN or ISDN), RF antenna, local area network, or remote area network.

D/A converter 122 may be any suitable device for converting digital image signals to analog image signals and is preferably implemented through a personal computer (PC)-based display system such as a VGA or SVGA system. Monitor 124 may be any means for displaying analog image signals and is preferably a VGA monitor. VCR 135 may be any VCR suitable for transmitting analog video signals representative of images stored on a video cassette to analog-to-digital converter 102.

Selecting Index Frames

Index frames are selected in the current invention from the plurality of video frames constituting a video database based on the amount of change in scenes depicted in the video database, so that significantly different scenes, images, or video shots within the video database are represented by an index frame. Thus, each index frame represents a unique scene corresponding to a particular sequence of video frames, or "video sequence." As will be understood by those skilled in the art, an index frame is "representative" of its corresponding video sequence in that a user can recognize the represented video sequence by viewing the index frame. Thus, if the user is familiar with the particular scene represented, he may be reminded of the scene when he sees the index frame. If the user has forgotten or is unfamiliar with the scene, the user can get some idea of the visual contents of the images and features in the represented video sequence when he sees the corresponding index frame. These index frames constitute a video database index stored in a mass storage device, and may be viewed in an efficient manner by a user of the index to locate a particular scene, as described in further detail below with reference to FIG. 4. Those skilled in the art will appreciate that some video sequences may consist of a single video frame, although video sequences typically consist of a plurality of video frames.

Referring again to the hot-air balloon event discussed above as an example, the video database may contain several video shots (i.e., sequences of video frames between scene cuts) that were filmed during the hot-air balloon event. These video shots may in turn each comprise several video sequences, i.e. significantly different scenes. For example, during a single video shot the camera may pan away from hot-air balloons in the sky and towards people in a crowd on the ground watching the balloons. Because the images within the video shot change significantly during a pan, or when objects enter or leave the field of view, such a video shot can contain several video sequences, each of which is significantly different from the immediately preceding video sequence. A goal of the present invention is thus to determine video sequences within the video database that adequately represent different scenes within the video database, and to represent each video sequence with an index frame. The video database index will therefore adequately represent most or all of the different scenes, images, and objects depicted in the video database, but may be searched much more easily and quickly than can be the video database.

Continuing with this example, there should be an index frame stored in the index that is representative of each significantly different video sequence within each video shot of the set of video frames corresponding to the hot-air balloon event. Instead of tediously searching the entire video database to find the hot-air balloon event or various sub-events or particular images within the hot-air balloon event, a user can view the vastly smaller number of index frames stored within the video database index until an index frame representative of the hot-air balloon event or sub-event is observed. The index frame that is found can then be utilized to indicate the exact location of the corresponding video scenes in the video database. Because the number of index frames within the video database index is much smaller than the number of video frames within the video database, searching the video database index is much more practical and feasible than performing a manual search of the video database itself. Further, as described in detail below with reference to FIG. 4, the video database index may be designed to be more efficiently searched by a person than can be a standard video database.

As explained above, a hot-air balloon event that occupies several minutes' worth of events in a video database will very likely comprise several significantly different scenes or video sequences that should each be represented by an index frame. For example, as different balloons enter and exit the filmed scene (or as the camera pans), new index frames are periodically extracted, thus defining different video sequences, each represented by an index frame. Similarly, when the camera operator stops filming and resumes filming later, thereby creating a scene cut between two different video shots, the scene represented in the second video shot will usually differ substantially from the images in the previous shot. In this case, at least one new index frame should be extracted from the plurality of video frames constituting the second video shot because index frames representing the previous video shot will not adequately represent the different video frames of the second video shot.

Various techniques may be utilized to determine which video frames will be selected as index frames, i.e. to segment the video database into represented video sequences. For example, scene cuts may be detected, and thus video shots can be defined as occurring between such scene cuts. In one embodiment of the present invention, a single index frame can be selected per video shot to constitute the video database index. In this case each video shot is defined as a single video sequence.

Referring now to FIG. 2, there is depicted a diagram of a video shot 202 with alternative representative index frames in accordance with the present invention. These alternative index frames are shown shaded in FIG. 2. As will be understood, because video shot 202 will be represented by a single index frame, video shot 202 is coextensive with a video sequence 202 which is represented by an index frame. Video shot 202 may be defined as the sequence of video frames 202.sub.1 -202.sub.100 between two detected scene cuts 203, 204. Scene cut 204 is detected, for example, by determining an abrupt difference between video frames 202.sub.100 and 206.sub.1, or by detecting a fade-in and fade-out or other typical scene transition, as more particularly described below. Video shot 202 thus comprises a continuous sequence of 100 video frames 202.sub.1 -202.sub.100, any of which may be chosen as the index frame that will represent video shot 202. For example, the first video frame 202.sub.1, the last video frame 202.sub.100, or the middle video frame 202.sub.50, may be selected as the index frame to represent video shot 202. When a user scans an index containing the selected index frame and sees the index frame, the user will very likely recognize at least some of the events within video shot 202 and thus realize that he has located an index frame corresponding to the events portrayed in video shot 202.

Because transitions, fades, and the like often occur near scene cuts, a video frame centered between the endpoints of a video shot, such as frame 202.sub.50, may be more representative of the average content of the video shot than the video frames near the scene cuts. Alternatively, an index frame 202.sub.x between frames 202.sub.1 and 202.sub.100 may be chosen such that the difference between frame 202.sub.x and frame 202.sub.1 is approximately equal to the difference between frame 202.sub.x and frame 202.sub.100. Frame 202.sub.x will then represent a video frame that is "difference-centered" between the beginning and ends of video shot 200, rather than "time-centered" as frame 202.sub.50 is.

As will be appreciated by those skilled in the art, various differencing methods may be used to determine such differences. The basis of such a difference measurement, which is often utilized for block matching and frame differencing in motion estimation data compression techniques, is often a calculation known as the L1 Norm, which has the following form: ##EQU1##

where:

a.sub.ij is a pixel in the ith row and jth column of the first video frame;

b.sub.ij is a pixel in the ith row and jth column of the second video frame;

n is the number of rows in a video frame; and

m is the number of columns in a video frame.

It will be appreciated by those skilled in the art that the lower the difference indicated by the L1 Norm calculation, the more similar are the two video frames being compared. It will also be appreciated that calculations other than the L1 Norm may be utilized to perform difference measurements between two video frames. For example, the L2 Norm has the following form: ##EQU2##

It will also be understood that a very large difference between two consecutive video frames, i.e. a difference above a predetermined threshold, indicates a discontinuity or very abrupt change between the two video frames being compared. Thus, as those skilled in the art will appreciate, the L1 or L2 Norms may also be used to implement a scene cut detection technique as described above.

In the above-described preferred embodiment of the present invention, only one video frame is selected as an index frame to represent each video shot in the video database index. However, those skilled in the art will appreciate that more sophisticated techniques may be used to select index frames that better represent the contents of a video database. For example, more than one index frame per shot may be utilized to more accurately represent various scenes occurring within a single video shot. If a new object enters the field of view during a shot, or if the shot includes a pan sequence which brings completely new scenery or objects into the field of view, the video shot will contain a plurality of video sequences, each of which is significantly different than previous video sequences within the video shot. In this case, using a single index frame to represent the entire video shot will not be as representative of the multiple scenes as would be several index frames, each corresponding to one of the different video sequences.

Further, if the video shot contains fades and similar special effects, it may be desirable to detect and ignore such fades since they do not add new features to the video shot that a user would wish to locate, although they may mathematically differ greatly from other video frames in the same video shot and otherwise tend to cause an index frame to be extracted. For example, a single video shot of a relatively static image or scene may fade from black at the beginning of the shot and fade to black at the end of the shot. Where a single index frame (e.g. the time-centered or difference-centered video frame) will suffice to represent the entire video shot other than the faded portions, index frames do not need to be selected to represent the dark portions of the fades. Fades may be detected, as will be appreciated by those skilled in the art, by detecting overall contrast decreases or increases, while the underlying image features remain relatively static. Dissolves or cross-fades may also be detected with similar techniques. Fades and related effects may be detected and compensated for. For instance, any video frames occurring during a fade may be ignored for purposes of generating index frames, with an override for frame differences above a predetermined threshold. Alternatively, a certain number of video frames after and before scene cuts may be ignored for purposes of determining changing video sequences within the video shot, and for purposes of selecting index frames.

Referring now to FIG. 3, there is depicted a diagram of a video shot 302 with multiple video sequences having representative index frames in accordance with a further preferred embodiment of the present invention. These index frames are depicted as shaded frames in FIG. 3. FIG. 3 depicts video shot 302 defined between scene cuts 303 and 304. Video shot 302 itself contains several consecutive video sequences 305, 306, and 307, each of which depicts scenes which differ in some significant, determinable manner from the adjacent video sequences. Thus, in an alternative preferred embodiment of the present invention, multiple index frames may be selected to represent a single video shot such as video shot 302, where a single index frame cannot adequately represent the entire video shot because of many different scenes therein.

For example, once a scene cut 303 is detected, thereby marking the beginning of a new video shot (which ends at scene cut 304), pixel processor 106 may select a first index frame 305.sub.a after the beginning of video shot 302 to represent the first video sequence 305. In this embodiment, no video frames will be considered for a small, predetermined number of video frames from the beginning of video shot 302, to ignore transitions. For example, the first 90 frames may be ignored in case a noisy transition, fade, or cross fade is occurring between video shots, or because of scene cut 303. Alternatively, fades and other transitions may be detected and video frames occurring during the fade ignored for purposes of determining index frames.

After taking such fades and related effects into account that might occur at the beginning of a video shot, pixel processor 106 chooses a first reference frame 305.sub.b near the beginning of the video shot, and compares this first reference frame 305.sub.b to each subsequent video frame until a video frame 306.sub.b is reached wherein the difference between these two reference frames exceeds a predetermined threshold. For example, where pixels vary from 0 to 255, an average change of 4 units for video of 500.times.200 pixels (100,000 pixels), the L1 Norm would produce a difference of about 4*100,000=400,000 between the two reference frames. Video frame 305.sub.c immediately preceding video frame 306B may be considered to be the second reference frame, because video frame 306.sub.b will be the first reference frame for the next video sequence 306. Threshold point 320 can therefore be defined as occurring after second reference frame 305c as illustrated in FIG. 3. Threshold point 320 defines v