|
|
|
| United States Patent | 5903892 |
| Link to this page | http://www.wikipatents.com/5903892.html |
| Inventor(s) | Hoffert; Eric M. (San Francisco, CA);
Cremin; Karl (Mountain View, CA);
Ali; Adnan (London, CA);
Smoot; Stephen R. (San Francisco, CA) |
| Abstract | A method and apparatus for searching for multimedia files in a distributed
database and for displaying results of the search based on the context and
content of the multimedia files. |
|
|
|
Title Information  |
|
|
|
|
|
Drawing from US Patent 5903892 |
|
|
Indexing of media content on a network |
|
|
|
|
|
| Publication Date |
May 11, 1999 |
|
|
|
|
|
| Filing Date |
April 30, 1997 |
|
|
|
|
|
|
|
|
|
|
|
| Parent Case |
BACKGROUND OF THE INVENTION
Related Applications
This application claims benefit of the following co-pending U.S.
Provisional Applications:
1) Method and Apparatus for Processing Context and Content of Multimedia
Files When Creating Searchable Indices of Multimedia Content on Large,
Distributed Networks; Ser. No.: 60/018,312; Filed: May 24, 1996, now
abandoned;
2) Method and Apparatus for Display of Results of a Search Queries for
Multimedia Files; Ser. No.: 60/018,311; Filed: May 24, 1996, now
abandoned;
3) Method for Increasing Overall Performance of Obtaining Search Results
When Searching on a Large, Distributed Database By Prioritizing Database
Segments to be Searched; Ser. No.: 60/018,238, Filed: May 24, 1996, now
abandoned;
4) Method for Processing Audio Files to Compute Estimates of Music-Speech
Content and Volume Levels to Enable Enhanced Searching of Multimedia
Databases; Ser. No.: 60/021,452; Filed: Jul. 10, 1996, now abandoned;
5) Method for Searching for Copyrighted Works on Large, Distributed
Networks; Ser. No.: 60/021,515; Filed: Jul. 10, 1996, now abandoned;
6) Method for Processing Video Files to Compute Estimates of Motion
Content, Brightness, Contrast and Color to Enable Enhanced Searching of
Multimedia Databases; Ser. No.: 60/021,517; Filed: Jul. 10, 1996, now
abandoned;
7) Method and Apparatus for Displaying Results of Search Queries for
Multimedia Files; Ser. No.: 60/021,466; Filed: Jul. 10, 1996, now
abandoned;
8) A Method for Indexing Stored Streaming Multimedia Content When Creating
Searchable Indices of Multimedia Content on Large, Distributed Networks;
Ser. No.: 60/023,634; Filed: Aug. 19, 1996, now abandoned;
9) An Algorithm for Exploiting Lexical Proximity When Performing Searches
of Multimedia Content on Large, Distributed Networks; Ser. No.:
60/023,633; Filed: Aug. 9, 1996, now abandoned;
10) A Method for Synthesizing Descriptive Summaries of Media Content When
Creating Searchable Indices of Multimedia Content on Large, Distributed
Networks; Ser. No.: 60/023,836; Filed: Aug. 12, 1996, now abandoned. |
|
|
|
|
|
|
|
|
|
|
|
|
|
Title Information  |
|
|
Description  |
|
|
FIELD OF THE INVENTION
The present invention relates to the field of networking, specifically to
the field of searching for and retrieval of information on a network.
Description of the Related Art
Wouldn't it be nice to be able to log onto your local internet service
provider, access the worldwide web, and search for some simple
information, like "Please find me action movies with John Wayne which are
in color?" or "Please find me audio files of Madonna talking?", or "I
would like black and white photos of the Kennedy assassination". Or, how
about even "Please find me an action movie starring Michael Douglas and
show me a preview of portions of the movie where he is speaking loudly".
Perhaps, instead of searching the entire worldwide web, a company may want
to implement this searching capability on its intranet.
Unfortunately, text based search algorithms cannot answer such queries.
Yet, text based search tools are the predominate search tools available on
the internet today. Even if text based search algorithms are enhanced to
examine files for file type and, therefore, be able to detect whether a
file is a audio, video or other multimedia file, little if any information
is available about the content of the file beyond its file type.
Still further, what if the search returns a number of files. Which one is
right? Can the user tell from looking at the title of the document or some
brief text contained in the document as is done by many present day search
engines? In the case of relatively small text files, downloading one or
two or three "wrong" files, when searching for the right file, is not a
major problem. However, when downloading relatively large multimedia
files, it may be problematic to download the files without having a degree
of assurance that the correct file has been found.
SUMMARY OF THE INVENTION
It is desireable to provide a search engine which is capable of searching
the internet, or other large distributed network for multimedia
information. It is also desirable that the search engine provide for
analysis of the content of files found in the search and for display of
previews of the information.
BRIEF DESCRIPTION OF THE DRAWINGS
FIG. 1 illustrates an overall diagram of a media search and retrieval
system as may implement the present inventions.
FIGS. 2A-C illustrates a flow diagram of a method of media crawling and
indexing as may utilize the present inventions.
FIG. 3A illustrates an overall diagram showing analysis of digital audio
files.
FIGS. 3B, 3C and 3D illustrates waveforms.
FIG. 3E-H illustrate a flow diagram of a method of analyzing content of
digital audio files.
FIG. 4A illustrates a user interface showing search results.
FIG. 4B illustrates components of a preview.
FIG. 4C-4E illustrate a flow diagram of a method of providing for previews.
For ease of reference, it might be pointed out that reference numerals in
all of the accompanying drawings typically are in the form "drawing
number" followed by two digits, xx; for example, reference numerals on
FIG. 1 may be numbered 1xx; on FIG. 3, reference numerals may be numbered
3xx. In certain cases, a reference numeral may be introduced on one
drawing and the same reference numeral may be utilized on other drawings
to refer to the same item.
DETAILED DESCRIPTION OF THE EMBODIMENTS
What is described herein is a method and apparatus for searching for,
indexing and retrieving information in a large, distributed network.
1.0 Overview
FIG. 1 provides an overview of a system implementing various aspects of the
present invention. As was stated above, it is desirable to be provide a
system which will allow searching of media files on a distributed network
such as the internet or, alternatively, on intranets. It would be
desirable if such a system were capable of crawling the network, indexing
media files, examining and analyzing the media file's content, and
presenting summaries to users of the system of the content of the media
files to assist the user in selection of a desired media file.
The embodiment described herein may be broken down into 3 key components:
(1) crawling and indexing of the network to discover multimedia files and
to index them 100; (2) examining the media files for content (101-105);
and (3) building previews which allow a user to easily identify media
objects of interest 106. Each of these phases of the embodiment provide,
as will be appreciated, for unique methods and apparatus for allowing
advanced media queries.
2.0 Media Crawling and Indexina
FIGS. 2A-2C provides a description of a method for crawling and indexing a
network to identify and index media files. Hypertext markup language
(HTML) in the network is crawled to locate media files, block 201. Lexical
information (i.e., textual descriptions) is located describing the media
files, block 202 and a media index is generated, block 203. The media
index is then weighted, block 204 and data is stored for each media
object, block 205. Each of these steps will be described in greater detail
below.
2.1 Crawl HTML to locate media files
The method of the described embodiment for crawling HTML to locate media
files is illustrated in greater detail by FIG. 2B. Generally, a process as
used by the present invention may be described as follows:
The crawler starts with a seed of multimedia specific URL sites to begin
its search. Each seed site is handled by a separate thread for use in a
multithreaded environment. Each thread parses HTML pages (using a
tokenizer with lexical analysis) and follows outgoing links from a page to
search for new references to media files. Outgoing links from an HTML page
are either absolute or relative references. Relative references are
concatenated with the base URL to generate an absolute pathname. Each new
page which is parsed is searched for media file references. When a new
site is found by the crawler, there is a check against the internal
database to ensure that the site has not already been visited (within a
small period of time); this guarantees that the crawler only indexes
unique sites within its database, and does not index the same site
repeatably. A hash table scheme is used to guarantee that only unique new
URLs are added to the database. The URL of a link is mapped into a single
bit in a storage area which can contain up to approximately ten million
URLs. If any URL link which is found hashes to the same bit position, then
the URL is not added to the list of URLs for processing. As the crawler
crawls the web, those pages which contain media references receive a
higher priority for processing than those pages which do not reference
media. As a result, pages linked to media specific pages will be visited
by the crawler first in an attempt to index media related pages more
quickly than through conventional crawler techniques.
When entering a new site, the crawler scans for a robot exclusion protocol
file. If the file is present, it indicates those directories which should
not be scanned for information. The crawler will not index material which
is disallowed by the optional robot exclusion file. On a per directory
basis, there is proposed to be stored a media description file (termed for
purposes of this application the mediaX file). The general format of this
file for the described embodiment is provided at the end of the
Specification. This file contains a series of records of textual
information for each media file within the current directory. As will be
discussed in greater detail below, the crawler scans for the media
description file in each directory at a web site, and adds the text based
information stored there into the index being created by the crawler. The
mediaX file allows for storage of information such as additional keywords,
abstract and classification data. Since the mediaX file is stored directly
within the directory where the media file resides, it ensures an implicit
authentication process whereby the content provider can enhance the
searchable aspects of the multimedia information and can do so in a secure
manner.
The crawler can be constrained to operate completely within a single parent
URL. In this case, the user inputs a single URL corresponding to a single
web site. The crawler will then only follow outgoing links which are
relative to the base URL for the site. All absolute links will not be
followed. By following only those links which are relative to the base
URL, only those web pages which are within a single web site will be
visited, resulting in a search and indexing pass of a single web site.
This allows for the crawling and indexing of a single media-rich web site.
Once a single web site has had an index created, then users may submit
queries to find content located only at the web site of interest. This
scheme will work for what is commonly referred to as "Intranet" sites,
where a media-rich web site is located behind a corporate firewall, or for
commercial web sites containing large multimedia datasets.
2.1.1 Scan page for predetermined HTML tag types
Each HTML page is scanned for predetermined types of HTML tags, block 211.
In this embodiment, the following tags are scanned for:
tables (single row and multi-row)
lists (ordered and unordered)
headings
java script
client side image maps
server side image maps
header separators
2.1.2 Determine if there is a media URL
If there is a media uniform resource locator (URL), block 212. If there is
a media URL, then the media URL is located and stored. However, in the
described embodiment, certain media URL's may be excluded. For example, an
embodiment may choose not to index URLs having certain keywords in the
URL, certain prefixes, certain suffixes or particular selected URLs.
2.1.3 Locating relevant lexical information
Next, relevant lexical information (text) is selected for each URL. Often a
web page which references a media file provides significant description of
the media file as textual information on the web page. When indexing a
media file, the present invention has recognized that it would be useful
to utilize this textual information. However, certain web pages may
reference only a single media file, while other web pages may reference a
plurality of media files. In addition, certain lexical information on the
web page may be more relevant than other information to categorizing the
media for later searching.
It has been observed that relevant textual information may be directly
surrounding the media reference on a web page, or it may be far from the
media reference. However, it has been found that more often than not, the
relevant text is very close (in lexical distance) to the media reference.
Therefore, the following general rules are applied when associating
lexical information with a media file:
1) if the media file reference is found within a table, store the text
within the table element as associated with the media file;
2) if the media file reference is found within a list, store the text
within the list element as associated with the media file;
3) store the text in the heading as associated with the media file. In
addition, in some embodiments, the text within higher level headings may
also be stored.
4) if there is javascript, store the text associated with the javascript
tag;
5) for client and server side image maps, if there is no relevant text,
store only the URL. In addition, the image maps may be parsed to obtain
all unique URLs and these may also be stored..
In some embodiments, a special tag may be stored within the indexed text
where the media reference occurs in the web page. When queries are posed
to the full-text database of the stored HTML pages which reference media,
the distance of the keyword text from the media reference tag can be used
to determine if there is a relevant match. The standard distance from
media reference to matching keyword utilized is ten words in each
direction outwards from the media reference. The word distance metric is
called "lexical proximity". For standard web pages where text surrounding
media is generally relevant this is an appropriate value.
If the results of a search using lexical proximity are not satisfactory to
a user, the user needs a mechanism by which to broaden or narrow the
search, based on the relevance which is found by the default lexical
proximity. Users can employ an expand and narrow search button to change
the default lexical proximity. The expand function will produce more and
more search results for a given query, as the lexical proximity value is
increased. A typical expand function will increase the lexical proximity
value by a factor of two each time it is selected. When the expand
function is used, more text will be examined which is located near the
media reference to see if there is a keyword match. Expanding the search
repeatedly will decrease precision and increase recall.
The narrow search button will do the reverse, by decreasing the lexical
proximity value more and more. A typical narrow function will decrease the
lexical proximity value by a factor of two each time it is selected. The
narrow search button will reduce the number of search results, and hone in
on that text information which only surrounds the media reference
directly. Narrowing the search will increase precision and decrease
recall. The relevance of all resulting queries should be quite high, on
average, as a search is narrowed using this method.
When a database is limited in depth of entries, and is generated with a
fixed lexical proximity value, a search query may often produce a search
result list with zero hits. In order to increase the number of search
results for the case of zero hits with fixed lexical proximity, a method
is employed which will iterate on the lexical proximity value until a set
of ten search results are returned. The algorithm is as follows:
perform the search query
look at the number of returned hits
if the number of returned hits is less than ten, then
perform a new search with the lexical proximity value doubled
continue the above process until ten search results are returned
Users should be able to specify the usage of lexical proximity to enhance
the indexing of their search material. For example, if the web page author
knows that all words which are ten words in front of the media reference
are valid and relevant, then the author should specify a lexical proximity
value which is only negative ten (i.e., look only in the reverse direction
from the media URL by ten words). If the web page author knows that all
words which are ten words after the media reference are valid and
relevant, then the author should specify a lexical proximity value which
is only positive ten. Finally, if the web author knows that both ten words
ahead, and ten words behind the media reference are relevant, then the
lexical proximity value should be set to positive/negative ten. Similarly,
if the web author knows that the entire page contains relevant text for a
single media file, then the lexical proximity value should be set to
include all text on a page as relevant.
In addition to the above-described processes for locating relevant lexical
information, in the described embodiment, certain information is generally
stored for all media URL's. In particular, the following information is
stored:
the name of the media file
URL of the media file
text string which is associated with the media file anchor reference
title of the HTML document containing the media file
keywords associated with the HTML document
URL for the HTML document containing the media file reference
keywords embedded in the media file
textual annotations in the media file
script dialogue, closed captioning and lyric data in the media file
auxiliary data in the media file (copyright, author, producer, etc.)
auxiliary data located within the media reference in the HTML document
auxiliary data located in an associated media description file
2.1.4 Streaming files
Media content of files may be stored as downloadable files or as streaming
files. Downloadable content is indexed by connecting to an HTTP server,
downloading the media file, and then analyzing the file for the purposes
of building a media rich index.
In the case of streaming, multimedia content, block 214, an HTTP server
stores, not the content itself, but instead a reference to the media file.
Therefore, the process of indexing such a file is not as straightforward
as for a downloadable file which is stored on the HTTP server and may be
downloaded from the server.
In the case of streaming media files certain information is gathered, block
215, as will be described with reference to FIG. 2C.
Below is described a method for indexing streaming files to index audio
content and to index video content:
download the media file reference corresponding to the correct streaming
media type
for each URL listed in the media file reference, perform the following
operation:
connect directly to the media file on the media server where it resides,
block 221
commence streaming of the media on the appropriate TCP socket, block 222
query the streaming media to obtain appropriate content attributes and
header data, block 223
add all relevant content attributes and header information into the media
rich index, block 224 (header information to be queried and indexed
includes title, author, copyright; in the case of a video media file,
additional information indexed may also include duration, video
resolution, frame rate, etc.)
determine if streaming text or synchronized multimedia information, is
included, block 225.
if it is, then stream the entire media clip, and index all text within the
synchronized media track of the media file
if possible, store the time code for each block of text which occurs with
the streaming media
This method can be applied to any streaming technology, including both
streaming sound and video. The media data which is indexed includes
information which is resident in the file header (i.e., title, author,
copyright), and which can be computed or analyzed based on information in
the media file (i.e., sound volume level, video color and brightness,
etc.).
The latter category of information includes content attributes which can be
computed while the media is streaming, or after the media has completed
streaming from a server. It should be noted that once the streaming media
has been queried and received results back from the server, the streaming
process can conclude as the indexing is complete.
2.2 Generate and weight a media index
As the network is crawled, a media index is generated by storing the
information which has been discussed above in an index format. The media
index is weighted to provide for increased accuracy in the searching
capabilities. In the described embodiment, the weighing scheme is applied
factoring a weight factor for each of the following text items:
______________________________________
ITEM WEIGHTING FACTOR
______________________________________
URL of the media file 10
Keywords embedded in the media file
10
Textual annotations in the media file
10
script dialogue, lyrics, and closed
10
captioning in the media file
Text strings associated with the media file
9
anchor reference
Text surrounding the media file reference
7
Title of the HTML document containing
6
the media file
Keywords and meta-tags associated with
6
the HTML document
URL for the HTML document containing the
5
media file reference
______________________________________
In other embodiments, alternative weighting factors may be utilized without
departure from the present invention.
2.3 Store data for each media object
Finally, data is stored for each media object. In the described embodiment,
the following data is stored:
Relevant text
HTML document title
HTML meta tags
Media specific text (e.g., closed captioning, annotations, etc.)
Media URL
Anchor text
Content previews (discussed below)
Content attributes (such as brightness, color or B/W, contrast, speech v.
music and volume level. In addition, sampling rate, frame rate, number of
tracks, data rate, size may be stored).
Of course, in alternative embodiments a subset or superset of these fields
may be used.
3.0 Content analysis
As was briefly mentioned above, it is desirable to not only search the
lexical content surrounding a media file, but also to search the content
of the media file itself in order to provide a more meaningful database of
information to search.
As was shown in FIG. 1, the present invention is generally concerned with
indexing two types of media files (i) audio 102 and (ii) video 103.
3.1 Video Content
The present invention discloses an algorithm used to predict the likelihood
that a given video file contains a low, medium or high degree of motion.
In the described embodiment, the likelihood is computed as a single scalar
value, which maps into one of N buckets of classification. The value
associated with the motion likelihood is called the "motion" metric. A
method for determining and classifying the brightness, contrast and color
of the same video signal is also described. The combination of the motion
metric along with brightness, contrast and color estimates enhance the
ability of users to locate a specific piece of digital video.
Once a motion estimate and brightness, contrast and color estimate exist
for all video files located in an index of multimedia content, it is
possible for users to execute search queries such as:
"find me all action packed videos"
"find me all dramas and talk shows"
If the digital video information is indexed in a database together with
auxiliary text-based information, then it is possible to execute queries
such as:
"find me all action packed videos of James Bond from 1967"
"find me all talk shows with Bill Clinton and Larry King from 1993"
Combining motion with other associated video file parameters, users can
execute queries such as:
"find me all slow moving, black and white movies made by Martin Scorcese"
"find me all dark action movies filmed in Zimbabwe"
The described method for estimating motion content and brightness, contrast
and color can be used together with the described algorithm for searching
the worldwide Internet in order to index and intelligently tag digital
multimedia content. The described method allows for powerful searching
based on information signals stored inside the content within very large
multimedia databases. Once an index of multimedia information exists which
includes a motion metric and brightness, contrast and color estimate,
users can perform field based sorting of multimedia databases. For
example, a user could execute the query: find me all video, from slow
moving to fast, by Steven Spielberg, and the database engine would return
a list of search results, ordered from slowest to fastest within the
requested motion range. In addition, if the digital video file is
associated with a digital audio sequence, then an analysis of the digital
audio can occur. An analysis of digital audio could determine if the audio
is either music or speech. It can also determine if the speaker is male or
female, and other information. This type of information could then be used
to allow a user query such as:
"find me all fast video clips which contain loud music";
"find me all action packed movies starring Sylvester Stallone and show me a
preview of a portion of the movie where Stallone is talking".
This type of powerful searching of content will become increasingly
important, as vast quantities of multimedia information become digitized
and moved onto digital networks which are accessible to large numbers of
consumer and business users.
The described method, in its preferred embodiment, is relatively fast to
compute. Historically, most systems for analyzing video signals have
operated in the frequency domain. Frequency domain processing, although
potentially more accurate than image based analysis, has the disadvantage
of being compute intensive, making it difficult to scan and index a
network for multimedia information in a rapid manner.
The described approach of low-cost computation applied to an analysis of
motion and brightness, contrast and color has been found to be useful for
rapid indexing of large quantities of digital video information when
building searchable multimedia databases. Coupled with low-cost
computation is the fact that most video files on large distributed
networks (such as the Internet) are generally of limited duration. Hence
the algorithms described herein can typically be applied to short duration
video files in such a way that they can be represented as a single scalar
value. This simplifies presentation to the user.
In addition to the image space method described here, an algorithm is
presented which works on digital video (such as MPEG) which has already
been transformed into a frequency domain representation. In this case, the
processing can be done solely by analyzing the frequency domain and motion
vector data, without needing to perform the computation moving the images
into frequency space.
3.1.1 Degree of Motion Algorithm Details (Image Space)
In order to determine if a given video file contains low, medium or high
amounts of motion, it is disclosed to derive a single valued scalar which
represents the video data file to a reasonable degree of accuracy. The
scalar value, called the motion metric, is an estimate of the type of
content found in the video file. The method described here is appropriate
for those video files which may be in a variety of different coding
formats (such as Vector Quantization, Block Truncation Coding, Intraframe
DCT coded), and need to be analyzed in a uniform uncompressed
representation. In fact, it is disclosed to decode the video into a
uniform representation, since it may be coded in either an intraframe or
an interframe coded format. If the video has been coded as intraframe,
then the method described here is a scheme for determing the average frame
difference for a pixel in a sequence of video. Likewise, for interframe
coded sequences, the same metric is determined. This is desirable, even
though the interframe coded video has some information about frame to
frame differences. The reason that the interframe coded video is
uncompressed and then analyzed, is that different coding schemes produce
different types of interframe patterns which may be non uniform. The
disclosed invention is based on three discoveries:
time periods can be compressed into buckets which average visual change
activity
the averaged rate of change of image activity gives an indication of
overall change
an indication of overall change rate is correlated with types of video
content
The indication of overall change has been found to be highly correlated
with the type of video information stored in an video file. It has been
found through empirical examination that
slow moving video is typically comprised of small frame differences
moderate motion video is typically comprised of medium frame differences
fast moving video is typically comprised of large frame differences and
that,
video content such as talking heads and talk shows are comprised of slow
moving video
video content such as newscasts and commercials are comprised of moderate
speed video
video content such as sports and action films are comprised of fast moving
video
The disclosed method operates generally by accessing a multimedia file and
evaluating the video data to determine the visual change activity and by
algorithm to compute the motion metric operates as follows:
A. Motion Estimator
if the number of samples N exceeds a threshold T, then repeat the Motion
Estimator algorithm below for a set of time periods P=N/T. The value Z
computed for each period P is then listed in a table of values.
as an optional preprocessing step, employ an adaptive noise reduction
algorithm to remove noise. Apply either a flat field (mean), or stray
pixel (median) filter to reduce mild and severe noise respectively.
if the video file contains RGB samples, then run the algorithm and average
the results into a single scalar value to represent the entire sequence
B. Motion Estimator
determine a fixed sampling grid in time consisting of X video frames
if video samples are compressed, then decompress the samples
decompress all video samples into a uniform decoded representation
adjust RGB for contrast (low/med/high)
compute the RGB frame differences for each frame X with its nearest
neighbor
sum up all RGB frame differences for each pixel in each frame X
compute the average RGB frame difference for each pixel for each frame X
sum and then average RGB frame differences for all pixels in all frames in
a sequence.
the resulting value is the motion metric Z. The motion metric Z is
normalized by taking Z-NORMAL=Z*(REF-VAL/MAX-DIFFERENCE) where
MAX-DIFFERENCE is the maximum difference for all frames.
map the value Z into one of five categories
low degree of motion
moderate degree of motion
high degree of motion
very high degree of motion
Using a typical RGB range of 0-255, the categories for the scalar Z map to:
0-20, motion content, low
20-40, motion content, moderate
40-60, motion content, high
60 and above, motion content, very high
A specific example, using actual values, is as follows:
number of video frames X=1000
sample size is 8 bits per pixel, 24 bits for RGB
average frame difference per frame is 15
the sequence is characterized as low motion
Note that when the number of video frames exceeds the threshold T, then the
percentage of each type of motion metric category is displayed. For
example, for a video sequence which is one hour long, which may consist of
different periods of low, moderate and high motion, the resulting
characterization of the video file would appear as follows:
40%, motion content low
10%, motion content moderate
50%, motion content high
Once the degree of motion has been computed, it is stored in the index of a
multimedia database. This facilitates user queries and searches based on
the degree of motion for a sequence, including the ability to provide
field based sorting of video clips based on motion estimates.
3.1.2 Degree of Motion Algorithm Details (Frequency Domain)
The method described above is appropriate for those video files which may
be in a variety of different coding formats (such as Vector Quantization,
Block Truncation Coding, Intraframe DCT coded), and need to be analyzed in
a uniform uncompressed representation. The coded representation is decoded
and then an analysis is applied in the image space domain on the
uncompressed pixel samples. However, some coding formats (such as MPEG)
already exist in the frequency domain and can provide useful information
regarding motion, without a need to decode the digital video sequence and
perform frame differencing averages. In the case of a coding scheme such
as MPEG, the data in its native form already contains estimates of motion
implicitly (indeed, the representation itself is called motion
estimation). The method described here uses the motion estimation data to
derive an estimate of motion for a full sequence of video in a
computationally efficient manner.
In order to determine if a given video file contains low, medium or high
amounts of motion, it is necessary to derive a single valued scalar which
represents the video data file to a reasonable degree of accuracy. The
scalar value, called the motion metric, is an estimate of the type of
content found in the video file. The idea, when applied to MPEG coded
sequences, is based on four key principles:
the MPEG coded data contains both motion vectors and motion vector lengths
the number of non-zero motion vectors is a measure of how many image blocks
are moving
the length of motion vectors is a measure of how far image blocks are
moving
averaging the number and length of motion vectors per frame indicates
degrees of motion
The indication of overall motion has been found to be correlated with the
type of video information stored in an video file. It has been found
through empirical examination that
slow moving video is comp | | |