WikiPatents - Community Patent Review
Create Free Account  |  License or Sell Your Patent  |  WikiPatents Marketplace  |  WikiPatents Blog
Username:  Password:  
    
Advanced Search
Generation, archiving, and retrieval of digital images with evoked suggestion-set captions and natural language interface    
United States Patent5493677   
Link to this pagehttp://www.wikipatents.com/5493677.html
Inventor(s)Balogh; Aristotle (Bowie, MD); Blejer; Hatte (Alexandria, VA); Chen; Eugene (Arlington, VA); Flank; Sharon (Washington, DC); Iannacone; Carmen (Fairfax, VA); Maloney; John (Upper Marlboro, MD); Martin; Patrick (Arlington, VA); Rothey; James (Fairfax, VA); Schmid; Gary (Arlington, VA); Dozier; Linda T. (Goleta, CA); Lorton; Michael (San Francisco, CA)
AbstractDigitized images are associated with English language captions and other data, collectively known as the metadata associated with the images. A natural language processing database removes ambiguities from the metadata, and the images and the metadata are stored in databases. A user formulates a search query, and natural language processing is used to determine matches between the query and the stored metadata. Images corresponding to the matches are then viewed, and desired images are selected for licensing. The license terms for selected images are displayed, and a subset of the selected images are ordered as desired by the user.
   














 Title Information Submit all comments and votes
 
Patent Text Patent PDF Print Page Summary File History
Plain text PDF images Print Summary File History
Drawing from US Patent 5493677
Generation, archiving, and retrieval of digital images with evoked

     suggestion-set captions and natural language interface - US Patent 5493677 Drawing
Generation, archiving, and retrieval of digital images with evoked suggestion-set captions and natural language interface
Inventor     Balogh; Aristotle (Bowie, MD); Blejer; Hatte (Alexandria, VA); Chen; Eugene (Arlington, VA); Flank; Sharon (Washington, DC); Iannacone; Carmen (Fairfax, VA); Maloney; John (Upper Marlboro, MD); Martin; Patrick (Arlington, VA); Rothey; James (Fairfax, VA); Schmid; Gary (Arlington, VA); Dozier; Linda T. (Goleta, CA); Lorton; Michael (San Francisco, CA)
Owner/Assignee     Systems Research & Applications Corporation (Arlington, VA)
Patent assignment
All assignments
Publication Date     February 20, 1996
Application Number     08/255,379
PAIR File History     Application Data   Transaction History
Image File Wrapper   Patent Term   Fees
Litigation
Filing Date     June 8, 1994
US Classification     707/104.1
Int'l Classification     G06F 017/30
Examiner     Black; Thomas G.
Assistant Examiner     Lintz; Paul R.
Attorney/Law Firm     Meyer; Stuart P.
Address
Parent Case    
Priority Data    
USPTO Field of Search     395/600 395/147 364/419.01 364/419.02 364/419.07 364/419.08
Patent Tags     generation, archiving, retrieval digital images evoked suggestion-set captions natural language interface
   
Enter a comma (,) or semicolon (;) between multiple tag words/phrases.
Describe this patent:
 Amusing   
 Clever   
 Complex   
 Efficient   
 Historic   
 Important   
 Innovative   
 Interesting   
 Practical   
 Simple   
[no votes]
Patent WIKI

Share information and news about this patent, including information and news about the technology, inventors, company, ligation and licensing.

 References Submit all comments and votes
 
*references marked with an asterisk below are user-added references
 U.S. References
 
Add a new US reference:  
ReferenceRelevancyCommentsReferenceRelevancyComments
5386556
Hedin
707/4
Jan,1995

[0 after 0 votes]
5265014
Haddock
704/9
Nov,1993

[0 after 0 votes]
5263159
Mitsui
707/5
Nov,1993

[0 after 0 votes]
5237503
Bedecarrax
704/10
Aug,1993

[0 after 0 votes]
5197005
Shwartz
707/2
Mar,1993

[0 after 0 votes]
5109439
Froessl
382/305
Apr,1992

[0 after 0 votes]
4849898
Adi
707/5
Jul,1989

[0 after 0 votes]
4833610
Zamora
707/5
May,1989

[0 after 0 votes]
4829423
Tennant
704/8
May,1989

[0 after 0 votes]
4695975
Bedrij
715/500.1
Sep,1987

[0 after 0 votes]
 Foreign References
 Other References
 Market Review Submit all comments and votes
   
Market Size
Estimate the gross annual revenues of the relevant market sector:
> $10B
$5B - $10B
$2B - $5B
$500M - $2B
$100M - $500M
$10M - $100M
$1M - $10M
$500K - $1M
$100K - $500K
< $100K
[No votes]
$0
 
$0   $2.5B   $5B   $7.5B   $10B
Market Share
Estimate the percentage of the relevant market sector this invention will capture:
75% - 100%
50% - 74.99%
25% - 49.99%
10 - 24.99%
5 - 9.99%
2 - 4.99%
1 - 1.99%
< 1%
[No votes]
0.0%
 
0%   25%   50%   75%   100%
Reasonable Royalty
What percentage of gross sales should the inventor or assignee be paid?
75% - 100%
50% - 74.99%
25% - 49.99%
10 - 24.99%
5 - 9.99%
2 - 4.99%
1 - 1.99%
< 1%
[No votes]
0.0%
 
0%   25%   50%   75%   100%
Public's "Guesstimation" of Royalty Value
Market SizeN/A[No votes]
xMarket ShareN/A[No votes]
xReasonable RoyaltyN/A[No votes]

N/A

License Availablity
If you are NOT the owner or assignee, answer here:
Yes, license is available for purchase

No, license is not currently available



[No votes]
License Availablity
If you ARE the owner or assignee, answer here:
Yes, license is available for purchase

No, license is not currently available



[No votes]
Competitive Advantage
Does this invention have a significant competitive advantage over similar technologies?
Yes

No



[No votes]
Most helpful competitive advantage comment
[No comments]

Commercial Alternatives
Are there viable commercial alternatives for this invention?
Yes

No



[No votes]
Most helpful commercial alternative comment
[No comments]

 Technical Review Submit all comments and votes
 Claims Submit all comments and votes
 


We claim:

1. A system for archiving and retrieving images, the system comprising:

an ingestion center including (i) a data entry device for accepting as input an image and metadata, the metadata including bibliographic data associated with the image, a caption associated with the image and a set of suggestions evoked by the image; (ii) a natural language processing database including a plurality of terms; and (iii) a disambiguation processor operatively connected to the data entry device and to the natural language processing database, adapted to permit user selection of characteristics of portions of the metadata responsive to the plurality of terms;

an image center including (i) an upload processor operatively connected to the ingestion center and adapted to receive as input the image and the metadata; (ii) a database operatively connected to the upload processor for storing the image and metadata with other images and other metadata; and (iii) a browser operatively connected to the database for viewing a selective subset of the image and the other images responsive to correspondence of a query request with the metadata and other metadata.

2. A system as in claim 1, wherein the image center further includes a client workstation operatively connected to the browser and adapted to allow a user to enter the query request.

3. A system as in claim 1, wherein the image center further includes a purchase processor operatively connected to the database and adapted to accept a request to purchase a selected one of the image and other images.

4. A system as in claim 1, wherein the image center further includes a delivery processor operatively connected to the database and adapted to accept a request to deliver a selected one of the image and other images.

5. A system as in claim 1, wherein the ingestion center further includes a watermarking processor adapted to modify the image applied to the data entry processor to allow display of the image in a first manner and to prevent display of the image in a second manner.

6. A system as in claim 1, wherein the browser is adapted to accept as input an exemplar query request for a new subset of images corresponding to an identified one of the selected subset of images.

7. A computer-implemented process for archiving and retrieving images, the process comprising:

a) associating metadata with an image, the metadata including bibliographic data associated with the image, a caption associated with the image, and a set of suggestions evoked by the image;

b) removing ambiguities from the metadata;

c) storing the image and the metadata in a database with other images and other metadata;

d) selecting a subset of the image and the other images responsive to correspondence of a query request with the metadata and other metadata.

8. A computer-implemented process as set forth in claim 7, wherein the removing ambiguities includes determining, for a portion of the caption having a plurality of senses, which one of the senses corresponds to the portion of the caption.

9. A computer-implemented process as in claim 8, further comprising defining, responsive to lack of correspondence between any one of the senses and the portion of the caption, a new sense corresponding to the portion of the caption.

10. A computer-implemented process as in claim 7, further comprising selecting, subsequent to (d), a new subset of images responsive to an exemplar query request for images corresponding to an identified one of the selected subset.

11. A computer-implemented process as set forth in claim 7, wherein removing ambiguities involves highlighting portions of the metadata that are recognized as having multiple senses, providing a list of possible senses for the portion, and allowing user input of a new sense for the portion.

12. A computer-implemented process as set forth in claim 7, wherein removing ambiguities includes grouping portions of the metadata into multiword phrases responsive to user selection of the portions.

13. A computer-implemented process as set forth in claim 7, wherein selecting includes comparing a first order of components of the search request with a second order of portions of metadata.

14. A computer-implemented process as set forth in claim 7, further comprising requesting delivery, after (d), of one of the subset of images.
 Description Submit all comments and votes
 


FIELD OF THE INVENTION

The present invention relates generally to image processing and specifically to archiving and retrieving of images, such as digitized photographs, works of art, or graphic illustrations, by use of a natural language such as English.

DESCRIPTION OF RELATED ART

Numerous schemes have been used in the past for archiving images and selecting images for retrieval from such archives. Before computers became widely available, simple index cards were often used to keep track of stock photographs, and personnel within photo agencies often relied on their own experience to retrieve photographs that corresponded to a potential customer's request.

Such methods of archiving and retrieving stock photographs provided imperfect results, and were difficult, time-consuming and expensive to implement. As image libraries grew, the shortcomings of conventional archiving and retrieval techniques became yet more pronounced.

The advent of photo Compact Disc ("CD") technology allowed certain advances to be made in this field. With CDs, a customer may purchase rights to use a large number of pictures that may be stored on a single disc and selectively browsed using a CD-ROM drive. However, the number of images available on a CD is still somewhat limited, and most CD-based photo portfolios require a relatively large up-front payment for all of the images on the CD, regardless of how many the user may be interested in. Finally, image quality on CD-based photo portfolios is not always production quality.

Some on-line systems have recently become available that include photo CD technology, such as the KODAK PICTURE EXCHANGE ("KPX") and the COMSTOCK BULLETIN BOARD SERVICE. Such services typically include relatively large libraries of images, and permit conventional keyword search techniques. However, none of the known systems provide an easy to use, natural language search capability, nor do they allow for automating the process of pricing, ordering, and delivering selected images.

It would be desirable to allow users to select images from a library based on conceptual characteristics of such images, to obtain immediate pricing information regarding selected images, and to order and obtain production-quality versions of such images directly.

DISCLOSURE OF INVENTION

In accordance with the present invention, images are archived and retrieved by associating metadata with an image, the metadata including bibliographic data, a caption, and a set of suggestions evoked by the image, removing ambiguities from the metadata, storing the image and metadata in a database with other images and metadata, and selecting certain images from the database that have metadata corresponding to a user's search request.

In one aspect of the invention, a natural language processing technique is used in connection with the selection of images based on the user's search request.

In another aspect of the invention, an image is watermarked so as to allow the image to be viewed for selection on a computer monitor, but not to be printed in a usable format or downloaded for digital publishing.

In yet another aspect of the invention, the user may order and obtain delivery of selected images directly over a computer connection.

In still another aspect of the invention, ambiguities in the metadata are removed by highlighting portions of the metadata that are recognized as having multiple senses, providing a list of possible senses for those portions, and allowing the user to select the appropriate sense.

In accordance with the present invention, apparatus (100) for image archive and retrieval includes an ingestion center (110), an image center (120), and user workstations (130-132).

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a physical block diagram of apparatus for pattern recognition in accordance with the present invention.

FIG. 2 is a functional block diagram of the ingestion center portion of the apparatus of FIG. 1, in accordance with the present invention.

FIG. 3 is a functional block diagram of the image center portion of the apparatus of FIG. 1, in accordance with the present invention.

FIG. 4 is a flow diagram illustrating disambiguation processing, in accordance with the present invention.

FIG. 5 is a flow diagram illustrating watermarking, in accordance with the present invention.

FIG. 6 illustrates an index card screen used in conjunction with the ingestion center portion of the apparatus of FIG. 1, in accordance with the present invention.

FIG. 7 illustrates an interpreter screen used in conjunction with the ingestion center portion of the apparatus of FIG. 1, in accordance with the present invention.

FIG. 8 is a flow diagram of upload processing, in accordance with the present invention.

FIG. 9 is a flow diagram of index server upload processing, in accordance with the present invention.

FIG. 10 is a flow diagram of search engine processing in accordance with the present invention.

FIG. 11 illustrates communications layers of the apparatus of FIG. 1, in accordance with the present invention.

FIG. 12 illustrates a match list screen used in conjunction with the image center portion of the apparatus of FIG. 1, in accordance with the present invention.

FIG. 13 illustrates an image browser screen used in conjunction with the image center portion of the apparatus of FIG. 1, in accordance with the present invention.

FIG. 14 illustrates an information screen used in conjunction with the image center portion of the apparatus of FIG. 1, in accordance with the present invention.

FIG. 15 illustrates a lightbox screen used in conjunction with the image center portion of the apparatus of FIG. 1, in accordance with the present invention.

FIG. 16 illustrates a pricing screen used in conjunction with the image center portion of the apparatus of FIG. 1, in accordance with the present invention.

BEST MODE FOR CARRYING OUT THE INVENTION

Referring now to FIG. 1, there is shown a system 100 for archiving and retrieving images in accordance with the present invention. Briefly, the system 100 is comprised of three major functional blocks. Ingestion center 110 is used to enter images and associated characterizing data (described below as "metadata") into system 100. Image center 120 is used to store the image data and metadata, and to process queries for images based on the metadata. User workstations 130-132 are used to allow remote users to enter queries for images, to view the images sent by image center 120, to obtain pricing information on such images, to order such images, and to obtain delivery of such images.

More specifically, ingestion center 110 includes a data entry processor 112, disambiguation processor 114, and natural language processing ("NLP") database 116. Image and basic bibliographic information provided by stock photography agents are converted into digital format if not provided in that form by the agents, and conventionally input to ingestion center 110 using data entry processor 112. Typically, the basic bibliographic information provided by the agents includes the artist's name, source, copyright holder, location, artist's project name or series, dates, release information, and any notes relating to the photograph supplied by the artist. The data entry processor 112 permits input of the image data and this textual data to 110, and also allows an operator known as a "captioner" to verify the quality of both the image data and the bibliographic data, to write a short caption, or description, of the salient features of the image, and to select certain attributes of the image. The caption may be a set of regular English language sentences, as opposed to merely a listing of unconnected keywords. The attributes may include, for example, the type of image (photograph, computer-generated graphic, video clip or other multimedia object, background pattern, portrait, abstract, aerial, or special effect), predominant hue, and image orientation (landscape or portrait). The captioner also provides as part of the metadata a "suggests" text field describing the emotional suggestions evoked by the image. If not already provided by the photo agency with the bibliographic data, the captioner may obtain and add to the bibliographic data information concerning the prices and other terms under which such image may be licensed. Collectively, the bibliographic data, the caption, the attributes and the suggests field are known as the "metadata" associated with the image.

A disambiguation processor 114 takes as input the metadata of the image and identifies for the captioner any portions of the metadata that are capable of multiple interpretations, based on information previously stored in NLP database 116. The captioner may then select which interpretation is desired, or may enter a new interpretation. The disambiguation processor 114 also serves to standardize the form of the caption, so that all captions use conjunction and disjunction in a standard way, and so that all captions are written in the same anaphoric reference style. Furthermore, if the style of captions is standardized, the location of information within a caption may even provide useful information. If, for example, the most important descriptive information is consistently placed in the first sentence of a caption, that information can be weighted more heavily in making decisions about the relative "closeness" of a query to the caption.

In a preferred embodiment, data entry processor 112, disambiguation processor 114, and NLP database 116 are implemented using conventional client/server computer systems, with client workstations being personal computers such as the APPLE MACINTOSH or IBM-compatible personal computers and servers being conventional computers such as the SUN SPARCSTATION.

Memory map B-trees are used to implement NLP database 116, as described in R. Sedgewick, ALGORITHMS IN C++, Reading, Mass.: Addison-Wesley (1992), the teachings of which are incorporated herein by reference. NLP database 116 contains information about words, their senses, and how those senses are linked together. Word senses are represented as unique numbers. An "expansions" portion of NLP database 116 represents each link as a database record incorporating (i) the word sense, represented by a unique number, (ii) a word sense to which the sense in (i) is linked, represented by another unique number, and (iii) the type of link between the senses in (i) and (ii), e.g., "synonym", "antonym", "a kind of", "a part of."

The primary components of image center 120 include upload processor 122, database processor 124, browse processor 126, and order processor 128. Upload processor 122 is used to take the image and metadata information from ingestion center 110 and store it in a form usable by database processor 124. Database processor 124 stores images and metadata for use by browse processor 126 and order processor 128, and also archives image data for long-term storage. In a preferred embodiment, database processor stores images in three separate resolutions for "thumbnail", quarter-sized, and full-sized display as desired, and uses VHS-type videotape as the medium for archiving full-resolution images. Browse processor 126 permits a user to see both image data and metadata for selected images, and order processor 128 permits the user to learn pricing information for such images, to order such images, and to obtain delivery of such images. In a preferred embodiment, processors 122-128 are implemented using conventional client/server architecture as described above in connection with the components of ingestion center 110.

User workstations 130-132 permit users of system 100 to access the image center 120 for the purpose of posting image queries, for reviewing the results of such queries, for selecting images to order, for completing ordering transactions, and for receiving print-quality images. For purposes of illustration, three user workstations 130-132 are shown but it should be recognized that any number of such workstations may be used in accordance with the invention. In a preferred embodiment, workstations 130-132 are implemented using conventional personal computers such as described above in connection with the client computers of ingestion center 110, programmed to provide the functionality described herein.

The components of system 100 illustrated in FIG. 1 are further described below in connection with the other figures.

Referring now also to FIG. 2, there is shown a functional block diagram of ingestion center 110. In operation, a digitized picture 250 and bibliographic data 252 (shown in simplified form in FIG. 2) are applied to an image/bibliographic input and quality control service 202. Service 202, implemented primarily by data entry processor 112, permits input of the image and agency-supplied bibliographic data into system 100. Service 202 also displays the image and data so that a captioner may perform quality control to ensure that the image is right-side up and in focus, that the colors are correct, and that the bibliographic data 252 was scanned or otherwise input correctly and matches the image 250.

After processing by service 202 is complete, the image and bibliographic data are applied to a caption/suggests field entry service 204. This service 204 permits a captioner to enter the caption and suggests field information as described in connection with FIG. 1. In a preferred embodiment, service 204 is implemented using data entry processor disambiguation processor 114, but other processors, e.g., data entry processor 112, could also provide this functionality.

The data are next applied to a disambiguation of caption and suggests field tool 206. This tool 206 provides processing, described in greater detail in connection with FIG. 4, that checks the spelling of words in the bibliographic data, allows for supplementation of information in the bibliographic data (e.g., to provide more complete location information), "tags" words in the caption and suggests field as being particular parts of speech, checks the spelling of words in the caption and suggests field, links logically connected adjacent words in the captions and suggests field as "multiwords" (e.g., "United States" and "home run"), and removes ambiguities from the caption and the suggests field by allowing the captioner to select a word sense that most closely matches the concept or intended meaning of any particular word in the context. For instance, the word "crane" has both noun and verb meanings, and the noun meaning is also ambiguous between a "heavy equipment" sense and a "bird" sense. The captioner is presented with a list of possible senses and asked to indicate which sense is intended. From that point on, the word is marked with the intended sense so that requests for images related to the other senses of the word do not pull up that image.

The disambiguation tool 206 uses a semantic net of word senses, including a hierarchy of synonyms and related words. This net acts as a thesaurus to link related words in both the disambiguation service 206 and in the image center 120 so that even if the end user does not request "crane", but instead requests "heron", images captioned using the bird sense of crane may be retrieved.

The disambiguation tool 206 permits a captioner to add new words and new senses to the semantic net. In a preferred embodiment, disambiguation tool 206 is implemented using disambiguation processor 114 and NLP database 116. Referring now also to FIG. 4, greater detail is provided concerning the data flow of disambiguation tool 206.

The data with ambiguities removed is next applied to captions/suggests field quality control service 208, wherein the captioner is provided with an opportunity to again check the accuracy of the caption and suggests field information. In a preferred embodiment, caption/suggests field quality control service is also implemented using disambiguation processor 114.

Referring now to FIG. 6, there is shown an index card screen 600 by which data entry for the quality control services 202, 208 and caption/suggests field entry services 204 may be accomplished. Index card screen 600 displays image 250 in a picture display area 602, allows the captioner to review and modify bibliographic data in bibliographic data area 610, allows the captioner to add or review a caption in caption area 604, allows the captioner to add or revise suggests field information in suggests field area 606, and allows the captioner to add or revise photographer's notes in photographer's notes area 608. Screen 600 also provides an area 612 in which the captioner may specify the image characteristics (e.g., predominant hue, type of image). Furthermore, screen 600 provides a set of navigation buttons 618 by which the captioner may move among images, reject an image that is of faulty quality, or change default values for the data. In a preferred embodiment, screen 600 is implemented to operate in conjunction in the conventional windowing environment of a UNIX workstation such as a SUN SPARCSTATION or an IBM model RS6000 workstation. In a preferred embodiment, a separate screen similar to screen 600 is used for data entry and correction of pricing and delivery information for images, but it should be recognized that this information could be provided on screen 600 if desired.

Referring now to FIG. 7, there is shown an interpreter screen 700 that is presented to the captioner when the disambiguation tool 206 is invoked by the captioner's selection of a "next" choice from navigation buttons 618 on screen 600. Interpreter screen 700 includes an area 702 for display of tokens, or words, in a caption, an area 704 for display of the various sense choices known in NLP database 116 for a selected word in display area 702, an area 706 for more detailed interpretation of a selected one of the sense choices that is displayed in area 704, and an area 712 for display of the part of speech of the selected token. A user commands area 710 displays button choices that the captioner may invoke to add or subtract information, to form or break multiwords, and to add new sense definitions. A navigation commands area 708 displays button choices that the captioner may invoke to finalize disambiguation selections, to ignore tokens that are flagged as being potentially ambiguous, or to finish or cancel a session. In operation, the caption displayed in the tokens area may include a number of words that the disambiguation tool determines to be ambiguous, and may also mark selected groups of words as multiword candidates.

Referring now also to FIG. 4, the process of disambiguation corresponding to the screens in FIGS. 6 and 7 begins by invoking 401 the disambiguation tool 206. A conventional spell-checker is then invoked 402 to correct any spelling errors that appear in the caption. Next, a check 403 is made to determine whether the caption contains any likely multiwords. If so, the multiwords are marked 404 by the captioner underlining them on screen 600. In one embodiment, the ingestion center 110 automatically provides suggestions for multiwords and provides the captioner an opportunity to modify those selections as desired. For example, a caption may include the term "blue collar", and it is up to the captioner to determine whether this term merely describes the color of a shirt collar and should therefore be considered as two separate words, or whether it relates to manual labor, in which case it should be considered as a multiword. The system may at times present a large number of choices for multiwords. For example, if the caption includes "Mt. Rushmore National Park, South Dakota", a number of possible multiwords may be presented, ranging from no multiwords, to a combination of the multiwords "Mt. Rushmore", "National Park", and "South Dakota", to a large single multiword containing the entire phrase. The proper selection is left to the captioner's discretion, and should be made in a manner that will be most helpful in searching for images. In the above example, one likely selection would be of the multiwords "Mt. Rushmore National Park" and "South Dakota".

Processing then invokes 405 a disambiguation/part of speech tagger and allows the captioner to select a word for processing. A check 406 is made to determine whether the selected word is recognized, i.e., is known by the NLP database 116. If not, an unknown word handler is invoked 412 so that the disambiguation tool 206 enters a learning mode, and the captioner is prompted 413 to enter the word type, i.e., a proper name, a location, or other. In a preferred embodiment, unknown words of a caption are immediately displayed using a different color, e.g., red, from recognized words to ensure that the captioner provides such unrecognized words with special attention. In typical operation of the preferred embodiment, all words in a caption and in a suggests field are selected by the captioner for disambiguation, but it should be recognized that operation in which only some words are so processed is also possible.

If the captioner indicates that the type of the unknown word is a name, the word is stored 415 as a name in the NLP database 116, and processing returns to 405 for disambiguation of subsequent words in the caption. If the word type is a location, the word is stored 414 as a location in the NLP database 116, and processing returns to 405 for disambiguation of subsequent words in the caption. If the captioner indicated any other word type, the captioner is prompted to identify 416 the part of speech of the word (e.g., noun, verb, adjective, adverb, date, keyword, helping word) and to associate 417 the word with a word that is known by the NLP database 116. Keywords are acronyms, company names, newly-defined terms in common usage, slang, and words that do not fall into the other categories. Examples of keywords might be names of musical groups such as "Peter, Paul & Mary" and of cultural movements such as "New Age" or "grunge". Helping (or "function") words are determiners such as prepositions, conjunctions and possessive pronouns when used in a manner that would not assist in image searching. For instance, a caption that reads "A boy runs past a house" should have the word "past" marked as a helping word.

In a preferred embodiment, three types of associations are provided for unknown words. The first, a "kind of" association, links genus terms with species terms. For instance, Halloween is a "kind of" holiday. The second type of association is a synonym association. For instance, the words "trauma" and "shock" may be linked in this manner. The third type of association is a "sister term" association, and is used to link two species within the same genus. For instance, the multiword terms "tank top" and "tee shirt" would be linked as sister terms. This information is used to determine the unknown word's placement in NLP database 116. For instance, if NLP database 116 already recognizes "tank top" as a kind of shirt, linking "tee shirt" as a sister term for "tank top" establishes tee shirt" as also being a kind of shirt. It should be recognized that additional or other types of associations may be provided.

The captioner may use the information stored in NLP database 116 as a dictionary or thesaurus by free associating other possible words with the unknown word in interpretation area 706. Once the captioner has typed in a proposed associated word that is recognized as being in NLP database 116, the disambiguation tool will allow that proposed associated word to be the sense with which the unknown word is tagged. For instance, if the word "biker" appears in a caption and is unknown in NLP database 116, the captioner may try free associating the term "bicyclist". If that term is unknown as well, the captioner may try the term "cyclist." If cyclist is in the NLP database 116, the captioner can choose that "biker" be tagged with the sense "cyclist" in that caption. Image searching can also be enhanced by free associating proper names or dates with other terms in NLP database 116. For example, the captioner may associate the proper name "Abraham Lincoln" with the noun "president". Similarly, nouns may be associated with verbs, for instance "explosion" with "combust". As a more complete example, the captioner may mark the words "boat people" as a multiword, indicate that the part of speech is a noun, and associate this multiword with the known term "refugees".

All of the new information provided by the captioner while disambiguation tool is in a learning mode is recorded in a log for future use in disambiguation and, once uploaded to image center 120, for use in image searching.

If check 406 indicates that the word is recognized in the NLP database 116, then a check 407 is made to determine whether the part of speech assumed by disambiguation tool 206 for the word is correct. This check 407 is accomplished by prompting the captioner to indicate whether there is an error in the assumed part of speech, which is conventionally determined by word order and statistical information concerning usage of each word. If there is an error, the captioner indicates 408 the correct part of speech. If the part of speech is determined 409 to be as a function (or "helping") word, the word is ignored 410 for purposes of disambiguation and processing returns to 405. A function word, as opposed to a content word, is a word that only links together or supports words that describe things, actions, and properties. For example, content words would include "house", "walk" or "crooked", while function words would include "the", "and", "could", and "if". If the word is not a function word, the captioner is prompted 411 to indicate the correct sense of the word and thereby mark that instance of the word with the desired sense. This prompting 411 takes place even if the NLP database 116 is currently aware of only one sense of the word, in order to give the captioner an opportunity to add a new sense for that word to the NLP database 116.

Disambiguation tool 206 is implemented in a preferred embodiment in a conventional manner using disambiguation processor 114 and NLP database 116. Further information on known techniques of natural language processing for text-only retrieval systems are found, for example, in T. Strzalkowski and B. Vauthey, Information Retrieval Using Robust Natural Language Processing, PROCEEDINGS OF THE 30TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, 28 Jun.-2 Jul. 1992, Newark, Del., pp. 104-111; P. Nelson, Site Report for the Text REtrieval Conference, TREC: The First Text REtrieval Conference (TREC-1), D. K. Harman, ed., Computer Systems Laboratory, National Institute of Standards and Technology, Gaithersburg, Md., NIST Special Publication 500-207 (hereinafter, "TREC"), pp. 287-296 (1993); D. Evans, et al., CLARIT TREC Design, Experiments, and Results, TREC, pp. 251-286 (1993); T. Strzalkowski, Natural Language Processing in Large-Scale Text Retrieval Tasks, TREC, pp. 173-187 (1993); S. Abney, Parsing by Chunks, PRINCIPLE-BASED PARSING: COMPUTATION AND PSYCHOLINGUISTICS, Berwick et al., eds., Dordrecht: Kluwer Academic Publishers, pp. 257-78 (1991), the teachings of which are incorporated herein by reference.

In a preferred embodiment, each word sense is represented using a unique identifying number. An ambiguous word, such as "crane", may have several senses, and disambiguation refers to selecting the correct sense in a particular context or, in other words, discarding those senses of the word that are not appropriate in the current context. It should be recognized that this disambiguation may be performed either manually, i.e., with the captioner selecting a proper sense for each word, or may be performed automatically, e.g., with a system that uses statistical information to select the most likely sense in a given context.

The output of ingestion center 110 includes image 250 and metadata 262 and may, as desired, be in the form of a data stream on a bus connecting ingestion center 110 to image center 120, or may be written onto storage media such as magnetic or optical disks or tapes.

Referring now to FIG. 3, there is shown a functional block diagram of image center 120. Image 250 and metadata 262 are applied to uploading, archiving, watermarking and indexing service 302 for initial processing. Service 302 transfers full-resolution images, e.g., 250 for long-term storage onto a conventional medium such as magnetic tape; generates browse-resolution images, watermarks such images and stores them for browsing service 308; stores metadata and any additions to the semantic net resulting from disambiguation for index querying service 306, and stores licensing and pricing information for use by purchase and delivery service 310 to permit on-line delivery of a full-resolution image 350. In a preferred embodiment, separate databases within database processor 124 are used to provide such storage, but it should be recognized that any conventional storage scheme could be used for storage of the browse-resolution images, the metadata, the semantic net information, and the licensing and pricing information.

Still referring to FIG. 3, a user workstation, e.g., 130, communicates with image center 120 by connection to credentials verification service 304. Service 304 verifies an user's credentials by checking an input user identification number, organization identification number, user name, and password. Users are assigned a permission level to indicate whether they are authorized only to search for images or to both search for and purchase rights to images. Service 304 also maintains audit trails of system usage, such as connect time and login attempts, both for billing purposes and for tracing attempted unauthorized use of system 100. In a preferred embodiment, credentials verification service 304 is implemented partially on user workstation 130 and partially on image center 120, specifically database processor 124. It should be recognized, however, that other equivalent implementations could be used to achieve the function of credentials verification service 304. In an alternative e