WikiPatents - Community Patent Review
Create Free Account  |  License or Sell Your Patent  |  WikiPatents Marketplace  |  WikiPatents Blog
Username:  Password:  
    
Advanced Search
Machine translation and telecommunications system    
United States Patent5497319   
Link to this pagehttp://www.wikipatents.com/5497319.html
Inventor(s)Chong; Leighton K. (New York, NY); Kamprath; Christine K. (Austin, TX)
AbstractA machine translation and telecommunications system automatically translates input text in a source language to output text in a target language using a dictionary database (22) containing core language dictionaries for general words, a plurality of sublanguage dictionaries for specialized words of different domains or user groups, and a plurality of user dictionaries for individualized words used by different users. The system includes a receiving interface (11) for receiving input from a sender, in the form of electronic text, facsimile (graphics) input, or page image data, and an output module (30) for sending translated output text to any designated recipient(s). The input text is accompanied by a cover page or header (50) identifying the sender, one or more recipients, their addresses, the source/target languages of the text, any sublanguage(s) applicable to the input text, and any formatting requirements for the output text. The system uses the cover page or header data to select the core language, sublanguage, and/or user dictionaries to be used for translation processing, to format the translated output text, and to send the output to the recipient(s) at the designated address(es). The dictionary database (22) can cumulate and evolve over time by adding new words as scratch entries to the user dictionaries and, through the use of dictionary maintenance utilities, by updating and/or moving the scratch entries to higher-level subdomain, domain, or even core dictionaries as their usage gains currency.



 Title Information Submit all comments and votes
 
Patent Text Patent PDF Print Page Summary File History
Plain text PDF images Print Summary File History
Drawing from US Patent 5497319
Machine translation and telecommunications system - US Patent 5497319 Drawing
Machine translation and telecommunications system
Inventor     Chong; Leighton K. (New York, NY); Kamprath; Christine K. (Austin, TX)
Owner/Assignee     Trans-link International Corp. (Honolulu, HI)
Patent assignment
All assignments
Publication Date     March 5, 1996
Application Number     08/312,440
PAIR File History     Application Data   Transaction History
Image File Wrapper   Patent Term   Fees
Litigation
Filing Date     September 26, 1994
US Classification     704/2 704/10 707/1
Int'l Classification     G06F 017/28
Examiner     Huntley; David M.
Assistant Examiner    
Attorney/Law Firm     Ostrager, Chong & Flaherty
Address
Parent Case     SPECIFICATION This application is a continuation of Ser. No. 07/920,456 filed Aug. 12, 1992, now abandoned, which is a continuation-in-part of U.S. patent application Ser. No. 636,400, entitled "Automatic Text Translation and Routing System", filed on Dec. 31, 1990, now issued as U.S. Pat. No. 5,175,684.
Priority Data    
USPTO Field of Search     364/419.02 364/419.03 364/419.11 379/90 379/100 358/400 395/600
Patent Tags     translation telecommunications
   
Enter a comma (,) or semicolon (;) between multiple tag words/phrases.
Describe this patent:
 Amusing   
 Clever   
 Complex   
 Efficient   
 Historic   
 Important   
 Innovative   
 Interesting   
 Practical   
 Simple   
[no votes]
Patent WIKI

Share information and news about this patent, including information and news about the technology, inventors, company, ligation and licensing.

 References Submit all comments and votes
 
*references marked with an asterisk below are user-added references
 U.S. References
 
Add a new US reference:  
ReferenceRelevancyCommentsReferenceRelevancyComments
5283887
Zachery
715/513
Feb,1994

[0 after 0 votes]
5274801
Gordon
707/3
Dec,1993

[0 after 0 votes]
5197005
Shwartz
707/2
Mar,1993

[0 after 0 votes]
5175684
Chong
704/3
Dec,1992

[0 after 0 votes]
5157384
Greanias
345/156
Oct,1992

[0 after 0 votes]
5079701
Kuga

Jan,1992

[0 after 0 votes]
5077804
Richard

Dec,1991

[0 after 0 votes]
4996707
O'Malley
379/100.13
Feb,1991

[0 after 0 votes]
4980829
Okajima
704/5
Dec,1990

[0 after 0 votes]
4916730
Hashimoto
379/70
Apr,1990

[0 after 0 votes]
4882681
Brotz
704/3
Nov,1989

[0 after 0 votes]
4866755
Hashimoto
379/80
Sep,1989

[0 after 0 votes]
4805207
McNutt
379/88.25
Feb,1989

[0 after 0 votes]
4383307
Gibson, III
715/533
May,1983

[0 after 0 votes]
4352012
Verderber
235/487
Sep,1982

[0 after 0 votes]
 Foreign References
 Other References
 Market Review Submit all comments and votes
   
Market Size
Estimate the gross annual revenues of the relevant market sector:
> $10B
$5B - $10B
$2B - $5B
$500M - $2B
$100M - $500M
$10M - $100M
$1M - $10M
$500K - $1M
$100K - $500K
< $100K
[No votes]
$0
 
$0   $2.5B   $5B   $7.5B   $10B
Market Share
Estimate the percentage of the relevant market sector this invention will capture:
75% - 100%
50% - 74.99%
25% - 49.99%
10 - 24.99%
5 - 9.99%
2 - 4.99%
1 - 1.99%
< 1%
[No votes]
0.0%
 
0%   25%   50%   75%   100%
Reasonable Royalty
What percentage of gross sales should the inventor or assignee be paid?
75% - 100%
50% - 74.99%
25% - 49.99%
10 - 24.99%
5 - 9.99%
2 - 4.99%
1 - 1.99%
< 1%
[No votes]
0.0%
 
0%   25%   50%   75%   100%
Public's "Guesstimation" of Royalty Value
Market SizeN/A[No votes]
xMarket ShareN/A[No votes]
xReasonable RoyaltyN/A[No votes]

N/A

License Availablity
If you are NOT the owner or assignee, answer here:
Yes, license is available for purchase

No, license is not currently available



[No votes]
License Availablity
If you ARE the owner or assignee, answer here:
Yes, license is available for purchase

No, license is not currently available



[No votes]
Competitive Advantage
Does this invention have a significant competitive advantage over similar technologies?
Yes

No



[No votes]
Most helpful competitive advantage comment
[No comments]

Commercial Alternatives
Are there viable commercial alternatives for this invention?
Yes

No



[No votes]
Most helpful commercial alternative comment
[No comments]

 Technical Review Submit all comments and votes
 Claims Submit all comments and votes
 


We claim:

1. A machine translation system comprising:

(a) a machine translation module for performing machine translation of an input text in a source language to an output text in a target language using a dictionary database containing entries for words of the target language corresponding to words of the source language;

(b) a dictionary database including a core language dictionary containing entries for generic words of a source/target language pair, a plurality of sublanguage dictionaries each containing entries for specialized words of a respective one of a plurality of sublanguages handled by the machine translation system for the source/target language pair, and a plurality of user dictionaries each containing entries for individualized words used by a respective one of a plurality of users of the machine translation system for the source/target language pair;

(c) a receiving interface for receiving a particular input text sent by a particular user and accompanying control inputs indicative of a preferred sublanguage applicable to the particular input text out of a plurality of possible sublanguages and indicative of a preferred user dictionary out of the plurality of user dictionaries; and

(d) a dictionary control module responsive to the control inputs received by the receiving interface for causing the machine translation module to use the core language dictionary, and for selecting a preferred sublanguage dictionary and a preferred user dictionary for use by the machine translation module for translation of the input text,

wherein said receiving interface further includes a programmed module for allowing users of the machine translation system to input scratch entries for individualized words into their respective user dictionaries, and

wherein said dictionary control module further includes an automated dictionary maintenance utility for automatically assessing whether a word entry has been entered as a scratch word entry in common to a given number or grouping of users through their respective user dictionaries and for thereupon automatically entering said word entry into a higher-level sublanguage dictionary for a sublanguage encompassing those users.

2. A machine translation system according to claim 1 wherein the dictionary database contains a plurality of core language dictionaries corresponding respectively to a plurality of source/target languages, the receiving interface includes means coupled to the dictionary control module for receiving a source/target control input indicative of a selected core language dictionary applicable to the input text, and the dictionary control module includes means responsive to the source/target control input for causing the machine translation module tc use the selected core language dictionary for translation of the input text.

3. A machine translation system according to claim 1, wherein said programmed module of said receiving interface includes means for prompting and receiving user input defining linguistic features of a scratch word entry input to the respective user dictionary, and the dictionary control module includes user maintenance utilities enabling the scratch word entry defined with said linguistic features to be added into the corresponding user dictionary.

4. A machine translation system according to claim 3, wherein said dictionary control module includes user maintenance utilities enabling the linguistic features of a scratch word entry to be defined by reference to a similar word entry already defined with linguistic features in another user dictionary.

5. A machine translation system according to claim 1, wherein the dictionary database has a nested form of organization such that user dictionaries are at a lowest level and nested within higher-level sublanguage dictionaries, and sublanguage dictionaries are nested within a core language dictionary at a highest level, and the dictionary control module provides access by the machine translation module to a selected user dictionary, a selected sublanguage dictionary, and the core dictionary in order from the lowest to the highest level.

6. A machine translation system according to claim 5, wherein the dictionary control module includes DMO assistance utilities for receiving input from a dictionary maintenance operator and allowing the DMO to update the word entries of the dictionaries of the dictionary database, and the dictionary control module includes dictionary maintenance utilities for enabling the DMO to input commands for moving a word entry appearing in common in the user dictionaries of a domain or group of users into a higher-level sublanguage dictionary for that domain or group of users.
 Description Submit all comments and votes
 


TECHNICAL FIELD

This invention relates to a system for automatic (machine) translation of text and, more particularly, to a telecommunications-based system for automatically translating and sending text from a sender to a recipient in another language.

BACKGROUND ART

After several decades of development, the field of automatic (machine) translation of text from a source language to a target language with a minimum of human intervention has developed to a rudimentary level where machine translation systems with limited vocabularies or limited language environments can produce a basic level of acceptably translated text. Some current systems can produce translations for unconstrained input in a selected language pair, i.e., from a chosen source language to a chosen target language, that is perhaps 50% acceptable to a native writer in the target language (using an arbitrary scale measure). When the translation system is constrained to a particular vocabulary or syntax style of a limited area of application (referred to as a "sublanguage"), the results that can now be achieved may approach a level 90% acceptable to a native writer. The wide difference in results is attributable to the difficulty of producing accurate translation when the system must encompass a wide variability in vocabulary use, syntax, and expression, as compared to the limited vocabularies and translation equivalents of a chosen sublanguage.

One example of a machine translation system limited to a specific sublanguage application is the TAUM-METEO system developed by the University of Montreal for translating weather reports issued by the Canadian Environment Department from English into French. TAUM-METEO uses the transfer method of translation, which consists basically of the three steps of: (1) analyzing the sequence and morphological forms of input words of the source language and determining their phrase and sentence structure, (2) transferring (directly translating) the input text into sentences of equivalent words of the target language using dictionary look-up and a developed set of transfer rules for word and/or phrase selections; then (3) synthesizing an acceptable output text in the target language using developed rules for target language syntax and grammar. TAUM-METEO was designed to operate for English-to-French translation in the narrow sublanguage of meteorology (1,500 dictionary entries, with several hundred place names; text having no tensed verbs). It can obtain high levels of translation accuracy of 80% to 90% by avoiding the need for any significant level of morphological analysis of input words, by analyzing input texts for domain-specific word markers which narrow the range of choices for output word selection and syntax structure, and by using ad hoc transfer rules for output word and phrase selections.

Another example of a sublanguage translation system is the METAL system developed by the Linguistics Research Center at the University of Texas at Austin for large-volume translations from German into English of texts in the field of telecommunications. The METAL system also uses the transfer method, but adds a fourth step called "integration" between the analysis and transfer steps. The integration step attempts to reduce the variability of output word selection and syntax by performing tests on the constituent words of the input text strings and constraining their application based upon developed grammar and phrase structure rules. Transfer dictionaries typically consist of roughly 10,000 word pairs. In terms of translation quality, the METAL system is reported to have achieved between 45% and 85% correct translations.

A strategy competing with the transfer approach is the "interlingua" approach which attempts to decompile input texts of a source language into an intermediate language which represents their "meaning" or semantic content, and then convert the semantic structures into equivalent output sentences of a target language by using a knowledge base of contextual, lexical, and syntactic rules. Historically, transfer systems lacking a comprehensive knowledge base and limited to translation of sentences in isolation have had the central problem of obtaining accurate word and phrase selections in the face of ambiguities presented by homonyms, polysemic phrases, and anaphoric references. The interlingua approach is favored because its representation of text meaning within a context larger than single sentences can, in theory, greatly reduce ambiguity in the analysis of input texts. Also, once the input text has been decompiled into a semantic structure, it can theoretically be translated into multiple target languages using the linguistic and semantic rules developed for each target language. In practice, however, the interlingua approach has proven difficult to implement because it requires the development of a universal symbolic language for representing "meaning" and comprehensive knowledge bases for making the conversions from source to intermediate and then to target languages. Examples of interlingua systems include the Distributed Translation Language (DLT) undertaken in Utrecht, the Netherlands, and the Knowledge-Based Machine Translation (KBMT) system of the Center for Machine Translation at Carnegie-Mellon University.

Other machine translation systems have been developed or are under development using modifications or hybrids of the transfer and interlingua approaches. For example, some systems use human pre-editing and/or post-editing to reduce text ambiguity and improve the correctness of word and phrase selections. Other systems attempt to combine a basic transfer approach with knowledge bases and artificial intelligence techniques for machine editing and enhancement. Another approach is to combine decompilation to a syntactically-based intermediate structure with transfer to equivalent output phrases and sentences. For a further discussion of current developments in machine translation, reference is made to Machine Translation, Theoretical and Methodological Issues, edited by Sergei Nirenberg, published by Cambridge University Press, 1987, and "Proceedings of The Third International Conference on Theoretical and Methodological Issues in Machine Translation of Natural Language", published by Linguistics Research Center, University of Texas at Austin, Jun. 1990.

It is expected that machine translation (MT) systems will develop in time to provide higher levels of translation accuracy and utility. However, current MT techniques using a basic transfer approach can produce acceptable translation accuracy in a selected sublanguage, yet they are not in widespread use. One reason for the limited use of MT systems is that most current systems are designed for a single, specific application, environment and language pair context. The requirements of that context motivate the design and development of the grammar, dictionary structure, and parsing algorithms. Thus, the utility of the system becomes confined to that particular context. This approach greatly limits the range of applications and the audience of users which can be productively served by such application- and language-specific MT systems.

SUMMARY OF INVENTION

It is therefore a principal object of the present invention to provide a system for performing machine translation for different source languages, target languages, and sublanguages, and automatically sending the translated text via telecommunications links to one or more recipients in different languages and/or in different locations. The system should be capable of providing acceptable levels of translation accuracy and be readily upgradable to higher levels of accuracy and utility. It is a further object that such a system be capable of operation with a minimum of human intervention, yet have interactive utilities for obtaining and adding new word entries to its dictionary database. It is also desired that such a system be capable of building and organizing a large-scale dictionary database containing core language dictionaries, plural sublanguage dictionaries, and individual user dictionaries in a manner which cumulates and evolves over time.

In accordance with a principal aspect of the present invention, a machine translation and telecommunications system comprises:

(a) a machine translation module for performing machine translation from input text of a source language to output text of a target language;

(b) a receiving interface for receiving input via a first telecommunications link, said input including an input text to be translated accompanied by a control portion having at least a first predefined field therein for designating an address of a recipient to which translated output text is to be sent;

(c) a recognition module coupled to said receiving interface for electronically scanning the control portion and recognizing the address of the recipient designated in the first predefined field of the control portion; and

(d) an output module including a sending interface for sending translated output text generated by said machine translation module to the address of the recipient recognized by said recognition module via a second telecommunications link.

In a more specific aspect of the invention relating to sublanguage selection, a machine translation system comprises:

(a) a receiving interface for receiving an input text and a sublanguage control input indicative of a selected sublanguage applicable to the input text from among a plurality of possible sublanguages;

(b) a machine translation module capable of performing machine translation of an input text in a source language to an output text in a target language using a dictionary database containing entries for words of the target language corresponding to words of the source language;

(c) a dictionary database including a core language dictionary containing entries for generic words of the source and target languages, and a plurality of sublanguage dictionaries each containing entries for specialized words of a sublanguage;

(d) a dictionary control module responsive to the sublanguage control input for selecting a sublanguage dictionary of the dictionary database applicable to the input text, and for causing the machine translation module to use the selected sublanguage dictionary in performing translation of the input text; and

(e) an output module for outputting translated text in the target language generated by the machine translation module.

In the present invention, the sublanguage control input causes a selected sublanguage dictionary deemed applicable to the input text to be used in order to perform more accurate translation of the input text. The dictionary database includes core and sublanguage dictionaries for different source/target languages and sublanguages. The machine translation system with this multiple core languages and sublanguages capability is employed in a telecommunications system which automatically translates and transmits text from a sender to one or more recipients in other languages. A cover page or header accompanying the input text is used to designate the selected source/target languages, the applicable sublanguages, and the address(es)--electronic, fax, or mail--of the recipient(s).

In a preferred embodiment, the receiving interface receives input text as electronic (machine-readable) text over a communications line, or as page image data via a fax/modem board or page scanner. The receiving interface is operated in a computer server along with a recognition module for converting any page image data to electronic text. The recognition module scans and recognizes designations of the cover page or header accompanying the input text for determining the selections of the source/target languages and sublanguage(s) applicable to the input text. In the case of electronic text, the cover page and the input text may be introduced by means of a disk file, by downloading an electronic file, or by online user-system interaction. An optional interaction mode prompts the user for information concerning the user's identity, sublanguage preferences, etc., in order to facilitate generation of a suitable cover page. Inferencing algorithms may be used to assess the user and cover page information and determine the applicable sublanguage dictionary(ies).

The output module may have a page formatting program for composing the translated output text into a desired page format appropriate to a particular recipient or target language. It may also have a footnoting function for providing footnotes of ambiguous phrases of the input text in their original source language and/or with alternate translations in the target language. The output module includes a sending interface coupled to a fax/modem board for facsimile transmission, or a printer for printing output pages, or a telecommunications interface for sending output electronic text to a recipient's electronic address. The modularity of the receiving interface, dictionary database, dictionary control module, and output module from the machine translation module assures that, as machine translation improvements are developed, the machine translation module may be upgraded or replaced without rendering the other portions of the system dysfunctional or obsolete.

As another aspect of the invention related to a machine dictionary database, a machine translation system comprises:

(a) a machine translation module for performing machine translation of input text in a source language to output text in a target language using a dictionary database containing entries for words of the target language corresponding to words of the source language;

(b) a dictionary database including a core language dictionary containing entries for generic words of the source/target languages, a plurality of sublanguage dictionaries each containing entries for specialized words of a sublanguage used by a group of users, and a plurality of user dictionaries each containing entries for individualized words of a user; and

(c) a dictionary control module responsive to control inputs to the machine translation system for causing the machine translation module to use the core language dictionary, any applicable sublanguage dictionary, and any applicable user dictionary for performing translation of an input text attributed to a user of the system.

In the invention, a large-scale dictionary database is maintained which has dictionaries containing word entries specified linguistically at different hierarchical levels of usage. At the lowest (user) level, a particular user can enter temporary or "scratch" word entries into a respective user dictionary. The machine translation system uses the particular user's dictionary to perform machine translation of text which may contain idiosyncratic or new words or phrases particularly used by that user. The dictionary control module includes dictionary maintenance utilities which allow such scratch entries to be entered by users into their user dictionaries, and which assist a dictionary maintenance operator (DMO) to review the scratch entries so that they can be confirmed as valid dictionary entries for machine translation. The dictionary maintenance utilities include automated programmed procedures for assessing whether word entries appearing in lower-level dictionaries should be moved into higher-level dictionaries.

Other objects, features, and advantages of the present invention will become apparent from the following detailed description of the preferred embodiments of the invention, as considered with reference to the following drawings:

BRIEF DESCRIPTION OF DRAWINGS

FIG. 1 is a schematic diagram of a machine translation and telecommunications system in accordance with the invention.

FIG. 1A is a schematic diagram of a computer server which includes a receiving interface, recognition module, and dictionary control module, and is coupled to a machine translation module and an output module.

FIG. 1B is a schematic diagram of a machine translation module which includes a translation processing module and a dictionary database, and its linkage to the computer server and the output module.

FIG. 1C is a schematic diagram of the output module, including a page formatting module and a sending interface.

FIG. 2 is an illustration of a cover page for designating core language pair, sublanguage(s), and recipient information, and accompanying text pages.

FIG. 3 is an illustration of input ideographic text and output English text as performed by the machine translation system using page formatting functions.

FIG. 4 is a schematic diagram of the dictionary control module, including dictionary selection and maintenance submodules, the latter containing an (interactive) user maintenance module and a dictionary maintenance module.

FIG. 5 is a schematic representation of an interactive input editor for interactions with users of the system.

FIG. 6 is a schematic diagram illustrating dictionary maintenance utilities for collapsing and promotion of entries from subordinate to superordinate dictionaries.

FIG. 7A illustrates, as a function of the dictionary maintenance utilities, the creation of scratch word entry from an identical word entry.

FIG. 7B illustrates the use of utilities with an interactive input editor to scan various levels of the dictionary hierarchy for word entries on which to base scratch word entries.

FIG. 7C illustrates a typical content of an identical word entry from which a scratch word entry is created.

FIG. 7D illustrates the creation of a "copy-cat" word entry from a synonymous word entry.

DETAILED DESCRIPTION OF PREFERRED EMBODIMENTS

Referring to FIG. 1, a preferred form of the machine translation and telecommunications system in accordance with the present invention comprises a computer server 10, a machine translation module 20, and an output module 30. (These and further-described components of the system will be denoted with capital letters for clarity of reference.) The Computer Server 10 receives electronic text input accompanied by a cover page or header from any of a plurality of input sources, designated generally as a telecommunications link A. The Computer Server 10 has a function for recognizing control data in the cover page or header designating core language and sublanguage selections applicable to the input text to be translated. It also recognizes output addresses and page formatting data to be used by the Output Module 30 for transmitting the translated text to the designated recipient(s) via any of a plurality of output devices, designated generally as a telecommunications link B. Due to the modularity of the system, the Machine Translation Module 20 may be updated by operator maintenance or upgraded or replaced without rendering the other functions of the system dysfunctional or obsolete.

The Machine Translation Module 20 is capable of performing machine translation from input text in a source language to output text in a target language. In the examples of a machine translation (MT) system described herein, reference is made to an MT system of the transfer type which relies upon the use of a machine-readable dictionary for lookup of source/target word entries. The principles of the present invention may also be applied to an MT system of the interlingua type. Transfer-type MT systems are widely accepted for near-term usage than interlingua systems, and they rely more heavily on linguistic knowledge incorporated into machine dictionaries designed for source/target language pairs. The operation of transfer-type MT systems is well understood by those skilled in the machine translation field, and is not described further herein.

Input Data Reception and Extraction

FIG. 1A shows the Computer Server 10 having a Receiving Interface 11 linked to the telecommunications link A, a Recognition Module 12, and a Dictionary Control Module 13. The Receiving Interface 11 may include an interactive mode program (to be described further herein) whereby a user can provide cover page or header designations, update or create User ID files pertinent to translation parameters associated with that user's communications, or create specialized user dictionary entries during interactive text entry sessions. The Recognition Module 12 includes a character recognition (often referred to as "OCR") program which recognizes and converts page image data into machine-readable text, and which recognizes cover page designations or user designations referencing cover page data stored in the User ID files. The Dictionary Control Module 13 includes a Dictionary Selection Module, which assesses the control data it receives from the Recognition Module 12 and designates the appropriate core language and sublanguage dictionary(ies) to be used by the Machine Translation Module 20. It also includes a Dictionary Maintenance Module, which allows a dictionary maintenance operator (DMO) to create and update dictionary entries in the Dictionary Database 22.

Using the control data from a cover page or header accompanying the input text, the Computer Server 10 allows the system to automatically recognize a sender's designations of the source language of the input text, the target language(s) of the output text, any particular sublanguage(s) used in a specialized domain, user group, or correspondence type, any preferred page format for the output text, and the address(es) of one or more recipients to whom the output is to be sent. Thus, the system can automatically access designated core and sublanguage dictionaries maintained in the Dictionary Database 22 for different source/target languages and sublanguages, and can format and transmit the translated text to recipient(s) in respective target language(s) via telecommunications link B, without the need for any substantial human intervention.

The Computer Server 10 interfaces with a plurality of receiving devices. For example, input data can be received as a facsimile transmission via a fax/modem board plugged into the I/O bus for the server system. Such fax/modem boards are widely available and their operation in a server system is well understood by those skilled in this field. Input may also be received from a conventional facsimile machine coupled to a telephone line which prints facsimile pages converted from signals transmitted on the telephone line. A conventional page scanner with a sheet feeder can be used to scan in facsimile or printed pages as page image data for input to the Computer Server. The page image data is then converted to machine-readable form by the OCR program. Input may also be received through a telecommunications program or network interface as electronic text or text files (such as ASCII text), in which case conversion by the OCR program is not required.

The OCR program may be resident as an application program in the Computer Server 10 along with the interface programs for handling the reception of input data. OCR programs are widely available, and their operation is well known in this field. For example, an OCR program for recognizing Japanese kana and ideographic characters is offered by Catena Corp., Tokyo, Japan. An example of an OCR program for alphanumeric characters is WordScan.TM. offered by Calera Recognition Systems, Santa Clara, Calif. The Computer Server 10 is preferably a high-speed, multi-tasking PC computer or workstation.

Referring to FIG. 2, the Computer Server 10 receives input data which is divided into two parts: a cover page or header 50 and input text 60. In the example shown, a cover page is used in conjunction with other pages of input text in a page-oriented system. In the case of transmission of an electronic text file or a text message, a preceeding header or identifier for the communication is used. The cover page 50 has a number of fields for designating selections of source/target language(s), sublanguage(s), page format, and recipient(s) for the text. The cover page 50 is organized with data fields in a predefined format which is readily recognized by the Recognition Module 12 of the Computer Server 10 so that the control data in the predefined fields can be readily recognized.

For example, the cover page 50 may be laid out and formatted with field boundaries and markings on the printed page for optically scanning with a high level of reliability. Line dividers 51 and large type-size headers 52 may be used to mark the sender, source/target language(s), sublanguage (communication type or subject matter), page format, and recipient address fields. Boxes 53, which can be marked or blackened in, allow the designated selections to be determined without error. The names of the sender and recipients, their respective companies, addresses, and telephone and/or facsimile transmission numbers are determined by character recognition once the respective fields 51, 52 have been distinguished. Any page length of input text 60 can follow the cover page 50. Alternatively, information ordinarily supplied by a cover page or header may be stored in the User ID files and supplied automatically as a memorized script in response to user selection.

It is the task of the Recognition Module 12 to extract data pertinent to dictionary selection from the fields of the cover page or header. In batch mode this data is predetermined--it is either filled into the cover page fields by the user with each specific translation transaction, or it can be supplied by a reference to the User Identification (ID) files resident in the Recognition Module 12.

In the Interactive Mode for specifying the cover page or header through the Receiving Interface 11, the user may first be presented with predetermined sets of fill-in data and then prompted for alternative values, or provided with a variety of alternatives from which to choose, based upon data already stored in the User ID files, or based upon inferences drawn from the data as it is entered by the user. For example, a User A may specify Recipient Z by name only, and then be presented with additional data, such as Recipient Z's address, title, or affiliation, already stored in the User ID files for verification or correction. Alternatively, Recipient Z may never have been addressed by User A in the past but may be a user categorized in Domain L, which is a domain of which User A is also a member, thus triggering the inference that the sublanguage dictionary of Domain L may be presented to User A as an option for use.

The user may be prompted in Interactive Mode to verify or choose among field values which aid in selecting one or more sublanguage dictionaries for a given translation, including correspondence types, subject domains, social indicators, etc. By automating the filling-in of cover page information, the system employs its computerized capabilities for the user while controlling and monitoring the completeness and cohesiveness of the data supplied.

The cover page may designate a plurality of recipients in a plurality of address locations and target languages, each of which may have particular formatting requirements for the output. For automated assistance, each prospective recipient can be referenced by an identifying code indexed to data stored in the User ID files. For example, a travel agent may have a regular set of clients in a variety of locations and languages, with access to a variety of communication modes, to whom he or she regularly sends advertising material. One client may require Japanese translation formatted as "right-to-left" vertical lines of ideographic characters, to be printed and sent as ordinary mail. Another may require faxed translation into German. Still another may have E-mail capability and require a printed copy as well. These combinations of addressees and requirements can be predefined and stored in the User ID files. The data for the cover page fields for each of these addressees may be indexed to mnemonic codes, such as the addressee's alphabetic name, and are retrieved from the User ID files by the Recognition Module.

The User ID files may be established at the time of subscription by a user to a machine translation service, and updated from time to time thereafter. Using the Interactive Mode, the user may be prompted to supply his or her name, sex, title, company, address, group affiliations, source language, etc., as well as data relevant to prospective recipients or groups of recipients to be stored in the User ID files for filling in cover pages automatically. Sublanguage selections appropriate to the user may be identified or queried by comparing the requirements of the user with those of other users subscribing to the service.

The user may be prompted to provide samples of typical texts expected to be submitted for translation, as well as individualized or key words for a thesaurus of terms. Automatic ut