A speech processing system for the generation of speaker specific text output. To automatically generate a transcript of a trial, hearing, or. meeting, the system uses microphones dedicated to specific speakers along with one or more computers with speech recognition software assigned to each microphone. The system tracks the occurrences of speech and assembles a transcript of the participant's spoken words including the speaker's identity and a text version of the spoken words in the order the words were spoken.
A voice discriminating tag for making a reception voice and a transmission voice distinguishable is added to the voice inputted into a cell phone. Further, a volume discriminating tag is added based on a detection result of a volume detector. A CPU converts the voice into character data and image data on the basis of the voice discriminating tag and the volume discriminating tag, referring to various files stored in file devices. The converted data is outputted to a display panel on which the character data corresponding to the transmission voice and the character data corresponding to the reception voice are shown in time series so as to have different colors. The character data is shown in a literal type corresponding to a volume level.
According to one embodiment of the invention, a method for conducting a conference call between two or more participants is provided. The method includes receiving an indication of a request for text from a participant. The method also includes converting, in response to the indication, any speech of the other participants of the conference call into text. The method also includes sending the text to a device associated with the participant who requested test. The device is operable to display the text.
A method for automatically organizing digitized photographic images into events based on spoken annotations comprises the steps of: providing natural-language text based on spoken annotations corresponding to at least some of the photographic images; extracting predetermined information from the natural-language text that characterizes the annotations of the images; segmenting the images into events by examining each annotation for the presence of certain categories of information which are indicative of a boundary between events; and identifying each event by assembling the categories of information into event descriptions. The invention further comprises the step of summarizing each event by selecting and arranging the event descriptions in a suitable manner, such as in a photographic album.
In an electronic device having a processor, a visual display, apparatus for entering data into said device, and an electronic memory, a method for reliably determining whether confidential handwritten or voice data entered into the electronic device for processing has been recognized by the device as representing a character the user intended to enter, comprising the steps of receiving a data entry, translating the entry into digital signals and combining the digital signals to create a character, saving the character in electronic memory, commencing a timing sequence, displaying the character on the visual display, awaiting the expiration of the timing sequence, and replacing the character with a mask character on the visual display and, if the character was incorrect, allowing the user to re-enter the character and, if the character was correct, accepting another character, and, when all characters have been entered, sending the saved characters to the processor for processing.