WO1998009228A1 - Natural-language speech control - Google Patents

Natural-language speech control Download PDF

Info

Publication number
WO1998009228A1
WO1998009228A1 PCT/US1997/015388 US9715388W WO9809228A1 WO 1998009228 A1 WO1998009228 A1 WO 1998009228A1 US 9715388 W US9715388 W US 9715388W WO 9809228 A1 WO9809228 A1 WO 9809228A1
Authority
WO
WIPO (PCT)
Prior art keywords
semantic
natural
language
command
computer
Prior art date
Application number
PCT/US1997/015388
Other languages
French (fr)
Other versions
WO1998009228A9 (en
Inventor
Hassan Alam
Original Assignee
Bcl Computers, Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Bcl Computers, Inc. filed Critical Bcl Computers, Inc.
Priority to JP51200598A priority Critical patent/JP2002515147A/en
Priority to CA002263743A priority patent/CA2263743A1/en
Priority to NZ333716A priority patent/NZ333716A/en
Priority to EP97940741A priority patent/EP0919032A4/en
Priority to AU42446/97A priority patent/AU723274B2/en
Publication of WO1998009228A1 publication Critical patent/WO1998009228A1/en
Publication of WO1998009228A9 publication Critical patent/WO1998009228A9/en

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/18Speech classification or search using natural language modelling
    • G10L15/1822Parsing for meaning understanding
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/211Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • G06F40/35Discourse or dialogue representation

Definitions

  • the present invention relates generally to the technical field of digital computer speech recognition and, more particularly, to recognizing and executing commands spoken in natural- language.
  • Voice control of computers allows speech and a high-level of abstraction and complexity in the command. For instance, in giving directions we might simply say "turn left at the light” . Presently, this type of command is possible only when communicating with other humans. Communicating with computers or equipment requires a series of commands at a much lower level of abstraction. For instance, the previous instruction would at a minimum need to be expanded as follows. Go Straight
  • each of the natural-language commands set forth above needs a set of sub-instructions.
  • real-time voice control of equipment has, thus far, remain an elusive goal.
  • an ability to issue spoken natural-language commands permits communicating with equipment ranging from computers to aircraft at a higher level of abstraction than is presently possible.
  • a natural-language voice interface will allow applications such as voice control of vehicles, and voice control of computer applications.
  • Statistical approaches look at word patterns and word co-occurrence and attempt to parse natural-language sentences based on the likelihood of such patterns.
  • Statistical approaches use a variety of methods including neural networks and word distribution. As with any other statistical pattern matching approach, this approach is ultimately limited by an upper limit on error rate which cannot be easily exceeded. Also, it is very difficult to handle wide varieties of linguistic phenomena such as scrambling, NP-movement, binding between question words and empty categories, etc., through statistical natural-language processing.
  • the GB-based approach also handles NP-movement that is exemplified by a passive sentence such as, 'Football was played.' which has the deeper structure ' [ [] [was played [football]] ]'. in parsing this natural-language sentence the noun phrase (NP) 'football' moves from its original position after the verb 'was played' to the front of the sentence, because otherwise the sentence would have no subject. Binding between question words and empty categories is exemplified by a question such as 'Whom will he invite. • The GB approach finds that this sentence has the deep structure [he will [invite [whom]] ].
  • An object of the present invention is to provide a voice- based command system that can translate commands spoken in natural-language into commands accepted by a computer program.
  • An object of the present invention is to provide a voice- based command system that can translate commands spoken in natural-language into commands accepted by different computer programs.
  • Another object of the present invention is to provide a natural-language-syntactic-parser that resolves ambiguities in a voice command.
  • Another object of the present invention is to provide a command interpreter that handles incomplete commands gracefully by interpreting the command as far as possible, and by retaining information from the command for subsequent clarification.
  • the present invention is a natural-language speech control method that produces a command for controlling the operation of a digital computer from words spoken in a natural- language.
  • the method includes the step of processing an audio signal that represents the spoken words to generate textual digital-computer-data.
  • the textual digital-computer-data contains representations of the words in the command spoken in a natural-language.
  • the textual digital-computer-data is then processed by a natural-language-syntactic-parser to produce a parsed sentence.
  • the parsed sentence consists of a string of words with each word being associated with a part of speech in the parsed sentence.
  • the string of words is then preferably processed by a semantic compiler to generate the command that controls the operation of the digital computer.
  • the preferred embodiment of the present invention uses a GB-based natural-language-syntactic-parser which reveals implied syntactic structure in English language sentences.
  • the GB-based natural-language-syntactic-parser can resolve ambiguous syntactic structures better than alternative methods of natural- language processing.
  • GB-based natural-language-syntactic-parser for the natural-language speech control method provides a customizable and portable parser that can be tailored to different operating environments with modification. With generalized principles-and- parameters, a GB-based approach can describe a large syntax and vocabulary relatively easily, and hence provides greater robustness than other approaches to natural-language processing.
  • FIG. 1 is a flow diagram illustrating the overall approach to processing spoken, natural-language computer commands with a natural-language speech control system in accordance with the present invention
  • FIG. 2 is a flow diagram, similar to that depicted in FIG. 1, that illustrates the presently preferred embodiment of the natural-language speech control system;
  • FIG. 3 depicts a logical form of a parsing of a sentence produced the presently preferred GB-based principles-and- parameters syntactic parser employed in the natural-language speech control system depicted in FIG. 2;
  • FIG. 4 is a flow diagram illustrating how a sentence is parsed by the presently preferred GB-based principles-and-para e- ters syntactic parser employed in the natural-language speech control system depicted in FIG. 2 ;
  • FIG. 5 is a block diagram depicting an alternative embodiment of a semantic compiler that converts parsed computer commands into machine code executable as a command to a digital computer program
  • FIG. 6 is a block diagram depicting a preferred embodiment of a semantic compiler that converts parsed computer commands into machine code executable as a command to a digital computer program.
  • FIG. 1 depicts a natural-language speech control system in accordance with the present invention referred to by the general reference character 20.
  • the natural-language speech control system 20 first processes a spoken command received as an audio signal with a robust automatic speech recognition computer program 22.
  • the speech recognition computer program 22 produces textual digital- computer-data in the form of an ASCII text stream 24 that contains a text of the spoken words as recognized by the speech recognition computer program 22.
  • the text stream 24 is then processed by a syntactic-parser 26 which converts the text stream 24, representing the spoken words, into a parsed sentence having a logical form 28.
  • the logical form 28 associates a part of speech in the parsed sentence with each word in a string of words.
  • the logical form 28 is processed by a semantic compiler 32 to generate a command in the form of a machine code 34 that is then processed by a computer program executed by a computer 36 to control its operation.
  • the speech recognition computer program 22, syntactic-parser 26 and semantic compiler 32 will generally be computer programs that are executed by the computer 36.
  • the text stream 24 and logical form 28 data in general will be stored, either temporarily or permanently, within the computer 36.
  • FIG. 2 is a flow diagram that depicts a presently preferred implementation of the natural-language speech control system 20.
  • the preferred implementation of the natural-language speech control system 20 includes an error message facility 42.
  • the error message facility 42 permits the natural-language speech control system 20 to inform the speaker of difficulties that the natural-language speech control system 20 encounters in attempting to process a spoken computer command.
  • the error message facility 42 inform the speaker about the processing difficulty either audibly or visibly.
  • the machine code 34 produced by the semantic compiler 32 is an MS DOS command.
  • the computer 36 executes the MS DOS command to produce a result 44 specified by the spoken command.
  • the speech recognition computer program 22 processes the audio signal that represents spoken words to generate a string of words forming the text stream 24.
  • a number of companies have developed computer programs for transcribing voice into text. Several companies offering such computer programs are listed below.
  • BBN a wholly owned subsidiary of GTE, has a Unix-based speech recognizer called Hark
  • SRI Corp's STAR Lab has a group developing a wideband, continuous speech recognizer called DECI ⁇
  • the AT&T's Advanced Speech Products Group offers a speech recognizer named WATSON.
  • Each word-vector corresponds to one spoken word in the sentence or computer command.
  • Each word-vector includes at least one, but probably several, two-tuples consisting of a word recognized by the speech recognition computer program 22 together with a number which represents a probability estimated by the speech recognition computer program 22 that the audio signal actually contains the corresponding spoken word.
  • Exhaustive processing of a spoken command by the syntactic-parser 26 requires that several strings of words be included in the text stream 24.
  • Each string of words included in the text stream 24 for such exhaustive processing is assembled by concatenating successive words selected from successive word-vectors.
  • the several strings of words in the text stream 24 to be processed by the syntactic-parser 26 are not identical because in every string at least one word differs from that in all other strings of words included in the text stream 24.
  • the syntactic-parser 26 incorporated into the preferred embodiment of the natural-language speech control system 20 is based on a principles-and-parameters (P-and-P) syntactic parser, Principar.
  • Principar has been developed by and is available from Prof. DeKang Lin at the University of Manitoba in Canada.
  • P-and-P parsing is based on Noam Chomsky's GB-based theory of natural-language syntax.
  • Principar 's significant advantage over other natural-language-syntactic-parsers is that with relatively few rules, it can perform deep parses of complex sentences.
  • the power of the P-and-P framework can be illustrated by considering how it can easily parse both Japanese and English language sentences.
  • English typically the word order in a sentence is subject-verb-object as in 'He loves reading'. But in Japanese, the order is typically subject-object-verb.
  • GB-based parser employs a principle which states that 'sentences contain subjects, objects and verbs', and the GB-based parser's parameter for 'word-order' of sentences is subject-verb-object for English and subject-object-verb for Japanese
  • the GB-based parser's principles and parameters described a grammar for simple sentences in both English and Japanese. This is the essence of the P-and-P framework.
  • the syntactic-parser 26 depicted in FIG. 2 uses the following principles.
  • Case Theory Case theory requires that every overt noun phrase (NP) be assigned an abstract case, such as nominative case for subjects, accusative case for direct objects, dative case for indirect objects, etc. 2.
  • X-bar Theory X-bar theory describes how the syntactic structure of a sentence is formed by successively smaller units called phrases. This theory determines the word-order in sentences.
  • Movement Theory The rule Move-a specifies that any sentence element can be moved from its base position in the underlying D-structure, to anywhere else in the surface structure. Whether a particular movement is allowed depends on other constraints of the grammar. For example, the result of a movement must satisfy the
  • Binding Theory This theory describes the structural relationship between an empty element left behind by a moved NAP and the moved NP itself.
  • commands to computers can be understood as verb phrases that are a sub-set of complete English sentences.
  • the sentences have an implied second person singular pronoun subject and the verb is active voice present tense. For instance, to resume work on a previous project, one might issue to a computer the following natural-language command.
  • FIG. 3 A parsing of the preceding sentence by Principar for the actual words appears in FIG. 3.
  • the GB-based parse presented in FIG. 3 allows the computer to map a verb (V) into a computer command action, with the noun phrase (NP) as the object, and the adjective phrase (AP) as properties of the object.
  • Limiting GB-based syntactic parsing to only active voice, second person verb-phrase parsing, permits implementing an efficient semantic compiler 32 that allows operating a computer with computer transcribed voice commands. Since only a sub-set of English is used computer commands, parameters can be set to limit the number of parses generated by the syntactic-parser 26.
  • the case principle may be set to only accusative case for verb-complements, oblique case for prepositional complements and genitive case for possessive nouns or pronouns.
  • a nominative case principle is unnecessary since the computer commands lack an express subject for the main clause.
  • Such tuning of the principles to be applied by the syntactic-parser 26 significantly reduces the number of unnecessary parses produced by the GB-based P-and-P syntactic-parser Principar.
  • moving the natural-language speech control system 20 between computer applications or between computer platforms involves simply changing the lexicon, and the parameters. Due to the modular framework of the grammar implemented by the syntactic-parser 26, with minor changes in parameter settings more complicated sentences such as the following queries and implicit commands may be parsed.
  • the syntactic-parser 26 includes a set of individual principle-based parsers 52 P, through P n , a dynamic principle-ordering system 54, principle parameters specifiers 56, and a lexicon specifying system 58.
  • the heart of the syntactic-parser 26 is the set of individual principle-based parsers 52.
  • Each of the principle-based parsers 52 implements an individual principle such as those listed and described above.
  • Each principle is abstract and is described in a manner different from each other (i.e. heterogeneous). For instance, the X-bar theory for English states that the verb must precede the object, while the ⁇ -theory states that every verb must discharge itself.
  • the various principle-based parsers 52 each implemented as a separate computer program module, formalize the preceding principles. Each principle-based parser 52 applies its principle to the input text and the legal parses which it receives from the preceding principle-based parser 52. The principle-based parser 52 then generates a set of legal parses according to the principle which it formalizes. Because the principle-based parsers 52 process an input sentence sequentially, the syntactic-parser 26 employs a set of data structures common to all the principle-based parsers 52 that allows the input text and the legal parses to be passed from one principle-based parser 52 to the next. Moreover, the syntactic-parser 26 includes a principle-ordering system 54 that controls a sequence in which individual principles, such as those summarized above, are applied in parsing a text.
  • each of the principle-based parsers 52 receives parameter values from the principle parameters specifiers 56.
  • the principle parameters specifiers 56 For instance, with the X-bar theory, a verb precedes the object in English, while in Japanese the object precedes the verb. Consequently, the grammar for each principle formalized in the principle-based parsers 52 needs to be dynamically generated, based on parameter values provided by the principle parameters specifiers 56.
  • Principar 's lexicon specifying system 58 contains over 90,000 entries, extracted out of standard dictionaries.
  • the structure of the lexicon specifying system 58 is a word-entry followed by functions representing parts-of-speech categories and other features.
  • Principar 's lexicon must be extended by adding recently adopted, platform- specific computer acronyms.
  • Parsing the text stream 24 into the logical form depicted in FIG. 3 permits the semantic compiler 32 to use a conventional LR grammar in generating the machine code 34 from the logical form 28. Parsing the text stream 24 into the canonical form is possible because commands are restricted to imperative sentences that are second person, active voice sentences that begin with a verb. Limiting the natural-language commands in this way insignificantly restricts the ability to issue voice commands.
  • the canonical logical form of a command can be parsed into the machine code 34 by the semantic compiler 32 using a conventional lexical analyzer named LEX and a conventional compiler writer named YACC..
  • the preferred semantic compiler 32 has an ability to detect some semantic errors, and then send a message back to a speaker via the error message facility 42 about the specific nature of the error.
  • An example of a semantic error would be if an action was requested that was not possible with the object. For instance an attempt to copy a directory to a file would result in a object type mix-match, and therefore cause an error.
  • An alternative approach for generating the machine code 34 to the conventional LR grammar described above would be for the semantic compiler 32 to take parse trees expressed in the canonical form in the logical form 28 as input and then map them into appropriate computer commands. This would be done by a command-interpreter computer program 62 by reference to mapping tables 64 which maps verbs to different actions.
  • the conventional LR grammar, or the combined command-interpreter computer program 62 and mapping tables 64 permit the semantic compiler 32 to prepare operating system commands 72, word processing commands 74, spreadsheet commands 76, and/or database commands 78 from the parse trees in the logical form 28.
  • the command-interpreter computer program 62 needs to have different functionality depending on the application domain to which the command is addressed. If a computer command is directed to DOS or a Unix command shell, the operating system can directly execute the machine code 34. But the word processing commands 74, the spreadsheet commands 76, or the database commands 78 must be piped through the operating system to that specific application. To facilitate this kind of command, the natural-language speech control system 20 must run in the background piping the machine code 34 to the current application.
  • the speech recognition computer program 22 and the syntactic-parser 26 are the same regardless of the computer program that will execute the command.
  • the semantic compiler 32 includes a set of semantic modules 84 used for generating commands that control different computer programs. Of these semantic modules 84, there are a set of semantic modules 84 to prepare commands for controlling operating system functions. Other optional semantic modules 84 generate commands for controlling operation of different application computer programs such as the word processing commands 74, spreadsheet commands 76 and database commands 78 illustrated in FIG. 5.
  • the semantic compiler 32 includes a set of semantic modules 84 for configuration, and for loading each specific application computer program.
  • the text stream 24 represents a spoken command with an ASCII text stream
  • any digital computer representation of textual digital-computer-data may be used for expressing such data in the text stream 24.
  • the semantic compiler 32 employs a canonical logical form to represent computer commands parsed by the semantic compiler 32, any other representation of the parsed computer commands that provides the same informational content may be used in the semantic compiler 32 for expressing parsed commands.

Abstract

A natural-language speech control method (20) produces a command (34) for controlling the operation of a digital computer (36) from words spoken in a natural-language. An audio signal that represents the spoken words is processed to generate textual digital-computer-data (24). The textual digital-computer-data (24) is then processed by a natural-language-syntactic-parser (26) to produce a parsed sentence in a logical form of the command (28). The parsed sentence is then processed by a semantic compiler (32) to generate the command (34) that controls the operation of the digital computer (36). The command (34) is expressed in a natural-language sentence that has an implied second person singular pronoun subject and the verb is active voice present tense. The preferred method uses a principles-and-parameters (P-and-P) Government-and-Binding-based (GB-based) natural-language-syntactic-parser (26) for resolving ambiguous syntactic structures.

Description

NATURAL-LANGUAGE SPEECH CONTROL
Technical Field
The present invention relates generally to the technical field of digital computer speech recognition and, more particularly, to recognizing and executing commands spoken in natural- language.
Background Art Currently, humans communicate with a computer primarily tactilely via keyboard or pointing device with commands that must strictly conform to computer program syntax. However, speech is the most natural method for humans to express commands. To improve speed, usability and user acceptance of computers there exists a well recognized need for a voice-based command system that responds appropriately to only a general description of tasks to be performed by the computer. Some systems have been demonstrated which permit speaking conventional computer commands. For example, a MS DOS command for copying all Word for Windows files in one directory into another directory named " ohn" might be spoken as follows.
Copy * . doc John However, to be truly effective a voice-based command system needs not only to translate a spoken command into a sequence of words, but also to interpret natural-language sentences such as that set forth below as a coherent command recognizable to and executable by the computer.
Copy all word files to John ' s directory. Because natural-language allows a computer user to prescribe a set of smaller tasks with a single sentence, an ability to handle high-level, abstract commands is key to an effective voice-based command system. The ability to handle high-level, abstract commands makes a voice-based command interface easy to use and potentially faster than keyboard or pointing device based computer control. Moreover, under certain circumstances a voice- based command system is essential for controlling a computer's operation such as for physically handicapped individuals, and for normal individuals while performing tasks which occupy both of their hands.
Voice control of computers allows speech and a high-level of abstraction and complexity in the command. For instance, in giving directions we might simply say "turn left at the light" . Presently, this type of command is possible only when communicating with other humans. Communicating with computers or equipment requires a series of commands at a much lower level of abstraction. For instance, the previous instruction would at a minimum need to be expanded as follows. Go Straight
Find Light Turn Left Go Straight Similarly for a jet aircraft landing on a deck of an aircraft carrier, the command "abort landing" would at a minimum translate to the following set of commands.
Afterburner On Steady Course Retract Flaps Retract Speed Brakes Retract Landing Gear Issuing the preceding sequence of commands by voice requires too much time, and would therefore probably result in a crash. To be effective, the pilot needs to be able to control the aircraft with one high-level command, in this case "aJort landing" . and the computer must execute all the commands needed to accomplish this task.
In actual practice, each of the natural-language commands set forth above needs a set of sub-instructions. Thus, despite the present ability of computer technology to transcribe speech into words, real-time voice control of equipment has, thus far, remain an elusive goal. Conversely, an ability to issue spoken natural-language commands permits communicating with equipment ranging from computers to aircraft at a higher level of abstraction than is presently possible. A natural-language voice interface will allow applications such as voice control of vehicles, and voice control of computer applications.
There are three basic approaches to natural-language syntactic processing: simple grammar, statistical, and Govern- ment-and-Binding-based (GB-based) . Of these three approaches, simple grammars are used for simple, un-complicated syntax. Examples of grammars for such a syntax include early work such as the psychiatrist program 'Eliza'. However, writing a full grammar for any significant portion of a natural-language is very complicated. For specialized domains, the grammar based approach is abandoned for a statistical one as described by Carl G. de. Marcken, Parsing the Lob Corpus , Proceedings of the 28 Annual Meeting of the Association for Computational Linguistics, June, 1990. Statistical approaches look at word patterns and word co-occurrence and attempt to parse natural-language sentences based on the likelihood of such patterns. Statistical approaches use a variety of methods including neural networks and word distribution. As with any other statistical pattern matching approach, this approach is ultimately limited by an upper limit on error rate which cannot be easily exceeded. Also, it is very difficult to handle wide varieties of linguistic phenomena such as scrambling, NP-movement, binding between question words and empty categories, etc., through statistical natural-language processing.
Approaches to natural-language processing based on Noam Chomsky's Government and Binding theories as described in Some Concepts and Conseguences of the Theory of Government and Binding Cambridge , Mass. MIT Press, offer a possibility of a more robust approach to natural-language parsing by developing computational methods based on linguistic theory of a universal language. Head-driven Phrase Structure Grammar (HPSG) is a major off-shoot of GB theory and a number of such parsers are being developed. The GB-based approach can find syntactic structure in scrambled sentences such as 'I play football' and 'football, I play'. The GB-based approach also handles NP-movement that is exemplified by a passive sentence such as, 'Football was played.' which has the deeper structure ' [ [] [was played [football]] ]'. in parsing this natural-language sentence the noun phrase (NP) 'football' moves from its original position after the verb 'was played' to the front of the sentence, because otherwise the sentence would have no subject. Binding between question words and empty categories is exemplified by a question such as 'Whom will he invite. • The GB approach finds that this sentence has the deep structure [he will [invite [whom]] ]. The question word 'whom' binds the empty trace that it leaves when it moves to the front of the sentence. The principle-based-parsing technique described by Robert C. Berwick, in Principles of Principle-Based Parsing, Principle-Based Parsing: Computational and Psycholinguistics, Kluwer Academic Publishers, pp. 1-37 (1991) , and by Fong and Sandiway in The Computational Implementation of Principle-Based Parsers , Principle-Based Parsing: Computational and
Psycholinguistics, Kluwer Academic Publishers, pp. 65-83 (1991), offers a possibility of a more robust approach. Principle-based-parsing uses a few principles for filtering sentences. A sequence of principle based filters eliminates illegal parses and the remaining parse is the legal one. A primary difficulty with this method is that it generates too many parses which makes the GB-based approach computationally slow. Methods for improving performance of GB-based parsing include:
1. appropriately sequencing the principle based filters to reduce over-generation as described by Fong and
Sandiway; or
2. 'co-routining' by interleaving the actual parsing mechanism with the principle filters as described by Bonnie Jean Dorr in Principle-Based Parsing for Machine Translation , Principle-Based Parsing: Computational and Psycholinguistics, Kluwer Academic Publishers, pp. 153-183 (1991). Disclosure of Invention
An object of the present invention is to provide a voice- based command system that can translate commands spoken in natural-language into commands accepted by a computer program. An object of the present invention is to provide a voice- based command system that can translate commands spoken in natural-language into commands accepted by different computer programs.
Another object of the present invention is to provide a natural-language-syntactic-parser that resolves ambiguities in a voice command.
Another object of the present invention is to provide a command interpreter that handles incomplete commands gracefully by interpreting the command as far as possible, and by retaining information from the command for subsequent clarification.
Another object of the present invention is to provide a voice based command system that is efficient in any operating environment, and that is portable with minor modifications to other operating environments. Briefly, the present invention is a natural-language speech control method that produces a command for controlling the operation of a digital computer from words spoken in a natural- language. The method includes the step of processing an audio signal that represents the spoken words to generate textual digital-computer-data. The textual digital-computer-data contains representations of the words in the command spoken in a natural-language. The textual digital-computer-data is then processed by a natural-language-syntactic-parser to produce a parsed sentence. The parsed sentence consists of a string of words with each word being associated with a part of speech in the parsed sentence. The string of words is then preferably processed by a semantic compiler to generate the command that controls the operation of the digital computer. The preferred embodiment of the present invention uses a GB-based natural-language-syntactic-parser which reveals implied syntactic structure in English language sentences. Hence the GB-based natural-language-syntactic-parser can resolve ambiguous syntactic structures better than alternative methods of natural- language processing. Using a generalized principles-and- parameters GB-based natural-language-syntactic-parser for the natural-language speech control method provides a customizable and portable parser that can be tailored to different operating environments with modification. With generalized principles-and- parameters, a GB-based approach can describe a large syntax and vocabulary relatively easily, and hence provides greater robustness than other approaches to natural-language processing.
These and other features, objects and advantages will be understood or apparent to those of ordinary skill in the art from the following detailed description of the preferred embodiment as illustrated in the various drawing figures.
Brief Description of Drawings
FIG. 1 is a flow diagram illustrating the overall approach to processing spoken, natural-language computer commands with a natural-language speech control system in accordance with the present invention; FIG. 2 is a flow diagram, similar to that depicted in FIG. 1, that illustrates the presently preferred embodiment of the natural-language speech control system;
FIG. 3 depicts a logical form of a parsing of a sentence produced the presently preferred GB-based principles-and- parameters syntactic parser employed in the natural-language speech control system depicted in FIG. 2;
FIG. 4 is a flow diagram illustrating how a sentence is parsed by the presently preferred GB-based principles-and-para e- ters syntactic parser employed in the natural-language speech control system depicted in FIG. 2 ;
FIG. 5 is a block diagram depicting an alternative embodiment of a semantic compiler that converts parsed computer commands into machine code executable as a command to a digital computer program; and
FIG. 6, is a block diagram depicting a preferred embodiment of a semantic compiler that converts parsed computer commands into machine code executable as a command to a digital computer program.
Best Mode for Carrying Out the Invention
FIG. 1 depicts a natural-language speech control system in accordance with the present invention referred to by the general reference character 20. As illustrated in FIG. l, the natural-language speech control system 20 first processes a spoken command received as an audio signal with a robust automatic speech recognition computer program 22. The speech recognition computer program 22 produces textual digital- computer-data in the form of an ASCII text stream 24 that contains a text of the spoken words as recognized by the speech recognition computer program 22. The text stream 24 is then processed by a syntactic-parser 26 which converts the text stream 24, representing the spoken words, into a parsed sentence having a logical form 28. The logical form 28 associates a part of speech in the parsed sentence with each word in a string of words. The logical form 28 is processed by a semantic compiler 32 to generate a command in the form of a machine code 34 that is then processed by a computer program executed by a computer 36 to control its operation.
As is readily apparent to those skilled in the art, the speech recognition computer program 22, syntactic-parser 26 and semantic compiler 32 will generally be computer programs that are executed by the computer 36. Similarly, the text stream 24 and logical form 28 data in general will be stored, either temporarily or permanently, within the computer 36.
FIG. 2 is a flow diagram that depicts a presently preferred implementation of the natural-language speech control system 20. As depicted in FIG. 2, the preferred implementation of the natural-language speech control system 20 includes an error message facility 42. The error message facility 42 permits the natural-language speech control system 20 to inform the speaker of difficulties that the natural-language speech control system 20 encounters in attempting to process a spoken computer command. The error message facility 42 inform the speaker about the processing difficulty either audibly or visibly. In the specific implementation of the natural-language speech control system 20 depicted in FIG. 2, the machine code 34 produced by the semantic compiler 32 is an MS DOS command. The computer 36 executes the MS DOS command to produce a result 44 specified by the spoken command.
Speech Recognition Computer Program 22 The speech recognition computer program 22 processes the audio signal that represents spoken words to generate a string of words forming the text stream 24. A number of companies have developed computer programs for transcribing voice into text. Several companies offering such computer programs are listed below.
1. BBN, a wholly owned subsidiary of GTE, has a Unix-based speech recognizer called Hark
2. Dragon Systems markets Dragon Dictate
3. IBM markets VoiceType Dictation 4. Kurzweil Applied Intelligence
5. Microsoft Research's Speech Technology Group is developing a speech recognition engine named Whisper
6. PureSpeech
7. SRI Corp's STAR Lab has a group developing a wideband, continuous speech recognizer called DECI¬
PHER
8. The AT&T's Advanced Speech Products Group offers a speech recognizer named WATSON.
Most of the systems identified above work with discrete speech in which a speaker must pause between words. Also these systems require some level of speaker training to attain high- accuracy speech recognition. Ideally, a continuous speech recognizer that employs a Hidden Markov Model is to be preferred. Of the systems listed above, Dragon Systems speech recognizer seems to be the most robust, has been used by the United States Armed forces in Bosnia, and is presently preferred for the natural-language speech control system 20. The Dragon Systems speech recognizer runs on an IBM PC compatible computer operating under the Microsoft Windows graphical user interface. Initial tests have demonstrated a very high degree of accuracy with a large number of speakers with unconstrained language and a variety of accents. In general, for a single sentence or command the speech recognition computer program 22 can generate a plurality of word-vectors. Each word-vector corresponds to one spoken word in the sentence or computer command. Each word-vector includes at least one, but probably several, two-tuples consisting of a word recognized by the speech recognition computer program 22 together with a number which represents a probability estimated by the speech recognition computer program 22 that the audio signal actually contains the corresponding spoken word. Exhaustive processing of a spoken command by the syntactic-parser 26 requires that several strings of words be included in the text stream 24. Each string of words included in the text stream 24 for such exhaustive processing is assembled by concatenating successive words selected from successive word-vectors. The several strings of words in the text stream 24 to be processed by the syntactic-parser 26 are not identical because in every string at least one word differs from that in all other strings of words included in the text stream 24. Syntactic-Parser computer Program 26
The syntactic-parser 26 incorporated into the preferred embodiment of the natural-language speech control system 20 is based on a principles-and-parameters (P-and-P) syntactic parser, Principar. Principar has been developed by and is available from Prof. DeKang Lin at the University of Manitoba in Canada. P-and-P parsing is based on Noam Chomsky's GB-based theory of natural-language syntax. Principar 's significant advantage over other natural-language-syntactic-parsers is that with relatively few rules, it can perform deep parses of complex sentences.
The power of the P-and-P framework can be illustrated by considering how it can easily parse both Japanese and English language sentences. In English, typically the word order in a sentence is subject-verb-object as in 'He loves reading'. But in Japanese, the order is typically subject-object-verb. Now if GB-based parser employs a principle which states that 'sentences contain subjects, objects and verbs', and the GB-based parser's parameter for 'word-order' of sentences is subject-verb-object for English and subject-object-verb for Japanese, the GB-based parser's principles and parameters described a grammar for simple sentences in both English and Japanese. This is the essence of the P-and-P framework.
To describe the complex interactions of different sentence elements, the syntactic-parser 26 depicted in FIG. 2 uses the following principles.
1. Case Theory: Case theory requires that every overt noun phrase (NP) be assigned an abstract case, such as nominative case for subjects, accusative case for direct objects, dative case for indirect objects, etc. 2. X-bar Theory: X-bar theory describes how the syntactic structure of a sentence is formed by successively smaller units called phrases. This theory determines the word-order in sentences. 3. Movement Theory: The rule Move-a specifies that any sentence element can be moved from its base position in the underlying D-structure, to anywhere else in the surface structure. Whether a particular movement is allowed depends on other constraints of the grammar. For example, the result of a movement must satisfy the
X-bar schema.
4. Bounding Theory: This theory prevents the results of movement from extending too far in the sentence.
5. Binding Theory: This theory describes the structural relationship between an empty element left behind by a moved NAP and the moved NP itself.
6. Θ-Theory: This theory deals with the assignment of semantic roles to the NPs in a sentence.
The preceding principles, and some other more complex ones that are described by Robert C. Berwick, in Principles of
Principle-Based Parsing, Principle-Based Parsing: Computational and Psycholinguistics, Kluwer Academic Publishers, pp. 1-37
(1991) , are used for parsing English with Principar.
With a GB-based approach to natural-language parsing, commands to computers can be understood as verb phrases that are a sub-set of complete English sentences. The sentences have an implied second person singular pronoun subject and the verb is active voice present tense. For instance, to resume work on a previous project, one might issue to a computer the following natural-language command.
'Edit the first document on nip-based command interpreters . '
Possible word vectors that the speech recognition computer program 22 might produce for the preceding sentence are set forth below.
edit edit 0.90 a-dot 0.50 the the 0.70 da 0.60 their 0.40 there 0.40 them 0.20 first first 0.80 force 0.40 fast 0.30 force 0.30 hearse 0.15 curse 0.15 purse 0.05 document document 0.80 dock-meant 0.40 on on 0.75 hun 0.40 an 0.35 nip nip 0.10 based based 0.75 baste 0.50 paste 0.35 command command 0.90 come-and 0.55 interpreters interpreters 0.85 inter-porter 0.40
A parsing of the preceding sentence by Principar for the actual words appears in FIG. 3. The GB-based parse presented in FIG. 3 allows the computer to map a verb (V) into a computer command action, with the noun phrase (NP) as the object, and the adjective phrase (AP) as properties of the object. Limiting GB-based syntactic parsing to only active voice, second person verb-phrase parsing, permits implementing an efficient semantic compiler 32 that allows operating a computer with computer transcribed voice commands. Since only a sub-set of English is used computer commands, parameters can be set to limit the number of parses generated by the syntactic-parser 26. For example, the case principle may be set to only accusative case for verb-complements, oblique case for prepositional complements and genitive case for possessive nouns or pronouns. A nominative case principle is unnecessary since the computer commands lack an express subject for the main clause. Such tuning of the principles to be applied by the syntactic-parser 26 significantly reduces the number of unnecessary parses produced by the GB-based P-and-P syntactic-parser Principar. By using a GB-based P-and-P syntactic-parser, moving the natural-language speech control system 20 between computer applications or between computer platforms involves simply changing the lexicon, and the parameters. Due to the modular framework of the grammar implemented by the syntactic-parser 26, with minor changes in parameter settings more complicated sentences such as the following queries and implicit commands may be parsed.
'Which files have been modified after July 4th? ' ' How many words are there in this document ? '
' I would like to delete all files in this directory . '
As illustrated in FIG. 4, the syntactic-parser 26 includes a set of individual principle-based parsers 52 P, through Pn, a dynamic principle-ordering system 54, principle parameters specifiers 56, and a lexicon specifying system 58. The heart of the syntactic-parser 26 is the set of individual principle-based parsers 52. Each of the principle-based parsers 52 implements an individual principle such as those listed and described above. Each principle is abstract and is described in a manner different from each other (i.e. heterogeneous). For instance, the X-bar theory for English states that the verb must precede the object, while the θ-theory states that every verb must discharge itself. The various principle-based parsers 52, each implemented as a separate computer program module, formalize the preceding principles. Each principle-based parser 52 applies its principle to the input text and the legal parses which it receives from the preceding principle-based parser 52. The principle-based parser 52 then generates a set of legal parses according to the principle which it formalizes. Because the principle-based parsers 52 process an input sentence sequentially, the syntactic-parser 26 employs a set of data structures common to all the principle-based parsers 52 that allows the input text and the legal parses to be passed from one principle-based parser 52 to the next. Moreover, the syntactic-parser 26 includes a principle-ordering system 54 that controls a sequence in which individual principles, such as those summarized above, are applied in parsing a text.
To parse more than one language, each of the principle-based parsers 52 receives parameter values from the principle parameters specifiers 56. For instance, with the X-bar theory, a verb precedes the object in English, while in Japanese the object precedes the verb. Consequently, the grammar for each principle formalized in the principle-based parsers 52 needs to be dynamically generated, based on parameter values provided by the principle parameters specifiers 56.
Principar 's lexicon specifying system 58 contains over 90,000 entries, extracted out of standard dictionaries. The structure of the lexicon specifying system 58 is a word-entry followed by functions representing parts-of-speech categories and other features. To properly parse computer commands, Principar 's lexicon must be extended by adding recently adopted, platform- specific computer acronyms.
Semantic Compiler Computer Program 32
Parsing the text stream 24 into the logical form depicted in FIG. 3 permits the semantic compiler 32 to use a conventional LR grammar in generating the machine code 34 from the logical form 28. Parsing the text stream 24 into the canonical form is possible because commands are restricted to imperative sentences that are second person, active voice sentences that begin with a verb. Limiting the natural-language commands in this way insignificantly restricts the ability to issue voice commands. The canonical logical form of a command can be parsed into the machine code 34 by the semantic compiler 32 using a conventional lexical analyzer named LEX and a conventional compiler writer named YACC..
As indicated in FIG. 2, the preferred semantic compiler 32 has an ability to detect some semantic errors, and then send a message back to a speaker via the error message facility 42 about the specific nature of the error. An example of a semantic error would be if an action was requested that was not possible with the object. For instance an attempt to copy a directory to a file would result in a object type mix-match, and therefore cause an error. An alternative approach for generating the machine code 34 to the conventional LR grammar described above would be for the semantic compiler 32 to take parse trees expressed in the canonical form in the logical form 28 as input and then map them into appropriate computer commands. This would be done by a command-interpreter computer program 62 by reference to mapping tables 64 which maps verbs to different actions.
Different computer programs perform the same abstract natural-language commands for similar operations. However, each computer program requires different types of commands that need to be handled uniquely. The conventional LR grammar, or the combined command-interpreter computer program 62 and mapping tables 64, permit the semantic compiler 32 to prepare operating system commands 72, word processing commands 74, spreadsheet commands 76, and/or database commands 78 from the parse trees in the logical form 28. Note that the command-interpreter computer program 62 needs to have different functionality depending on the application domain to which the command is addressed. If a computer command is directed to DOS or a Unix command shell, the operating system can directly execute the machine code 34. But the word processing commands 74, the spreadsheet commands 76, or the database commands 78 must be piped through the operating system to that specific application. To facilitate this kind of command, the natural-language speech control system 20 must run in the background piping the machine code 34 to the current application.
Industrial Applicability In adapting the natural-language speech control system 20 for preparing commands for execution by a variety of computer programs, the speech recognition computer program 22 and the syntactic-parser 26 are the same regardless of the computer program that will execute the command. However, as depicted in FIG. 6, the semantic compiler 32 includes a set of semantic modules 84 used for generating commands that control different computer programs. Of these semantic modules 84, there are a set of semantic modules 84 to prepare commands for controlling operating system functions. Other optional semantic modules 84 generate commands for controlling operation of different application computer programs such as the word processing commands 74, spreadsheet commands 76 and database commands 78 illustrated in FIG. 5. In addition, the semantic compiler 32 includes a set of semantic modules 84 for configuration, and for loading each specific application computer program.
Although the present invention has been described in terms of the presently preferred embodiment, it is to be understood that such disclosure is purely illustrative and is not to be interpreted as limiting. While preferably the text stream 24 represents a spoken command with an ASCII text stream, as is readily apparent to those skilled in the art any digital computer representation of textual digital-computer-data may be used for expressing such data in the text stream 24. Similarly, while preferably the semantic compiler 32 employs a canonical logical form to represent computer commands parsed by the semantic compiler 32, any other representation of the parsed computer commands that provides the same informational content may be used in the semantic compiler 32 for expressing parsed commands. Consequently, without departing from the spirit and scope of the invention, various alterations, modifications, and/or alternative applications of the invention will, no doubt, be suggested to those skilled in the art after having read the preceding disclosure. Accordingly, it is intended that the following claims be interpreted as encompassing all alterations, modifications, or alternative applications as fall within the true spirit and scope of the invention.

Claims

The ClaimsWhat is claimed is:
1. A universal voice-command-interpretation method for producing from spoken words a command that is adapted for controlling operation of a digital computer, the method comprising the steps of: receiving an audio signal that represents the spoken words; processing the received audio signal to generate therefrom textual digital-computer-data that contains representations of individual spoken words; processing the textual digital-computer-data with a natural-language-syntactic-parser to produce a parsed sentence that consists of a string of words with each word being associat- ed with a part of speech in the parsed sentence; and generating the command from the parsed sentence.
2. The method of claim 1 wherein the parsed sentence has a syntax of an implied second person singular pronoun subject and an active voice present tense verb.
3. The method of claim 1 wherein processing of the audio signal to generate therefrom the textual digital-computer-data produces a plurality of word-vectors.
4. The method of claim 3 wherein each word-vector includes at least one two-tuple consisting of a word together with a number which represents a probability that the audio signal actually contains that spoken word.
5. The method of claim 3 wherein the textual digital-computer-data processed by the natural-language-syntactic-parser consists of a string of words, each successive word being selected from successive word-vectors in the plurality of word-vectors.
6. The method of claim 5 wherein in producing the command the natural-language-syntactic-parser processes at least two unidentical strings of words in which at least one word is different.
7. The method of claim 1 wherein the natural-language-syntactic-parser is a govern ent-and-binding- based (GB-based) natural-language-syntactic-parεer .
8. The method of claim 7 wherein the GB-based natural-language-syntactic-parser is a principles-and-parameters (P-and-P) syntactic parser.
9. The method of claim 1 wherein the command is generated from the parsed sentence by a semantic compiler.
10. The method of claim 9 wherein the semantic compiler uses a LR grammar in generating the command.
11. The method of claim 10 wherein the semantic compiler upon detecting a semantic error dispatches a message that describes the semantic error.
12. The method of claim 11 wherein the message describing the semantic error that is dispatched by the semantic compiler is presented audibly to a speaker.
13. The method of claim 11 wherein the message describing the semantic error that is dispatched by the semantic compiler is presented visibly to a speaker.
14. The method of claim 10 wherein the semantic compiler includes a plurality of semantic modules that respectively generate commands for controlling operation of different computer programs .
15. The method of claim 14 wherein the semantic compiler includes a at least one semantic modules that generates operating system commands.
16. The method of claim 14 wherein the semantic compiler includes a at least one semantic modules that generates application program commands.
17. The method of claim 14 wherein the semantic compiler includes a at least one semantic modules that generates configuration commands.
18. The method of claim 14 wherein the semantic compiler includes a at least one semantic modules that generates program loading commands.
19. The method of claim 1 further comprising the step of transmitting the command to the digital computer.
PCT/US1997/015388 1996-08-29 1997-08-28 Natural-language speech control WO1998009228A1 (en)

Priority Applications (5)

Application Number Priority Date Filing Date Title
JP51200598A JP2002515147A (en) 1996-08-29 1997-08-28 Natural language voice control
CA002263743A CA2263743A1 (en) 1996-08-29 1997-08-28 Natural-language speech control
NZ333716A NZ333716A (en) 1996-08-29 1997-08-28 Producing from spoken words a command for controlling operation of a digital computer
EP97940741A EP0919032A4 (en) 1996-08-29 1997-08-28 Natural-language speech control
AU42446/97A AU723274B2 (en) 1996-08-29 1997-08-28 Natural-language speech control

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US2514596P 1996-08-29 1996-08-29
US60/025,145 1997-08-27
US08/919,138 1997-08-27

Publications (2)

Publication Number Publication Date
WO1998009228A1 true WO1998009228A1 (en) 1998-03-05
WO1998009228A9 WO1998009228A9 (en) 1998-07-09

Family

ID=21824297

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US1997/015388 WO1998009228A1 (en) 1996-08-29 1997-08-28 Natural-language speech control

Country Status (2)

Country Link
AU (1) AU723274B2 (en)
WO (1) WO1998009228A1 (en)

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB2358937A (en) * 1999-09-03 2001-08-08 Ibm Hierarchical translation of a natural language statement to a formal command in a computer system
GB2379786A (en) * 2001-09-18 2003-03-19 20 20 Speech Ltd Speech processing apparatus
JP2003241790A (en) * 2002-02-13 2003-08-29 Internatl Business Mach Corp <Ibm> Speech command processing system, computer device, speech command processing method, and program
FR2840441A1 (en) * 2002-06-04 2003-12-05 Marc Mertz Portable aid for reduced mobility persons has battery powered industrial standard central processor with bus peripheral connection and voice recognition command
EP1518221A1 (en) * 2002-06-28 2005-03-30 T-Mobile Deutschland GmbH Method for natural voice recognition based on a generative transformation/phrase structure grammar
EP1647971A2 (en) * 2004-10-12 2006-04-19 AT&T Corp. Apparatus and method for spoken language understanding by using semantic role labeling
US7085709B2 (en) 2001-10-30 2006-08-01 Comverse, Inc. Method and system for pronoun disambiguation
US7363269B2 (en) 2001-01-03 2008-04-22 Ebs Group Limited Conversational dealing system
WO2012152985A1 (en) * 2011-05-11 2012-11-15 Nokia Corporation Method and apparatus for summarizing communications
WO2013137503A1 (en) * 2012-03-16 2013-09-19 엘지전자 주식회사 Unlock method using natural language processing and terminal for performing same
US8768687B1 (en) 2013-04-29 2014-07-01 Google Inc. Machine translation of indirect speech
WO2015009586A3 (en) * 2013-07-15 2015-06-18 Microsoft Corporation Performing an operation relative to tabular data based upon voice input
EP3550449A1 (en) * 2018-04-02 2019-10-09 Pegatron Corporation Search method and electronic device using the method
WO2021022318A1 (en) * 2019-08-08 2021-02-11 Interbid Pty Ltd Electronic auction system and process
US10984195B2 (en) 2017-06-23 2021-04-20 General Electric Company Methods and systems for using implied properties to make a controlled-english modelling language more natural

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5060155A (en) * 1989-02-01 1991-10-22 Bso/Buro Voor Systeemontwikkeling B.V. Method and system for the representation of multiple analyses in dependency grammar and parser for generating such representation
US5146406A (en) * 1989-08-16 1992-09-08 International Business Machines Corporation Computer method for identifying predicate-argument structures in natural language text
US5418717A (en) * 1990-08-27 1995-05-23 Su; Keh-Yih Multiple score language processing system
US5457768A (en) * 1991-08-13 1995-10-10 Kabushiki Kaisha Toshiba Speech recognition apparatus using syntactic and semantic analysis
US5555169A (en) * 1992-05-20 1996-09-10 Hitachi, Ltd. Computer system and method for converting a conversational statement to computer command language

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5060155A (en) * 1989-02-01 1991-10-22 Bso/Buro Voor Systeemontwikkeling B.V. Method and system for the representation of multiple analyses in dependency grammar and parser for generating such representation
US5146406A (en) * 1989-08-16 1992-09-08 International Business Machines Corporation Computer method for identifying predicate-argument structures in natural language text
US5418717A (en) * 1990-08-27 1995-05-23 Su; Keh-Yih Multiple score language processing system
US5457768A (en) * 1991-08-13 1995-10-10 Kabushiki Kaisha Toshiba Speech recognition apparatus using syntactic and semantic analysis
US5555169A (en) * 1992-05-20 1996-09-10 Hitachi, Ltd. Computer system and method for converting a conversational statement to computer command language

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP0919032A4 *

Cited By (22)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB2358937B (en) * 1999-09-03 2004-03-17 Ibm Method and system for natural language understanding
GB2358937A (en) * 1999-09-03 2001-08-08 Ibm Hierarchical translation of a natural language statement to a formal command in a computer system
US7363269B2 (en) 2001-01-03 2008-04-22 Ebs Group Limited Conversational dealing system
GB2379786A (en) * 2001-09-18 2003-03-19 20 20 Speech Ltd Speech processing apparatus
US7085709B2 (en) 2001-10-30 2006-08-01 Comverse, Inc. Method and system for pronoun disambiguation
JP2003241790A (en) * 2002-02-13 2003-08-29 Internatl Business Mach Corp <Ibm> Speech command processing system, computer device, speech command processing method, and program
US7299187B2 (en) 2002-02-13 2007-11-20 International Business Machines Corporation Voice command processing system and computer therefor, and voice command processing method
FR2840441A1 (en) * 2002-06-04 2003-12-05 Marc Mertz Portable aid for reduced mobility persons has battery powered industrial standard central processor with bus peripheral connection and voice recognition command
EP1518221A1 (en) * 2002-06-28 2005-03-30 T-Mobile Deutschland GmbH Method for natural voice recognition based on a generative transformation/phrase structure grammar
EP1647971A3 (en) * 2004-10-12 2008-06-18 AT&T Corp. Apparatus and method for spoken language understanding by using semantic role labeling
EP1647971A2 (en) * 2004-10-12 2006-04-19 AT&T Corp. Apparatus and method for spoken language understanding by using semantic role labeling
US7742911B2 (en) 2004-10-12 2010-06-22 At&T Intellectual Property Ii, L.P. Apparatus and method for spoken language understanding by using semantic role labeling
WO2012152985A1 (en) * 2011-05-11 2012-11-15 Nokia Corporation Method and apparatus for summarizing communications
US9223859B2 (en) 2011-05-11 2015-12-29 Here Global B.V. Method and apparatus for summarizing communications
WO2013137503A1 (en) * 2012-03-16 2013-09-19 엘지전자 주식회사 Unlock method using natural language processing and terminal for performing same
US8768687B1 (en) 2013-04-29 2014-07-01 Google Inc. Machine translation of indirect speech
WO2015009586A3 (en) * 2013-07-15 2015-06-18 Microsoft Corporation Performing an operation relative to tabular data based upon voice input
CN105408890A (en) * 2013-07-15 2016-03-16 微软技术许可有限责任公司 Performing an operation relative to tabular data based upon voice input
US10956433B2 (en) 2013-07-15 2021-03-23 Microsoft Technology Licensing, Llc Performing an operation relative to tabular data based upon voice input
US10984195B2 (en) 2017-06-23 2021-04-20 General Electric Company Methods and systems for using implied properties to make a controlled-english modelling language more natural
EP3550449A1 (en) * 2018-04-02 2019-10-09 Pegatron Corporation Search method and electronic device using the method
WO2021022318A1 (en) * 2019-08-08 2021-02-11 Interbid Pty Ltd Electronic auction system and process

Also Published As

Publication number Publication date
AU4244697A (en) 1998-03-19
AU723274B2 (en) 2000-08-24

Similar Documents

Publication Publication Date Title
US6556973B1 (en) Conversion between data representation formats
Bates Models of natural language understanding.
Glass et al. Multilingual spoken-language understanding in the MIT Voyager system
Derouault et al. Natural language modeling for phoneme-to-text transcription
Allen Natural language processing
US6442524B1 (en) Analyzing inflectional morphology in a spoken language translation system
US6282507B1 (en) Method and apparatus for interactive source language expression recognition and alternative hypothesis presentation and selection
US6243669B1 (en) Method and apparatus for providing syntactic analysis and data structure for translation knowledge in example-based language translation
AU723274B2 (en) Natural-language speech control
US6356865B1 (en) Method and apparatus for performing spoken language translation
WO2000045290A1 (en) A method and apparatus for adaptive speech recognition hypothesis construction and selection in a spoken language translation system
WO1998009228A9 (en) Natural-language speech control
WO2000045374A1 (en) A method and portable apparatus for performing spoken language translation
Hasegawa-Johnson et al. Grapheme-to-phoneme transduction for cross-language ASR
Ostrogonac et al. Morphology-based vs unsupervised word clustering for training language models for Serbian
EP0919032A1 (en) Natural-language speech control
Bose Natural Language Processing: Current state and future directions
KR19980038185A (en) Natural Language Interface Agent and Its Meaning Analysis Method
Hockey et al. Comparison of grammar-based and statistical language models trained on the same data
Ball et al. Spoken language processing in the Persona conversational assistant
Rayner et al. Using corpora to develop limited-domain speech translation systems
Ferreiros et al. Increasing robustness, reliability and ergonomics in speech interfaces for aerial control systems
Hoge et al. Syllable-based acoustic-phonetic decoding and wordhypotheses generation in fluently spoken speech
Ferri et al. A complete linguistic analysis for an Italian text-to-speech system
Hema Natural Language Processing

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): AU CA CN JP NZ RU

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): AT BE CH DE DK ES FI FR GB GR IE IT LU MC NL PT SE

DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
121 Ep: the epo has been informed by wipo that ep was designated in this application
COP Corrected version of pamphlet

Free format text: PAGES 1/2-2/2, DRAWINGS, REPLACED BY NEW PAGES 1/3-3/3; DUE TO LATE TRANSMITTAL BY THE RECEIVING OFFICE

WWE Wipo information: entry into national phase

Ref document number: 333716

Country of ref document: NZ

ENP Entry into the national phase

Ref country code: JP

Ref document number: 1998 512005

Kind code of ref document: A

Format of ref document f/p: F

WWE Wipo information: entry into national phase

Ref document number: 1997940741

Country of ref document: EP

ENP Entry into the national phase

Ref document number: 2263743

Country of ref document: CA

Ref country code: CA

Ref document number: 2263743

Kind code of ref document: A

Format of ref document f/p: F

WWP Wipo information: published in national office

Ref document number: 1997940741

Country of ref document: EP

WWW Wipo information: withdrawn in national office

Ref document number: 1997940741

Country of ref document: EP