US20090287680A1 - Multi-modal query refinement - Google Patents

Multi-modal query refinement Download PDF

Info

Publication number
US20090287680A1
US20090287680A1 US12/200,584 US20058408A US2009287680A1 US 20090287680 A1 US20090287680 A1 US 20090287680A1 US 20058408 A US20058408 A US 20058408A US 2009287680 A1 US2009287680 A1 US 2009287680A1
Authority
US
United States
Prior art keywords
query
search
results
list
words
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
US12/200,584
Inventor
Timothy Seung Yoon Paek
Bo Thiesson
Yun-Cheng Ju
Bongshin Lee
Christopher A. Meek
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Microsoft Technology Licensing LLC
Original Assignee
Microsoft Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Microsoft Corp filed Critical Microsoft Corp
Priority to US12/200,584 priority Critical patent/US20090287680A1/en
Assigned to MICROSOFT CORPORATION reassignment MICROSOFT CORPORATION ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: JU, YUN-CHENG, LEE, BONGSHIN, MEEK, CHRISTOPHER A., PAEK, TIMOTHY SEUNG YOON, THIESSON, BO
Publication of US20090287680A1 publication Critical patent/US20090287680A1/en
Assigned to MICROSOFT TECHNOLOGY LICENSING, LLC reassignment MICROSOFT TECHNOLOGY LICENSING, LLC ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: MICROSOFT CORPORATION
Abandoned legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3322Query formulation using system suggestions
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems

Definitions

  • the Internet continues to make available ever-increasing amounts of information which can be stored in databases and accessed therefrom.
  • mobile and portable terminals e.g., cellular telephones, personal data assistants (PDAs), smartphones and other devices
  • PDAs personal data assistants
  • users are becoming more mobile, and hence, more reliant upon information accessible via the Internet. Accordingly, users often search network sources such as the Internet from their mobile device.
  • search query is constructed that can be submitted to a search engine.
  • search engine matches this search query to actual search results.
  • search queries were constructed merely of keywords that were matched to a list of results based upon factors such as relevance, popularity, preference, etc.
  • the Internet and the World Wide Web continue to evolve rapidly with respect to both volume of information and number of users. As a whole, the Web provides a global space for accumulation, exchange and dissemination of information. As mobile devices become more and more commonplace to access the Web, the number of users continues to increase.
  • a user knows the name of a site, server or URL (uniform resource locator) to the site or server that is desired for access.
  • the user can access the site, by simply typing the URL in an address bar of a browser to connect to the site.
  • the user does not know the URL and therefore has to ‘search’ the Web for relevant sources and/or URL's.
  • search engines are regularly employed.
  • a search engine to facilitate locating and accessing sites based upon alphanumeric keywords and/or Boolean operators.
  • these keywords are text- or speech-based queries, although, speech is not always reliable.
  • a search engine is a tool that facilitates web navigation based upon textual (or speech-to-text) entry of a search query usually comprising one or more keywords.
  • the search engine retrieves a list of websites, typically ranked based upon relevance to the query. To enable this functionality, the search engine must generate and maintain a supporting infrastructure.
  • the search engine Upon textual entry of one or more keywords as a search query, the search engine retrieves indexed information that matches the query from an indexed database, generates a snippet of text associated with each of the matching sites and displays the results to the user. The user can thereafter scroll through a plurality of returned sites to attempt to determine if the sites are related to the interests of the user.
  • this can be an extremely time-consuming and frustrating process as search engines can return a substantial number of sites. More often than not, the user is forced to narrow the search iteratively by altering and/or adding keywords and Boolean operators to obtain the identity of websites including relevant information, again by typing (or speaking) the revised query.
  • search engines typically analyze content of alphanumeric search queries in order to return results.
  • search engines merely parse alphanumeric queries into ‘keywords’ and subsequently perform searches based upon a defined number of instances of each of the keywords in a reference.
  • the innovation disclosed and claimed herein in one aspect thereof, comprises search systems (and corresponding methodologies) that can couple speech, text and touch for search interfaces and engines.
  • the multi-modal functionality can be used to refine search results thereby enhancing search functionality with minimal textual input.
  • the innovation can combine speech, text, and touch to enhance usability and efficiency of search mechanisms. Accordingly, it can be possible to locate more meaningful and comprehensive results as a function of a search query.
  • the innovation discloses a multi-modal search interface that tightly couples speech, text and touch by utilizing regular expression queries with ‘wildcards,’ where parts of the query can be input via different modalities, e.g., different modalities such as speech, text, and touch can be used at any point in the query construction process.
  • the innovation can represent uncertainty in a spoken recognized result as wildcards in a regular expression query.
  • the innovation allows users to express their own uncertainty about parts of their utterance using expressions such as “something” or “whatchamacallit” which then gets translated into wildcards.
  • the innovation can be incorporated or retrofitted into existing search engines and/or interfaces. Additionally, the features, functionality and benefits of the innovation can be incorporated into mobile search applications which have strategic importance given the increasing usage of mobile devices as a primary computing device. As described above, mobile devices are not always configured or equipped with full-function keyboards, thus, the multi-modal functionality of the innovation can be employed to greatly enhance comprehensiveness of search.
  • machine learning and reasoning employs a probabilistic and/or statistical-based analysis to prognose or infer an action that a user desires to be automatically performed.
  • FIG. 1 illustrates an example block diagram of a multi-modal search refinement system in accordance with an aspect of the innovation.
  • FIG. 2 illustrates an example flow diagram of procedures that facilitate query refinement in accordance with an aspect of the innovation.
  • FIG. 3 illustrates an example query administration component in accordance with an aspect of the innovation.
  • FIG. 4 illustrates an example query refinement component in accordance with an aspect of the innovation.
  • FIG. 5 illustrates an example screenshot that illustrates that the innovation can tightly couple touch and text for multi-modal refinement.
  • FIG. 6 a - e illustrates an example word palette that helps a user compose and refine a search phrase from an n-best list.
  • FIG. 7 a - c illustrates example text hints that help the speech recognizer to efficiently identify the query.
  • FIG. 8 a - e illustrates example screenshots that show that words can be excluded and restored from retrieved results by touch.
  • FIG. 9 a - e illustrates that a user can specify uncertain information using the word “something” in accordance with aspects.
  • FIG. 10 illustrates example recovery rates for using multi-modal refinement with a word palette in accordance with an aspect.
  • FIG. 11 illustrates example recovery rates for text hints of increasing number of characters in accordance with aspects.
  • FIG. 12 illustrates a block diagram of a computer operable to execute the disclosed architecture.
  • FIG. 13 illustrates a schematic block diagram of an exemplary computing environment in accordance with the subject innovation.
  • a component can be, but is not limited to being, a process running on a processor, a processor, an object, an executable, a thread of execution, a program, and/or a computer.
  • an application running on a server and the server can be a component.
  • One or more components can reside within a process and/or thread of execution, and a component can be localized on one computer and/or distributed between two or more computers.
  • the term to “infer” or “inference” refer generally to the process of reasoning about or inferring states of the system, environment, and/or user from a set of observations as captured via events and/or data. Inference can be employed to identify a specific context or action, or can generate a probability distribution over states, for example. The inference can be probabilistic—that is, the computation of a probability distribution over states of interest based on a consideration of data and events. Inference can also refer to techniques employed for composing higher-level events from a set of events and/or data. Such inference results in the construction of new events or actions from a set of observed events and/or stored event data, whether or not the events are correlated in close temporal proximity, and whether the events and data come from one or several event and data sources.
  • screen While certain ways of displaying information to users are shown and described with respect to certain figures as screenshots, those skilled in the relevant art will recognize that various other alternatives can be employed.
  • the terms “screen,” “web page,” and “page” are generally used interchangeably herein.
  • the pages or screens are stored and/or transmitted as display descriptions, as graphical user interfaces, or by other methods of depicting information on a screen (whether personal computer, PDA, mobile telephone, or other suitable device, for example) where the layout and information or content to be displayed on the page is stored in memory, database, or another storage facility.
  • the innovation disclosed and claimed herein in aspects thereof, describe a method (and system) of presenting search query suggestions for a regular expression query with wildcards whereby not only are the best candidate phrase matches displayed, but each word in the displayed phrases are treated as a substitution choices for the words and/or wildcards in the query.
  • the list of query suggestion results essentially acts as a kind of “word palette” with which users can select (e.g., via touch or d-pad) words to compose and/or refine queries.
  • users can drag and drop words into a query from the query suggestion list.
  • Still other aspects can employ a “refinement by exclusion” technique, where, if the user does not select any of the phrases in the query suggestion list, as can be implemented in an example “None of the above” choice, every word that was not selected can be treated as a word to exclude in retrieving more matches from the index.
  • Another aspect employs a refinement technique above based on retrieving entries from a k-best suffix-array with constraints.
  • FIG. 1 illustrates an example block diagram of a system 100 that employs a multi-modal search refinement component 102 to refine search queries by selecting and/or excluding terms thereby rendering meaningful search results.
  • multi-modal can refer to most any combination of text, voice, touch, gesture, etc. While examples described herein are directed to a specific multi-modal example that employs text, voice and touch only, it is to be understood that other examples exist that employ a subset of these identified modalities. As well, it is to be understood that other examples exist that employ disparate modalities in combination with or separate from those described herein. For instance, other examples can employ gesture input, image/pattern recognition, among others to refine a search query.
  • the multi-modal search refinement component 102 can include a query administration component 104 and a search engine component 106 , each of these sub-components 104 , 106 can be referred to as a backend search system.
  • these subcomponents enable a user to refine a set of search results by way of multiple modalities, for example, text, voice, touch, etc.
  • a set of search results can be employed as a word palette thereby enabling users to refine, improve, filter or focus the results as desired.
  • the query administration component 104 can employ multiple input modes to construct a search query, e.g., a wildcard query.
  • the search engine component 106 can include backend components capable of matching query suggestions.
  • the innovation presents a multi-modal search refinement component 102 , a mobile search interface that not only can facilitate touch and text refinement whenever speech fails, but also allows users to assist the recognizer via text hints. For instance, a text hint can be used together with speech to better refine search queries.
  • the innovation can also take advantage of most any partial knowledge users may have about a business listing by letting them express their uncertainty in a simple, intuitive way.
  • leveraging multi-modal refinement resulted in a 28% relative reduction in error rate.
  • Providing text hints along with the spoken utterance resulted in even greater relative reduction, with dramatic gains in recovery for each additional character.
  • the multi-modal search refinement system 102 can generate a series of user interfaces (UI) which assist in refinement of search queries.
  • UI user interfaces
  • the multi-modal UIs tightly couple speech with touch and text (as well as gestures, etc.) in at least two directions; users can not only use touch and text to refine their queries, whenever speech fails, but they can also use speech whenever text entry becomes burdensome.
  • the innovation can facilitate this tight coupling by transforming a typical n-best list, or a list of phrase alternates from the recognizer, into a palette of words with which users can compose and refine queries.
  • the innovation can also take advantage of any partial knowledge users may have about the words of the business listing. For example, a user may only remember that the listing starts with an “s” and also contains the word “avenue”. Likewise, the user may only remember “Saks something”, where the word “something” is used to express uncertainty about what words follow. While the word “something” is used in the aforementioned example, it is to be appreciated that most any desired word or indicator can be used without departing from the spirit/scope of the innovation and claims appended hereto.
  • the innovation can represent this uncertainty as wildcards in an enhanced regular expression search of the listings, which exploits the popularity of the listings.
  • FIG. 2 illustrates a methodology of refining a search query in accordance with an aspect of the innovation. While, for purposes of simplicity of explanation, the one or more methodologies shown herein, e.g., in the form of a flow chart, are shown and described as a series of acts, it is to be understood and appreciated that the subject innovation is not limited by the order of acts, as some acts may, in accordance with the innovation, occur in a different order and/or concurrently with other acts from that shown and described herein. For example, those skilled in the art will understand and appreciate that a methodology could alternatively be represented as a series of interrelated states or events, such as in a state diagram. Moreover, not all illustrated acts may be required to implement a methodology in accordance with the innovation.
  • search query suggestion results are received. Further, the query suggestion results can be categorized, ordered or otherwise organized in most any manner using most any ranking algorithm or methodology.
  • the search results are representative of a ‘bag of word’ or a ‘word palette.’
  • the words within the search results themselves are search terms that can be used to further organize, sort, filter or otherwise refine the results.
  • a search term can be selected at 204 —it is to be understood that this act can include selection of multiple terms from the list of results.
  • the selected search term(s) can be used for inclusion within a refined query.
  • a word or set of words can be selected for exclusion such that the refinement will exclude any results that employ the excluded word or set of words.
  • the selection for inclusion or other exclusion can employ most any suitable mechanism.
  • the selection can be effected by way of a navigation device, touch screen, speech identification or the like.
  • the selection e.g., refinement criteria
  • the selection can be placed upon or maintained in a separate location such as a “scratchpad.”
  • the “scratchpad” can be the textbox in which the user may have entered text (e.g., if s/he utilized text), or it could be some other area suitable for the input modality.
  • the search term (or selected word(s)) can be supplemented, for example, by way of speech.
  • the selected word(s) can be combined with spoken words or phrases which define or further refine a search query.
  • a decision is made at 210 to determine if additional refinement is desired. If so, the methodology returns to 204 as shown. If not, a refined set of search query suggestion results are received at 212 in view of the refined query. This methodology is repeatable until the user is satisfied with the refinement of the search query and subsequent results. The recursive characteristics of the methodology are illustrated by decision block 214 .
  • the query administration component 104 can include an analysis component 302 and a query refinement component 304 . Together, these sub-components ( 302 , 304 ) enable a user to use multi-modal mechanisms to refine search query suggestion results.
  • the initial search results can be utilized as a word palette whereby users can refine the results (e.g., by selecting words for inclusion and/or exclusion).
  • the analysis component 302 is capable of providing instructions to the query refinement component 304 with regard to streamlining query suggestion results.
  • the results can be streamlined by way of analysis of input characteristics as well as including and/or excluding terms.
  • the words (or portions thereof) can be supplemented with verbally spoken cues. These spoken cues can essentially employ the text as word hints thereby refining the search query results as desired.
  • the query refinement component 304 can include a selection component 402 , an exclusion component 404 and a query update component 406 . Together, these sub-components enable search query refinement. As described above, search query suggestion results are parsed into a word palette where each of the words or segments of words can be used to comprehensively refine the results. In other words, rather than resubmitting a revised query to a search engine, the query refinement component 304 enables a user to employ the actual words included in an original set of results to drill down or further refine a set of results.
  • the selection component 402 enables a user to choose one or more words from a set (e.g., word palette) that represents the words from within a set of search results.
  • words can be selected using navigation devices such as a mouse, trackball, touchpad or the like. Additionally, navigational keys, verbal identification, gestures, etc. can be employed to select words. Once the words are selected, these words can be used to identify words for inclusion within a refined set of results.
  • the exclusion component 402 can be used to designate a specific word, or group of words, to be excluded from a refined set of results. In other words, once a word is designated as excluded, a refined set of results can be located thereby comprehensively adjusting the user's initial query. It will be understood that this functionality can be incorporated with a specialized search engine/mechanism. Alternatively, the features, functions and benefits of the innovation can be incorporated into, or used in connection with, conventional search engine/mechanisms.
  • the query update component 406 employs information and instructions received from the selection component 402 and/or the exclusion component 404 to refine the query resultant set.
  • the query update component 406 can also receive multi-modal instructions directly from an entity or user.
  • selected (e.g., for inclusion) and/or excluded terms can be complemented by other text entries, speech entries, etc. thereby assisting a user in efficiently refining search results to obtain a set of meaningful results.
  • This disclosure is focused on three phases. First, a description of the system 100 architecture and contrast that with the typical architecture of conventional voice search applications. The specification also details the backend infrastructure deployed onto a device for fast and efficient retrieval of the search query suggestion results. Second, the disclosure presents an example UI, highlighting its tightly coupled multi-modal refinement capabilities and support of partial knowledge with several user scenarios. Third, the system is evaluated by conducting simulation experiments examining the effectiveness of multi-modal refinement in recovering from speech errors on utterances collected from a previously deployed mobile voice search product.
  • the innovation is capable of tightly coupling multiple modalities, for example, text, speech and touch.
  • the UI enables a search for two words beginning with the letters of ‘b’ and ‘n.’
  • the innovation infers wildcard suffixes for each of the two letters and returns results that contain matching content.
  • this functionality can also be used to refine initial search results. While specific examples are shown and described, it is to be understood that these examples are provided to add perspective to the innovation and not intended to limit the innovation in any manner. Rather, it is to be understood that additional (and countless) examples exist which are to be included within the scope of the innovation and claims appended hereto.
  • a user can be presented with an n-best list of results in response to a search query for ‘b’ and ‘n.’
  • results can be effectively understood as based upon an interpretation of ‘b*’ and ‘n*,’ wherein the asterisks represent wildcards, and a wildcard matches zero, one, or more arbitrary characters.
  • the innovation leverages the use of the n-best list as a word palette such that results can be easily refined by way of selecting, excluding, parsing, or supplementing words (or portions thereof).
  • the n-best list is essentially treated as a sort of word palette from which users can select those words that the recognizer heard correctly, though they may appear in a different phrase. For example, suppose a user says “home depot,” but because of background noise, the phrase does not occur in the n-best list. Suppose, however, that the phrase “home office design” does. With typical (or conventional) voice search applications, the user would have to start over.
  • the user can simply select the word “home” and invoke the backend which finds the most popular listings that contain the word.
  • the system can measure popularity by the frequency with which a business listing appears in the ADA call logs.
  • regular expressions can be used. Because much of the effectiveness of the innovation's interface rests on its ability to retrieve listings using a wildcard query—or a regular expression query containing wildcards—a discussion follows that describes implementation of a RegEx engine followed by further details about wildcard queries constructed in the RegEx generator.
  • k-best suffix arrays Similar to traditional suffix arrays, k-best suffix arrays arrange all suffixes of the listings into an array. While traditional suffix arrays arrange the suffixes in lexicographical order only, the k-best suffix arrays arrange the suffixes according to two alternating orders—a lexicographical ordering and/or an ordering based on a figure of merit such as popularity, preference (determined or inferred), etc.
  • the k-best suffix array is sorted by both lexicographic order and popularity, it is a convenient structure for finding the most popular matches for a substring, especially when there are many matches.
  • the k most popular matches can be found in time close to O(log N) for most practical situations, and with a worst case guarantee of O(sqrt N), where N is the number of characters in the listings.
  • a standard suffix array permits finding all matches to a substring in O(log N) time, but does not impose any popularity ordering on the matches. To find the most popular matches, the user would have to traverse them all.
  • the standard suffix array may be sufficiently fast when searching for the k-best matches to a large substring since there will not be many matches to traverse in this case.
  • the situation is, however, completely different for a short substring such as, for example, ‘a’.
  • a user would have to traverse all dictionary entries containing an ‘a’, which is not much better than traversing all suffixes in the listings—in O(N) time.
  • it is possible to continue a search in a k-best suffix array from the position it was previously stopped.
  • k-best suffix matching will therefore allow look up the k-best (e.g., most popular) matches for an arbitrary wildcard query, such as, for instance ‘f* m* ban*’.
  • the approach proceeds as the k-best suffix matching above for the largest substring without a wildcard (‘ban’).
  • ban wildcard
  • the innovation now evaluates the full wildcard query against the full listing entry for the suffix, and continues the search until k valid expansions to the wildcard query are found.
  • the k-best suffix array can also be used to exclude words in the same way by continuing the search until expansions without the excluded words are found.
  • the query refinement is an iterative process, which gradually eliminates the wildcards in the text string. Whenever the largest substring in the wildcard query does not change between iterations, there is an opportunity to further improve the computational efficiency of the expansion algorithm. In this case, the k-best suffix matching can just be continued from the point where the previous iteration ended.
  • the innovation can implement an IR engine based on an improved term frequency—inverse document frequency (TFIDF) algorithm.
  • TFIDF term frequency—inverse document frequency
  • the IR engine can treat queries and listings as bags of words. This is advantageous when users either incorrectly remember the order of words in a listing, or add additional words that do not actually appear in the listing. This is not the case for the RegEx engine where order and the presence of suffixes in the query matter.
  • the word is sent as a query to a RegEx generator which transforms it into a wildcard query.
  • the generator can simply insert wildcards before spaces, as well as to the end of the entire query. For example, for the query “home”, the generator could produce the regular expression “home*”.
  • the query refinement component 304 can generate a wildcard using minimal edit distance (with equal edit operation costs) to align the phrases at the word level. Once words are aligned, minimal edit distance is again applied to align the characters. Whenever there is disagreement between any aligned words or characters, a wildcard can be substituted in its place. For example, for an n-best list containing the phrases “home depot” and “home office design,” the RegEx generator would produce “home* de*”.
  • the RegEx generator After an initial query is formulated, the RegEx generator applies a heuristics to clean up the regular expression (e.g., no word would have more than one wildcard) before it is used to retrieve k-best matches from the RegEx engine.
  • the RegEx generator (or query refinement component 304 ) is invoked in this form whenever speech is utilized, such as for leveraging partial knowledge, as will be discussed below.
  • the innovation displays an n-best list to the user, prompting a UI to appear, at least at first blush, similar to most any other voice search application.
  • users may select words or phrases (or portions of words or phrases) from a list of choices, provided that it exists among these choices. Because re-speaking does not generally increase the likelihood that the utterance will be recognized correctly, and furthermore, because mobile usage poses distinct challenges not encountered in desktop settings, the interface also endows users with a larger arsenal of recovery strategies.
  • the following user scenarios are highlighted to demonstrate at least two concepts: first, tight coupling of speech with touch and text, so that whenever one of the three modalities fails or becomes burdensome, users may switch to another modality in a complementary way. Second, the scenarios illustrate the ability to leverage any partial knowledge a user may have about constituent words of their intended query.
  • FIG. 5 illustrates how users can leverage the word palette discussed in the previous section for multi-modal refinement.
  • a user utters “first mutual bank.” ( FIG. 6 a ).
  • the system returns an n-best list that unfortunately does not include the intended utterance. It will be appreciated that a number of factors can contribute to the incorrect interpretation, for example, background noise, inadequacy of the voice recognition application/functionality, lack of clarity in spoken words/phrases, etc.
  • the n-best list does include parts of the utterance in the choice “2. source mutual bank.”
  • the user can now select the word “mutual” ( FIG. 6 b ) and then “bank” ( FIG. 6 c ) which gets added to the query textbox in the order selected.
  • the textbox functions as a scratch pad upon which users can add and edit words until they click the search button (or other trigger) on the top left-hand side ( FIG. 6 d ) to refine the query.
  • the query in the textbox is submitted to the backend which retrieves a new set of results containing both exact matches of the query from the RegEx engine as well as approximate matches from the IR engine.
  • This new result list of query suggestions can appear or otherwise be presented in the same manner as the initial n-best list except that words matching the query are highlighted in red (or other identifying or highlight manner). Given that the intended query is now among the list of choices, the user simply selects the choice and is finished ( FIG. 6 e ).
  • FIGS. 7 a - c illustrate how text hints can be leveraged.
  • the user starts typing “m” for the intended query “mill creek family practice,” but because the query is too long to type, the user utters the intended query after pressing the ‘Refine’ soft key button at the bottom of the screen ( FIG. 7 b ). After the query returns from the backend, all choices in the list now start with an “m” and indeed include the user utterance ( FIG. 7 c ).
  • the innovation can achieve this functionality by first converting the text hint in the textbox into a wildcard query and then using that to filter the n-best list from the speech recognition as well as to retrieve additional matches from the RegEx engine. In principle, the innovation acknowledges that the query should be used to bias the recognition of the utterance in the speech engine itself.
  • the system not only returns more results to the user, this time containing the correct query ( FIG. 8 c ), but also creates a new tab to hold all the excluded words ( FIG. 8 d ). If the user has accidentally excluded a word, they can peruse the Excluded Words tab ( FIG. 8 d ) and select the word to remove it from the tab, which will then bring up a new set of choices.
  • the user can select the word and the UI will fill in whatever query part matches it. For example, suppose the user was looking for “pacific networks.” If the user selects the word “pacific” in “3. pacific northwest ballet,” the query in the textbox will change from “p n” to “pacific n,” ( FIG. 8 e ), at which point the user may choose to invoke the backend.
  • the innovation will automatically convert the “something” into a wildcard and retrieve exact matches from the RegEx Engine along with approximate matches from the IR Engine ( FIG. 9 b ). Now, the query appears among the choices and the user simply selects the appropriate choice and is finished ( FIG. 9 c ).
  • the innovation adjusts the statistical language model to allow for transitions to the word “something” before and after every word in the training sentences as a bigram.
  • Business listings that actually contain the word “something” were far and few, and appropriately tagged to not generate a wildcard during inverse text normalization of the recognized result.
  • the innovation can also transform the training sentences into one character prefixes so that it could support partial knowledge queries such as, “b something angus” for “b* angus” ( FIGS. 9 d - e ).
  • the innovation interface can be referred to as “taming” speech recognition errors with a multi-modal interface.
  • the innovation was designed with mobile voice search in mind, in certain situations, it may make sense to exploit richer gestures other than simply selecting via touch or d-pad. For example, users could use gestures to separate words that they want in their query from those they wish to exclude. It is to be understood that the features, functions and benefits of the innovation can be applied to most any search scenario, including desktop based search, without departing from the spirit and/or scope of the innovation and claims appended hereto.
  • 2317 local-area utterances were collected which had been transcribed by a professional transcription service and filtered to remove noise-only and yes-no utterances.
  • the utterances were systematically collected to cover all days of the week as well as times in the day.
  • the utterances were submitted to a speech server which utilized the same acoustic and language models as Live Search Mobile.
  • the transcription appeared in the top position of the n-best list 72.0% of the time, and somewhere in the n-best list 80.0% of the time, where again n was limited to 8 (the number of readable choices displayable on a standard pocket PC (personal computer) form factor).
  • the first set of experiments conducted examined how much recovery rate could be gained by treating the n-best list as a word palette and allowing users to select words.
  • the interface itself enables users to also edit their queries by inserting or deleting characters, this was not permitted for the experiments.
  • Words in the n-best list that matched the transcription were always selected in the proper word order. For example, if the transcription was “black angus restaurant,” “black” was selected from the n-best list first before selecting either “angus” or “restaurant.”
  • as many words from the transcription as could be found in the word palette were selected. For instance, although just “black” could have been submitted as a query in the previous example, because “angus” could also be found in the n-best list, it was included.
  • words in the n-best list alone covered the full transcription 4.31% of the time. Note that full coverage constitutes recovery from the error since the transcription can be completely built up word by word from the n-best list. In 24.6% of the cases, only part of the transcription could be covered by the n-best list (not shown in FIG. 10 ), in which case, another query would need to be submitted to get a new list. If the n-best list is supplemented with matches from the backend, an improvement in the recovery rate to 14.4%, which is a factor of 3.4 over using the n-best list as a word palette is obtained.
  • the recovery rate jumps to 25.2%. If the n-best list is used padded by supplementary matches to submit a query, the recovery rate is 28.5% (which is also the relative error reduction with respect to the entire data set). This is 5.9 times the recovery rate of using just the n-best list as a word palette.
  • FIG. 11 shows these listings as a default list when starting, which includes the general category “pizza” as well as popular stores such as “wal-mart” and “home depot.” Below is a discussion of the implication of such a high baseline.
  • the innovation After generating a wildcard query for the text hint from the transcription, the innovation used it to filter the n-best list obtained from submitting the transcription utterance to the speech server. If there were enough list items after the filtering, the list was supplemented as described in the Supplement generator description above.
  • the system a multi-modal interface system that can be used for voice search applications (e.g., mobile voice search) is presented.
  • voice search applications e.g., mobile voice search
  • This innovation not only facilitates touch and text refinement whenever speech fails (or accuracy is compromised), but also allows users to assist the recognizer via text hints.
  • the innovation can also take advantage of any partial knowledge users may have about their queries by letting them express their uncertainty through “something” expressions. Also discussed was an example overall architecture and details of how the innovation could quickly retrieve exact and approximate matches to the listings from the backend.
  • the innovation found that leveraging multi-modal refinement using the word palette resulted in a 28% relative reduction in error rate.
  • providing text hints along with a spoken utterance resulted in dramatic gains in recovery rate, though this should be qualified by stating that users in the test data tended to ask for popular listings which we could retrieve quickly.
  • voice search applications encourage users to “just say what you want” in order to obtain useful mobile content such as automated directory assistance (ADA).
  • ADA automated directory assistance
  • users when users only remember part of what they are looking for, they are forced to guess, even though what they know may be sufficient to retrieve the desired information.
  • ADA automated directory assistance
  • the disclosure highlights the enhanced user experience uncertain expressions affords and delineates how to perform language modeling and information retrieval.
  • the innovation evaluates an approach by assessing its impact on overall ADA performance and by discussing the results of an experiment in which users generated both uncertain expressions as well as guesses for directory listings. Uncertain expressions reduced relative error rate by 31.8% compared to guessing.
  • FIG. 12 there is illustrated a block diagram of a computer operable to execute the disclosed architecture.
  • FIG. 12 and the following discussion are intended to provide a brief, general description of a suitable computing environment 1200 in which the various aspects of the innovation can be implemented. While the innovation has been described above in the general context of computer-executable instructions that may run on one or more computers, those skilled in the art will recognize that the innovation also can be implemented in combination with other program modules and/or as a combination of hardware and software.
  • program modules include routines, programs, components, data structures, etc., that perform particular tasks or implement particular abstract data types.
  • inventive methods can be practiced with other computer system configurations, including single-processor or multiprocessor computer systems, minicomputers, mainframe computers, as well as personal computers, hand-held computing devices, microprocessor-based or programmable consumer electronics, and the like, each of which can be operatively coupled to one or more associated devices.
  • the illustrated aspects of the innovation may also be practiced in distributed computing environments where certain tasks are performed by remote processing devices that are linked through a communications network.
  • program modules can be located in both local and remote memory storage devices.
  • Computer-readable media can be any available media that can be accessed by the computer and includes both volatile and nonvolatile media, removable and non-removable media.
  • Computer-readable media can comprise computer storage media and communication media.
  • Computer storage media includes both volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer-readable instructions, data structures, program modules or other data.
  • Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital versatile disk (DVD) or other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can be accessed by the computer.
  • Communication media typically embodies computer-readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism, and includes any information delivery media.
  • modulated data signal means a signal that has one or more of its characteristics set or changed in such a manner as to encode information in the signal.
  • communication media includes wired media such as a wired network or direct-wired connection, and wireless media such as acoustic, RF, infrared and other wireless media. Combinations of the any of the above should also be included within the scope of computer-readable media.
  • the exemplary environment 1200 for implementing various aspects of the innovation includes a computer 1202 , the computer 1202 including a processing unit 1204 , a system memory 1206 and a system bus 1208 .
  • the system bus 1208 couples system components including, but not limited to, the system memory 1206 to the processing unit 1204 .
  • the processing unit 1204 can be any of various commercially available processors. Dual microprocessors and other multi-processor architectures may also be employed as the processing unit 1204 .
  • the system bus 1208 can be any of several types of bus structure that may further interconnect to a memory bus (with or without a memory controller), a peripheral bus, and a local bus using any of a variety of commercially available bus architectures.
  • the system memory 1206 includes read-only memory (ROM) 1210 and random access memory (RAM) 1212 .
  • ROM read-only memory
  • RAM random access memory
  • a basic input/output system (BIOS) is stored in a non-volatile memory 1210 such as ROM, EPROM, EEPROM, which BIOS contains the basic routines that help to transfer information between elements within the computer 1202 , such as during start-up.
  • the RAM 1212 can also include a high-speed RAM such as static RAM for caching data.
  • the computer 1202 further includes an internal hard disk drive (HDD) 1214 (e.g., EIDE, SATA), which internal hard disk drive 1214 may also be configured for external use in a suitable chassis (not shown), a magnetic floppy disk drive (FDD) 1216 , (e.g., to read from or write to a removable diskette 1218 ) and an optical disk drive 1220 , (e.g., reading a CD-ROM disk 1222 or, to read from or write to other high capacity optical media such as the DVD).
  • the hard disk drive 1214 , magnetic disk drive 1216 and optical disk drive 1220 can be connected to the system bus 1208 by a hard disk drive interface 1224 , a magnetic disk drive interface 1226 and an optical drive interface 1228 , respectively.
  • the interface 1224 for external drive implementations includes at least one or both of Universal Serial Bus (USB) and IEEE 1394 interface technologies. Other external drive connection technologies are within contemplation of the subject innovation.
  • the drives and their associated computer-readable media provide nonvolatile storage of data, data structures, computer-executable instructions, and so forth.
  • the drives and media accommodate the storage of any data in a suitable digital format.
  • computer-readable media refers to a HDD, a removable magnetic diskette, and a removable optical media such as a CD or DVD, it should be appreciated by those skilled in the art that other types of media which are readable by a computer, such as zip drives, magnetic cassettes, flash memory cards, cartridges, and the like, may also be used in the exemplary operating environment, and further, that any such media may contain computer-executable instructions for performing the methods of the innovation.
  • a number of program modules can be stored in the drives and RAM 1212 , including an operating system 1230 , one or more application programs 1232 , other program modules 1234 and program data 1236 . All or portions of the operating system, applications, modules, and/or data can also be cached in the RAM 1212 . It is appreciated that the innovation can be implemented with various commercially available operating systems or combinations of operating systems.
  • a user can enter commands and information into the computer 1202 through one or more wired/wireless input devices, e.g., a keyboard 1238 and a pointing device, such as a mouse 1240 .
  • Other input devices may include a microphone, an IR remote control, a joystick, a game pad, a stylus pen, touch screen, or the like.
  • These and other input devices are often connected to the processing unit 1204 through an input device interface 1242 that is coupled to the system bus 1208 , but can be connected by other interfaces, such as a parallel port, an IEEE 1394 serial port, a game port, a USB port, an IR interface, etc.
  • a monitor 1244 or other type of display device is also connected to the system bus 1208 via an interface, such as a video adapter 1246 .
  • a computer typically includes other peripheral output devices (not shown), such as speakers, printers, etc.
  • the computer 1202 may operate in a networked environment using logical connections via wired and/or wireless communications to one or more remote computers, such as a remote computer(s) 1248 .
  • the remote computer(s) 1248 can be a workstation, a server computer, a router, a personal computer, portable computer, microprocessor-based entertainment appliance, a peer device or other common network node, and typically includes many or all of the elements described relative to the computer 1202 , although, for purposes of brevity, only a memory/storage device 1250 is illustrated.
  • the logical connections depicted include wired/wireless connectivity to a local area network (LAN) 1252 and/or larger networks, e.g., a wide area network (WAN) 1254 .
  • LAN and WAN networking environments are commonplace in offices and companies, and facilitate enterprise-wide computer networks, such as intranets, all of which may connect to a global communications network, e.g., the Internet.
  • the computer 1202 When used in a LAN networking environment, the computer 1202 is connected to the local network 1252 through a wired and/or wireless communication network interface or adapter 1256 .
  • the adapter 1256 may facilitate wired or wireless communication to the LAN 1252 , which may also include a wireless access point disposed thereon for communicating with the wireless adapter 1256 .
  • the computer 1202 can include a modem 1258 , or is connected to a communications server on the WAN 1254 , or has other means for establishing communications over the WAN 1254 , such as by way of the Internet.
  • the modem 1258 which can be internal or external and a wired or wireless device, is connected to the system bus 1208 via the serial port interface 1242 .
  • program modules depicted relative to the computer 1202 can be stored in the remote memory/storage device 1250 . It will be appreciated that the network connections shown are exemplary and other means of establishing a communications link between the computers can be used.
  • the computer 1202 is operable to communicate with any wireless devices or entities operatively disposed in wireless communication, e.g., a printer, scanner, desktop and/or portable computer, portable data assistant, communications satellite, any piece of equipment or location associated with a wirelessly detectable tag (e.g., a kiosk, news stand, restroom), and telephone.
  • any wireless devices or entities operatively disposed in wireless communication e.g., a printer, scanner, desktop and/or portable computer, portable data assistant, communications satellite, any piece of equipment or location associated with a wirelessly detectable tag (e.g., a kiosk, news stand, restroom), and telephone.
  • the communication can be a predefined structure as with a conventional network or simply an ad hoc communication between at least two devices.
  • Wi-Fi Wireless Fidelity
  • Wi-Fi is a wireless technology similar to that used in a cell phone that enables such devices, e.g., computers, to send and receive data indoors and out; anywhere within the range of a base station.
  • Wi-Fi networks use radio technologies called IEEE 802.11 (a, b, g, etc.) to provide secure, reliable, fast wireless connectivity.
  • IEEE 802.11 a, b, g, etc.
  • a Wi-Fi network can be used to connect computers to each other, to the Internet, and to wired networks (which use IEEE 802.3 or Ethernet).
  • Wi-Fi networks operate in the unlicensed 2.4 and 5 GHz radio bands, at an 11 Mbps (802.11a) or 54 Mbps (802.11b) data rate, for example, or with products that contain both bands (dual band), so the networks can provide real-world performance similar to the basic 10 BaseT wired Ethernet networks used in many offices.
  • the system 1300 includes one or more client(s) 1302 .
  • the client(s) 1302 can be hardware and/or software (e.g., threads, processes, computing devices).
  • the client(s) 1302 can house cookie(s) and/or associated contextual information by employing the innovation, for example.
  • the system 1300 also includes one or more server(s) 1304 .
  • the server(s) 1304 can also be hardware and/or software (e.g., threads, processes, computing devices).
  • the servers 1304 can house threads to perform transformations by employing the innovation, for example.
  • One possible communication between a client 1302 and a server 1304 can be in the form of a data packet adapted to be transmitted between two or more computer processes.
  • the data packet may include a cookie and/or associated contextual information, for example.
  • the system 1300 includes a communication framework 1306 (e.g., a global communication network such as the Internet) that can be employed to facilitate communications between the client(s) 1302 and the server(s) 1304 .
  • a communication framework 1306 e.g., a global communication network such as the Internet
  • Communications can be facilitated via a wired (including optical fiber) and/or wireless technology.
  • the client(s) 1302 are operatively connected to one or more client data store(s) 1308 that can be employed to store information local to the client(s) 1302 (e.g., cookie(s) and/or associated contextual information).
  • the server(s) 1304 are operatively connected to one or more server data store(s) 1310 that can be employed to store information local to the servers 1304 .

Abstract

A multi-modal search query refinement system (and corresponding methodology) is provided. In accordance with the innovation, query suggestion results represent a word palette which can be used to select strings for inclusion or exclusion from a refined set of results. The system employs text, speech, touch and gesture input to refine a set of search query results. Wildcards can be employed in the search either prompted by the user or inferred by the system. Additionally, partial knowledge supplemented by speech can be employed to refine search results.

Description

    CROSS-REFERENCE TO RELATED APPLICATIONS
  • This application claims the benefit of U.S. Provisional Patent application Ser. No. 61/053,214 entitled “MULTI-MODALITY SEARCH INTERFACE” and filed May 14, 2008. This application is related to pending U.S. patent application Ser. No. ______ entitled “MULTI-MODAL QUERY GENERATION” filed on ______ and to pending U.S. patent application Ser. No. ______ entitled “MULTI-MODAL SEARCH WILDCARDS” filed on ______. The entireties of the above-noted applications are incorporated by reference herein.
  • BACKGROUND
  • The Internet continues to make available ever-increasing amounts of information which can be stored in databases and accessed therefrom. With the proliferation of mobile and portable terminals (e.g., cellular telephones, personal data assistants (PDAs), smartphones and other devices), users are becoming more mobile, and hence, more reliant upon information accessible via the Internet. Accordingly, users often search network sources such as the Internet from their mobile device.
  • There are essentially two phases in an Internet search. First, a search query is constructed that can be submitted to a search engine. Second the search engine matches this search query to actual search results. Conventionally, these search queries were constructed merely of keywords that were matched to a list of results based upon factors such as relevance, popularity, preference, etc.
  • The Internet and the World Wide Web continue to evolve rapidly with respect to both volume of information and number of users. As a whole, the Web provides a global space for accumulation, exchange and dissemination of information. As mobile devices become more and more commonplace to access the Web, the number of users continues to increase.
  • In some instances, a user knows the name of a site, server or URL (uniform resource locator) to the site or server that is desired for access. In such situations, the user can access the site, by simply typing the URL in an address bar of a browser to connect to the site. Oftentimes, the user does not know the URL and therefore has to ‘search’ the Web for relevant sources and/or URL's. To maximize likelihood of locating relevant information amongst an abundance of data, Internet or web search engines are regularly employed.
  • Traditionally, to locate a site or corresponding URL of interest, the user can employ a search engine to facilitate locating and accessing sites based upon alphanumeric keywords and/or Boolean operators. In aspects, these keywords are text- or speech-based queries, although, speech is not always reliable. Essentially, a search engine is a tool that facilitates web navigation based upon textual (or speech-to-text) entry of a search query usually comprising one or more keywords. Upon receipt of a search query, the search engine retrieves a list of websites, typically ranked based upon relevance to the query. To enable this functionality, the search engine must generate and maintain a supporting infrastructure.
  • Upon textual entry of one or more keywords as a search query, the search engine retrieves indexed information that matches the query from an indexed database, generates a snippet of text associated with each of the matching sites and displays the results to the user. The user can thereafter scroll through a plurality of returned sites to attempt to determine if the sites are related to the interests of the user. However, this can be an extremely time-consuming and frustrating process as search engines can return a substantial number of sites. More often than not, the user is forced to narrow the search iteratively by altering and/or adding keywords and Boolean operators to obtain the identity of websites including relevant information, again by typing (or speaking) the revised query.
  • Conventional computer-based search, in general, is extremely text-centric (pure text or speech-to-text) in that search engines typically analyze content of alphanumeric search queries in order to return results. These traditional search engines merely parse alphanumeric queries into ‘keywords’ and subsequently perform searches based upon a defined number of instances of each of the keywords in a reference.
  • Currently, users of mobile devices, such as smartphones, often attempt to access or ‘surf’ the Internet using keyboards or keypads such as, a standard numeric phone keypad, a soft or miniature QWERTY keyboard, etc. Unfortunately, these input mechanisms are not always efficient for the textual input to efficiently search the Internet. As described above, conventional mobile devices are limited to text input to establish search queries, for example, Internet search queries. Text input can be a very inefficient way to search, particularly for long periods of time and/or for very long queries.
  • SUMMARY
  • The following presents a simplified summary of the innovation in order to provide a basic understanding of some aspects of the innovation. This summary is not an extensive overview of the innovation. It is not intended to identify key/critical elements of the innovation or to delineate the scope of the innovation. Its sole purpose is to present some concepts of the innovation in a simplified form as a prelude to the more detailed description that is presented later.
  • The innovation disclosed and claimed herein, in one aspect thereof, comprises search systems (and corresponding methodologies) that can couple speech, text and touch for search interfaces and engines. In particular aspects, the multi-modal functionality can be used to refine search results thereby enhancing search functionality with minimal textual input. In other words, rather than being completely dependent upon conventional textual input, the innovation can combine speech, text, and touch to enhance usability and efficiency of search mechanisms. Accordingly, it can be possible to locate more meaningful and comprehensive results as a function of a search query.
  • In aspects, the innovation discloses a multi-modal search interface that tightly couples speech, text and touch by utilizing regular expression queries with ‘wildcards,’ where parts of the query can be input via different modalities, e.g., different modalities such as speech, text, and touch can be used at any point in the query construction process. In other aspects, the innovation can represent uncertainty in a spoken recognized result as wildcards in a regular expression query. In yet other aspects, the innovation allows users to express their own uncertainty about parts of their utterance using expressions such as “something” or “whatchamacallit” which then gets translated into wildcards.
  • In still other aspects, the innovation can be incorporated or retrofitted into existing search engines and/or interfaces. Additionally, the features, functionality and benefits of the innovation can be incorporated into mobile search applications which have strategic importance given the increasing usage of mobile devices as a primary computing device. As described above, mobile devices are not always configured or equipped with full-function keyboards, thus, the multi-modal functionality of the innovation can be employed to greatly enhance comprehensiveness of search.
  • In yet another aspect thereof, machine learning and reasoning is provided that employs a probabilistic and/or statistical-based analysis to prognose or infer an action that a user desires to be automatically performed.
  • To the accomplishment of the foregoing and related ends, certain illustrative aspects of the innovation are described herein in connection with the following description and the annexed drawings. These aspects are indicative, however, of but a few of the various ways in which the principles of the innovation can be employed and the subject innovation is intended to include all such aspects and their equivalents. Other advantages and novel features of the innovation will become apparent from the following detailed description of the innovation when considered in conjunction with the drawings.
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • FIG. 1 illustrates an example block diagram of a multi-modal search refinement system in accordance with an aspect of the innovation.
  • FIG. 2 illustrates an example flow diagram of procedures that facilitate query refinement in accordance with an aspect of the innovation.
  • FIG. 3 illustrates an example query administration component in accordance with an aspect of the innovation.
  • FIG. 4 illustrates an example query refinement component in accordance with an aspect of the innovation.
  • FIG. 5 illustrates an example screenshot that illustrates that the innovation can tightly couple touch and text for multi-modal refinement.
  • FIG. 6 a-e illustrates an example word palette that helps a user compose and refine a search phrase from an n-best list.
  • FIG. 7 a-c illustrates example text hints that help the speech recognizer to efficiently identify the query.
  • FIG. 8 a-e illustrates example screenshots that show that words can be excluded and restored from retrieved results by touch.
  • FIG. 9 a-e illustrates that a user can specify uncertain information using the word “something” in accordance with aspects.
  • FIG. 10 illustrates example recovery rates for using multi-modal refinement with a word palette in accordance with an aspect.
  • FIG. 11 illustrates example recovery rates for text hints of increasing number of characters in accordance with aspects.
  • FIG. 12 illustrates a block diagram of a computer operable to execute the disclosed architecture.
  • FIG. 13 illustrates a schematic block diagram of an exemplary computing environment in accordance with the subject innovation.
  • DETAILED DESCRIPTION
  • The innovation is now described with reference to the drawings, wherein like reference numerals are used to refer to like elements throughout. In the following description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the subject innovation. It may be evident, however, that the innovation can be practiced without these specific details. In other instances, well-known structures and devices are shown in block diagram form in order to facilitate describing the innovation.
  • As used in this application, the terms “component” and “system” are intended to refer to a computer-related entity, either hardware, a combination of hardware and software, software, or software in execution. For example, a component can be, but is not limited to being, a process running on a processor, a processor, an object, an executable, a thread of execution, a program, and/or a computer. By way of illustration, both an application running on a server and the server can be a component. One or more components can reside within a process and/or thread of execution, and a component can be localized on one computer and/or distributed between two or more computers.
  • As used herein, the term to “infer” or “inference” refer generally to the process of reasoning about or inferring states of the system, environment, and/or user from a set of observations as captured via events and/or data. Inference can be employed to identify a specific context or action, or can generate a probability distribution over states, for example. The inference can be probabilistic—that is, the computation of a probability distribution over states of interest based on a consideration of data and events. Inference can also refer to techniques employed for composing higher-level events from a set of events and/or data. Such inference results in the construction of new events or actions from a set of observed events and/or stored event data, whether or not the events are correlated in close temporal proximity, and whether the events and data come from one or several event and data sources.
  • While certain ways of displaying information to users are shown and described with respect to certain figures as screenshots, those skilled in the relevant art will recognize that various other alternatives can be employed. The terms “screen,” “web page,” and “page” are generally used interchangeably herein. The pages or screens are stored and/or transmitted as display descriptions, as graphical user interfaces, or by other methods of depicting information on a screen (whether personal computer, PDA, mobile telephone, or other suitable device, for example) where the layout and information or content to be displayed on the page is stored in memory, database, or another storage facility.
  • The innovation disclosed and claimed herein, in aspects thereof, describe a method (and system) of presenting search query suggestions for a regular expression query with wildcards whereby not only are the best candidate phrase matches displayed, but each word in the displayed phrases are treated as a substitution choices for the words and/or wildcards in the query. In other words, the list of query suggestion results essentially acts as a kind of “word palette” with which users can select (e.g., via touch or d-pad) words to compose and/or refine queries. In aspects, users can drag and drop words into a query from the query suggestion list.
  • Still other aspects can employ a “refinement by exclusion” technique, where, if the user does not select any of the phrases in the query suggestion list, as can be implemented in an example “None of the above” choice, every word that was not selected can be treated as a word to exclude in retrieving more matches from the index. Another aspect employs a refinement technique above based on retrieving entries from a k-best suffix-array with constraints. These and other aspects will be described in greater detail in connection with the figures that follow.
  • Referring initially to the drawings, FIG. 1 illustrates an example block diagram of a system 100 that employs a multi-modal search refinement component 102 to refine search queries by selecting and/or excluding terms thereby rendering meaningful search results. It is to be understood that, as used herein, ‘multi-modal’ can refer to most any combination of text, voice, touch, gesture, etc. While examples described herein are directed to a specific multi-modal example that employs text, voice and touch only, it is to be understood that other examples exist that employ a subset of these identified modalities. As well, it is to be understood that other examples exist that employ disparate modalities in combination with or separate from those described herein. For instance, other examples can employ gesture input, image/pattern recognition, among others to refine a search query.
  • As shown the multi-modal search refinement component 102 can include a query administration component 104 and a search engine component 106, each of these sub-components 104, 106 can be referred to as a backend search system. Essentially, these subcomponents (104, 106) enable a user to refine a set of search results by way of multiple modalities, for example, text, voice, touch, etc. As described herein, a set of search results can be employed as a word palette thereby enabling users to refine, improve, filter or focus the results as desired. Features, functions and benefits of the innovation will be described in greater detail below. As will be described in greater detail infra, the query administration component 104 can employ multiple input modes to construct a search query, e.g., a wildcard query. The search engine component 106 can include backend components capable of matching query suggestions.
  • Internet usage, especially via mobile devices, continues to grow as users seek anytime, anywhere access to information. Because users frequently search for businesses, directory assistance has been the focus of conventional voice search applications utilizing speech as the primary input modality. Unfortunately, mobile usage scenarios often contain noise which degrades performance of speech recognition functionalities. Thus, the innovation presents a multi-modal search refinement component 102, a mobile search interface that not only can facilitate touch and text refinement whenever speech fails, but also allows users to assist the recognizer via text hints. For instance, a text hint can be used together with speech to better refine search queries.
  • The innovation can also take advantage of most any partial knowledge users may have about a business listing by letting them express their uncertainty in a simple, intuitive way. In simulation experiments conducted on actual voice search data, leveraging multi-modal refinement, resulted in a 28% relative reduction in error rate. Providing text hints along with the spoken utterance resulted in even greater relative reduction, with dramatic gains in recovery for each additional character.
  • As can be appreciated, according to market research, mobile devices are believed to be poised to rival desktop and laptop PCs as the dominant Internet platform, providing users with anytime, anywhere access to information. One common request for information is the telephone number or address of local businesses. Because perusing a large index of business listings can be a cumbersome affair using existing mobile text- and touch-based input mechanisms, directory assistance has been the focus of voice search applications, which utilize speech as the primary input modality. Unfortunately, mobile environments pose problems for speech recognition, even for native speakers. First, mobile settings often contain non-stationary noise which cannot be easily cancelled or filtered. Second, speakers tend to adapt to surrounding noise in acoustically unhelpful ways. Under such adverse conditions, task completion for voice search is less than stellar, especially in the absence of an effective correction user interface (UI) for dealing with speech recognition errors.
  • In light of the challenges of mobile voice search, the multi-modal search refinement system 102 can generate a series of user interfaces (UI) which assist in refinement of search queries. As will be described with reference to the figures that follow, the multi-modal UIs tightly couple speech with touch and text (as well as gestures, etc.) in at least two directions; users can not only use touch and text to refine their queries, whenever speech fails, but they can also use speech whenever text entry becomes burdensome. Essentially, the innovation can facilitate this tight coupling by transforming a typical n-best list, or a list of phrase alternates from the recognizer, into a palette of words with which users can compose and refine queries.
  • The innovation can also take advantage of any partial knowledge users may have about the words of the business listing. For example, a user may only remember that the listing starts with an “s” and also contains the word “avenue”. Likewise, the user may only remember “Saks something”, where the word “something” is used to express uncertainty about what words follow. While the word “something” is used in the aforementioned example, it is to be appreciated that most any desired word or indicator can be used without departing from the spirit/scope of the innovation and claims appended hereto. The innovation can represent this uncertainty as wildcards in an enhanced regular expression search of the listings, which exploits the popularity of the listings.
  • FIG. 2 illustrates a methodology of refining a search query in accordance with an aspect of the innovation. While, for purposes of simplicity of explanation, the one or more methodologies shown herein, e.g., in the form of a flow chart, are shown and described as a series of acts, it is to be understood and appreciated that the subject innovation is not limited by the order of acts, as some acts may, in accordance with the innovation, occur in a different order and/or concurrently with other acts from that shown and described herein. For example, those skilled in the art will understand and appreciate that a methodology could alternatively be represented as a series of interrelated states or events, such as in a state diagram. Moreover, not all illustrated acts may be required to implement a methodology in accordance with the innovation.
  • At 202, search query suggestion results are received. Further, the query suggestion results can be categorized, ordered or otherwise organized in most any manner using most any ranking algorithm or methodology. In accordance with the innovation, the search results are representative of a ‘bag of word’ or a ‘word palette.’ In other words, the words within the search results themselves are search terms that can be used to further organize, sort, filter or otherwise refine the results.
  • In one example, a search term can be selected at 204—it is to be understood that this act can include selection of multiple terms from the list of results. In one aspect, the selected search term(s) can be used for inclusion within a refined query. In yet another example, at 206, a word or set of words can be selected for exclusion such that the refinement will exclude any results that employ the excluded word or set of words.
  • The selection for inclusion or other exclusion can employ most any suitable mechanism. For instance, the selection can be effected by way of a navigation device, touch screen, speech identification or the like. Additionally, it is to be understood that the selection (e.g., refinement criteria) can be placed upon or maintained in a separate location such as a “scratchpad.” In one example, the “scratchpad” can be the textbox in which the user may have entered text (e.g., if s/he utilized text), or it could be some other area suitable for the input modality. These and other conceivable examples are to be included within the scope of this disclosure and claims appended hereto.
  • At 208, the search term (or selected word(s)) can be supplemented, for example, by way of speech. Here, the selected word(s) can be combined with spoken words or phrases which define or further refine a search query. A decision is made at 210 to determine if additional refinement is desired. If so, the methodology returns to 204 as shown. If not, a refined set of search query suggestion results are received at 212 in view of the refined query. This methodology is repeatable until the user is satisfied with the refinement of the search query and subsequent results. The recursive characteristics of the methodology are illustrated by decision block 214.
  • Referring now to FIG. 3, an example block diagram of a query administration component 104 is shown. Generally, the query administration component 104 can include an analysis component 302 and a query refinement component 304. Together, these sub-components (302, 304) enable a user to use multi-modal mechanisms to refine search query suggestion results. In other words, as described above, the initial search results can be utilized as a word palette whereby users can refine the results (e.g., by selecting words for inclusion and/or exclusion).
  • The analysis component 302 is capable of providing instructions to the query refinement component 304 with regard to streamlining query suggestion results. In other words, the results can be streamlined by way of analysis of input characteristics as well as including and/or excluding terms. In other aspects, the words (or portions thereof) can be supplemented with verbally spoken cues. These spoken cues can essentially employ the text as word hints thereby refining the search query results as desired.
  • As shown in the example of FIG. 4, the query refinement component 304 can include a selection component 402, an exclusion component 404 and a query update component 406. Together, these sub-components enable search query refinement. As described above, search query suggestion results are parsed into a word palette where each of the words or segments of words can be used to comprehensively refine the results. In other words, rather than resubmitting a revised query to a search engine, the query refinement component 304 enables a user to employ the actual words included in an original set of results to drill down or further refine a set of results.
  • The selection component 402 enables a user to choose one or more words from a set (e.g., word palette) that represents the words from within a set of search results. In aspects, words can be selected using navigation devices such as a mouse, trackball, touchpad or the like. Additionally, navigational keys, verbal identification, gestures, etc. can be employed to select words. Once the words are selected, these words can be used to identify words for inclusion within a refined set of results.
  • Alternatively, the exclusion component 402 can be used to designate a specific word, or group of words, to be excluded from a refined set of results. In other words, once a word is designated as excluded, a refined set of results can be located thereby comprehensively adjusting the user's initial query. It will be understood that this functionality can be incorporated with a specialized search engine/mechanism. Alternatively, the features, functions and benefits of the innovation can be incorporated into, or used in connection with, conventional search engine/mechanisms.
  • The query update component 406 employs information and instructions received from the selection component 402 and/or the exclusion component 404 to refine the query resultant set. In addition to information received from the selection and/or exclusion components (402, 404), the query update component 406 can also receive multi-modal instructions directly from an entity or user. In aspects, selected (e.g., for inclusion) and/or excluded terms can be complemented by other text entries, speech entries, etc. thereby assisting a user in efficiently refining search results to obtain a set of meaningful results.
  • This disclosure is focused on three phases. First, a description of the system 100 architecture and contrast that with the typical architecture of conventional voice search applications. The specification also details the backend infrastructure deployed onto a device for fast and efficient retrieval of the search query suggestion results. Second, the disclosure presents an example UI, highlighting its tightly coupled multi-modal refinement capabilities and support of partial knowledge with several user scenarios. Third, the system is evaluated by conducting simulation experiments examining the effectiveness of multi-modal refinement in recovering from speech errors on utterances collected from a previously deployed mobile voice search product.
  • Referring first to the example UI illustrated in FIG. 5, as shown, the innovation is capable of tightly coupling multiple modalities, for example, text, speech and touch. As shown, the UI enables a search for two words beginning with the letters of ‘b’ and ‘n.’ In other words, the innovation infers wildcard suffixes for each of the two letters and returns results that contain matching content. As will be described below, this functionality can also be used to refine initial search results. While specific examples are shown and described, it is to be understood that these examples are provided to add perspective to the innovation and not intended to limit the innovation in any manner. Rather, it is to be understood that additional (and countless) examples exist which are to be included within the scope of the innovation and claims appended hereto.
  • As shown in FIG. 5, a user can be presented with an n-best list of results in response to a search query for ‘b’ and ‘n.’ These results can be effectively understood as based upon an interpretation of ‘b*’ and ‘n*,’ wherein the asterisks represent wildcards, and a wildcard matches zero, one, or more arbitrary characters.
  • In contrast to typical search systems, the innovation leverages the use of the n-best list as a word palette such that results can be easily refined by way of selecting, excluding, parsing, or supplementing words (or portions thereof). The n-best list is essentially treated as a sort of word palette from which users can select those words that the recognizer heard correctly, though they may appear in a different phrase. For example, suppose a user says “home depot,” but because of background noise, the phrase does not occur in the n-best list. Suppose, however, that the phrase “home office design” does. With typical (or conventional) voice search applications, the user would have to start over.
  • In accordance with the innovation, the user can simply select the word “home” and invoke the backend which finds the most popular listings that contain the word. In one aspect, the system can measure popularity by the frequency with which a business listing appears in the ADA call logs. In order to retrieve the most popular listings that contain a particular word or substring, regular expressions can be used. Because much of the effectiveness of the innovation's interface rests on its ability to retrieve listings using a wildcard query—or a regular expression query containing wildcards—a discussion follows that describes implementation of a RegEx engine followed by further details about wildcard queries constructed in the RegEx generator.
  • Turning now to a discussion of a RegEx engine, one example index data structure used for regular expression matching is based on k-best suffix arrays. Similar to traditional suffix arrays, k-best suffix arrays arrange all suffixes of the listings into an array. While traditional suffix arrays arrange the suffixes in lexicographical order only, the k-best suffix arrays arrange the suffixes according to two alternating orders—a lexicographical ordering and/or an ordering based on a figure of merit such as popularity, preference (determined or inferred), etc.
  • Because the k-best suffix array is sorted by both lexicographic order and popularity, it is a convenient structure for finding the most popular matches for a substring, especially when there are many matches. In an aspect, the k most popular matches can be found in time close to O(log N) for most practical situations, and with a worst case guarantee of O(sqrt N), where N is the number of characters in the listings. In contrast, a standard suffix array permits finding all matches to a substring in O(log N) time, but does not impose any popularity ordering on the matches. To find the most popular matches, the user would have to traverse them all.
  • Consider a simple example which explains why this subtle difference is important to the application. The standard suffix array may be sufficiently fast when searching for the k-best matches to a large substring since there will not be many matches to traverse in this case. The situation is, however, completely different for a short substring such as, for example, ‘a’. In this case, a user would have to traverse all dictionary entries containing an ‘a’, which is not much better than traversing all suffixes in the listings—in O(N) time. With a clever implementation, it is possible to continue a search in a k-best suffix array from the position it was previously stopped. A simple variation of k-best suffix matching will therefore allow look up the k-best (e.g., most popular) matches for an arbitrary wildcard query, such as, for instance ‘f* m* ban*’. The approach proceeds as the k-best suffix matching above for the largest substring without a wildcard (‘ban’). At each match, the innovation now evaluates the full wildcard query against the full listing entry for the suffix, and continues the search until k valid expansions to the wildcard query are found.
  • The k-best suffix array can also be used to exclude words in the same way by continuing the search until expansions without the excluded words are found. The query refinement is an iterative process, which gradually eliminates the wildcards in the text string. Whenever the largest substring in the wildcard query does not change between iterations, there is an opportunity to further improve the computational efficiency of the expansion algorithm. In this case, the k-best suffix matching can just be continued from the point where the previous iteration ended.
  • With an efficient k-best suffix array matching algorithm in hand for the RegEx engine, it can be deployed, for example onto a mobile device, because of latencies associated with sending information back and forth along a wireless data channel. It will be appreciated that speech recognition for ADA already takes several seconds to return an n-best list. It is desirable to provide short latencies for wildcard queries. While many of the examples described herein are directed to ADA, it is to be understood that other aspects exist, for example general Internet search, without departing from the spirit and/or scope of the innovation and claims appended hereto.
  • Turning now to a discussion of an IR (information retrieval) or search engine, besides wildcard queries, which provide exact matches to the listings, it is useful to also retrieve approximate matches to the listings. For at least this purpose, the innovation can implement an IR engine based on an improved term frequency—inverse document frequency (TFIDF) algorithm. As described above, what is important to note about the IR engine is that it can treat queries and listings as bags of words. This is advantageous when users either incorrectly remember the order of words in a listing, or add additional words that do not actually appear in the listing. This is not the case for the RegEx engine where order and the presence of suffixes in the query matter.
  • Referring now to the RegEx generator, returning to the example in which a user selects the word “home” for “home depot” from the word palette, once the user invokes the backend, the word is sent as a query to a RegEx generator which transforms it into a wildcard query. For single phrases, the generator can simply insert wildcards before spaces, as well as to the end of the entire query. For example, for the query “home”, the generator could produce the regular expression “home*”.
  • For a list of phrases, such as an n-best list from a speech recognizer, the query refinement component 304 can generate a wildcard using minimal edit distance (with equal edit operation costs) to align the phrases at the word level. Once words are aligned, minimal edit distance is again applied to align the characters. Whenever there is disagreement between any aligned words or characters, a wildcard can be substituted in its place. For example, for an n-best list containing the phrases “home depot” and “home office design,” the RegEx generator would produce “home* de*”. After an initial query is formulated, the RegEx generator applies a heuristics to clean up the regular expression (e.g., no word would have more than one wildcard) before it is used to retrieve k-best matches from the RegEx engine. The RegEx generator (or query refinement component 304) is invoked in this form whenever speech is utilized, such as for leveraging partial knowledge, as will be discussed below.
  • As discussed above, the innovation displays an n-best list to the user, prompting a UI to appear, at least at first blush, similar to most any other voice search application. However, in accordance with the innovation, users may select words or phrases (or portions of words or phrases) from a list of choices, provided that it exists among these choices. Because re-speaking does not generally increase the likelihood that the utterance will be recognized correctly, and furthermore, because mobile usage poses distinct challenges not encountered in desktop settings, the interface also endows users with a larger arsenal of recovery strategies.
  • The following user scenarios are highlighted to demonstrate at least two concepts: first, tight coupling of speech with touch and text, so that whenever one of the three modalities fails or becomes burdensome, users may switch to another modality in a complementary way. Second, the scenarios illustrate the ability to leverage any partial knowledge a user may have about constituent words of their intended query.
  • Turning to a discussion of refinement using the word palette, FIG. 5 illustrates how users can leverage the word palette discussed in the previous section for multi-modal refinement. Suppose a user utters “first mutual bank.” (FIG. 6 a). The system returns an n-best list that unfortunately does not include the intended utterance. It will be appreciated that a number of factors can contribute to the incorrect interpretation, for example, background noise, inadequacy of the voice recognition application/functionality, lack of clarity in spoken words/phrases, etc.
  • As shown, it is possible that the n-best list does include parts of the utterance in the choice “2. source mutual bank.” As such, the user can now select the word “mutual” (FIG. 6 b) and then “bank” (FIG. 6 c) which gets added to the query textbox in the order selected. The textbox functions as a scratch pad upon which users can add and edit words until they click the search button (or other trigger) on the top left-hand side (FIG. 6 d) to refine the query. At this point, the query in the textbox is submitted to the backend which retrieves a new set of results containing both exact matches of the query from the RegEx engine as well as approximate matches from the IR engine. This new result list of query suggestions can appear or otherwise be presented in the same manner as the initial n-best list except that words matching the query are highlighted in red (or other identifying or highlight manner). Given that the intended query is now among the list of choices, the user simply selects the choice and is finished (FIG. 6 e).
  • As stated supra, and illustrated in FIGS. 7 a-c, the innovation supports refinement with text hints. Just in the way that users can resort to touch and text when speech fails, they can also resort to speech whenever typing becomes burdensome, or when they feel they have provided enough text hints for the recognizer to identify their query. FIGS. 7 a-c illustrate how text hints can be leveraged.
  • Here, as shown in FIG. 7 a, the user starts typing “m” for the intended query “mill creek family practice,” but because the query is too long to type, the user utters the intended query after pressing the ‘Refine’ soft key button at the bottom of the screen (FIG. 7 b). After the query returns from the backend, all choices in the list now start with an “m” and indeed include the user utterance (FIG. 7 c).
  • The innovation can achieve this functionality by first converting the text hint in the textbox into a wildcard query and then using that to filter the n-best list from the speech recognition as well as to retrieve additional matches from the RegEx engine. In principle, the innovation acknowledges that the query should be used to bias the recognition of the utterance in the speech engine itself.
  • Turning to a discussion of refinement by exclusion, in certain situations, users may invoke the backend when they think there is sufficient information to retrieve their desired query in one pass, but find that their query does not show up among the choices, perhaps due to lack of popularity. Typically, users would have to provide more information and try again. Contrary to conventional approaches, the innovation supports this but also adds the ability to exclude words from retrieved result (see the RegEx discussion above for details). For example, in FIG. 8a, the user is looking for “pure networks” so he types “p n” and invokes the backend. The most popular exact matches to regular expression “p* n*” are retrieved and displayed to the user. Seeing that none of the choices contain the intended query, or even part of the query, the user selects “None of the above” (FIG. 8 b). This tells the backend to retrieve more results for the query “p* and n*”, but exclude all queries that contain words from the previous query suggestions, such as “princess” and “northwest.”
  • As such, the system not only returns more results to the user, this time containing the correct query (FIG. 8 c), but also creates a new tab to hold all the excluded words (FIG. 8 d). If the user has accidentally excluded a word, they can peruse the Excluded Words tab (FIG. 8 d) and select the word to remove it from the tab, which will then bring up a new set of choices.
  • Note that if the user observes that part of the intended query shows up among the choices, similar to the word palette scenario, the user can select the word and the UI will fill in whatever query part matches it. For example, suppose the user was looking for “pacific networks.” If the user selects the word “pacific” in “3. pacific northwest ballet,” the query in the textbox will change from “p n” to “pacific n,” (FIG. 8 e), at which point the user may choose to invoke the backend.
  • With reference to leveraging partial knowledge, sometimes users may not remember exactly the name of the listing they are looking for, but only parts of it. For example, they may remember that the first word begins with “pacific” and some word thereafter starts with an “n.” The previous user scenario shows that the innovation UI supports this kind of search with text. In this scenario, the innovation demonstrates that the interface also enables this kind of search with speech. In FIG. 9, the user is looking for “black angus restaurant” but only remembers that the first word is “black.” Here, the user can simply say, “black ‘something’ restaurant” (FIG. 9 a). Noticing that there is no “black something restaurant” in the listings, the innovation will automatically convert the “something” into a wildcard and retrieve exact matches from the RegEx Engine along with approximate matches from the IR Engine (FIG. 9 b). Now, the query appears among the choices and the user simply selects the appropriate choice and is finished (FIG. 9 c).
  • In order to support the recognition of “something” expressions of uncertainty, the innovation adjusts the statistical language model to allow for transitions to the word “something” before and after every word in the training sentences as a bigram. Business listings that actually contain the word “something” were far and few, and appropriately tagged to not generate a wildcard during inverse text normalization of the recognized result. The innovation can also transform the training sentences into one character prefixes so that it could support partial knowledge queries such as, “b something angus” for “b* angus” (FIGS. 9 d-e).
  • The innovation interface can be referred to as “taming” speech recognition errors with a multi-modal interface. Although the innovation was designed with mobile voice search in mind, in certain situations, it may make sense to exploit richer gestures other than simply selecting via touch or d-pad. For example, users could use gestures to separate words that they want in their query from those they wish to exclude. It is to be understood that the features, functions and benefits of the innovation can be applied to most any search scenario, including desktop based search, without departing from the spirit and/or scope of the innovation and claims appended hereto.
  • The following research results are included to provide perspective as to the usefulness of the innovation—these research results are not intended to limit the innovation in any manner. Apart from switching modalities, a fair amount of research has been devoted to simultaneous multi-modal disambiguation. In accordance with the innovation, text hints could be construed as a way of fusing speech and text, though technically, the text could bias the internal processing of the speech recognizer.
  • In order to assess the effectiveness of the subject innovation in recovering from speech recognition errors, simulation experiments on utterances collected from a deployed mobile voice search product were conducted; namely, Microsoft Live Search Mobile, which provides not only ADA but also maps, driving directions, movie times and local gas prices. Besides capturing the difficult acoustic conditions inherent in mobile environments, the collected utterances also represent a random sampling of speaker accents, speaker adaptation to surrounding noise, and even the variable recording quality of different mobile devices.
  • 2317 local-area utterances were collected which had been transcribed by a professional transcription service and filtered to remove noise-only and yes-no utterances. The utterances were systematically collected to cover all days of the week as well as times in the day. For all of the simulation experiments, the utterances were submitted to a speech server which utilized the same acoustic and language models as Live Search Mobile. Of the 2317 utterances, the transcription appeared in the top position of the n-best list 72.0% of the time, and somewhere in the n-best list 80.0% of the time, where again n was limited to 8 (the number of readable choices displayable on a standard pocket PC (personal computer) form factor). As summarized in Table 1 below, in 20% of the utterances, the transcription did not appear at all in the n-best list. These failure cases constituted an opportunity for recovering from error, given that the innovation performs the same as the existing product for the other 80% of the cases.
  • TABLE 1
    A breakdown of the simulation test data.
    Case Frequency Percentage
    Top
    1 High Conf (Bull's Eye) 545 24%
    Top
    1 Med + Low Conf 1125 48%
    Top N 183 8%
    All Wrong 464 20%
    Total: 2317
  • With regard to refinement with the word palette, looking at just the failure cases, the first set of experiments conducted examined how much recovery rate could be gained by treating the n-best list as a word palette and allowing users to select words. Although the interface itself enables users to also edit their queries by inserting or deleting characters, this was not permitted for the experiments. Words in the n-best list that matched the transcription were always selected in the proper word order. For example, if the transcription was “black angus restaurant,” “black” was selected from the n-best list first before selecting either “angus” or “restaurant.” Furthermore, as many words from the transcription as could be found in the word palette were selected. For instance, although just “black” could have been submitted as a query in the previous example, because “angus” could also be found in the n-best list, it was included.
  • As shown in FIG. 10, words in the n-best list alone (without supplementary matches from the backend) covered the full transcription 4.31% of the time. Note that full coverage constitutes recovery from the error since the transcription can be completely built up word by word from the n-best list. In 24.6% of the cases, only part of the transcription could be covered by the n-best list (not shown in FIG. 10), in which case, another query would need to be submitted to get a new list. If the n-best list is supplemented with matches from the backend, an improvement in the recovery rate to 14.4%, which is a factor of 3.4 over using the n-best list as a word palette is obtained. For the transcriptions which were only partially covered by the n-best list, if the backend is utilized using whatever words could be found, the recovery rate jumps to 25.2%. If the n-best list is used padded by supplementary matches to submit a query, the recovery rate is 28.5% (which is also the relative error reduction with respect to the entire data set). This is 5.9 times the recovery rate of using just the n-best list as a word palette.
  • Looking at text hints with speech, before examining the effect of providing text hints on the recovery rate, as a baseline, it was first considered how well the innovation could recover from an error by just retrieving the top 8 most popular listings from the index. This is shown in FIG. 11 in the 0 character column. Surprisingly, guessing the top 8 most popular listings resulted in a recovery rate of 14.4%. FIG. 7 a shows these listings as a default list when starting, which includes the general category “pizza” as well as popular stores such as “wal-mart” and “home depot.” Below is a discussion of the implication of such a high baseline.
  • In applying text hints for the experiment, used was a simple left-to-right assignment of prefixes for generating a wildcard query that proceeded as follows: Given m characters to assign, a character can be assigned to the prefix of each word in the transcription. If there were still characters left to assign, the innovation would loop back to the beginning word of the transcription and continue. For example, for a 3 character text hint, if the transcription contained three words such as “black angus restaurant”, the innovation would assign prefix characters for all three words; namely, “b* a* r*”. If, on the other hand, the transcription was “home depot,” the innovation would loop back to the first word so that the generated wildcard query would be “ho* d*”. After generating a wildcard query for the text hint from the transcription, the innovation used it to filter the n-best list obtained from submitting the transcription utterance to the speech server. If there were enough list items after the filtering, the list was supplemented as described in the Supplement generator description above.
  • When a 1-character text hint is used along with the spoken utterance, as shown in FIG. 11, the recovery rate jumps to almost 50%, with 16.8% of the transcriptions appearing in the top position of the result list. That is 3.4 times better than guessing the listing using no text hints. As more and more characters are used in the text hint, the recovery rate climbs to as high as 92.7% for 3 characters.
  • It will be understood that, oftentimes, users consistently asked for popular listings. As such, because the backend utilizes popularity to retrieve k-best matches, the correct answer was frequently obtained. This may be because users have found that low popularity listings do not get recognized as well as high popularity listings, so they do not even bother with those. Or it may be that popular listings are precisely popular because they get asked frequently. In any case, by providing users with a multi-modal interface, that facilitates a richer set of recovery strategies, they will be encouraged to try unpopular queries as well as popular ones.
  • In this disclosure, the system, a multi-modal interface system that can be used for voice search applications (e.g., mobile voice search) is presented. This innovation not only facilitates touch and text refinement whenever speech fails (or accuracy is compromised), but also allows users to assist the recognizer via text hints. The innovation can also take advantage of any partial knowledge users may have about their queries by letting them express their uncertainty through “something” expressions. Also discussed was an example overall architecture and details of how the innovation could quickly retrieve exact and approximate matches to the listings from the backend. Finally, in evaluating the innovation via simulation experiments conducted on real mobile voice search data, the innovation found that leveraging multi-modal refinement using the word palette resulted in a 28% relative reduction in error rate. Furthermore, providing text hints along with a spoken utterance resulted in dramatic gains in recovery rate, though this should be qualified by stating that users in the test data tended to ask for popular listings which we could retrieve quickly.
  • As described supra, in mobile device aspects, voice search applications encourage users to “just say what you want” in order to obtain useful mobile content such as automated directory assistance (ADA). Unfortunately, when users only remember part of what they are looking for, they are forced to guess, even though what they know may be sufficient to retrieve the desired information. In this disclosure, it is proposed to expand the capabilities of voice search to allow users to explicitly express their uncertainties as part of their queries, and as such, to provide partial knowledge. Applied to ADA, the disclosure highlights the enhanced user experience uncertain expressions affords and delineates how to perform language modeling and information retrieval. As described in detail above, the innovation evaluates an approach by assessing its impact on overall ADA performance and by discussing the results of an experiment in which users generated both uncertain expressions as well as guesses for directory listings. Uncertain expressions reduced relative error rate by 31.8% compared to guessing.
  • Referring now to FIG. 12, there is illustrated a block diagram of a computer operable to execute the disclosed architecture. In order to provide additional context for various aspects of the subject innovation, FIG. 12 and the following discussion are intended to provide a brief, general description of a suitable computing environment 1200 in which the various aspects of the innovation can be implemented. While the innovation has been described above in the general context of computer-executable instructions that may run on one or more computers, those skilled in the art will recognize that the innovation also can be implemented in combination with other program modules and/or as a combination of hardware and software.
  • Generally, program modules include routines, programs, components, data structures, etc., that perform particular tasks or implement particular abstract data types. Moreover, those skilled in the art will appreciate that the inventive methods can be practiced with other computer system configurations, including single-processor or multiprocessor computer systems, minicomputers, mainframe computers, as well as personal computers, hand-held computing devices, microprocessor-based or programmable consumer electronics, and the like, each of which can be operatively coupled to one or more associated devices.
  • The illustrated aspects of the innovation may also be practiced in distributed computing environments where certain tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules can be located in both local and remote memory storage devices.
  • A computer typically includes a variety of computer-readable media. Computer-readable media can be any available media that can be accessed by the computer and includes both volatile and nonvolatile media, removable and non-removable media. By way of example, and not limitation, computer-readable media can comprise computer storage media and communication media. Computer storage media includes both volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer-readable instructions, data structures, program modules or other data. Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital versatile disk (DVD) or other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can be accessed by the computer.
  • Communication media typically embodies computer-readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism, and includes any information delivery media. The term “modulated data signal” means a signal that has one or more of its characteristics set or changed in such a manner as to encode information in the signal. By way of example, and not limitation, communication media includes wired media such as a wired network or direct-wired connection, and wireless media such as acoustic, RF, infrared and other wireless media. Combinations of the any of the above should also be included within the scope of computer-readable media.
  • With reference again to FIG. 12, the exemplary environment 1200 for implementing various aspects of the innovation includes a computer 1202, the computer 1202 including a processing unit 1204, a system memory 1206 and a system bus 1208. The system bus 1208 couples system components including, but not limited to, the system memory 1206 to the processing unit 1204. The processing unit 1204 can be any of various commercially available processors. Dual microprocessors and other multi-processor architectures may also be employed as the processing unit 1204.
  • The system bus 1208 can be any of several types of bus structure that may further interconnect to a memory bus (with or without a memory controller), a peripheral bus, and a local bus using any of a variety of commercially available bus architectures. The system memory 1206 includes read-only memory (ROM) 1210 and random access memory (RAM) 1212. A basic input/output system (BIOS) is stored in a non-volatile memory 1210 such as ROM, EPROM, EEPROM, which BIOS contains the basic routines that help to transfer information between elements within the computer 1202, such as during start-up. The RAM 1212 can also include a high-speed RAM such as static RAM for caching data.
  • The computer 1202 further includes an internal hard disk drive (HDD) 1214 (e.g., EIDE, SATA), which internal hard disk drive 1214 may also be configured for external use in a suitable chassis (not shown), a magnetic floppy disk drive (FDD) 1216, (e.g., to read from or write to a removable diskette 1218) and an optical disk drive 1220, (e.g., reading a CD-ROM disk 1222 or, to read from or write to other high capacity optical media such as the DVD). The hard disk drive 1214, magnetic disk drive 1216 and optical disk drive 1220 can be connected to the system bus 1208 by a hard disk drive interface 1224, a magnetic disk drive interface 1226 and an optical drive interface 1228, respectively. The interface 1224 for external drive implementations includes at least one or both of Universal Serial Bus (USB) and IEEE 1394 interface technologies. Other external drive connection technologies are within contemplation of the subject innovation.
  • The drives and their associated computer-readable media provide nonvolatile storage of data, data structures, computer-executable instructions, and so forth. For the computer 1202, the drives and media accommodate the storage of any data in a suitable digital format. Although the description of computer-readable media above refers to a HDD, a removable magnetic diskette, and a removable optical media such as a CD or DVD, it should be appreciated by those skilled in the art that other types of media which are readable by a computer, such as zip drives, magnetic cassettes, flash memory cards, cartridges, and the like, may also be used in the exemplary operating environment, and further, that any such media may contain computer-executable instructions for performing the methods of the innovation.
  • A number of program modules can be stored in the drives and RAM 1212, including an operating system 1230, one or more application programs 1232, other program modules 1234 and program data 1236. All or portions of the operating system, applications, modules, and/or data can also be cached in the RAM 1212. It is appreciated that the innovation can be implemented with various commercially available operating systems or combinations of operating systems.
  • A user can enter commands and information into the computer 1202 through one or more wired/wireless input devices, e.g., a keyboard 1238 and a pointing device, such as a mouse 1240. Other input devices (not shown) may include a microphone, an IR remote control, a joystick, a game pad, a stylus pen, touch screen, or the like. These and other input devices are often connected to the processing unit 1204 through an input device interface 1242 that is coupled to the system bus 1208, but can be connected by other interfaces, such as a parallel port, an IEEE 1394 serial port, a game port, a USB port, an IR interface, etc.
  • A monitor 1244 or other type of display device is also connected to the system bus 1208 via an interface, such as a video adapter 1246. In addition to the monitor 1244, a computer typically includes other peripheral output devices (not shown), such as speakers, printers, etc.
  • The computer 1202 may operate in a networked environment using logical connections via wired and/or wireless communications to one or more remote computers, such as a remote computer(s) 1248. The remote computer(s) 1248 can be a workstation, a server computer, a router, a personal computer, portable computer, microprocessor-based entertainment appliance, a peer device or other common network node, and typically includes many or all of the elements described relative to the computer 1202, although, for purposes of brevity, only a memory/storage device 1250 is illustrated. The logical connections depicted include wired/wireless connectivity to a local area network (LAN) 1252 and/or larger networks, e.g., a wide area network (WAN) 1254. Such LAN and WAN networking environments are commonplace in offices and companies, and facilitate enterprise-wide computer networks, such as intranets, all of which may connect to a global communications network, e.g., the Internet.
  • When used in a LAN networking environment, the computer 1202 is connected to the local network 1252 through a wired and/or wireless communication network interface or adapter 1256. The adapter 1256 may facilitate wired or wireless communication to the LAN 1252, which may also include a wireless access point disposed thereon for communicating with the wireless adapter 1256.
  • When used in a WAN networking environment, the computer 1202 can include a modem 1258, or is connected to a communications server on the WAN 1254, or has other means for establishing communications over the WAN 1254, such as by way of the Internet. The modem 1258, which can be internal or external and a wired or wireless device, is connected to the system bus 1208 via the serial port interface 1242. In a networked environment, program modules depicted relative to the computer 1202, or portions thereof, can be stored in the remote memory/storage device 1250. It will be appreciated that the network connections shown are exemplary and other means of establishing a communications link between the computers can be used.
  • The computer 1202 is operable to communicate with any wireless devices or entities operatively disposed in wireless communication, e.g., a printer, scanner, desktop and/or portable computer, portable data assistant, communications satellite, any piece of equipment or location associated with a wirelessly detectable tag (e.g., a kiosk, news stand, restroom), and telephone. This includes at least Wi-Fi and Bluetooth™ wireless technologies. Thus, the communication can be a predefined structure as with a conventional network or simply an ad hoc communication between at least two devices.
  • Wi-Fi, or Wireless Fidelity, allows connection to the Internet from a couch at home, a bed in a hotel room, or a conference room at work, without wires. Wi-Fi is a wireless technology similar to that used in a cell phone that enables such devices, e.g., computers, to send and receive data indoors and out; anywhere within the range of a base station. Wi-Fi networks use radio technologies called IEEE 802.11 (a, b, g, etc.) to provide secure, reliable, fast wireless connectivity. A Wi-Fi network can be used to connect computers to each other, to the Internet, and to wired networks (which use IEEE 802.3 or Ethernet). Wi-Fi networks operate in the unlicensed 2.4 and 5 GHz radio bands, at an 11 Mbps (802.11a) or 54 Mbps (802.11b) data rate, for example, or with products that contain both bands (dual band), so the networks can provide real-world performance similar to the basic 10 BaseT wired Ethernet networks used in many offices.
  • Referring now to FIG. 13, there is illustrated a schematic block diagram of an exemplary computing environment 1300 in accordance with the subject innovation. The system 1300 includes one or more client(s) 1302. The client(s) 1302 can be hardware and/or software (e.g., threads, processes, computing devices). The client(s) 1302 can house cookie(s) and/or associated contextual information by employing the innovation, for example.
  • The system 1300 also includes one or more server(s) 1304. The server(s) 1304 can also be hardware and/or software (e.g., threads, processes, computing devices). The servers 1304 can house threads to perform transformations by employing the innovation, for example. One possible communication between a client 1302 and a server 1304 can be in the form of a data packet adapted to be transmitted between two or more computer processes. The data packet may include a cookie and/or associated contextual information, for example. The system 1300 includes a communication framework 1306 (e.g., a global communication network such as the Internet) that can be employed to facilitate communications between the client(s) 1302 and the server(s) 1304.
  • Communications can be facilitated via a wired (including optical fiber) and/or wireless technology. The client(s) 1302 are operatively connected to one or more client data store(s) 1308 that can be employed to store information local to the client(s) 1302 (e.g., cookie(s) and/or associated contextual information). Similarly, the server(s) 1304 are operatively connected to one or more server data store(s) 1310 that can be employed to store information local to the servers 1304.
  • What has been described above includes examples of the innovation. It is, of course, not possible to describe every conceivable combination of components or methodologies for purposes of describing the subject innovation, but one of ordinary skill in the art may recognize that many further combinations and permutations of the innovation are possible. Accordingly, the innovation is intended to embrace all such alterations, modifications and variations that fall within the spirit and scope of the appended claims. Furthermore, to the extent that the term “includes” is used in either the detailed description or the claims, such term is intended to be inclusive in a manner similar to the term “comprising” as “comprising” is interpreted when employed as a transitional word in a claim.

Claims (20)

1. A system that facilitates multi-modal search query refinement, comprising:
a query administration component that employs a plurality of modalities to refine a list of query suggestion results into a regular expression query; and
a search query suggestion engine component that evaluates the regular expression query and renders a list of refined query suggestion results as a function of the evaluation.
2. The system of claim 1, wherein the regular expression query includes at least one wildcard.
3. The system of claim 1 wherein the list of query suggestion results includes a list of best candidate matches that represent a word palette that enables individual string selection, where the string can be a word or part of a word.
4. The system of claim 1, further comprising:
an analysis component that evaluates the regular expression query; and
a query generation component that renders a list of refined query suggestion results.
5. The system of claim 1, further comprising a selection component that enables a user to choose at least one word from the list of query suggestion results, wherein the selection facilitates refinement of the regular expression query where refinement can include any deletion, substitution, or addition of characters to the original query, and wherein the refinement is maintained in a separate area as a “scratchpad.”
6. The system of claim 5, wherein the selection is accomplished by way of a drag/drop procedure.
7. The system of claim 5, wherein at least a portion of the selection identifies an exclusion used as a parameter to establish the list of refined query suggestion results.
8. The system of claim 1, wherein a k-best suffix-array with exclusion constraints is used to generate the list of refined query suggestion results.
9. The system of claim 1, wherein the plurality of modalities includes at least two of text, touch, speech and gesture.
10. The system of claim 1, wherein the list of refined query suggestion results includes an n-best list or alternates list from a speech recognizer as well as a list of supplementary results that includes at least one of an ‘exact’ match via a wildcard expression or an ‘approximate’ match via information retrieval algorithms.
11. The system of claim 10, wherein at least part of the n-best list obtained from the speech recognizer is submitted as a query to an information retrieval algorithm that is indifferent to the order of words in the regular expression query.
12. The system of claim 1, wherein the query administration component employs user generated text to constrain speech recognition upon generating the regular expression query.
13. The system of claim 1, further comprising an artificial intelligence (AI) component that employs at least one of a probabilistic and a statistical-based analysis that infers an action that a user desires to be automatically performed.
14. A computer-implemented method of search refinement, comprising:
receiving a selection related to a plurality of words in a set of query suggestion results, wherein the selection defines a refinement of an original query;
establishing a regular expression query based upon a subset of the selection; and
rendering a plurality of refined query suggestion results based upon the regular expression query.
15. The computer-implemented method of claim 14, wherein the selection is effectuated by at least one of text, speech, touch or gesture and wherein the refinement is maintained upon a “scratchpad.”
16. The computer-implemented method of claim 14, further comprising excluding a subset of words in the set of results, wherein the excluded words define a parameter for the plurality of results.
17. The computer-implemented method of claim 14, further comprising:
receiving an input that supplements the selection;
converting a portion of the input into a wildcard; and
retrieving a subset of the plurality of refined query suggestion results based upon the wildcard.
18. A computer-executable system of refining search queries, comprising:
means for rendering query suggestion results as a plurality of selectable words;
means for choosing a subset of the plurality of selectable words;
means for refining the query suggestion results based at least in part upon the chosen subset of selectable words.
19. The computer-executable system of claim 18, further comprising:
means for designating at least a portion of the chosen subset of selectable words as exclusions, wherein the exclusions are employed to retrieve the refined query suggestion results.
20. The computer-executable system of claim 19, wherein the means for selecting is a drag/drop procedure of selecting each of the subset of the plurality of selectable words.
US12/200,584 2008-05-14 2008-08-28 Multi-modal query refinement Abandoned US20090287680A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US12/200,584 US20090287680A1 (en) 2008-05-14 2008-08-28 Multi-modal query refinement

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US5321408P 2008-05-14 2008-05-14
US12/200,584 US20090287680A1 (en) 2008-05-14 2008-08-28 Multi-modal query refinement

Publications (1)

Publication Number Publication Date
US20090287680A1 true US20090287680A1 (en) 2009-11-19

Family

ID=41317081

Family Applications (3)

Application Number Title Priority Date Filing Date
US12/200,584 Abandoned US20090287680A1 (en) 2008-05-14 2008-08-28 Multi-modal query refinement
US12/200,648 Abandoned US20090287626A1 (en) 2008-05-14 2008-08-28 Multi-modal query generation
US12/200,625 Active 2030-01-29 US8090738B2 (en) 2008-05-14 2008-08-28 Multi-modal search wildcards

Family Applications After (2)

Application Number Title Priority Date Filing Date
US12/200,648 Abandoned US20090287626A1 (en) 2008-05-14 2008-08-28 Multi-modal query generation
US12/200,625 Active 2030-01-29 US8090738B2 (en) 2008-05-14 2008-08-28 Multi-modal search wildcards

Country Status (1)

Country Link
US (3) US20090287680A1 (en)

Cited By (71)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100191520A1 (en) * 2009-01-23 2010-07-29 Harman Becker Automotive Systems Gmbh Text and speech recognition system using navigation information
US20100223547A1 (en) * 2009-02-27 2010-09-02 Research In Motion Limited System and method for improved address entry
US20110055189A1 (en) * 2009-08-31 2011-03-03 Effrat Jonathan J Framework for selecting and presenting answer boxes relevant to user input as query suggestions
US20110145214A1 (en) * 2009-12-16 2011-06-16 Motorola, Inc. Voice web search
US20110161347A1 (en) * 2009-12-30 2011-06-30 At&T Intellectual Property I, L.P. System and method for an n-best list interface
US20110201387A1 (en) * 2010-02-12 2011-08-18 Microsoft Corporation Real-time typing assistance
US20120054177A1 (en) * 2010-08-31 2012-03-01 Microsoft Corporation Sketch-based image search
US8249876B1 (en) * 2012-01-03 2012-08-21 Google Inc. Method for providing alternative interpretations of a voice input to a user
US20120290969A1 (en) * 2011-05-11 2012-11-15 Abb Technology Ag Multi-stage method and apparatus for interactively locating device data of an automation system
US20130036111A2 (en) * 2011-02-11 2013-02-07 Siemens Aktiengesellschaft Methods and devicesfor data retrieval
US8504437B1 (en) 2009-11-04 2013-08-06 Google Inc. Dynamically selecting and presenting content relevant to user input
US8515935B1 (en) 2007-05-31 2013-08-20 Google Inc. Identifying related queries
US20130226892A1 (en) * 2012-02-29 2013-08-29 Fluential, Llc Multimodal natural language interface for faceted search
US20130290291A1 (en) * 2011-01-14 2013-10-31 Apple Inc. Tokenized Search Suggestions
US8577913B1 (en) 2011-05-27 2013-11-05 Google Inc. Generating midstring query refinements
US20130297304A1 (en) * 2012-05-02 2013-11-07 Electronics And Telecommunications Research Institute Apparatus and method for speech recognition
US20130326353A1 (en) * 2012-06-02 2013-12-05 Tara Chand Singhal System and method for context driven voice interface in handheld wireles mobile devices
US20130332876A1 (en) * 2011-03-20 2013-12-12 William J. Johnson System and Method for Summoning User Interface Objects
US8630851B1 (en) 2011-06-29 2014-01-14 Amazon Technologies, Inc. Assisted shopping
US20140046922A1 (en) * 2012-08-08 2014-02-13 Microsoft Corporation Search user interface using outward physical expressions
US8676828B1 (en) * 2009-11-04 2014-03-18 Google Inc. Selecting and presenting content relevant to user input
US8682906B1 (en) * 2013-01-23 2014-03-25 Splunk Inc. Real time display of data field values based on manual editing of regular expressions
US8751499B1 (en) 2013-01-22 2014-06-10 Splunk Inc. Variable representative sampling under resource constraints
US8751963B1 (en) 2013-01-23 2014-06-10 Splunk Inc. Real time indication of previously extracted data fields for regular expressions
US20140181135A1 (en) * 2010-08-19 2014-06-26 Google Inc. Predictive query completion and predictive search results
US20140207758A1 (en) * 2013-01-24 2014-07-24 Huawei Technologies Co., Ltd. Thread Object-Based Search Method and Apparatus
US8849785B1 (en) * 2010-01-15 2014-09-30 Google Inc. Search query reformulation using result term occurrence count
US8849791B1 (en) * 2011-06-29 2014-09-30 Amazon Technologies, Inc. Assisted shopping
US20140337131A1 (en) * 2011-09-23 2014-11-13 Amazon Technologies, Inc. Keyword determinations from voice data
US8909642B2 (en) 2013-01-23 2014-12-09 Splunk Inc. Automatic generation of a field-extraction rule based on selections in a sample event
US20150081678A1 (en) * 2009-12-15 2015-03-19 At&T Intellectual Property I, L.P. System and method for speech-based incremental search
US20150082237A1 (en) * 2012-04-27 2015-03-19 Sharp Kabushiki Kaisha Mobile information terminal
US20150154214A1 (en) * 2011-10-05 2015-06-04 Google Inc. Referent based search suggestions
US9147125B2 (en) 2013-05-03 2015-09-29 Microsoft Technology Licensing, Llc Hand-drawn sketch recognition
US9152929B2 (en) * 2013-01-23 2015-10-06 Splunk Inc. Real time display of statistics and values for selected regular expressions
US9183323B1 (en) 2008-06-27 2015-11-10 Google Inc. Suggesting alternative query phrases in query results
US20160092428A1 (en) * 2014-09-30 2016-03-31 Microsoft Technology Licensing, Llc Dynamic Presentation of Suggested Content
US9305108B2 (en) 2011-10-05 2016-04-05 Google Inc. Semantic selection and purpose facilitation
US9305548B2 (en) 2008-05-27 2016-04-05 Voicebox Technologies Corporation System and method for an integrated, multi-modal, multi-device natural language voice services environment
US20160171122A1 (en) * 2014-12-10 2016-06-16 Ford Global Technologies, Llc Multimodal search response
US9406078B2 (en) 2007-02-06 2016-08-02 Voicebox Technologies Corporation System and method for delivering targeted advertisements and/or providing natural language processing based on advertisements
US9411830B2 (en) 2011-11-24 2016-08-09 Microsoft Technology Licensing, Llc Interactive multi-modal image search
US9443519B1 (en) 2015-09-09 2016-09-13 Google Inc. Reducing latency caused by switching input modalities
CN106021402A (en) * 2016-05-13 2016-10-12 河南师范大学 Multi-modal multi-class Boosting frame construction method and device for cross-modal retrieval
US9570070B2 (en) 2009-02-20 2017-02-14 Voicebox Technologies Corporation System and method for processing multi-modal device interactions in a natural language voice services environment
US9620113B2 (en) 2007-12-11 2017-04-11 Voicebox Technologies Corporation System and method for providing a natural language voice user interface
US9626703B2 (en) * 2014-09-16 2017-04-18 Voicebox Technologies Corporation Voice commerce
US20170139887A1 (en) 2012-09-07 2017-05-18 Splunk, Inc. Advanced field extractor with modification of an extracted field
US9672287B2 (en) 2013-12-26 2017-06-06 Thomson Licensing Method and apparatus for gesture-based searching
US9679079B2 (en) 2012-07-19 2017-06-13 Yandex Europe Ag Search query suggestions based in part on a prior search and searches based on such suggestions
US9747896B2 (en) 2014-10-15 2017-08-29 Voicebox Technologies Corporation System and method for providing follow-up responses to prior natural language inputs of a user
US20170263250A1 (en) * 2016-03-08 2017-09-14 Toyota Jidosha Kabushiki Kaisha Voice processing system and voice processing method
US9881222B2 (en) 2014-09-30 2018-01-30 Microsoft Technology Licensing, Llc Optimizing a visual perspective of media
US9898459B2 (en) 2014-09-16 2018-02-20 Voicebox Technologies Corporation Integration of domain information into state transitions of a finite state transducer for natural language processing
US9965521B1 (en) * 2014-02-05 2018-05-08 Google Llc Determining a transition probability from one or more past activity indications to one or more subsequent activity indications
US9990433B2 (en) 2014-05-23 2018-06-05 Samsung Electronics Co., Ltd. Method for searching and device thereof
US10013152B2 (en) 2011-10-05 2018-07-03 Google Llc Content selection disambiguation
US20190108276A1 (en) * 2017-10-10 2019-04-11 NEGENTROPICS Mesterséges Intelligencia Kutató és Fejlesztõ Kft Methods and system for semantic search in large databases
US10297249B2 (en) 2006-10-16 2019-05-21 Vb Assets, Llc System and method for a cooperative conversational voice user interface
US10318537B2 (en) 2013-01-22 2019-06-11 Splunk Inc. Advanced field extractor
US10331784B2 (en) 2016-07-29 2019-06-25 Voicebox Technologies Corporation System and method of disambiguating natural language processing requests
US10380228B2 (en) 2017-02-10 2019-08-13 Microsoft Technology Licensing, Llc Output generation based on semantic expressions
US10394946B2 (en) 2012-09-07 2019-08-27 Splunk Inc. Refining extraction rules based on selected text within events
US10431214B2 (en) 2014-11-26 2019-10-01 Voicebox Technologies Corporation System and method of determining a domain and/or an action related to a natural language input
US10552410B2 (en) 2017-11-14 2020-02-04 Mindbridge Analytics Inc. Method and system for presenting a user selectable interface in response to a natural language request
US10614799B2 (en) 2014-11-26 2020-04-07 Voicebox Technologies Corporation System and method of providing intent predictions for an utterance prior to a system detection of an end of the utterance
US10896284B2 (en) 2012-07-18 2021-01-19 Microsoft Technology Licensing, Llc Transforming data to create layouts
US10956503B2 (en) * 2016-09-20 2021-03-23 Salesforce.Com, Inc. Suggesting query items based on frequent item sets
US11314826B2 (en) 2014-05-23 2022-04-26 Samsung Electronics Co., Ltd. Method for searching and device thereof
US11651149B1 (en) 2012-09-07 2023-05-16 Splunk Inc. Event selection via graphical user interface control
US11710194B2 (en) * 2016-04-29 2023-07-25 Liveperson, Inc. Systems, media, and methods for automated response to queries made by interactive electronic chat

Families Citing this family (50)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7516190B2 (en) * 2000-02-04 2009-04-07 Parus Holdings, Inc. Personal voice-based information retrieval system
US20090248476A1 (en) * 2008-03-27 2009-10-01 Mitel Networks Corporation Method, system and apparatus for controlling an application
US20090287680A1 (en) * 2008-05-14 2009-11-19 Microsoft Corporation Multi-modal query refinement
US20110106814A1 (en) * 2008-10-14 2011-05-05 Yohei Okato Search device, search index creating device, and search system
EP2418589A4 (en) * 2009-04-06 2012-09-12 Mitsubishi Electric Corp Retrieval device
US11416214B2 (en) 2009-12-23 2022-08-16 Google Llc Multi-modal input on an electronic device
EP4318463A3 (en) 2009-12-23 2024-02-28 Google LLC Multi-modal input on an electronic device
US20120278308A1 (en) * 2009-12-30 2012-11-01 Google Inc. Custom search query suggestion tools
US8650210B1 (en) 2010-02-09 2014-02-11 Google Inc. Identifying non-search actions based on a search query
US9348417B2 (en) 2010-11-01 2016-05-24 Microsoft Technology Licensing, Llc Multimodal input system
US8352245B1 (en) 2010-12-30 2013-01-08 Google Inc. Adjusting language models
KR101828273B1 (en) * 2011-01-04 2018-02-14 삼성전자주식회사 Apparatus and method for voice command recognition based on combination of dialog models
US8296142B2 (en) 2011-01-21 2012-10-23 Google Inc. Speech recognition using dock context
US10409851B2 (en) 2011-01-31 2019-09-10 Microsoft Technology Licensing, Llc Gesture-based search
US10444979B2 (en) 2011-01-31 2019-10-15 Microsoft Technology Licensing, Llc Gesture-based search
US8527483B2 (en) * 2011-02-04 2013-09-03 Mikko VÄÄNÄNEN Method and means for browsing by walking
US8688667B1 (en) * 2011-02-08 2014-04-01 Google Inc. Providing intent sensitive search results
US20120209590A1 (en) * 2011-02-16 2012-08-16 International Business Machines Corporation Translated sentence quality estimation
KR101852821B1 (en) 2011-09-08 2018-04-27 엘지전자 주식회사 Mobile terminal and method for controlling the same
US9129606B2 (en) 2011-09-23 2015-09-08 Microsoft Technology Licensing, Llc User query history expansion for improving language model adaptation
US20130091266A1 (en) 2011-10-05 2013-04-11 Ajit Bhave System for organizing and fast searching of massive amounts of data
US9081829B2 (en) * 2011-10-05 2015-07-14 Cumulus Systems Incorporated System for organizing and fast searching of massive amounts of data
US9081834B2 (en) * 2011-10-05 2015-07-14 Cumulus Systems Incorporated Process for gathering and special data structure for storing performance metric data
US8788273B2 (en) 2012-02-15 2014-07-22 Robbie Donald EDGAR Method for quick scroll search using speech recognition
US10984337B2 (en) 2012-02-29 2021-04-20 Microsoft Technology Licensing, Llc Context-based search query formation
US9064492B2 (en) * 2012-07-09 2015-06-23 Nuance Communications, Inc. Detecting potential significant errors in speech recognition results
US20140019462A1 (en) * 2012-07-15 2014-01-16 Microsoft Corporation Contextual query adjustments using natural action input
US9483518B2 (en) * 2012-12-18 2016-11-01 Microsoft Technology Licensing, Llc Queryless search based on context
US10223411B2 (en) * 2013-03-06 2019-03-05 Nuance Communications, Inc. Task assistant utilizing context for improved interaction
US10783139B2 (en) * 2013-03-06 2020-09-22 Nuance Communications, Inc. Task assistant
US10795528B2 (en) 2013-03-06 2020-10-06 Nuance Communications, Inc. Task assistant having multiple visual displays
US9842592B2 (en) 2014-02-12 2017-12-12 Google Inc. Language models using non-linguistic context
US9412365B2 (en) 2014-03-24 2016-08-09 Google Inc. Enhanced maximum entropy models
US10311115B2 (en) 2014-05-15 2019-06-04 Huawei Technologies Co., Ltd. Object search method and apparatus
TWI798912B (en) * 2014-05-23 2023-04-11 南韓商三星電子股份有限公司 Search method, electronic device and non-transitory computer-readable recording medium
US9953646B2 (en) 2014-09-02 2018-04-24 Belleau Technologies Method and system for dynamic speech recognition and tracking of prewritten script
GB201418402D0 (en) * 2014-10-16 2014-12-03 Touchtype Ltd Text prediction integration
US10276158B2 (en) 2014-10-31 2019-04-30 At&T Intellectual Property I, L.P. System and method for initiating multi-modal speech recognition using a long-touch gesture
US10134394B2 (en) 2015-03-20 2018-11-20 Google Llc Speech recognition using log-linear model
US9978367B2 (en) 2016-03-16 2018-05-22 Google Llc Determining dialog states for language models
WO2017180153A1 (en) * 2016-04-15 2017-10-19 Entit Software Llc Removing wildcard tokens from a set of wildcard tokens for a search query
US10832664B2 (en) 2016-08-19 2020-11-10 Google Llc Automated speech recognition using language models that selectively use domain-specific model components
CN106446122B (en) * 2016-09-19 2020-03-10 华为技术有限公司 Information retrieval method and device and computing equipment
US10311860B2 (en) 2017-02-14 2019-06-04 Google Llc Language model biasing system
JP2021144065A (en) * 2018-06-12 2021-09-24 ソニーグループ株式会社 Information processing device and information processing method
CN111159472B (en) 2018-11-08 2024-03-12 微软技术许可有限责任公司 Multimodal chat technique
CN113204669B (en) * 2021-06-08 2022-12-06 以特心坊(深圳)科技有限公司 Short video search recommendation method, system and storage medium based on voice recognition
CN113656546A (en) * 2021-08-17 2021-11-16 百度在线网络技术(北京)有限公司 Multimodal search method, apparatus, device, storage medium, and program product
WO2023074916A1 (en) * 2021-10-29 2023-05-04 Tesnology Inc. Data transaction management with database on edge device
CN117033724A (en) * 2023-08-24 2023-11-10 青海昇云信息科技有限公司 Multi-mode data retrieval method based on semantic association

Citations (27)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020123876A1 (en) * 2000-12-30 2002-09-05 Shuvranshu Pokhariyal Specifying arbitrary words in rule-based grammars
US6564213B1 (en) * 2000-04-18 2003-05-13 Amazon.Com, Inc. Search query autocompletion
US20040054541A1 (en) * 2002-09-16 2004-03-18 David Kryze System and method of media file access and retrieval using speech recognition
US20050283364A1 (en) * 1998-12-04 2005-12-22 Michael Longe Multimodal disambiguation of speech recognition
US7027987B1 (en) * 2001-02-07 2006-04-11 Google Inc. Voice interface for a search engine
US7096218B2 (en) * 2002-01-14 2006-08-22 International Business Machines Corporation Search refinement graphical user interface
US20060190436A1 (en) * 2005-02-23 2006-08-24 Microsoft Corporation Dynamic client interaction for search
US20060190256A1 (en) * 1998-12-04 2006-08-24 James Stephanick Method and apparatus utilizing voice input to resolve ambiguous manually entered text input
US20060293890A1 (en) * 2005-06-28 2006-12-28 Avaya Technology Corp. Speech recognition assisted autocompletion of composite characters
US20070022005A1 (en) * 2005-07-21 2007-01-25 Hanna Nader G Method for requesting, displaying, and facilitating placement of an advertisement in a computer network
US20070050191A1 (en) * 2005-08-29 2007-03-01 Voicebox Technologies, Inc. Mobile systems and methods of supporting natural language human-machine interactions
US20070061335A1 (en) * 2005-09-14 2007-03-15 Jorey Ramer Multimodal search query processing
US20070061336A1 (en) * 2005-09-14 2007-03-15 Jorey Ramer Presentation of sponsored content based on mobile transaction event
US20070067345A1 (en) * 2005-09-21 2007-03-22 Microsoft Corporation Generating search requests from multimodal queries
US20070162422A1 (en) * 2005-12-30 2007-07-12 George Djabarov Dynamic search box for web browser
US20070164782A1 (en) * 2006-01-17 2007-07-19 Microsoft Corporation Multi-word word wheeling
US7277029B2 (en) * 2005-06-23 2007-10-02 Microsoft Corporation Using language models to expand wildcards
US20070239670A1 (en) * 2004-10-20 2007-10-11 International Business Machines Corporation Optimization-based data content determination
US20070299824A1 (en) * 2006-06-27 2007-12-27 International Business Machines Corporation Hybrid approach for query recommendation in conversation systems
US20080086311A1 (en) * 2006-04-11 2008-04-10 Conwell William Y Speech Recognition, and Related Systems
US20080162471A1 (en) * 2005-01-24 2008-07-03 Bernard David E Multimodal natural language query system for processing and analyzing voice and proximity-based queries
US20090006343A1 (en) * 2007-06-28 2009-01-01 Microsoft Corporation Machine assisted query formulation
US20090019002A1 (en) * 2007-07-13 2009-01-15 Medio Systems, Inc. Personalized query completion suggestion
US20090287626A1 (en) * 2008-05-14 2009-11-19 Microsoft Corporation Multi-modal query generation
US20100125457A1 (en) * 2008-11-19 2010-05-20 At&T Intellectual Property I, L.P. System and method for discriminative pronunciation modeling for voice search
US7778837B2 (en) * 2006-05-01 2010-08-17 Microsoft Corporation Demographic based classification for local word wheeling/web search
US7797303B2 (en) * 2006-02-15 2010-09-14 Xerox Corporation Natural language processing for developing queries

Patent Citations (30)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050283364A1 (en) * 1998-12-04 2005-12-22 Michael Longe Multimodal disambiguation of speech recognition
US20060190256A1 (en) * 1998-12-04 2006-08-24 James Stephanick Method and apparatus utilizing voice input to resolve ambiguous manually entered text input
US6564213B1 (en) * 2000-04-18 2003-05-13 Amazon.Com, Inc. Search query autocompletion
US20020123876A1 (en) * 2000-12-30 2002-09-05 Shuvranshu Pokhariyal Specifying arbitrary words in rule-based grammars
US7027987B1 (en) * 2001-02-07 2006-04-11 Google Inc. Voice interface for a search engine
US7096218B2 (en) * 2002-01-14 2006-08-22 International Business Machines Corporation Search refinement graphical user interface
US20040054541A1 (en) * 2002-09-16 2004-03-18 David Kryze System and method of media file access and retrieval using speech recognition
US20070239670A1 (en) * 2004-10-20 2007-10-11 International Business Machines Corporation Optimization-based data content determination
US20080162471A1 (en) * 2005-01-24 2008-07-03 Bernard David E Multimodal natural language query system for processing and analyzing voice and proximity-based queries
US20060190436A1 (en) * 2005-02-23 2006-08-24 Microsoft Corporation Dynamic client interaction for search
US7461059B2 (en) * 2005-02-23 2008-12-02 Microsoft Corporation Dynamically updated search results based upon continuously-evolving search query that is based at least in part upon phrase suggestion, search engine uses previous result sets performing additional search tasks
US7277029B2 (en) * 2005-06-23 2007-10-02 Microsoft Corporation Using language models to expand wildcards
US20060293890A1 (en) * 2005-06-28 2006-12-28 Avaya Technology Corp. Speech recognition assisted autocompletion of composite characters
US20070022005A1 (en) * 2005-07-21 2007-01-25 Hanna Nader G Method for requesting, displaying, and facilitating placement of an advertisement in a computer network
US20070050191A1 (en) * 2005-08-29 2007-03-01 Voicebox Technologies, Inc. Mobile systems and methods of supporting natural language human-machine interactions
US20070061336A1 (en) * 2005-09-14 2007-03-15 Jorey Ramer Presentation of sponsored content based on mobile transaction event
US20070061335A1 (en) * 2005-09-14 2007-03-15 Jorey Ramer Multimodal search query processing
US20070067345A1 (en) * 2005-09-21 2007-03-22 Microsoft Corporation Generating search requests from multimodal queries
US20070162422A1 (en) * 2005-12-30 2007-07-12 George Djabarov Dynamic search box for web browser
US20070164782A1 (en) * 2006-01-17 2007-07-19 Microsoft Corporation Multi-word word wheeling
US7797303B2 (en) * 2006-02-15 2010-09-14 Xerox Corporation Natural language processing for developing queries
US20080086311A1 (en) * 2006-04-11 2008-04-10 Conwell William Y Speech Recognition, and Related Systems
US7778837B2 (en) * 2006-05-01 2010-08-17 Microsoft Corporation Demographic based classification for local word wheeling/web search
US20070299824A1 (en) * 2006-06-27 2007-12-27 International Business Machines Corporation Hybrid approach for query recommendation in conversation systems
US20080215555A1 (en) * 2006-06-27 2008-09-04 International Business Machines Corporation Hybrid Approach for Query Recommendation in Conversation Systems
US20090006343A1 (en) * 2007-06-28 2009-01-01 Microsoft Corporation Machine assisted query formulation
US20090019002A1 (en) * 2007-07-13 2009-01-15 Medio Systems, Inc. Personalized query completion suggestion
US20090287626A1 (en) * 2008-05-14 2009-11-19 Microsoft Corporation Multi-modal query generation
US20090287681A1 (en) * 2008-05-14 2009-11-19 Microsoft Corporation Multi-modal search wildcards
US20100125457A1 (en) * 2008-11-19 2010-05-20 At&T Intellectual Property I, L.P. System and method for discriminative pronunciation modeling for voice search

Cited By (162)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11222626B2 (en) 2006-10-16 2022-01-11 Vb Assets, Llc System and method for a cooperative conversational voice user interface
US10755699B2 (en) 2006-10-16 2020-08-25 Vb Assets, Llc System and method for a cooperative conversational voice user interface
US10510341B1 (en) 2006-10-16 2019-12-17 Vb Assets, Llc System and method for a cooperative conversational voice user interface
US10297249B2 (en) 2006-10-16 2019-05-21 Vb Assets, Llc System and method for a cooperative conversational voice user interface
US10515628B2 (en) 2006-10-16 2019-12-24 Vb Assets, Llc System and method for a cooperative conversational voice user interface
US11080758B2 (en) 2007-02-06 2021-08-03 Vb Assets, Llc System and method for delivering targeted advertisements and/or providing natural language processing based on advertisements
US9406078B2 (en) 2007-02-06 2016-08-02 Voicebox Technologies Corporation System and method for delivering targeted advertisements and/or providing natural language processing based on advertisements
US10134060B2 (en) 2007-02-06 2018-11-20 Vb Assets, Llc System and method for delivering targeted advertisements and/or providing natural language processing based on advertisements
US8732153B1 (en) 2007-05-31 2014-05-20 Google Inc. Identifying related queries
US8515935B1 (en) 2007-05-31 2013-08-20 Google Inc. Identifying related queries
US9620113B2 (en) 2007-12-11 2017-04-11 Voicebox Technologies Corporation System and method for providing a natural language voice user interface
US10347248B2 (en) 2007-12-11 2019-07-09 Voicebox Technologies Corporation System and method for providing in-vehicle services via a natural language voice user interface
US10553216B2 (en) 2008-05-27 2020-02-04 Oracle International Corporation System and method for an integrated, multi-modal, multi-device natural language voice services environment
US9305548B2 (en) 2008-05-27 2016-04-05 Voicebox Technologies Corporation System and method for an integrated, multi-modal, multi-device natural language voice services environment
US9711143B2 (en) 2008-05-27 2017-07-18 Voicebox Technologies Corporation System and method for an integrated, multi-modal, multi-device natural language voice services environment
US10089984B2 (en) 2008-05-27 2018-10-02 Vb Assets, Llc System and method for an integrated, multi-modal, multi-device natural language voice services environment
US9183323B1 (en) 2008-06-27 2015-11-10 Google Inc. Suggesting alternative query phrases in query results
US8340958B2 (en) * 2009-01-23 2012-12-25 Harman Becker Automotive Systems Gmbh Text and speech recognition system using navigation information
US20100191520A1 (en) * 2009-01-23 2010-07-29 Harman Becker Automotive Systems Gmbh Text and speech recognition system using navigation information
US9953649B2 (en) 2009-02-20 2018-04-24 Voicebox Technologies Corporation System and method for processing multi-modal device interactions in a natural language voice services environment
US9570070B2 (en) 2009-02-20 2017-02-14 Voicebox Technologies Corporation System and method for processing multi-modal device interactions in a natural language voice services environment
US10553213B2 (en) 2009-02-20 2020-02-04 Oracle International Corporation System and method for processing multi-modal device interactions in a natural language voice services environment
US10176162B2 (en) * 2009-02-27 2019-01-08 Blackberry Limited System and method for improved address entry
US20100223547A1 (en) * 2009-02-27 2010-09-02 Research In Motion Limited System and method for improved address entry
US9396268B2 (en) 2009-08-31 2016-07-19 Google Inc. Framework for selecting and presenting answer boxes relevant to user input as query suggestions
US20110055189A1 (en) * 2009-08-31 2011-03-03 Effrat Jonathan J Framework for selecting and presenting answer boxes relevant to user input as query suggestions
US8538982B2 (en) 2009-08-31 2013-09-17 Google Inc. Framework for selecting and presenting answer boxes relevant to user input as query suggestions
US9110995B2 (en) 2009-08-31 2015-08-18 Google Inc. Framework for selecting and presenting answer boxes relevant to user input as query suggestions
US9361381B2 (en) 2009-11-04 2016-06-07 Google Inc. Selecting and presenting content relevant to user input
US10089393B2 (en) 2009-11-04 2018-10-02 Google Llc Selecting and presenting content relevant to user input
US8504437B1 (en) 2009-11-04 2013-08-06 Google Inc. Dynamically selecting and presenting content relevant to user input
US8676828B1 (en) * 2009-11-04 2014-03-18 Google Inc. Selecting and presenting content relevant to user input
US9396252B2 (en) * 2009-12-15 2016-07-19 At&T Intellectual Property I, L.P. System and method for speech-based incremental search
US20150081678A1 (en) * 2009-12-15 2015-03-19 At&T Intellectual Property I, L.P. System and method for speech-based incremental search
US9081868B2 (en) * 2009-12-16 2015-07-14 Google Technology Holdings LLC Voice web search
US20110145214A1 (en) * 2009-12-16 2011-06-16 Motorola, Inc. Voice web search
US8914401B2 (en) * 2009-12-30 2014-12-16 At&T Intellectual Property I, L.P. System and method for an N-best list interface
US20110161347A1 (en) * 2009-12-30 2011-06-30 At&T Intellectual Property I, L.P. System and method for an n-best list interface
US9110993B1 (en) 2010-01-15 2015-08-18 Google Inc. Search query reformulation using result term occurrence count
US8849785B1 (en) * 2010-01-15 2014-09-30 Google Inc. Search query reformulation using result term occurrence count
US9165257B2 (en) 2010-02-12 2015-10-20 Microsoft Technology Licensing, Llc Typing assistance for editing
US10156981B2 (en) 2010-02-12 2018-12-18 Microsoft Technology Licensing, Llc User-centric soft keyboard predictive technologies
US20110201387A1 (en) * 2010-02-12 2011-08-18 Microsoft Corporation Real-time typing assistance
US20110202876A1 (en) * 2010-02-12 2011-08-18 Microsoft Corporation User-centric soft keyboard predictive technologies
US9613015B2 (en) 2010-02-12 2017-04-04 Microsoft Technology Licensing, Llc User-centric soft keyboard predictive technologies
US8782556B2 (en) 2010-02-12 2014-07-15 Microsoft Corporation User-centric soft keyboard predictive technologies
US10126936B2 (en) 2010-02-12 2018-11-13 Microsoft Technology Licensing, Llc Typing assistance for editing
US11620318B2 (en) 2010-08-19 2023-04-04 Google Llc Predictive query completion and predictive search results
US20140181135A1 (en) * 2010-08-19 2014-06-26 Google Inc. Predictive query completion and predictive search results
US9953076B2 (en) 2010-08-19 2018-04-24 Google Llc Predictive query completion and predictive search results
US20120054177A1 (en) * 2010-08-31 2012-03-01 Microsoft Corporation Sketch-based image search
US9449026B2 (en) * 2010-08-31 2016-09-20 Microsoft Technology Licensing, Llc Sketch-based image search
US8983999B2 (en) * 2011-01-14 2015-03-17 Apple Inc. Tokenized search suggestions
US9607101B2 (en) 2011-01-14 2017-03-28 Apple Inc. Tokenized search suggestions
US20130290291A1 (en) * 2011-01-14 2013-10-31 Apple Inc. Tokenized Search Suggestions
US20130036111A2 (en) * 2011-02-11 2013-02-07 Siemens Aktiengesellschaft Methods and devicesfor data retrieval
US9575994B2 (en) * 2011-02-11 2017-02-21 Siemens Aktiengesellschaft Methods and devices for data retrieval
US9134880B2 (en) * 2011-03-20 2015-09-15 William J. Johnson System and method for summoning user interface objects
US20130332876A1 (en) * 2011-03-20 2013-12-12 William J. Johnson System and Method for Summoning User Interface Objects
US20120290969A1 (en) * 2011-05-11 2012-11-15 Abb Technology Ag Multi-stage method and apparatus for interactively locating device data of an automation system
US8577913B1 (en) 2011-05-27 2013-11-05 Google Inc. Generating midstring query refinements
US8630851B1 (en) 2011-06-29 2014-01-14 Amazon Technologies, Inc. Assisted shopping
US8977554B1 (en) 2011-06-29 2015-03-10 Amazon Technologies, Inc. Assisted shopping server
US10296953B2 (en) 2011-06-29 2019-05-21 Amazon Technologies, Inc. Assisted shopping
US8849791B1 (en) * 2011-06-29 2014-09-30 Amazon Technologies, Inc. Assisted shopping
US9454779B2 (en) 2011-06-29 2016-09-27 Amazon Technologies, Inc. Assisted shopping
US20140337131A1 (en) * 2011-09-23 2014-11-13 Amazon Technologies, Inc. Keyword determinations from voice data
US10692506B2 (en) 2011-09-23 2020-06-23 Amazon Technologies, Inc. Keyword determinations from conversational data
US11580993B2 (en) 2011-09-23 2023-02-14 Amazon Technologies, Inc. Keyword determinations from conversational data
US9111294B2 (en) * 2011-09-23 2015-08-18 Amazon Technologies, Inc. Keyword determinations from voice data
US10373620B2 (en) 2011-09-23 2019-08-06 Amazon Technologies, Inc. Keyword determinations from conversational data
US9679570B1 (en) 2011-09-23 2017-06-13 Amazon Technologies, Inc. Keyword determinations from voice data
US20150154214A1 (en) * 2011-10-05 2015-06-04 Google Inc. Referent based search suggestions
US9652556B2 (en) 2011-10-05 2017-05-16 Google Inc. Search suggestions based on viewport content
US9501583B2 (en) 2011-10-05 2016-11-22 Google Inc. Referent based search suggestions
US9305108B2 (en) 2011-10-05 2016-04-05 Google Inc. Semantic selection and purpose facilitation
US9779179B2 (en) * 2011-10-05 2017-10-03 Google Inc. Referent based search suggestions
US10013152B2 (en) 2011-10-05 2018-07-03 Google Llc Content selection disambiguation
US9594474B2 (en) 2011-10-05 2017-03-14 Google Inc. Semantic selection and purpose facilitation
US9411830B2 (en) 2011-11-24 2016-08-09 Microsoft Technology Licensing, Llc Interactive multi-modal image search
US8249876B1 (en) * 2012-01-03 2012-08-21 Google Inc. Method for providing alternative interpretations of a voice input to a user
US20130226892A1 (en) * 2012-02-29 2013-08-29 Fluential, Llc Multimodal natural language interface for faceted search
US20150082237A1 (en) * 2012-04-27 2015-03-19 Sharp Kabushiki Kaisha Mobile information terminal
US10019991B2 (en) * 2012-05-02 2018-07-10 Electronics And Telecommunications Research Institute Apparatus and method for speech recognition
US20130297304A1 (en) * 2012-05-02 2013-11-07 Electronics And Telecommunications Research Institute Apparatus and method for speech recognition
US20130326353A1 (en) * 2012-06-02 2013-12-05 Tara Chand Singhal System and method for context driven voice interface in handheld wireles mobile devices
US9684395B2 (en) * 2012-06-02 2017-06-20 Tara Chand Singhal System and method for context driven voice interface in handheld wireless mobile devices
US10896284B2 (en) 2012-07-18 2021-01-19 Microsoft Technology Licensing, Llc Transforming data to create layouts
US9679079B2 (en) 2012-07-19 2017-06-13 Yandex Europe Ag Search query suggestions based in part on a prior search and searches based on such suggestions
US20140046922A1 (en) * 2012-08-08 2014-02-13 Microsoft Corporation Search user interface using outward physical expressions
US11423216B2 (en) 2012-09-07 2022-08-23 Splunk Inc. Providing extraction results for a particular field
US20170139887A1 (en) 2012-09-07 2017-05-18 Splunk, Inc. Advanced field extractor with modification of an extracted field
US10783318B2 (en) 2012-09-07 2020-09-22 Splunk, Inc. Facilitating modification of an extracted field
US11651149B1 (en) 2012-09-07 2023-05-16 Splunk Inc. Event selection via graphical user interface control
US10394946B2 (en) 2012-09-07 2019-08-27 Splunk Inc. Refining extraction rules based on selected text within events
US11042697B2 (en) 2012-09-07 2021-06-22 Splunk Inc. Determining an extraction rule from positive and negative examples
US10783324B2 (en) 2012-09-07 2020-09-22 Splunk Inc. Wizard for configuring a field extraction rule
US8751499B1 (en) 2013-01-22 2014-06-10 Splunk Inc. Variable representative sampling under resource constraints
US11709850B1 (en) 2013-01-22 2023-07-25 Splunk Inc. Using a timestamp selector to select a time information and a type of time information
US11232124B2 (en) 2013-01-22 2022-01-25 Splunk Inc. Selection of a representative data subset of a set of unstructured data
US9582557B2 (en) 2013-01-22 2017-02-28 Splunk Inc. Sampling events for rule creation with process selection
US11775548B1 (en) 2013-01-22 2023-10-03 Splunk Inc. Selection of representative data subsets from groups of events
US9031955B2 (en) 2013-01-22 2015-05-12 Splunk Inc. Sampling of events to use for developing a field-extraction rule for a field to use in event searching
US10318537B2 (en) 2013-01-22 2019-06-11 Splunk Inc. Advanced field extractor
US11106691B2 (en) 2013-01-22 2021-08-31 Splunk Inc. Automated extraction rule generation using a timestamp selector
US10585910B1 (en) 2013-01-22 2020-03-10 Splunk Inc. Managing selection of a representative data subset according to user-specified parameters with clustering
US11119728B2 (en) 2013-01-23 2021-09-14 Splunk Inc. Displaying event records with emphasized fields
US10282463B2 (en) 2013-01-23 2019-05-07 Splunk Inc. Displaying a number of events that have a particular value for a field in a set of events
US8682906B1 (en) * 2013-01-23 2014-03-25 Splunk Inc. Real time display of data field values based on manual editing of regular expressions
US11100150B2 (en) 2013-01-23 2021-08-24 Splunk Inc. Determining rules based on text
US10579648B2 (en) 2013-01-23 2020-03-03 Splunk Inc. Determining events associated with a value
US10802797B2 (en) 2013-01-23 2020-10-13 Splunk Inc. Providing an extraction rule associated with a selected portion of an event
US11210325B2 (en) 2013-01-23 2021-12-28 Splunk Inc. Automatic rule modification
US9152929B2 (en) * 2013-01-23 2015-10-06 Splunk Inc. Real time display of statistics and values for selected regular expressions
US11514086B2 (en) 2013-01-23 2022-11-29 Splunk Inc. Generating statistics associated with unique field values
US10019226B2 (en) 2013-01-23 2018-07-10 Splunk Inc. Real time indication of previously extracted data fields for regular expressions
US10769178B2 (en) 2013-01-23 2020-09-08 Splunk Inc. Displaying a proportion of events that have a particular value for a field in a set of events
US11556577B2 (en) 2013-01-23 2023-01-17 Splunk Inc. Filtering event records based on selected extracted value
US8751963B1 (en) 2013-01-23 2014-06-10 Splunk Inc. Real time indication of previously extracted data fields for regular expressions
US20170255695A1 (en) 2013-01-23 2017-09-07 Splunk, Inc. Determining Rules Based on Text
US8909642B2 (en) 2013-01-23 2014-12-09 Splunk Inc. Automatic generation of a field-extraction rule based on selections in a sample event
US11782678B1 (en) 2013-01-23 2023-10-10 Splunk Inc. Graphical user interface for extraction rules
US10585919B2 (en) 2013-01-23 2020-03-10 Splunk Inc. Determining events having a value
US20140207758A1 (en) * 2013-01-24 2014-07-24 Huawei Technologies Co., Ltd. Thread Object-Based Search Method and Apparatus
US9870516B2 (en) 2013-05-03 2018-01-16 Microsoft Technology Licensing, Llc Hand-drawn sketch recognition
US9147125B2 (en) 2013-05-03 2015-09-29 Microsoft Technology Licensing, Llc Hand-drawn sketch recognition
US9672287B2 (en) 2013-12-26 2017-06-06 Thomson Licensing Method and apparatus for gesture-based searching
US10838538B2 (en) 2013-12-26 2020-11-17 Interdigital Madison Patent Holdings, Sas Method and apparatus for gesture-based searching
US9965521B1 (en) * 2014-02-05 2018-05-08 Google Llc Determining a transition probability from one or more past activity indications to one or more subsequent activity indications
US11734370B2 (en) 2014-05-23 2023-08-22 Samsung Electronics Co., Ltd. Method for searching and device thereof
US11157577B2 (en) 2014-05-23 2021-10-26 Samsung Electronics Co., Ltd. Method for searching and device thereof
US11080350B2 (en) 2014-05-23 2021-08-03 Samsung Electronics Co., Ltd. Method for searching and device thereof
US11314826B2 (en) 2014-05-23 2022-04-26 Samsung Electronics Co., Ltd. Method for searching and device thereof
US10223466B2 (en) 2014-05-23 2019-03-05 Samsung Electronics Co., Ltd. Method for searching and device thereof
US9990433B2 (en) 2014-05-23 2018-06-05 Samsung Electronics Co., Ltd. Method for searching and device thereof
US9626703B2 (en) * 2014-09-16 2017-04-18 Voicebox Technologies Corporation Voice commerce
US10430863B2 (en) 2014-09-16 2019-10-01 Vb Assets, Llc Voice commerce
US9898459B2 (en) 2014-09-16 2018-02-20 Voicebox Technologies Corporation Integration of domain information into state transitions of a finite state transducer for natural language processing
US11087385B2 (en) 2014-09-16 2021-08-10 Vb Assets, Llc Voice commerce
US10216725B2 (en) 2014-09-16 2019-02-26 Voicebox Technologies Corporation Integration of domain information into state transitions of a finite state transducer for natural language processing
US20160092428A1 (en) * 2014-09-30 2016-03-31 Microsoft Technology Licensing, Llc Dynamic Presentation of Suggested Content
US10282069B2 (en) * 2014-09-30 2019-05-07 Microsoft Technology Licensing, Llc Dynamic presentation of suggested content
US9881222B2 (en) 2014-09-30 2018-01-30 Microsoft Technology Licensing, Llc Optimizing a visual perspective of media
US10229673B2 (en) 2014-10-15 2019-03-12 Voicebox Technologies Corporation System and method for providing follow-up responses to prior natural language inputs of a user
US9747896B2 (en) 2014-10-15 2017-08-29 Voicebox Technologies Corporation System and method for providing follow-up responses to prior natural language inputs of a user
US10431214B2 (en) 2014-11-26 2019-10-01 Voicebox Technologies Corporation System and method of determining a domain and/or an action related to a natural language input
US10614799B2 (en) 2014-11-26 2020-04-07 Voicebox Technologies Corporation System and method of providing intent predictions for an utterance prior to a system detection of an end of the utterance
US20160171122A1 (en) * 2014-12-10 2016-06-16 Ford Global Technologies, Llc Multimodal search response
US9779733B2 (en) 2015-09-09 2017-10-03 Google Inc. Reducing latency caused by switching input modalities
US10134397B2 (en) 2015-09-09 2018-11-20 Google Llc Reducing latency caused by switching input modalities
US9443519B1 (en) 2015-09-09 2016-09-13 Google Inc. Reducing latency caused by switching input modalities
US20170263250A1 (en) * 2016-03-08 2017-09-14 Toyota Jidosha Kabushiki Kaisha Voice processing system and voice processing method
US10629197B2 (en) * 2016-03-08 2020-04-21 Toyota Jidosha Kabushiki Kaisha Voice processing system and voice processing method for predicting and executing an ask-again request corresponding to a received request
US11710194B2 (en) * 2016-04-29 2023-07-25 Liveperson, Inc. Systems, media, and methods for automated response to queries made by interactive electronic chat
CN106021402A (en) * 2016-05-13 2016-10-12 河南师范大学 Multi-modal multi-class Boosting frame construction method and device for cross-modal retrieval
US10331784B2 (en) 2016-07-29 2019-06-25 Voicebox Technologies Corporation System and method of disambiguating natural language processing requests
US10956503B2 (en) * 2016-09-20 2021-03-23 Salesforce.Com, Inc. Suggesting query items based on frequent item sets
US10380228B2 (en) 2017-02-10 2019-08-13 Microsoft Technology Licensing, Llc Output generation based on semantic expressions
US20220261427A1 (en) * 2017-10-10 2022-08-18 Negentropics Mesterseges Intelligencia Kutato Es F Methods and system for semantic search in large databases
US20190108276A1 (en) * 2017-10-10 2019-04-11 NEGENTROPICS Mesterséges Intelligencia Kutató és Fejlesztõ Kft Methods and system for semantic search in large databases
US11704310B2 (en) 2017-11-14 2023-07-18 Mindbridge Analytics Inc. Method and system for presenting a user selectable interface in response to a natural language request
US10552410B2 (en) 2017-11-14 2020-02-04 Mindbridge Analytics Inc. Method and system for presenting a user selectable interface in response to a natural language request

Also Published As

Publication number Publication date
US20090287681A1 (en) 2009-11-19
US20090287626A1 (en) 2009-11-19
US8090738B2 (en) 2012-01-03

Similar Documents

Publication Publication Date Title
US20090287680A1 (en) Multi-modal query refinement
US9256683B2 (en) Dynamic client interaction for search
US9330661B2 (en) Accuracy improvement of spoken queries transcription using co-occurrence information
US10713571B2 (en) Displaying quality of question being asked a question answering system
US20190295550A1 (en) Predicting and learning carrier phrases for speech input
JP3720068B2 (en) Question posting method and apparatus
US8812534B2 (en) Machine assisted query formulation
US8825694B2 (en) Mobile device retrieval and navigation
US9043199B1 (en) Manner of pronunciation-influenced search results
US7272558B1 (en) Speech recognition training method for audio and video file indexing on a search engine
US7729913B1 (en) Generation and selection of voice recognition grammars for conducting database searches
US8260809B2 (en) Voice-based search processing
US7742922B2 (en) Speech interface for search engines
JP2018077858A (en) System and method for conversation-based information search
US20100153112A1 (en) Progressively refining a speech-based search
JP2011526383A (en) Proposal of resource locator from input string
JP2017509049A (en) Coherent question answers in search results
JP2015511746A5 (en)
US20180246896A1 (en) Corpus Specific Generative Query Completion Assistant
US10102199B2 (en) Corpus specific natural language query completion assistant
KR100795930B1 (en) Method and system for recommending query based search index
JP2009163358A (en) Information processor, information processing method, program, and voice chat system
Paek et al. Search Vox: Leveraging multimodal refinement and partial knowledge for mobile voice search
Mishra et al. Speech-driven query retrieval for question-answering
KR101099917B1 (en) Method and system for recommending query based search index

Legal Events

Date Code Title Description
AS Assignment

Owner name: MICROSOFT CORPORATION, WASHINGTON

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:PAEK, TIMOTHY SEUNG YOON;THIESSON, BO;JU, YUN-CHENG;AND OTHERS;REEL/FRAME:021458/0550;SIGNING DATES FROM 20080825 TO 20080826

STCB Information on status: application discontinuation

Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION

AS Assignment

Owner name: MICROSOFT TECHNOLOGY LICENSING, LLC, WASHINGTON

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:MICROSOFT CORPORATION;REEL/FRAME:034766/0509

Effective date: 20141014