WO2001057850A2 - Robust voice and device browser system including unified bundle of telephone and network services - Google Patents

Robust voice and device browser system including unified bundle of telephone and network services Download PDF

Info

Publication number
WO2001057850A2
WO2001057850A2 PCT/US2001/003742 US0103742W WO0157850A2 WO 2001057850 A2 WO2001057850 A2 WO 2001057850A2 US 0103742 W US0103742 W US 0103742W WO 0157850 A2 WO0157850 A2 WO 0157850A2
Authority
WO
WIPO (PCT)
Prior art keywords
user
information
service
web site
response
Prior art date
Application number
PCT/US2001/003742
Other languages
French (fr)
Other versions
WO2001057850A3 (en
Inventor
Alex Kurganov
Harold E. Poel
Valery Zhukoff
Original Assignee
Webley Systems, Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Webley Systems, Inc. filed Critical Webley Systems, Inc.
Priority to AU2001234833A priority Critical patent/AU2001234833A1/en
Publication of WO2001057850A2 publication Critical patent/WO2001057850A2/en
Publication of WO2001057850A3 publication Critical patent/WO2001057850A3/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/02Network architectures or network communication protocols for network security for separating internal from external traffic, e.g. firewalls
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/957Browsing optimisation, e.g. caching or content distillation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/16Sound input; Sound output
    • G06F3/167Audio in a user interface, e.g. using voice commands for navigating, audio feedback
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M3/00Automatic or semi-automatic exchanges
    • H04M3/42Systems providing special services or facilities to subscribers
    • H04M3/487Arrangements for providing information services, e.g. recorded voice services or time announcements
    • H04M3/493Interactive information services, e.g. directory enquiries ; Arrangements therefor, e.g. interactive voice response [IVR] systems or voice portals
    • H04M3/4938Interactive information services, e.g. directory enquiries ; Arrangements therefor, e.g. interactive voice response [IVR] systems or voice portals comprising a voice browser which renders and interprets, e.g. VoiceXML
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/02Feature extraction for speech recognition; Selection of recognition unit
    • G10L2015/025Phonemes, fenemes or fenones being the recognition units

Definitions

  • the present invention relates to a robust and highly reliable system that allows users to browse web sites and retrieve information by using conversational voice commands. Additionally, the present invention allows users to control and monitor other systems and devices that are connected the Internet or any other network by using voice commands. Additionally, the invention relates to a personalized system for accessing information from the Internet or other information sources using speech commands.
  • the present invention relates to a method for providing integrated Internet and telecommunications services from a common provider that enables subscribers to inexpensively send, receive, and transfer telephone calls, email messages, voice mail messages, paging messages, and facsimile messages.
  • the first option is to use a desktop or a laptop computer connected to a telephone line via a modem or connected to a network with Internet access.
  • the second option is to use a Personal Digital Assistant (PDA) that has the capability of connecting to the Internet either through a modem or a wireless connection.
  • PDA Personal Digital Assistant
  • the third option is to use one of the newly designed web-phones or web-pagers that are now being offered on the market.
  • desktop computers are very large and bulky and are difficult to transport- Laptop computers solve this inconvenience, but many are still quite heavy and are inconvenient to carry. Further, laptop computers cannot be carried and used everywhere a user travels. For instance, if a user wishes to obtain information from a remote location where no electricity or communication lines are installed, it would be nearly impossible to use a laptop computer. Oftentimes, information is needed on an immediate basis where a computer is not accessible. Furthermore, the use of laptop or desktop computers to access the Internet requires either a direct or a dial-up connection tan an Internet Service Provider (ISP). Oftentimes, such connections are not available when a user desires to connect to the Internet to acquire information.
  • ISP Internet Service Provider
  • PDAs The second option for remotely accessing web sites is the use of PDAs.
  • PDAs with the ability to connect to the Internet and access web sites are not readily available. As a result, these PDAs tend to be very expensive. Furthermore, users are usually required to pay a special service fee to enable the web browsing feature of the PDA.
  • a further disadvantage of these PDAs is that web sites must be specifically designed to allow these devices to access information on the web site. Therefore, a limited number of web sites are available that are accessible by these web-enabled PDAs.
  • a user attempting to find information using a telephone expects immediate responses to his search requests.
  • a system that introduces too much delay between the time a user makes a request and the time of response will not be tolerated by users and will lose its usefulness. Therefore, it is important that a voice browsing system that uses telephonic communications selects web sites that provide rapid responses since speed is an important factor for maintaining the system's desirability and usability. Therefore, a need exists for a system that accesses web sites based upon their speed of operation.
  • Wireline telephone services, cellular telephone service, facsimile messages, email messages, voice mail messages, pager services, and Internet access services are just some of the important methods and services widely used for business and personal communications.
  • people require the ability to send and receive messages, access information, conduct business transactions, organize daily schedules, and stay in touch with homes and offices from almost anywhere, at any time, in an easy to use and economical manner.
  • portable electronic devices such as cellular telephones, pagers, and Personal Digital Assistants (PDAs).
  • PDAs Personal Digital Assistants
  • the explosive growth of Internet and related networking services demonstrates the importance of such systems to personal communications and the ability to quickly and easily access information.
  • These networks currently host a variety of services such as contact lists, scheduling and date book information, electronic mail, conferencing, electronic commerce, games, software libraries and electronic newspapers and magazines.
  • the problem of accessing and processing all of the available information from communication systems, networks and services is particularly acute for mobile business professionals.
  • the mobile professional whether working out of the home or while on the road, may have a cellular telephone, a facsimile machine, a pager, intranet mail, Internet mail, and voice mail services. Success for this professional, depends in large part on the ability to easily, quickly and inexpensively access, sort, and respond to the messages delivered to each of these communication devices and on the ability to obtain necessary information to conduct business within proliferating networks and services.
  • An additional object of an embodiment of the present invention is to provide a system and method that allows the searching and retrieving of publicly available information by controlling a web browsing server using naturally spoken voice commands.
  • the ranking order is automatically adjusted if the system detects that a given web site is not functioning, is too slow, or has been modified in such a way that the requested information cannot be retrieved any longer.
  • An additional object of an embodiment of the present invention is to provide a system and method for using voice commands to control and monitor devices connected to a network.
  • One object of a preferred embodiment of the present invention is to allow users to customize a voice browsing system.
  • a further object of a preferred embodiment is to allow users to customize the information retrieved from the Internet or other computer networks and accessed by speech commands over telephones.
  • a further object of an embodiment of the present invention is to provide a system and method for interfacing a plurality of different communication services enabling each service to transfer data and calls to another service.
  • Another object of an embodiment of the present invention to provide a reduced cost system and method for transferring data and calls to multiple locations.
  • the present invention relates to a system for acquiring information from sources on a network, such as the Internet.
  • a voice browsing system maintains a database containing a list of information sources, such as web sites, connected to a network. Each of the information sources is assigned a rank number which is listed in the database.
  • a network interface system accesses the information source with the highest rank number in order to retrieve information requested by the user.
  • a preferred embodiment of the present invention allows users to access and browse web sites when they do not have access to computers with Internet access. This is accomplished by providing a voice browsing system and method that allows users to browse web sites using conversational voice commands spoken into any type of voice enabled device (i.e., any type of wireline or wireless telephone, IP phone, wireless PDA, or other wireless device). These spoken commands are then converted into data messages by a speech recognition software engine running on a user interface system. These data messages are then sent to and processed by a network interface system. This network interface system then generates the proper requests that are transmitted to the desired web site over the Internet. Responses sent from the web site are received and processed by the network interface system and then converted into an audio message via a speech synthesis engine or a pre-recorded audio concatenation application and finally transmitted to the user's voice enabled device.
  • voice enabled device i.e., any type of wireline or wireless telephone, IP phone, wireless PDA, or other wireless device.
  • a preferred embodiment of the voice browser system and method uses a web site polling and ranking methodology that allows the system to detect changes in web sites and adapt to those changes in real-time. This enables the voice browser system of a preferred embodiment to deliver highly reliable information to users over any voice enabled device.
  • This ranking system also enables the present invention to provide rapid responses to user requests. Long delays before receiving responses to requests are not tolerated by users of voice-based systems, such as telephones. When a user speaks into a telephone, an almost immediate response is expected. This expectation does not exist for non-voice communications, such as email transmissions or accessing a web site using a personal computer. In such situations, a reasonable amount of transmission delay is acceptable.
  • the ranking system of implemented by a preferred embodiment of the present invention ensures users will always receive the fastest possible response to their request.
  • a second embodiment of the present invention allows users to control and monitor the operation of a variety of household devices connected to a network using speech commands spoken into a voice enabled device.
  • a third embodiment of present invention enables a user to create a user-defined record in the database that identifies an information source, such as a web site, containing information of interest to the user.
  • This record identifies the location of the information source and also contains a recognition grammar assigned by the user.
  • a network interface system Upon receiving a speech command from the user that is described with the assigned recognition grammar, a network interface system accesses the information source and retrieves the information requested by the user.
  • a customized, voice-activated information access system is provided.
  • a user creates a descriptor file defining specific information found on a web site the user would like to access in the future.
  • the user assigns a pronounceable name or identifier to the selected content and this pronounceable name is saved in a user-defined database record as a recognition grammar along with the URL of the selected web site.
  • a telephone call is placed to a media server.
  • the user provides speech commands to the media server which include the recognition grammar assigned to the desired search.
  • the media server retrieves the user- defined record from a database and passes the information to a web browsing server which retrieves the information from associated web site.
  • the retrieved information is then transmitted to the user using a speech synthesis software engine.
  • a fourth embodiment of the present invention provides a unified communications system that provides a variety of different communication services from a single service provider. These services include local telephone service, long distance, cellular telephone service, Internet access, voice mail, email, facsimile service, and paging services. Each of the different communication services are linked together by a system controller operated by a single service provider.
  • the unified system allows users to easily and economically transfer information received by one of the communication services to a second communication service.
  • the system implements speech recognition technology thereby allowing users to control all of the communication services uses speech commands.
  • the communications system provides a single operating menu that allows users to control and access all of the features and services provided by the system. This operating menu may be accessed using speech commands, touch-tone commands, or via a computer.
  • FIG. 1 is a depiction of the voice browsing system of the first embodiment of the present invention
  • FIG. 2 is a block diagram of a database record used by the first preferred embodiment of the present invention
  • FIG. 3 is a block diagram of a media server used by the preferred embodiment
  • FIG. 4 is a block diagram of a web browsing server used by the preferred embodiment
  • FIG. 5 is a depiction of the device browsing system of the second embodiment of the present invention.
  • FIG. 6 depicts a personal information selection system used with a third preferred embodiment of the present invention.
  • FIG. 7 depicts a web page displayed by the clipping client of the third preferred embodiment
  • FIG. 8 is a block diagram of a user-defined database record used by the third preferred embodiment of the present invention.
  • FIG. 9 is a block diagram showing methods available for users to access the communications system of the fourth preferred embodiment.
  • FIG. 10 is a block diagram of a system controller used with the fourth preferred embodiment
  • FIG. 11 is a block diagram of the various services that may be provided by single service provider according to the fourth preferred embodiment.
  • a first embodiment of the present invention is a system and method for allowing users to browse information sources, such as web sites, by using naturally spoken, conversational voice commands spoken into a voice enabled device. Users are not required to learn a special language or command set in order to communicate with the voice browsing system of the present invention. Common and ordinary commands and phrases are all that is required for a user to operate the voice browsing system.
  • the voice browsing system recognizes naturally spoken voice commands and is speaker-independent; it does not have to be trained to recognize the voice patterns of each individual user. Such speech recognition systems use phonemes to recognize spoken words and not predefined voice patterns.
  • the first embodiment allows users to select from various categories of information and to search those categories for desired data by using conversational voice commands.
  • the voice browsing system of the first preferred embodiment includes a user interface system referred to as a media server.
  • the media server contains a speech recognition software engine. This speech recognition engine is used to recognize natural, conversational voice commands spoken by the user and converts them into data messages based on the available recognition grammar. These data messages are then sent to a network interface system.
  • the network interface system is referred to as a web browsing server.
  • the web browsing server accesses the appropriate information source, such as a web site, to gather information requested by the user. Responses received from the information sources are then transferred to the media server where speech synthesis engine converts the responses into audio messages that are transmitted to the user.
  • a database 2 designed by Webley Systems Incorporated is connected to one or more web browsing servers 4 as well as to one or more media servers 8.
  • the database 2 contains a separate set of records for each web site accessible by the system.
  • An example of a web site record is shown in FIG. 2.
  • Each web site record 20 contains the rank number of the web site 22, the associated Uniform Resource Locator (URL) 24, and a command that enables the appropriate "extraction agent" 26 that is required in order to generate proper requests sent to and to format data received from the web site.
  • the database record 20 also contains the timestamp 28 indicating the last time the web site was accessed.
  • the extraction agent is described in more detail below.
  • the database 2 categorizes each database record 20 according to the type of information provided by each web site.
  • a first category of database records 20 may correspond to web sites that provide "weather” information.
  • the database 2 may also contain a second category of records 20 for web sites that provide "stock” information.
  • These categories may be further divided into sub categories.
  • the "weather” category may contain subcategories depending upon type of weather information available to a user, such as "current weather” or "extended forecast”.
  • a list of web site records may be stored that provide weather information for multiple days.
  • the use of subcategories may allow the web browsing feature to provide more accurate, relevant, and up-to-date information to the user by accessing the most relevant web site.
  • the number of records contained in each category or subcategory is not limited. In the preferred embodiment, three web site records are provided for each category.
  • Table 1 below depicts two database records 20 that are used with the preferred embodiment. These records also contain a field indicating the "category” of the record, which is "weather” in each of these examples.
  • the database also contains a listing of pre-recorded audio files used to create concatenated phrases and sentences. Further, database 2 may contain customer profile information, system activity reports, and any other data or software servers necessary for the testing or administration of the voice browsing system.
  • the media servers 8 function as user interface systems.
  • the media servers 8 contain a speech recognition engine 30, a speech synthesis engine 32, an Interactive Voice Response (IVR) application 34, a call processing system 36, and telephony and voice hardware 38 required to communicate with the Public Switched Telephone Network (PSTN) 18.
  • PSTN Public Switched Telephone Network
  • each media server is based upon Intel's Dual Pentium III 730 MHz microprocessor system.
  • the speech recognition function is performed by a speech recognition engine 30 that converts voice commands received from the user's voice enabled device 14 (i.e., any type of wireline or wireless telephone, Internet Protocol (IP) phones, or other special wireless units) into data messages.
  • voice enabled device 14 i.e., any type of wireline or wireless telephone, Internet Protocol (IP) phones, or other special wireless units
  • voice commands and audio messages are transmitted using the PSTN 18 and data is transmitted using the TCP/IP communications protocol.
  • TCP/IP Transmission Control Protocol
  • Other possible transmission protocols would include S ⁇ VVoIP (Session Initiation Protocol/Voice over IP), Asynchronous Transfer Mode (ATM) and Frame Relay.
  • S ⁇ VVoIP Session Initiation Protocol/Voice over IP
  • ATM Asynchronous Transfer Mode
  • Frame Relay A preferred speech recognition engine is developed by Nuance Communications of 1380 Willow Road, Menlo Park, California 94025 (www.nuance.com). The Nuance engine capacity is measured in recognition units based on CPU type as defined in the vendor specification.
  • the natural speech recognition grammars i.e., what a user can say that will be recognized by the speech recognition engine) were developed by Webley Systems. Table 2 below provides a partial source code listing of the recognition grammars used by the speech recognition engine of the preferred embodiment for obtaining weather information.
  • the media server 8 uses recognition results generated by the speech recognition engine 30 to retrieve a web site record 20 stored in the database 2 that can provide the information requested by the user.
  • the media server 8 processes the recognition result data identifying keywords that are used to search the web site records 20 contained in the database 2 For instance, if the user's request was "What is the weather in Chicago?", the keywords "weather” and “Chicago” would be recognized.
  • a web site record 20 with the highest rank number from the "weather” category within the database 2 would then be selected and transmitted to the web browsing server 4 along with an identifier indicating that
  • the media servers 8 also contain a speech synthesis engine 32 that converts the data retrieved by the web browsing servers 4 into audio messages that are transmitted to the user's voice enabled device 14.
  • a preferred speech synthesis engine is developed by Lernout and Hauspie Speech Products, 52 Third Avenue, Burlington, Massachusetts 01803 (www.lhsl.com).
  • the web browsing servers 4 provide access to any computer network such as the Internet 12. These servers are also capable of accessing databases stored on Local Area Networks (LANs) or Wide Area Networks (WANs).
  • the web browsing servers receive responses from web sites and extract the data requested by the user. This task is also known as "content extraction.”
  • the web browsing servers 4 also perform the task of periodically polling or "pinging" various web sites and modifying the ranking numbers of these web sites depending upon their response and speed. This polling feature is further discussed below.
  • the web browsing server 4 is comprised of a content extraction agent 40, a content fetcher 42, a polling and ranking agent 44, and the content descriptor files 46. Each of these are software applications and will be discussed below.
  • the web browsing server 4 Upon receiving a web site record 20 from the database 2 in response to a user request, the web browsing server 4 invokes the "content extraction agent" command 26 contained in the record 20.
  • the content extraction agent 40 allows the web browsing server 4 to properly format requests and read responses provided by the web site 16 identified in the URL field 24 of the web site record 20.
  • Each content extraction agent command 26 invokes the content extraction agent and identifies a content description file associated with the web page identified by the URL 24. This content description the directs the extraction agent where to extract data from the accessed web page and how to format a response to the user utilizing that data. For example, the content description for a web page providing weather information would indicate where to insert the "city" name or ZIP code in order to retrieve Chicago weather information. Additionally, the content description file for each supported URL indicates the location on the web page where the response information is provided. The extraction agent 40 uses this information to properly extract from the web page the information requested by the user.
  • Table 3 below contains source code for a content extraction agent 40 used by the preferred embodiment.
  • 6--stamp -C>hLN ⁇ PdbuuE*itn/ord, itn/cb/sprint_hd
  • # check parameters die "Usage: $0 service [params] ⁇ n" if $#ARGV ⁇ 1; #print STDERR @ARGV;
  • Table 4 below contains source code of the content fetcher 42 used with the content extraction agent 40 to retrieve information from a web site.
  • service_name [service_parameters] , i.e. stock sft or weather
  • $debug 1; use URI: :URL; use LWP: :UserAgent; use HTTP: : Request : -.Common; use Vail : : NarList ; use Sybase : :CTlib; use HTTP: : Cookies;
  • %Content map ⁇ ( $Param ⁇ Output ⁇ ->[ $_ ], $values [ $_ ]
  • Table 5 below contains the content descriptor file source code for obtaining weather information from the web site www.cnn.com that is used by the extraction agent 40 of the preferred embodiment.
  • Regular_expression Author   (.+) Four Day Forecast ( ⁇ S+)
  • Post-filter wind” S" South
  • Post-filter _wind”W”West
  • Post-filter _wind/mph/miles per hour/
  • Temperature is _current_temperature_F Fahrenheit, _current_temperature_C Celsium.
  • Humidity is _humidity. Wind from the wind.
  • Table 6 below contains the content descriptor file source code for obtaining weather information from the web site www.lycos.com that is used by the extraction agent 40 of the preferred embodiment.
  • Post-filter _wind/kph ! /kilometers per hour/
  • the current weather in _location is __current_weather .
  • the current temperature is _current_temperature__F Farenheit
  • Humidity is _humidity.
  • each web browsing server 4 accesses the web site specified in the URL 24 and retrieves the requested information, the information is forwarded to the media server 8.
  • the media server uses the speech synthesis engine 32 to create an audio message that is then transmitted to the user's voice enabled device 14.
  • each web browsing server 4 is based upon Intel's Dual Pentium III 730 MHz microprocessor system. Referring to FIG. 1, the operation of the robust voice browser system will be described.
  • a user establishes a connection between his voice enabled device 14 and a media server 8. This may be done using the Public Switched Telephone Network (PSTN) 18 by calling a telephone number associated with the voice browsing system 19.
  • PSTN Public Switched Telephone Network
  • ISR interactive voice response
  • the INR application plays audio messages to the user presenting a list of options, such as, "stock quotes”, “flight status”, “yellow pages", “weather”, and “news”. These options are based upon the available web site categories and may be modified as desired.
  • the user selects the desired option by speaking the name of the option into the voice enabled device 14. As an example, if a user wishes to obtain restaurant information, he may speak into his telephone the phrase "yellow pages”.
  • the INR application would then ask the user what he would like to find and the user may respond by stating "restaurants”.
  • the user may then be provided with further options related to searching for the desired restaurant. For instance, the user may be provided with the following restaurant options, "Mexican Restaurants", “Italian Restaurants", or "American Restaurants”.
  • the user then speaks into the telephone 14 the restaurant type of interest.
  • the INR application running on the media server 8 may also request additional information limiting the geographic scope of the restaurants to be reported to the user. For instance, the IVR application may ask the user to identify the zip code of the area where the restaurant should be located.
  • the media server 8 uses the speech recognition engine 30 to interpret the speech commands received from the user. Based upon these commands, the media server 8 retrieves the appropriate web site record 20 from the database 2. This record and any additional data, which may include other necessary parameters needed to perform the user's request, are transmitted to a web browsing server 4.
  • a firewall 6 may be provided that separates the web browsing server 4 from the database 2 and media server 8.
  • the firewall provides protection to the media server and database by preventing unauthorized access in the event the firewall for web browsing server 10 fails or is compromised. Any type of firewall protection technique commonly known to one skilled in the art could be used, including packet filter, proxy server, application gateway, or circuit- level gateway techniques.
  • the web browsing server 4 then uses the web site record and any additional data and executes the extraction agent 40 and relevant content descriptor file 46 to retrieve the requested information.
  • the information received from the responding web site 16 is then processed by the web browsing server 4 according to the content descriptor file 46 retrieval by the extraction agent .
  • This processed response is then transmitted to the media server 8 for conversion into audio messages using either the speech synthesis software 32 or selecting among a database of prerecorded voice responses contained within the database 2.
  • each web site record contains a rank number 22 as shown in FIG. 2.
  • the web site ranking method and system of the present invention provides robustness to the voice browser system and enables it to adapt to changes that may occur as web sites evolve. For instance, the information required by a web site 16 to perform a search or the format of the reported response data may change. Without the ability to adequately monitor and detect these changes, a search requested by a user may provide an incomplete response, no response, or an error. Such useless responses may result from incomplete data being provided to the web site 16 or the web browsing server 4 being unable to recognize the response data messages received from the searched web site 16.
  • This polling mechanism continually polls or "pings" each of the sites listed in the database 2.
  • a web browsing server 4 sends brief requests to each web site listed in database 2.
  • the web browsing server 4 monitors the response received from each web site and determines whether it is a complete response and whether the response is in the expected format specified by the content descriptor file 46 used by the extraction agent 40.
  • the polled web sites that provide complete responses in the format expected by the extraction agent 40 have their ranking established based on their "response time". That is, web sites with faster response times will be will be assigned higher rankings than those with slower response times. If the web browsing server 4 receives no response from the polled web site or if the response received is not in the expected format, then the rank of that web site is lowered.
  • warning message or alarm may be generated for the system administrator indicating that the specified web site has been modified or is not responsive and requires further review.
  • the web browsing servers 4 access web sites based upon their ranking number, only those web sites that produce useful and error-free responses will be used by the voice browser system to gather information requested by the user. Further, since the ranking numbers are also based upon the speed of a web site in providing responses, only the most time efficient sites are accessed. This system assures that users will get complete, timely, and relevant responses to their requests. Without this feature, users may be provided with information that is not relevant to their request or may not get any information at all. The constant polling and re-ranking of the web sites used within each category allows the voice browser of the present invention to operate efficiently. Finally, it allows the voice browser system of the present invention to dynamically adapt to changes in the rapidly evolving web sites that exist on the Internet.
  • the web sites accessible by the voice browser of the preferred embodiment may use any type of mark-up language, including Extensible Markup Language (XML), Wireless Markup Language (WML), Handheld Device Markup Language (HDML), Hyper Text Markup Language (HTML), or any variation of these languages-
  • XML Extensible Markup Language
  • WML Wireless Markup Language
  • HDML Handheld Device Markup Language
  • HTML Hyper Text Markup Language
  • FIG. 5 A second embodiment of the present invention is depicted in FIG. 5.
  • This embodiment provides a system and method for controlling a variety of devices 50 connected to a network 52 by using conversational speech commands spoken into a voice enabled device 54 (i.e., wireline or wireless telephones, Internet Protocol (IP) phones, or other special wireless units).
  • the networked devices may include various household devices. For instance, voice commands may be used to control household security systems, NCRs, TVs, outdoor or indoor lighting, sprinklers, or heating and air conditioning systems.
  • Each of these devices 50 is connected to a network 52.
  • These devices 50 may contain embedded microprocessors or may be connected to other computer equipment that allow the device 50 to communicate with network 52.
  • the devices 50 appear as "web sites" connected to the network 52.
  • This allows a network interface system, such as a device browsing server 56, a database 57, and a user interface system, such as a media server 58, to operate similar to the web browsing server 4, database 2 and media server 8 described in the first preferred embodiment above.
  • a network 52 interfaces with one or more network interface systems, which are shown as device browsing servers 56 in FIG. 5.
  • the device browsing servers perform many of the same functions and operate in much the same way as the web browsing servers 4 discuss above in the first preferred embodiment.
  • the device browsing servers 56 are also connected to a database 57.
  • Database 57 lists all devices that are connected to the network 52. For each device
  • the database 57 contains a record similar to that shown in FIG. 2. Each record will contain at least a device identifier, which may be in the form of a URL, and a command to
  • Database 57 may also include any other data or software necessary to test and administer the device browsing system.
  • a device descriptor file contains a listing of the options and functions available for each of the devices 50 connected on the network 52. Furthermore, the device descriptor file contains the information necessary to properly communicate with the networked devices 50. Such information would include, for example, communication protocols, message formatting requirements, and required operating parameters.
  • the device browsing server 56 receives messages from the various networked devices 50, appropriately formats those messages and transmits them to one or more media servers 58 which are part of the device browsing system.
  • the user's voice enabled devices 54 can access the device browsing system by calling into a media server 58 via the Public Switched Telephone Network (PSTN) 59.
  • PSTN Public Switched Telephone Network
  • the device browsing server is based upon Intel's Dual Pentium III 730 MHz microprocessor system.
  • the media servers 58 act as user interface systems and perform the functions of natural speech recognition, speech synthesis, data processing, and call handling.
  • the media server 58 operates similarly to the media server 8 depicted in FIG. 3.
  • the media server 58 When data is received from the device browser server 56, the media server 58 will convert the data into audio messages via a speech synthesis engine that are then transmitted to the voice enabled device of the user 54. Speech commands received from the voice enabled device of the user 54 are converted into data messages via a speech recognition engine running on the media server 58.
  • a preferred speech recognition engine is developed by Nuance Communications of 1380 Willow Road, Menlo Park, California 94025 (www.nuance.com).
  • a preferred speech synthesis engine is developed by Lernout and Hauspie Speech Products, 52 Third Avenue, Burlington, Massachusetts 01803 (www.lhsl.com).
  • the media servers 58 of the preferred embodiment are based on Intel's Dual Pentium III 730 MHz microprocessor system. A specific example for using the system and method of this embodiment of the invention will now be given.
  • a user may call into a media server 58 by dialing a telephone number associated with an established device browsing system. Once the user is connected, the IUR application of the media server 58 will provide the user with a list of available systems that may be monitored or controlled based upon information contained in database 57.
  • the user may be provided with the option to select "Home Systems” or "Office Systems”. The user may then speak the command "access home systems”.
  • the media server 58 would then access the database 57 and provide the user with a listing of the home subsystems or devices 50 available on the network 52 for the user to monitor and control. For instance, the user may be given a listing of subsystems such as "Outdoor Lighting System”, “Indoor Lighting System”, “Security System”, or "Heating and Air Conditioning System”. The user may then select the indoor lighting subsystem by speaking the command "Indoor Lighting System”.
  • the IUR application would then provide the user with a set of options related to the indoor lighting system.
  • the media server 58 may then provide a listing such as "Dining Room”, “Living Room”, “Kitchen”, or “Bedroom”. After selecting the desired room, the IUR application would provide the user with the options to hear the "status” of the lighting in that room or to "turn on”, “turn off, or “dim” the lighting in the desired room. These commands are provided by the user by speaking the desired command into the users voice enabled device 54. The media server 58 receives this command and translates it into a data message. This data message is then forwarded to the device browsing server 56 which routes the message to the appropriate device 50.
  • the device browsing system 51 of this embodiment of the present invention also provides the same robustness and reliability features described in the first embodiment.
  • the device browsing system 51 has the ability to detect whether new devices have been added to the system or whether current devices are out-of-service. This robustness is achieved by periodically polling or "pinging" all devices 50 listed in database 57.
  • the device browsing server 56 periodically polls each device 50 and monitors the response. If the device browsing server 56 receives a recognized and expected response from the polled device, then the device is categorized as being recognized and in-service. However, if the device browsing server 56 does not receive a response from the polled device 50 or receives an unexpected response, then the device 50 is marked as being either new or out-of-service. A warning message or a report may then be generated for the user indicating that a new device has been detected or that an existing device is experiencing trouble.
  • this embodiment allows users to remotely monitor and control any devices that are connected to a network, such as devices within a home or office. Furthermore, no special telecommunications equipment is required for users to remotely access the device browser system. Users may use any type of voice enabled device (i.e., wireline or wireless telephones, IP phones, or other wireless units) available to them. Furthermore, a user may perform these functions from anywhere without having to subscribe to additional services. Therefore, no additional expenses are incurred by the user.
  • the third preferred embodiment of the present invention enables a user to associate information of interest found on a specific information source, such as a web site, with a pronounceable name or identification word. This pronounceable name/identification word forms a recognition grammar in the preferred embodiment.
  • the user When the user wishes to retrieve the selected information, he may use a telephone or other voice enabled device to access a voice browser system. The user then speaks the pronounceable name or command described within the recognition grammar associated with the desired information. The voice browsing system then accesses the associated information source and returns to the user, using a voice synthesizer, the requested information.
  • a user 60 uses a computer 62 to access a network, such as a WAN, LAN, or the Internet, containing various information sources.
  • a network such as a WAN, LAN, or the Internet
  • the user 60 access the Internet 12 and begins searching for web sites 16, which are information sources that contain information of interest to the user.
  • web sites 16 are information sources that contain information of interest to the user.
  • the user 60 identifies a web site 16 containing information the user would like to access using only a voice enabled device, such as a telephone, and the voice browsing system 19, the user initiates a "clipping client" engine 64 on his computer 62.
  • the clipping client 64 allows a user 60 to create a set of instructions for use by the voice browsing system 19 in order to report personalized information back to the user upon request.
  • the instruction set is created by "clipping" information from the identified web site.
  • a user 60 may be interested in weather for a specific city, such as Chicago.
  • the user 60 identifies a web site from which he would like to obtain the latest Chicago weather information.
  • the clipping client 64 is then activated by the user 60.
  • the clipping client 64 displays the selected web site in the same manner as a conventional web browser such as Microsoft's® Internet Explorer.
  • FIG. 7 depicts a sample of a web page 70 displayed by the clipping client 64.
  • the user 60 begins creation of the instruction set for retrieving information from the identified web site by selecting the uniform resource locator (URL) address 72 for the web site In the preferred embodiment, this selection is done by highlighting and copying the URL address 72
  • the user selects the information from the displayed web page that he would like to have retrieved when a request is made Referring to FIG 7, the user would select the information regarding the weather conditions in Chicago 74
  • the web page 70 may also contain additional information such as advertisements 76 or links to other web sites 78 which are not of interest to the user
  • the clipping client 64 allows the user to select only that portion of the web page containing information of interest to the user Therefore, unless the advertisements 76 and links 78 displayed on the web page are of interest to the user, he would not select this information Based on the web page information 74 selected
  • Table 7 below is an example of a content descriptor file created by the clipping client of the preferred embodiment This content descriptor file relates to obtaining weather information from the web site www cnn com
  • Pre-f ⁇ lter ⁇ n"
  • Pre-filter ,r ⁇ [ ⁇ ⁇ >] +>”
  • the clipping client 64 prompts the user to enter an identification word or phrase that will be associated with the identified web site and information. For example, the user could associate the phrase "Chicago weather" with the selected URL 72 and related weather information 74.
  • the identification word or phrase is stored as a personal recognition grammar that can now be recognized by a speech recognition engine of the voice browsing system 19 which will be discussed below.
  • the personal recognition grammar, URL address 72, and a command for executing a content extraction agent are stored within a database used by the voice browser system 19 which has been discussed above in relation to the first preferred embodiment.
  • the database 2 of the voice browsing system 19 contains a section that stores the personal recognition grammars and related web site information generated by the clipping client 64.
  • An example of a user - defined web site record is shown in FIG. 8.
  • Each user-defined web site record 80 contains the recognition grammar 82 assigned by the user, the associated Uniform Resource Locator (URL) 84, and a command that enables the "content extraction agent" 86 and retrieves the appropriate content descriptor file 86 required to generate proper requests to the web site and to properly format received data.
  • the content exaction agent has been described above in relation to the first preferred embodiment and Tables 3 and 4.
  • the web site record 80 also contains the timestamp 88 indicating the last time the web site was accessed.
  • the media servers 8 when a user access the voice browsing system 19, he will be prompted if he would like to use his "user-defined searches.” If the user answers affirmatively, the media servers 8 will retrieve from the database 2 the personal recognition grammars 82 defined by the user while using the clipping client 64.
  • the web browsing server 4 Upon receiving a user-defined web site record 80 from the database 2 in response to a user request, the web browsing server 4 invokes the "content extraction agent" command 86 contained in the record 80.
  • the content extraction agent 40 retrieves the content descriptor file 46 associated with the user-defined record 80.
  • the content descriptor file 46 directs the extraction agent where to extract data from the accessed web page and how to format a response to the user utilizing that data.
  • the content descriptor file 46 for each supported URL indicates the location on the web page where the response information is provided. The extraction agent 40 uses this information to properly extract from the web page the information requested by the user.
  • the content extraction agent 40 can also parse the content of a web page in which the user-desired information has changed location or format. This is accomplished based on the characteristic that most hypertext documents include named objects like tables, buttons, and forms that contain textual content of interest to a user. When changes to a web page occur, a named object may be moved within a document, but it still exists. Therefore, the content extraction agent 40 simply searches for the relevant name of desired object. In this way, the information requested by the user may still be found and reported regardless of changes that have occurred.
  • Table 3 above contains source code for a content extraction agent 40 which may be used by the third preferred embodiment.
  • Table 4 above contains source code of the content fetcher 42 which may be used with the content extraction agent 40 to retrieve information from a web site.
  • PSTN Public Switched Telephone Network
  • FIG. 1 the operation of the personal voice-based information retrieval system will be described.
  • a user establishes a connection between his voice enabled device 14 and a media server 8 of the voice browsing system 19. This may be done using the Public Switched Telephone Network (PSTN) 18 by calling a telephone number associated with the voice browsing system 19.
  • PSTN Public Switched Telephone Network
  • the media server 8 initiates an interactive voice response (IVR) application.
  • the IVR application plays audio message to the user presenting a list of options, which includes "perform a user-defined search.”
  • the user selects the option to perform a user-defined search by speaking the name of the option into the voice enabled device 14
  • the media server 8 then accesses the database 2 and retrieves the personal recognition grammars 82. Using the speech synthesis software engine 32, the media server 8 then asks the user, "Which of the following user-defined searches would you like to perform" and reads to the user the identification name, provided by the recognition grammar 82, of each user-defined search. The user selects the desired search by speaking the appropriate speech command or pronounceable name described within the recognition grammar 82.
  • These speech recognition grammars 82 define the speech commands or pronounceable names spoken by a user in order to perform a user-defined search. If the user has a multitude of user-defined searches, he may speak command described within the recognition grammar 82 of the desired search at anytime without waiting for the media server 8 to list all available user-defined searches. This feature is commonly referred to as a
  • the media server 8 uses the speech recognition engine 30 to interpret the speech commands received from the user. Based upon these commands, the media server 8 retrieves the appropriate user-defined web site record 80 from the database 2. This record is then transmitted to a web browsing server 4.
  • a firewall 6 may be provided that separates the web browsing server 4 from the database 2 and media server 8. The firewall provides protection to the media server and database by preventing unauthorized access in the event the firewall 10 for the web browsing server fails or is compromised. Any type of firewall protection technique commonly known to one skilled in the art could be used, including packet filter, proxy server, application gateway, or circuit-level gateway techniques.
  • the web browsing server 4 accesses the web site 16 specified by the URL 84 in the user-defined web site record 80 and retrieves the user-defined information from that site using the content extraction agent and specified content descriptor file specified in the content extraction agent command 86. Since the web browsing server 4 uses the URL and retrieves new information from the Internet each time a request is made, the requested information is always updated.
  • the content information received from the responding web site 16 is then processed by the web browsing server 4 according to the associated content descriptor file.
  • This processed response is then transmitted to the media server 8 for conversion into audio messages using either the speech synthesis engine 32 or selecting among a database of prerecorded voice responses contained within the database 2.
  • This message is then transmitted to the user's voice enabled device 14.
  • the web sites accessible by the personal information retrieval system and voice browser of the preferred embodiments may use any type of mark-up language, including Extensible Markup Language (XML), Wireless Markup Language (WML), Handheld Device Markup Language (HDML), Hyper Text Markup Language (HTML), or any variation of these languages.
  • XML Extensible Markup Language
  • WML Wireless Markup Language
  • HDML Handheld Device Markup Language
  • HTML Hyper Text Markup Language
  • a single communications system offers a plurality of communication services to users from a single provider. These services are required by users in order to effectively communicate with others and manage personal, as well as business, information.
  • the system provided by the fourth preferred embodiment offers to users the following services for home and business uses: local telephone service, cellular telephone service, long distance service, Internet access service, and a variety of messaging services that include voice mail, facsimile, electronic mail ("email"), and paging.
  • the system provides users with the option to obtain multiple email and voice mail accounts. Further, the system allows users to select either dial-up Internet access service or to broadband Internet access service, which includes Digital Subscriber Line service (DSL) and cable modem service.
  • DSL Digital Subscriber Line service
  • a communications system allowing a user to subscribe to each of these services from a single provider enables each service to be integrated together and operate seamlessly to the user.
  • This integration allows each service to easily and efficiently communicate with and transfer data to other services as will be described below.
  • a user also acquires these services from many different companies is subject to incompatibility problems which resulting from the varying products used by the different companies.
  • the system provided by the fourth preferred embodiment eliminates these incompatibility problems since all services are provided by the same company. This ability to eliminate incompatibility problems can improve user interest in the "collectively bundled" group of services made available by the provider. Further, significant cost economies can be realized by a service provider with the ability to offer a "collectively bundled" group of services. These cost economies can improve customer satisfaction with the service provider.
  • Providing a bundle of services enables the service provider to lower the unit costs for each service since a user will be subscribing to a several services provided by the same company.
  • Several economic advantages will also be realized by the service provider.
  • Much of the infrastructure (i.e., hardware and software) required for a service provider to provide the various services can be used for multiple services. Therefore, as the number of different services offered by a service provider increases, the less the amount of expenditures that need to be made for capital improvements.
  • the provider will be able to market a wide variety of services to users, each of which can be offered at a lower per unit cost than competitors. Further, the service provider will be able to actually increase revenues since it has the ability to provide multiple services. Referring to FIG.
  • the communications system 90 of the fourth preferred embodiment may be accessed by users via a voice enabled device 92 (i.e., any type of wireline or wireless telephone, Internet Protocol (IP) phones, or other special wireless units) or a computer 96 using a modem or other communication connection (i.e., Digital Subscriber Line connection, cable modem, LAN, or WAN).
  • a voice enabled device 92 i.e., any type of wireline or wireless telephone, Internet Protocol (IP) phones, or other special wireless units
  • IP Internet Protocol
  • a user accesses the communications system 90 via a telephone by calling a toll-free number.
  • FIG. 10 depicts a control system 100 used with the communications system 90 of the fourth preferred embodiment.
  • the control system 100 functions as a central system that monitors and controls the features, services, and functions of the communications system 90.
  • a user accesses the control system 100, he is presented with an operating menu that enables the user to control all of the services and features of the communications system 90.
  • the control system 100 provides three methods for a user to access the operating menu and handle his communications.
  • the control system contains a speech recognition software engine 102.
  • This speech recognition engine 102 uses phonemes to recognize speech commands. Therefore, the speech recognition engine can recognize naturally spoken speech commands and is speaker-independent; it does not have to be trained to recognize the speech patterns of each individual user.
  • a preferred speech recognition engine is developed by Nuance Communications of 1380 Willow Road, Menlo Park, California 94025 (www.nuance.com).
  • the natural speech recognition grammars i.e., what a user can say that will be recognized by the speech recognition engine) were developed by Webley Systems.
  • the control system 100 also contains a speech synthesis software engine 104 that converts text messages into audio messages that may be transmitted to a user.
  • a preferred speech synthesis engine is developed by Lernout and Hauspie Speech Products, 52 Third Avenu, Burlington, Massachusetts 01803 (www.lhsl.com).
  • the control system 100 also contains a call processing system 106 and telephony and voice hardware 108 required to interface the communications system 90 with the Public Switched Telephone Network (PSTN) 94.
  • PSTN Public Switched Telephone Network
  • a user may also access the communication system 90 using the touch-tone signals provided by the users voice enabled device 92, such as a telephone.
  • the third option a user has for accessing the communication system 90 is via a computer 96.
  • a user may access and control all of the available features and services of via the operating menu provided by the control system 100.
  • the user's computer 96 may be connected directly with the communication system 90 using a modem and the Public Switched Telephone Network (PSTN) 94, or via the Internet 97.
  • PSTN Public Switched Telephone Network
  • the communications system 90 comprises a control system 100 that is a central system connected to a variety of different communication services in a ring-type manner similar to a token ring.
  • the services connected to the control system 100 in the preferred embodiment include a local telephone access service 110, a cellular telephone service 112, a voice mail service 114, a facsimile service 116, an email service 118, an Internet access service 120, a long distance service 122, and a pager service 124.
  • These services may also be sub-divided into "business" related services on "home” related services. That is, the local telephone service may be subdivided into "home local telephone access" and "business local telephone access”.
  • the local telephone access service 110 is typically provided as a landline service (i.e., it is a non- wireless service).
  • the local telephone access service 110 may also be provided using voice- over-IP technology. This allows telephone calls to be established using a data network such as the Internet.
  • the control system 100 can communicate with all of the services 110, 112, 114, 116,
  • the control system 100 provides an interface between all of the services 110, 112, 114, 116, 118, 120, 122, and 124. By communicating through the control system 100, all of the services are able to communicate with each other. This enables a call being handled by the cellular telephone service 112 to be transferred to the voice mail service 114. Further, this arrangement allows a call being handled by the local telephone access service 110 to transfer the call to a user's cellular telephone, via the cellular telephone service 112, or to be transferred to the voice mail service 114. This example demonstrates a unique advantage of the preferred embodiment.
  • a single voice mail service 114 can be used to record messages from callers that have called either a user's cellular telephone or wireline telephone (the telephone associated with the local telephone access service). Therefore, a user who subscribes to these three services will be provided with the advantage of being able to retrieve all voice mail messages from one location.
  • the control system 100 can transfer calls, email messages, facsimile messages, or voice mail messages between any of the services that are part of the communications system 90.
  • a user can therefore instruct the control system 100 to forward any received email messages to a local facsimile machine specified within the facsimile service 116.
  • facsimile messages received for a user may be subsequently forwarded to a user's email address maintained by the email service 118.
  • the ability of the communications system 90 to provide users with a variety of options for receiving and routing messages and calls are beneficial features for users. These options provide users with an added level of control over how they manage information and communicate with others.
  • the communications system 90 of this embodiment also enables users to transfer telephone calls to other locations for a reduced fee.
  • the service provider may easily monitor the number of call transfers attempted by the user.
  • the service provider may allow a user to transfer a received call to one location free of charge. Any additional transfers would be subject to a fee. For instance, a user may specify that all incoming calls to the communications system 90 for a user should be transferred to a business telephone. This transfer would be done free of charge. However, if the user decides to transfer the call to an additional device, such as a cellular telephone, the user would be charged for this additional transfer.
  • This method for transferring telephone calls would present a substantial reduction in costs for cellular telephones users who subscribe to the cellular service 112 of the communications system 90.
  • Most cellular service providers charge per minute fees for any call received on a cellular telephone.
  • all calls that are initially transferred to this cellular telephone from the control system 100 would be free. No per minute usage fees would be charged. This would present a dramatic cost reduction for most users who previously possessed cellular service from an outside provider.
  • This feature allows users to instruct the communications system 90 to forward received calls to a second telephone number if the user does not answer the call at the first designated telephone number. Users can program the communications system 90 to "follow" the user by sequentially transferring a received call to different locations or services communications devices until the user is contacted.
  • the user can create a list of predetermined contact numbers used by the system 90 in trying to locate the user. This list may include telephone numbers for office, home, cellular telephone, pager and other designated locations. The user may also indicate the order in which the system 90 should call each of the numbers in trying to locate the user.
  • the "Follow Me” feature also logs the originating telephone number used by the user when accessing the communications system 90 to retrieve stored messages or make a telephone call. A user can instruct the system 90 to subsequently use this number to re- contact the user when an incoming call is received for that subscriber. For example, a user may be traveling and have the communications system 90 forward all telephone calls to the hotel room where the user is staying. Further, since the communications system is accessible via the telephone, the user is able to obtain and send messages from the hotel telephone.
  • the system advises the user of the telephone number of the calling party, and/or the callers identity.
  • the system may recognize the caller's identity by comparing the telephone number of the caller with a user's contact list which is stored within a database 126 connected with the control system 100. If the user is already on the telephone when a new call is received, the system will whisper the pending inbound call information to the user, allowing the user the option to take the call, thereby putting the user's current call on hold, or direct the pending inbound call into the user's voice mail system provided by the voice mail service 114.
  • the integrated nature of the services provided by the fourth preferred embodiment allows a user to either retrieve email messages using an Internet connection to the communication system 90 or retrieve then by having then read to the user over a telephone using the speech synthesis engine 104.
  • the integrated features of the communications system enables users to immediately respond to email messages by another email message, a voice mail message, or placing a call to the originator of the email message.
  • a speech-to- text feature enables users to create email messages using only a telephone. Additionally, the speech recognition feature discussed above allows users to edit, forward, saving or deleting email messages using speech commands and a telephone.
  • the speech synthesis engine 104 may also be used by users to review by telephone facsimile messages received by the facsimile service 116. Further, the speech-to-text feature * may be used to create facsimile messages by telephone. The user may then issue speech commands, which are recognized by the speech recognition engine 102, to send, edit, forward, or delete the existing or newly created facsimile messages.
  • the communication system 90 of the of the fourth embodiment also contains a
  • notification feature that enables users to be notified of messages received by the system.
  • a user can program the communications system 90 to notify the user via a pager, using the paging service 124, when an incoming message has been received. This notification can further indicate whether the incoming message is a voice mail, email, or a facsimile message.
  • the communications system 90 of the preferred embodiment enables users to maintain a contact list maintained in a database 126 accessible by the control system 100.
  • This contact list enables users to broadcast email, voice mail, or facsimile messages to groups of contacts.
  • This contact list can be accessed by the user at anytime using either an Internet connection or telephone connection with the communications system 90.
  • the speech recognition engine 102 described above enables users to access and edit the contact list and send messages to contacts by using simple speech commands.
  • the "collectively bundled" communication system also allows users to retrieve, on demand or at predetermined intervals, selected information from the Internet.
  • a user may establish predefined Internet searches using the Internet access service 106 of the communications system 90. The user can then specify that the Internet access service 106 perform the search using an Internet search engine (e.g., www.yahoo.com). The search can be performed upon receiving a speech command from the user or it may periodically be executed based upon a schedule set by the user.
  • the control system 100 would then notify the user of the search results using the method specified by the user. For example, the user may select to by notified of the search results by email, voice mail, or facsimile- Additionally, the speech synthesis engine 104 may be used to read the search results to the user over a telephone connection.
  • a user can access the communication system 90 via a computer 96.
  • the system 90 allows a user to access and play voice mail messages (using a downloadable audio player, such as RealPlayer, obtainable from www.real.com), read and send email and facsimile message, and manage the user's contact list using computer connection established through the Internet access service 120 or through a direct dial-up connection using the local telephone access service 110.
  • voice mail messages using a downloadable audio player, such as RealPlayer, obtainable from www.real.com
  • RealPlayer obtainable from www.real.com

Abstract

Known networks (12, 18) can be utilized to provide a host of services to a user. Web browsing (4) allows internet access (12) to web sites (16) of the user's choice. Telephone (14) access is also possible through a public switched telephone network (18). Automated servers (8) may route information according to user input and a database (2) which contains user related information.

Description

ROBUST VOICE AND DEVICE BROWSER SYSTEM INCLUDING UNIFIED BUNDLE OF TELEPHONE AND NETWORK SERVICES
FIELD OF THE INVENTION
The present invention relates to a robust and highly reliable system that allows users to browse web sites and retrieve information by using conversational voice commands. Additionally, the present invention allows users to control and monitor other systems and devices that are connected the Internet or any other network by using voice commands. Additionally, the invention relates to a personalized system for accessing information from the Internet or other information sources using speech commands.
Further, the present invention relates to a method for providing integrated Internet and telecommunications services from a common provider that enables subscribers to inexpensively send, receive, and transfer telephone calls, email messages, voice mail messages, paging messages, and facsimile messages.
BACKGROUND OF THE INVENTION
Currently, three options exist for a user who wishes to gather information from a web site accessible over the Internet. The first option is to use a desktop or a laptop computer connected to a telephone line via a modem or connected to a network with Internet access. The second option is to use a Personal Digital Assistant (PDA) that has the capability of connecting to the Internet either through a modem or a wireless connection. The third option is to use one of the newly designed web-phones or web-pagers that are now being offered on the market. Although each of these options provide a user with access to the Internet to browse web sites, each of them have their own drawbacks.
Desktop computers are very large and bulky and are difficult to transport- Laptop computers solve this inconvenience, but many are still quite heavy and are inconvenient to carry. Further, laptop computers cannot be carried and used everywhere a user travels. For instance, if a user wishes to obtain information from a remote location where no electricity or communication lines are installed, it would be nearly impossible to use a laptop computer. Oftentimes, information is needed on an immediate basis where a computer is not accessible. Furthermore, the use of laptop or desktop computers to access the Internet requires either a direct or a dial-up connection tan an Internet Service Provider (ISP). Oftentimes, such connections are not available when a user desires to connect to the Internet to acquire information.
The second option for remotely accessing web sites is the use of PDAs. These devices also have their own set of drawbacks. First, PDAs with the ability to connect to the Internet and access web sites are not readily available. As a result, these PDAs tend to be very expensive. Furthermore, users are usually required to pay a special service fee to enable the web browsing feature of the PDA. A further disadvantage of these PDAs is that web sites must be specifically designed to allow these devices to access information on the web site. Therefore, a limited number of web sites are available that are accessible by these web-enabled PDAs. Finally, it is very common today for users to carry cell phones, however, users must also carry a separate PDA if they require the ability to gather information from various web sites. Users are therefore subjected to added expenses since they must pay for both cellular telephone service and also for the web-enabling service for the PDA. This results in a -very expensive alternative for the consumer.
The third alternative mentioned above is the use of web-phones or web-pagers. These devices suffer many of the same drawbacks as PDAs. First, these devices are expensive to purchase. Further, the number of web sites accessible to these devices is limited since web sites must be specifically designed to allow access by these devices.
Furthermore, users are often required to pay an additional fee in order to gain wireless web access. Again, this service is expensive. Another drawback of these web-phones or web- pagers is that as technology develops, the methods used by the various web sites to allow access by these devices may change. These changes may require users to purchase new web-phones or web-pagers or have the current device serviced in order to upgrade the firmware or operating system stored within the device. At the least, this would be inconvenient to users and may actually be quite expensive.
Therefore, a need exists for a system that allows users to easily access and browse the Internet from any location. Such a system would only require users to have access to any type of telephone and would not require users to subscribe to multiple services.
In the rapidly changing area of Internet applications, web sites change frequently. The design of the web site may change, the information required by the web site in order to perform searches may change, and the method of reporting search results may change. Web browsing applications that submit search requests and interpret responses from these web sites based upon expected formats will produce errors and useless responses when such changes occur. Therefore, a need exists for a system that can detect modifications to web sites and adapt to such changes in order to quickly and accurately provide the information requested by a user through a voice enabled device, such as a telephone. When users access web sites using devices such as personal computers, delays in receiving responses are tolerated and are even expected, however, such delays are not expected when a user communicates with a telephone. Users expect communications over a telephone to occur immediately with a minimal amount of delay time. A user attempting to find information using a telephone expects immediate responses to his search requests. A system that introduces too much delay between the time a user makes a request and the time of response will not be tolerated by users and will lose its usefulness. Therefore, it is important that a voice browsing system that uses telephonic communications selects web sites that provide rapid responses since speed is an important factor for maintaining the system's desirability and usability. Therefore, a need exists for a system that accesses web sites based upon their speed of operation.
Popular methods of information access and retrieval using the Internet or other computer networks can be time-consuming and complicated. A user must frequently wade through vast amounts of information provided by an information source or web site in order obtain a small amount of relevant information. This can be time-consuming, frustrating, and, depending on the access method, costly. A user is required to continuously identify reliable sources of information and, if these information sources are used frequently, repeatedly access these sources.
Current methods of accessing information stored on computer networks, such as Wide Area Networks (WANs), Local Area Network (LANs) or the Internet, require a user to have access to a computer. While computers are becoming increasingly smaller and easier to transport, using a computer to access information is still more difficult than simply using a telephone. Since speech recognition systems allow a user to convert his voice into a computer-usable message, telephone access to digital information is becoming more and more feasible. Voice recognition technology is growing in its ability to allow users to use a wide vocabulary. Further, such technology is quite accurate when a single, known user only needs to use a small vocabulary.
Therefore, a need exists for an information access and retrieval system and method that allows users to access frequently needed information from information sources on networks by using a telephone and simple speech commands.
People use a multitude of different services to communicate with one another. Wireline telephone services, cellular telephone service, facsimile messages, email messages, voice mail messages, pager services, and Internet access services are just some of the important methods and services widely used for business and personal communications. For many different purposes, people require the ability to send and receive messages, access information, conduct business transactions, organize daily schedules, and stay in touch with homes and offices from almost anywhere, at any time, in an easy to use and economical manner. Continued demand for products and services that address these communication needs is evidenced by the increasing number of portable electronic devices, such as cellular telephones, pagers, and Personal Digital Assistants (PDAs). Furthermore, the explosive growth of Internet and related networking services demonstrates the importance of such systems to personal communications and the ability to quickly and easily access information. These networks currently host a variety of services such as contact lists, scheduling and date book information, electronic mail, conferencing, electronic commerce, games, software libraries and electronic newspapers and magazines.
Despite the proliferation of communication devices and the development of the Internet and other business networks, significant barriers remain to fulfilling a user's needs for a simple and economical system allowing access to not only personal information, but also professional and public information.
The hardware designs and software technologies that enable today's communication are complex and expensive. User's must often maintain multiple different devices (i.e., a cell phone, a fax machine, and a computer) in order to utilize the available communication opportunities. Further, a user is often require to pay for multiple service subscriptions, such as cellular service, Internet service, and wireline telephone service. Therefore, it can become quite costly for a user to take full advantage of the communication services required by the user.
Additionally, interfacing all of the available communication services with one another is a very complex task. Each service provider organizes data differently, uses different hardware, and charges a variety of different fees depending upon the activities of the user.
The problem of accessing and processing all of the available information from communication systems, networks and services is particularly acute for mobile business professionals. The mobile professional, whether working out of the home or while on the road, may have a cellular telephone, a facsimile machine, a pager, intranet mail, Internet mail, and voice mail services. Success for this professional, depends in large part on the ability to easily, quickly and inexpensively access, sort, and respond to the messages delivered to each of these communication devices and on the ability to obtain necessary information to conduct business within proliferating networks and services.
Therefore, a need exists for a system and method for providing an integrated communications system that bundles all services required by users. SUMMARY OF THE INVENTION
It is an object of an embodiment of the present invention to allow users to gather information from web sites by using voice enabled devices, such as wireline or wireless telephones.
An additional object of an embodiment of the present invention is to provide a system and method that allows the searching and retrieving of publicly available information by controlling a web browsing server using naturally spoken voice commands.
It is an object of another embodiment of the present invention to provide a robust voice browsing system that can obtain the same information from several web sites based upon a ranking order. The ranking order is automatically adjusted if the system detects that a given web site is not functioning, is too slow, or has been modified in such a way that the requested information cannot be retrieved any longer.
A still further object of an embodiment of the present invention is to allow users to gather information from web sites from any location where a telephonic connection can be made. Another object of an embodiment of the present invention is to allows users to browse web sites on the Internet using conversational voice commands spoken into wireless or wireline telephones or other voice enabled devices.
An additional object of an embodiment of the present invention is to provide a system and method for using voice commands to control and monitor devices connected to a network.
One object of a preferred embodiment of the present invention is to allow users to customize a voice browsing system.
A further object of a preferred embodiment is to allow users to customize the information retrieved from the Internet or other computer networks and accessed by speech commands over telephones.
It is an object of an embodiment of the present invention to provide a system and method for providing a bundle of communication services from a single provider. A further object of an embodiment of the present invention is to provide a system and method for interfacing a plurality of different communication services enabling each service to transfer data and calls to another service.
Another object of an embodiment of the present invention to provide a reduced cost system and method for transferring data and calls to multiple locations.
The present invention relates to a system for acquiring information from sources on a network, such as the Internet. A voice browsing system maintains a database containing a list of information sources, such as web sites, connected to a network. Each of the information sources is assigned a rank number which is listed in the database. In response to a speech command received from a user, a network interface system accesses the information source with the highest rank number in order to retrieve information requested by the user.
A preferred embodiment of the present invention allows users to access and browse web sites when they do not have access to computers with Internet access. This is accomplished by providing a voice browsing system and method that allows users to browse web sites using conversational voice commands spoken into any type of voice enabled device (i.e., any type of wireline or wireless telephone, IP phone, wireless PDA, or other wireless device). These spoken commands are then converted into data messages by a speech recognition software engine running on a user interface system. These data messages are then sent to and processed by a network interface system. This network interface system then generates the proper requests that are transmitted to the desired web site over the Internet. Responses sent from the web site are received and processed by the network interface system and then converted into an audio message via a speech synthesis engine or a pre-recorded audio concatenation application and finally transmitted to the user's voice enabled device.
A preferred embodiment of the voice browser system and method uses a web site polling and ranking methodology that allows the system to detect changes in web sites and adapt to those changes in real-time. This enables the voice browser system of a preferred embodiment to deliver highly reliable information to users over any voice enabled device. This ranking system also enables the present invention to provide rapid responses to user requests. Long delays before receiving responses to requests are not tolerated by users of voice-based systems, such as telephones. When a user speaks into a telephone, an almost immediate response is expected. This expectation does not exist for non-voice communications, such as email transmissions or accessing a web site using a personal computer. In such situations, a reasonable amount of transmission delay is acceptable. The ranking system of implemented by a preferred embodiment of the present invention ensures users will always receive the fastest possible response to their request.
A second embodiment of the present invention allows users to control and monitor the operation of a variety of household devices connected to a network using speech commands spoken into a voice enabled device.
A third embodiment of present invention enables a user to create a user-defined record in the database that identifies an information source, such as a web site, containing information of interest to the user. This record identifies the location of the information source and also contains a recognition grammar assigned by the user. Upon receiving a speech command from the user that is described with the assigned recognition grammar, a network interface system accesses the information source and retrieves the information requested by the user.
In accordance with the third embodiment of the present invention, a customized, voice-activated information access system is provided. A user creates a descriptor file defining specific information found on a web site the user would like to access in the future. The user then assigns a pronounceable name or identifier to the selected content and this pronounceable name is saved in a user-defined database record as a recognition grammar along with the URL of the selected web site. In this third embodiment, when a user wishes to retrieve the previously defined web- based information, a telephone call is placed to a media server. The user provides speech commands to the media server which include the recognition grammar assigned to the desired search. Based upon the recognition grammar, the media server retrieves the user- defined record from a database and passes the information to a web browsing server which retrieves the information from associated web site. The retrieved information is then transmitted to the user using a speech synthesis software engine.
A fourth embodiment of the present invention provides a unified communications system that provides a variety of different communication services from a single service provider. These services include local telephone service, long distance, cellular telephone service, Internet access, voice mail, email, facsimile service, and paging services. Each of the different communication services are linked together by a system controller operated by a single service provider. The unified system allows users to easily and economically transfer information received by one of the communication services to a second communication service. The system implements speech recognition technology thereby allowing users to control all of the communication services uses speech commands. Further, the communications system provides a single operating menu that allows users to control and access all of the features and services provided by the system. This operating menu may be accessed using speech commands, touch-tone commands, or via a computer. BRIEF DESCRIPTION OF THE DRAWINGS
FIG. 1 is a depiction of the voice browsing system of the first embodiment of the present invention;
FIG. 2 is a block diagram of a database record used by the first preferred embodiment of the present invention; FIG. 3 is a block diagram of a media server used by the preferred embodiment;
FIG. 4 is a block diagram of a web browsing server used by the preferred embodiment;
FIG. 5 is a depiction of the device browsing system of the second embodiment of the present invention. FIG. 6 depicts a personal information selection system used with a third preferred embodiment of the present invention;
FIG. 7 depicts a web page displayed by the clipping client of the third preferred embodiment;
FIG. 8 is a block diagram of a user-defined database record used by the third preferred embodiment of the present invention;
FIG. 9 is a block diagram showing methods available for users to access the communications system of the fourth preferred embodiment;
FIG. 10 is a block diagram of a system controller used with the fourth preferred embodiment; and FIG. 11 is a block diagram of the various services that may be provided by single service provider according to the fourth preferred embodiment.
DETAILED DESCRD?TION OF THE PREFERRED EMBODIMENTS
A first embodiment of the present invention is a system and method for allowing users to browse information sources, such as web sites, by using naturally spoken, conversational voice commands spoken into a voice enabled device. Users are not required to learn a special language or command set in order to communicate with the voice browsing system of the present invention. Common and ordinary commands and phrases are all that is required for a user to operate the voice browsing system. The voice browsing system recognizes naturally spoken voice commands and is speaker-independent; it does not have to be trained to recognize the voice patterns of each individual user. Such speech recognition systems use phonemes to recognize spoken words and not predefined voice patterns. The first embodiment allows users to select from various categories of information and to search those categories for desired data by using conversational voice commands. The voice browsing system of the first preferred embodiment includes a user interface system referred to as a media server. The media server contains a speech recognition software engine. This speech recognition engine is used to recognize natural, conversational voice commands spoken by the user and converts them into data messages based on the available recognition grammar. These data messages are then sent to a network interface system. In the first preferred embodiment, the network interface system is referred to as a web browsing server. The web browsing server then accesses the appropriate information source, such as a web site, to gather information requested by the user. Responses received from the information sources are then transferred to the media server where speech synthesis engine converts the responses into audio messages that are transmitted to the user. A more detailed description of this embodiment will now be provided.
Referring to FIG. 1, a database 2 designed by Webley Systems Incorporated is connected to one or more web browsing servers 4 as well as to one or more media servers 8. The database 2 contains a separate set of records for each web site accessible by the system. An example of a web site record is shown in FIG. 2. Each web site record 20 contains the rank number of the web site 22, the associated Uniform Resource Locator (URL) 24, and a command that enables the appropriate "extraction agent" 26 that is required in order to generate proper requests sent to and to format data received from the web site. The database record 20 also contains the timestamp 28 indicating the last time the web site was accessed. The extraction agent is described in more detail below. The database 2 categorizes each database record 20 according to the type of information provided by each web site. For instance, a first category of database records 20 may correspond to web sites that provide "weather" information. The database 2 may also contain a second category of records 20 for web sites that provide "stock" information. These categories may be further divided into sub categories. For instance, the "weather" category may contain subcategories depending upon type of weather information available to a user, such as "current weather" or "extended forecast". Within the "extended forecast" subcategory, a list of web site records may be stored that provide weather information for multiple days. The use of subcategories may allow the web browsing feature to provide more accurate, relevant, and up-to-date information to the user by accessing the most relevant web site. The number of records contained in each category or subcategory is not limited. In the preferred embodiment, three web site records are provided for each category.
Table 1 below depicts two database records 20 that are used with the preferred embodiment. These records also contain a field indicating the "category" of the record, which is "weather" in each of these examples.
TABLE 1 category: weather
URL: URL=http: //cgi . cnn. com/cgi-bin/weather/redirect?zip=__zip rank: 1 command: web_dispatch.pl weather_cnn browsingServer : wportall browsingServerBackup: wportal2 dateTime: Dec 21 2000 2:15PM
category: weather
URL: URL=http: //weather . lycos . co /wcfiveday . asp?city=zip rank: 2 command: web_dispatch.pl weather_lycos browsingServer: wportall browsingServerBackup: wportal2 dateTime: Dec 21 2000 1:45PM
The database also contains a listing of pre-recorded audio files used to create concatenated phrases and sentences. Further, database 2 may contain customer profile information, system activity reports, and any other data or software servers necessary for the testing or administration of the voice browsing system.
The operation of the media servers 8 will now be discussed in relation to FIG. 3. The media servers 8 function as user interface systems. In the preferred embodiment, the media servers 8 contain a speech recognition engine 30, a speech synthesis engine 32, an Interactive Voice Response (IVR) application 34, a call processing system 36, and telephony and voice hardware 38 required to communicate with the Public Switched Telephone Network (PSTN) 18. In the preferred embodiment, each media server is based upon Intel's Dual Pentium III 730 MHz microprocessor system. The speech recognition function is performed by a speech recognition engine 30 that converts voice commands received from the user's voice enabled device 14 (i.e., any type of wireline or wireless telephone, Internet Protocol (IP) phones, or other special wireless units) into data messages. In the preferred embodiment, voice commands and audio messages are transmitted using the PSTN 18 and data is transmitted using the TCP/IP communications protocol. However, one skilled in the art would recognize that other transmission protocols may be used for either voice or data. Other possible transmission protocols would include SπVVoIP (Session Initiation Protocol/Voice over IP), Asynchronous Transfer Mode (ATM) and Frame Relay. A preferred speech recognition engine is developed by Nuance Communications of 1380 Willow Road, Menlo Park, California 94025 (www.nuance.com). The Nuance engine capacity is measured in recognition units based on CPU type as defined in the vendor specification. The natural speech recognition grammars (i.e., what a user can say that will be recognized by the speech recognition engine) were developed by Webley Systems. Table 2 below provides a partial source code listing of the recognition grammars used by the speech recognition engine of the preferred embodiment for obtaining weather information.
TABLE 2
? HAT_IS ?the weather ?[info information report conditions]
? ( (?like in )
[ UScities :n {<paraml $n.zip> <param2 $n.city> <param3
$n. state>} ( (area code) AREA_CODE:n ) {<paraml $n>} ( AREA_CODE:n (area code) ) {<paraml $n>} ( (zip ?code) ZIP_CODE:n ) {<paraml $n>} ( ZIP_CODE:n (zip ?code) ) {<paraml $n>}
] )
) {<menu 194>}
The media server 8 uses recognition results generated by the speech recognition engine 30 to retrieve a web site record 20 stored in the database 2 that can provide the information requested by the user. The media server 8 processes the recognition result data identifying keywords that are used to search the web site records 20 contained in the database 2 For instance, if the user's request was "What is the weather in Chicago?", the keywords "weather" and "Chicago" would be recognized. A web site record 20 with the highest rank number from the "weather" category within the database 2 would then be selected and transmitted to the web browsing server 4 along with an identifier indicating that
Chicago weather is being requested.
The media servers 8 also contain a speech synthesis engine 32 that converts the data retrieved by the web browsing servers 4 into audio messages that are transmitted to the user's voice enabled device 14. A preferred speech synthesis engine is developed by Lernout and Hauspie Speech Products, 52 Third Avenue, Burlington, Massachusetts 01803 (www.lhsl.com).
A further description of the web browsing server 4 will be provided in relation to FIG. 4. The web browsing servers 4 provide access to any computer network such as the Internet 12. These servers are also capable of accessing databases stored on Local Area Networks (LANs) or Wide Area Networks (WANs). The web browsing servers receive responses from web sites and extract the data requested by the user. This task is also known as "content extraction." The web browsing servers 4 also perform the task of periodically polling or "pinging" various web sites and modifying the ranking numbers of these web sites depending upon their response and speed. This polling feature is further discussed below. The web browsing server 4 is comprised of a content extraction agent 40, a content fetcher 42, a polling and ranking agent 44, and the content descriptor files 46. Each of these are software applications and will be discussed below.
Upon receiving a web site record 20 from the database 2 in response to a user request, the web browsing server 4 invokes the "content extraction agent" command 26 contained in the record 20. The content extraction agent 40 allows the web browsing server 4 to properly format requests and read responses provided by the web site 16 identified in the URL field 24 of the web site record 20. Each content extraction agent command 26 invokes the content extraction agent and identifies a content description file associated with the web page identified by the URL 24. This content description the directs the extraction agent where to extract data from the accessed web page and how to format a response to the user utilizing that data. For example, the content description for a web page providing weather information would indicate where to insert the "city" name or ZIP code in order to retrieve Chicago weather information. Additionally, the content description file for each supported URL indicates the location on the web page where the response information is provided. The extraction agent 40 uses this information to properly extract from the web page the information requested by the user.
Table 3 below contains source code for a content extraction agent 40 used by the preferred embodiment.
TABLE 3
# ! /usr/local/www/bin/sybperl5 #$Header:
/usr/local/cvsroot/webley/agents/service/web_dispatch.pl, v 1.6 # Dispatches all web requests
#http : //wcorp . itn.net/cgi/flstat?carrier=ua&flight_no=155&mon_ abbr=jul&date=
6--stamp=-C>hLN~PdbuuE*itn/ord, itn/cb/sprint_hd
#http: //cgi . cnnfn.com/flightview/rlm?airline=amt&number=300
require "config_tmp.pl";
# check parameters die "Usage: $0 service [params]\n" if $#ARGV < 1; #print STDERR @ARGV;
# get parameters my ( $service, @param ) = @ARGV;
# check service my %Services = ( weather_cnn => 'webget.pl weather_cnn' , weather_lycos ==> 'webget.pl weather lycos' , weather weather => 'webget.pl weather weather' weather_snap => 'webget.pl weather_snap' , weather_infospace => 'webget.pl weather_infospace' , stockQuote_yahoo => 'webget.pl stock', flightstatus_itn —> 'webget.pl flight_delay' , yellowPages_yahoo => 'yp_data.pl', yellowPages_yahoo => 'yp_data.pl', newsHeaders_newsreal => 'news.pl', newsArticle_newsreal => 'news.pl',
) ; # test param my $date = ate ; chop ( $date ) ; my ( $short_date ) = $date =~ /\s+ (\w{3} \s+\d{l, 2 } ) \s+/; my %Test = ( weather_cnn *=> '60053', weather_lycos => ' 60053' , weather_weather => ' 60053' , weather_snap => ' 60053' , weather_infospace => '60053', stσckQuote_yahoo => 'msft', flightStatus_itn => ' ua 155 ' .
$short_date, yellowPages_yahoo *=> 'tires 60015', newsHeaders_newsreal => '1', newsArticle_newsreal => '1 1', ) ; die "$date: $0: error: no such service: $service (check this script) \n" unless $Services { $service }; # prepare absolute path to run other scripts my ( $path, $script ) = $o =~ m|Λ(.*/) ([Λ/]*) I;
# store the service to compare against datatable my $service__stored = $service;
# run service while ( !( $response = $path$Services {$service }@paramx ) ) {
# response failed
# check with test parameters $response = Λ$path$Services {$service }$Test {$service } ;
# print "test: $path$Services {$service }$Test {$service }" ; if ( $response ) {
$service = &switch_service ( $service );
# print "Wrong parameter values were supplied: $service -
@param\n" ;
# die "$date: $0: error: wrong parameters: $service - @param\n" ;
} else {
# change priority and notify $service = &increase_attempt ( $service ); } }
# output the response print $response; sub increase_attempt { my ( $service ) = @_; my ( $service name ) = split ( / /, $service ); print STDERR "$date: $0: attn: changing priority for service : $service\n" ;
# update priority &db_query( "update mcServiceRoute "
. "set priority = ( select max ( priority ) from mcServiceRoute "
. "where service = ' $service_name' ) + 1, ' . "date = getdateO, "
. " attempt = attempt + 1 " . "where route = '$script $service'" ); # print " $route===\n" ;
# find new route my $route = @{&db_query( "select route from mcServiceRoute "
. "where service =
'$service name' and attempt < 5 order by priority " )
}-> [ 0 ] {route }; &db_query ( "update mcServiceRoute " . "set attempt = 0 "
. "where route = '$script $service'" ) if ( $route eq "$script $service" or $route eq "$script $service_stored" ); ( $service_name, $service ) = split ( /\s+/, $route ); die "$date: $0: error: no route for the service: $service (add more) \n" unless $service; return $service; } sub switch_service { my ( $ service ) = @_; my ( $service_name ) = split ( /_// $service ); print STDERR "$date: $0: attn: changing priority for service:
$service\n" ;
# update priority
&db_query( "update mcServiceRoute "
. "set priority = ( select max ( priority ) from mcServiceRoute "
. "where service = ' $service_name' ) + 1, " . "date = getdateO "
. "where route = '$script $service'" ); # print " $route===\n" ;
# find new route my $route = @{&db_query( "select route from mcServiceRoute "
. "where service = ' $service_name' " . " and attempt < 5 rr
. "order by priority " )
}-> [ 0 ] {route }; die "$date: $0: error: there is the only service: $route (add more) \n" if ( $route eq "$script $service" or $route eq "$script $service_stored" ); ( $service_name, $service ) = split ( /\s+/, $route ); die "$date: $0: error: no route for the service: $service (add more) \n" unless $service; return $service; }
Table 4 below contains source code of the content fetcher 42 used with the content extraction agent 40 to retrieve information from a web site.
TABLE 4
# ! /usr/local/www/bin/sybperl5 #-T
# -w
# $Header: /usr/local/cvsroot/webley/agents/service/webget .pi, v 1.4
# Agent to get info from the web.
# Parameters: service_name [service_parameters] , i.e. stock sft or weather
60645 # Configuration stored in files service_name.ini
# if this file is absent the configuration is received from mcServices table
# This script provides autoupdate to datatable if the . ini file is newer.
$debug = 1; use URI: :URL; use LWP: :UserAgent; use HTTP: : Request : -.Common; use Vail : : NarList ; use Sybase : :CTlib; use HTTP: : Cookies;
#print "Sybase: :CTlib $DB_USR, $DB_PWD, $DB_SRV;"; open( STDERR, "»$0.1og" ) if $debug;
#open( STDERR, ">&STDOUT" );
$log = dateΛ;
#$response *= Λ . url.pl
"http: //cgi . cnn. com/cgi-bin/weather/redirect?zip=60605" λ; #$response = ,pwdΛ;
#print STDERR "pwd = $response\n" ;
#$response = Λlsλ;
#print STDERR "Is = $response\n" ; chop ( $log ) ; $log .= "pwd=" . λpwdv; chop ( $log ) ;
#$debug2 *= 1; my $service *= shift;
$log .= " $service: ". join( ':', @ARGV ) . "\n"; print STDERR $log if $debug; #$response = x./url.pl
"http : //cgi . cnn. co /cgi-bin/weather/redirect?zip=60605" Λ ; my @ini = &read_ini ( $service ) ; chop ( @ini ) ; my $section = "" ; do {$section = δprocess_section ( $section ) }while $section; #$response = ./url.pl "http : //cgi . cnn. com/cgi-bin/weather/redirect?zip=60605" λ ; exit;
####################################################### sub read_ini { my ( $service ) = @_; my @ini - () ;
# first, try to read file $0 =~ m|Λ(.*/) [Λ/]*|; $service = $1 . $service; if ( open( IΝI, " $service. ini" ) ) { @ini = ( <IΝI> ) ; return @ini unless ( $DB_SRV ) ;
# update datatable my $file time = time - int ( ( -M "$service.ini" )
24 * 3600 ) ;
# print "time $file_time\n" ; my $dbh = new Sybase :: CTlib $DB_USR, $DB_P D,
$DB_SRV; unless ( $dbh ) { print STDERR "webget.pl: Cannot connect to dataserver $DB SRN : $DB USR:$DB PWD\n" ; return @ini;
} my @row_refs = $dbh->ct_sql ( "select lastUpdate from mcServices where service = '$service'", undef, 1 ); if ( $dbh->{RC }== CS_FAIL ) { print STDERR "webget.pl: DB select from mcServices failed\n" ; return @ini; } unless ( defined @row_refs ) { # have to insert my ( @ini_escaped ) *= map {
( my $x = $_ ) =- sΛ'Λ'V/g; $x;
}@ini;
$dbh->ct_sql ( "insert mcServices values ( ' $service' ,
' @ini_escaped' , $file_time )" ); if ( $dbh->{RC }== CS_FAIL ) { print STDERR "webget.pl: DB insert to mcServices ailed\n" ;
} return @ini; }
# print "time $file_time:" . $row_refs [ 0 ]-
>{' lastUpdate'
}."\n"; if ( $file_time > $row_refs [ 0 ]->{' lastUpdate' }) { # have to update my ( @ini_escaped ) = map {
( my $x = $_ ) =~ s/VΛ'V/g; $x; }@ini; $dbh->ct_sql ( "update mcServices set config =
' @ini_escaped' , lastUpdate = $file_time where service = ' $ service'" ) ; if ( $dbh->{RC }== CS_FAIL ) { print STDERR "webget.pl: DB update to mcServices failed\n" ;
} } return @ini;
} else { print STDERR "$0: WARNING: $ service.ini n/a in " . ΛpwdΛ
. " Try to read DB\n" ; } # then try to read datatable die "webget.pl: Unable to find service $service\n" unless ( $DB_SRN
) ; my $dbh = new Sybase: :CTlib $DB_USR, $DB_PWD, $DB_SRN; die "webget.pl: Cannot connect to dataserver $DB_SRV:$DB_USR:$DB_PWD\n" unless ( $dbh ); my @row_refs = $dbh->ct_sql ( "select config from mcServices where service = '$service'", undef, 1 ); die "webget.pl: DB select from mcServices failedXn" if $dbh->{RC } == CS_FAIL; die "webget.pl: Unable to find service $service\n" unless ( defined @row_refs ) ; $row_refs[ 0 ]->{' config' }=~ s/\n /\n\r/g;
@ini = split ( /\r/, $row_refs[ 0 ]->{' config' }); return @ini; }
####################################################### sub process_section { my ( $prev_section ) = @_; my ( $section, $output, $content ) ; my %Param; my %Content; # print" ################################\n"; foreach ( @ini ) {
# print;
# chop; s/\s+$//; s/Λ\s+//;
# get section name if ( /Λ\[(.*)\]/ ) {
# print "$_: $section: $prev_section\n" ; last if $section; next if $1 eq "print";
# next if $prev_section ne "" and $prev_section ne $1; if ( $prev_section eq $1 ) { $prev_section = "" ; next ;
} $section = $1;
}
# get parameters push( @{$Param{$l }}, $2 ) if $section and
/([Λ*=]+) = (.*)/; }
# print"++++++++++++++++++++++++++++++++++\n" ; return 0 unless $section; # print "section $section\n" ;
# substitute parameters with values map {$Param{URL }->[ 0 ] =~ s/$Param{ Input }->[ $_ ]/$ARGV[ $_
]/g
}0 .. $#{$Param{ Input }};
# get page content
( $Content{'TIME' }, $content ) = &get_url_content ( ${$Param{URL }}[ 0 ] );
# filter it map { if ( Λ" ([Λ\"]+)\" ([Λ\"]*)\"/ or Λ/([Λ\/]+)\/([Λ\/]*)\// ) { my $out = $2; $content =~ s/$l/$out/g;
} }@{$Param{"Pre-filter" }}; #print STDERR $content;
# do main regular expression unless ( Θvalues = $content =~
/${$Param{Regular_expression }}[ 0
]/ ) { &die_hard( $ {$Param{Regular_expression }}[ 0 ],
$content
); return $section;
}
%Content = map {( $Param{ Output }->[ $_ ], $values [ $_ ]
)
}0 .. $#{$Param{ Output }}; # filter it map { if ( /([Λ\"] + )\" ([ "] + )V ([ΛV']*)\"/ or /([Λ\/]+)\/([Λ\/]+)\/([Λ\/]*)\// ) { my $out = $3; $Content{$l }=~ s/$2/$out/g;
} }@{$Param{"Post-filter" }};
# calculate it map { if ( /(["=] + ) = (.*)/ ) { my $eval = $2; map {$eval =~ s/$_/$Content {$_ }/g }keys %Content; $Content{$l }= eval ( $eval );
} } @ { $Param{ Calculate } } ;
# read section [print] foreach $i ( 0 .. $#ini ) { next unless $ini[ $i ] =~ /Λ\ [print\] /; foreach ( $i + 1 .. $#ini ) { last if $ini[ $_ ] =~ /Λ\[.+\]/; $output .= $ini[ $_ ] . "\n"; } last; }
# prepare output map {$output =~ s/$_/$Content { $_ }/g
}keys %Content; print $output; return 0;
} ####################################################### sub get_url_content { my ( $url ) ***= @_;, print STDERR $url if $debug;
# $response = ./url.pl '$url'λ; $response = /url.pl '$url'Λ; return ( $time - time, $response ); my $ua = LWP: :UserAgent->new;
$ua->agent( 'Mozilla/4.0 [en] (Xll; I; FreeBSD 2.2.8- STABLE i386) ' ) ;
# $ua->proxy( ['http', 'https'], 'http -.//proxy. webley: 3128/' );
# $ua->no_proxy ( 'webley', 'vail' ); my $cookie = HTTP: : Cookies->new; $ua->cookie_jar ( $cookie ); $url = url $url; print "$url\n" if $debug2; my $time = time; my $res = $ua->request ( GET $url ); print "Response: " . ( time - $time ) . "sec\n" if $debug2; return ( $time - time, $res->content ); }
####################################################### sub die_hard { my( $re, $content ) = @_; my ( $re_end, $pattern ) ; while ( $content !~ /$re/ ) { if ( $re =- s/(\([Λ\(\)]+\) [Λ\(\)]*$)// ) { $re_end = $1 . $re_end;
} el se {
$re_end = $re ; last ; } }
$content =~ /$re/; print STDERR "The regular expression did not match: \n $re\n
Possible misuse: $re_end:\n Matched: $&\n
Mismatched: $'\n " if $debug; if ( $debug ) { print STDERR " Content : \n $content\n" unless $'; } }
#######################################################
Table 5 below contains the content descriptor file source code for obtaining weather information from the web site www.cnn.com that is used by the extraction agent 40 of the preferred embodiment.
TABLE 5
[ cnn]
Input*=_zip
URL=http : / /cgi . cnn . com/ cgi-bin/weather/redirect? zip=_z ip Pre-filter=" \n" "
Pre-filter*="<[Λ<>]+>"" Pre-filter=/\s+/ / Pre-filter=" [\ (\) \ | ]" !" Output=_location
Output=first_day_name
Output=first_day_weather
Output=first_day_high_F
Output=first_day_high_C Output*=first_day low F Output=first_day_low_C
Output=second_day_name
Output=second_day_weather
Output=second_day_high_F Output=second_day_high_C
Output=second_day_low_F
Output=second_day_low_C
Output=third_day_name
Output=third_day_weather Output=third_day_high_F
Output=third_day_high_C
Output=third_day_low_F
Output=third_day_low_C
Output=fourth_day_name Output=fourth_day_weather
Output=fourth_day_high_F
Output=fourth_day_high_C
Output=fourth_day_low_F
Output=fourth_day_low_C Output=undef
Output=_current_time
Output=_current_month
Output=_curre t_day
Output=_current_weather Output=_current_temperature_F
Output*=_current_temperature_C
Output=_humidity
Output=_wind
Output=_pressure Output=_sunrise
Output=_sunset
Regular_expression=Author &nbsp; (.+) Four Day Forecast (\S+)
(\S+) HIGH (\S+) F (\S+) C LOW (\S+) F (\S+) C (\S+) (\S+) HIGH (\S+) F
(\S+) C LOW
(\S+) F (\S+) C (\S+) (\S+) HIGH (\S+) F (\S+) C LOW (\S+) F
(\S+) C (\S+)
(\S+) HIGH (\S+) F (\S+) C LOW (\S+) F (\S+) C (.+) Current Conditions ( .+)
! local!, (\S+) (\S+) (.+) Temp: (\S+) F, (\S+) C Rel.
Humidity: (\S+) Wind:
(.+) Pressure: (. + ) Sunrise: (.+) Sunset: (.+) Related Links Post-filter=_current_weather"p/"partly "
Post-filter=_current_weather" 1/" little "
Post-filter=_current_weather"m/"mostly "
Post-filter=_current_weather"t-/"thunder "
Post-filter*=_wind"N"North " Post-filter=_wind"E"East "
Post-filter= wind" S" South " Post-filter=_wind"W"West " Post-filter=_wind/mph/miles per hour/ Post-filter=_wind/kph! /kilometers per hour/ Post-filter=_wind"\s+!" , "
[print]
Current weather in _location is _current_weather . Temperature is _current_temperature_F Fahrenheit, _current_temperature_C Celsium.
Humidity is _humidity. Wind from the wind.
Table 6 below contains the content descriptor file source code for obtaining weather information from the web site www.lycos.com that is used by the extraction agent 40 of the preferred embodiment.
TABLE 6
[ lycos ]
Input=zip Input=_city
URL=http : / /weather . lycos . com/wcf iveday . asp?city=zip
Pre-filter=" \n" " Pre-filter=" </TD>" td" Pre-filter=" < ! . * ?->"" Pre-filter=" <br>"_br_" Pre-f ilter=/alt=" />alt=/ Pre-f ilter=" < [ Λ<>] +>"" Pre-f ilter=" &nbsp; " " Pre-filter=/\ s+/ /
Output=_location
Output=_current_weather
Output=_current_temperature_F Output=_humidity Output=_winddir Output=_windspeed Output=_windmeasure Output=_pressure Output=first_day_name Output=second_day_name Output*=:third_day name Output=fourth_day_name
Output=fifth_day_name
Output=first_day_weather
Output=second_day_weather Output=third_day_weather
Output=fourth_day_weather
Output*=fifth_day_weather
Output=first_day_high_F
Output=first_day_low_F Output*=second_day_high_F
Output=second_day_low_F
Output=third_day_high_F
Output*=third_day_low_F
Output-fourth_day_high_F Output*=fourth_day_low_F
Ou tpu t **= f i f t h_day_h i gh_F
Output=f if th_day_low_F
Output=_windkmh Regular_expression*=Guide My Lycos ( . + ) Click image to enlarge alt= ( X" ] + ) " ( ? : . + ) Temp : ( \d+ ) ( ? : . + ) F _br_ Humidity : ( \S+ ) ( ? : . + ) Wind : ( . + ? ) br
Output= current temperature C
Post-filter=_location"_br_""
Post-filter=_current_weather"p/"partly " Post-filter=_current_weather"m/"mostly "
Post-filter=_current_weather" t-/" thunder "
Post-filter=_winddir" @" at "
Post-filter=_winddir/mph/miles per hour/
Post-filter=_wind/kph ! /kilometers per hour/
Calculate=_current_temperature_C=int ( (_current_temperature_F-
32)*5/9)
Calculate*=_windkmh=int (_windspeed*l .6) [print]
The current weather in _location is __current_weather .
The current temperature is _current_temperature__F Farenheit
_current_temperature_C Celcius.
Humidity is _humidity.
Winds winddir.
Once the web browsing server 4 accesses the web site specified in the URL 24 and retrieves the requested information, the information is forwarded to the media server 8. The media server uses the speech synthesis engine 32 to create an audio message that is then transmitted to the user's voice enabled device 14. In the preferred embodiment, each web browsing server 4 is based upon Intel's Dual Pentium III 730 MHz microprocessor system. Referring to FIG. 1, the operation of the robust voice browser system will be described. A user establishes a connection between his voice enabled device 14 and a media server 8. This may be done using the Public Switched Telephone Network (PSTN) 18 by calling a telephone number associated with the voice browsing system 19. Once the r connection is established, the media server 8 initiates an interactive voice response (INR) application 34. The INR application plays audio messages to the user presenting a list of options, such as, "stock quotes", "flight status", "yellow pages", "weather", and "news". These options are based upon the available web site categories and may be modified as desired. The user selects the desired option by speaking the name of the option into the voice enabled device 14. As an example, if a user wishes to obtain restaurant information, he may speak into his telephone the phrase "yellow pages". The INR application would then ask the user what he would like to find and the user may respond by stating "restaurants". The user may then be provided with further options related to searching for the desired restaurant. For instance, the user may be provided with the following restaurant options, "Mexican Restaurants", "Italian Restaurants", or "American Restaurants". The user then speaks into the telephone 14 the restaurant type of interest. The INR application running on the media server 8 may also request additional information limiting the geographic scope of the restaurants to be reported to the user. For instance, the IVR application may ask the user to identify the zip code of the area where the restaurant should be located. The media server 8 uses the speech recognition engine 30 to interpret the speech commands received from the user. Based upon these commands, the media server 8 retrieves the appropriate web site record 20 from the database 2. This record and any additional data, which may include other necessary parameters needed to perform the user's request, are transmitted to a web browsing server 4. A firewall 6 may be provided that separates the web browsing server 4 from the database 2 and media server 8. The firewall provides protection to the media server and database by preventing unauthorized access in the event the firewall for web browsing server 10 fails or is compromised. Any type of firewall protection technique commonly known to one skilled in the art could be used, including packet filter, proxy server, application gateway, or circuit- level gateway techniques. The web browsing server 4 then uses the web site record and any additional data and executes the extraction agent 40 and relevant content descriptor file 46 to retrieve the requested information.
The information received from the responding web site 16 is then processed by the web browsing server 4 according to the content descriptor file 46 retrieval by the extraction agent. This processed response is then transmitted to the media server 8 for conversion into audio messages using either the speech synthesis software 32 or selecting among a database of prerecorded voice responses contained within the database 2.
As mentioned above, each web site record contains a rank number 22 as shown in FIG. 2. For each category searchable by a user, the database 2 may list several web sites, each with a different rank number 22. As an example, three different web sites may be listed as searchable under the category of "restaurants". Each of those web sites will be assigned a rank number such as 1, 2, or 3. The site with the highest rank (i.e., rank = 1) will be the first web site accessed by a web browsing server 4. If the information requested by the user cannot be found at this first web site, then the web browsing server 4 will search the second ranked web site and so forth down the line until the requested information is retrieved or no more web sites left to check.
The web site ranking method and system of the present invention provides robustness to the voice browser system and enables it to adapt to changes that may occur as web sites evolve. For instance, the information required by a web site 16 to perform a search or the format of the reported response data may change. Without the ability to adequately monitor and detect these changes, a search requested by a user may provide an incomplete response, no response, or an error. Such useless responses may result from incomplete data being provided to the web site 16 or the web browsing server 4 being unable to recognize the response data messages received from the searched web site 16.
The robustness and reliability of the voice browsing system of the present invention is further improved by the addition of a polling mechanism. This polling mechanism continually polls or "pings" each of the sites listed in the database 2. During this polling function, a web browsing server 4 sends brief requests to each web site listed in database 2. The web browsing server 4 monitors the response received from each web site and determines whether it is a complete response and whether the response is in the expected format specified by the content descriptor file 46 used by the extraction agent 40. The polled web sites that provide complete responses in the format expected by the extraction agent 40 have their ranking established based on their "response time". That is, web sites with faster response times will be will be assigned higher rankings than those with slower response times. If the web browsing server 4 receives no response from the polled web site or if the response received is not in the expected format, then the rank of that web site is lowered.
Additionally, a warning message or alarm may be generated for the system administrator indicating that the specified web site has been modified or is not responsive and requires further review.
Since the web browsing servers 4 access web sites based upon their ranking number, only those web sites that produce useful and error-free responses will be used by the voice browser system to gather information requested by the user. Further, since the ranking numbers are also based upon the speed of a web site in providing responses, only the most time efficient sites are accessed. This system assures that users will get complete, timely, and relevant responses to their requests. Without this feature, users may be provided with information that is not relevant to their request or may not get any information at all. The constant polling and re-ranking of the web sites used within each category allows the voice browser of the present invention to operate efficiently. Finally, it allows the voice browser system of the present invention to dynamically adapt to changes in the rapidly evolving web sites that exist on the Internet.
It should be noted that the web sites accessible by the voice browser of the preferred embodiment may use any type of mark-up language, including Extensible Markup Language (XML), Wireless Markup Language (WML), Handheld Device Markup Language (HDML), Hyper Text Markup Language (HTML), or any variation of these languages-
A second embodiment of the present invention is depicted in FIG. 5. This embodiment provides a system and method for controlling a variety of devices 50 connected to a network 52 by using conversational speech commands spoken into a voice enabled device 54 (i.e., wireline or wireless telephones, Internet Protocol (IP) phones, or other special wireless units). The networked devices may include various household devices. For instance, voice commands may be used to control household security systems, NCRs, TVs, outdoor or indoor lighting, sprinklers, or heating and air conditioning systems.
Each of these devices 50 is connected to a network 52. These devices 50 may contain embedded microprocessors or may be connected to other computer equipment that allow the device 50 to communicate with network 52. In the preferred embodiment, the devices 50 appear as "web sites" connected to the network 52. This allows a network interface system, such as a device browsing server 56, a database 57, and a user interface system, such as a media server 58, to operate similar to the web browsing server 4, database 2 and media server 8 described in the first preferred embodiment above. A network 52 interfaces with one or more network interface systems, which are shown as device browsing servers 56 in FIG. 5. The device browsing servers perform many of the same functions and operate in much the same way as the web browsing servers 4 discuss above in the first preferred embodiment. The device browsing servers 56 are also connected to a database 57.
Database 57 lists all devices that are connected to the network 52. For each device
50, the database 57 contains a record similar to that shown in FIG. 2. Each record will contain at least a device identifier, which may be in the form of a URL, and a command to
"content extraction agent" contained in the device browsing server 56. Database 57 may also include any other data or software necessary to test and administer the device browsing system.
The content extraction agent operates similarly to that described in the first embodiment. A device descriptor file contains a listing of the options and functions available for each of the devices 50 connected on the network 52. Furthermore, the device descriptor file contains the information necessary to properly communicate with the networked devices 50. Such information would include, for example, communication protocols, message formatting requirements, and required operating parameters.
The device browsing server 56 receives messages from the various networked devices 50, appropriately formats those messages and transmits them to one or more media servers 58 which are part of the device browsing system. The user's voice enabled devices 54 can access the device browsing system by calling into a media server 58 via the Public Switched Telephone Network (PSTN) 59. In the preferred embodiment, the device browsing server is based upon Intel's Dual Pentium III 730 MHz microprocessor system. The media servers 58 act as user interface systems and perform the functions of natural speech recognition, speech synthesis, data processing, and call handling. The media server 58 operates similarly to the media server 8 depicted in FIG. 3. When data is received from the device browser server 56, the media server 58 will convert the data into audio messages via a speech synthesis engine that are then transmitted to the voice enabled device of the user 54. Speech commands received from the voice enabled device of the user 54 are converted into data messages via a speech recognition engine running on the media server 58. A preferred speech recognition engine is developed by Nuance Communications of 1380 Willow Road, Menlo Park, California 94025 (www.nuance.com). A preferred speech synthesis engine is developed by Lernout and Hauspie Speech Products, 52 Third Avenue, Burlington, Massachusetts 01803 (www.lhsl.com). The media servers 58 of the preferred embodiment are based on Intel's Dual Pentium III 730 MHz microprocessor system. A specific example for using the system and method of this embodiment of the invention will now be given.
First, a user may call into a media server 58 by dialing a telephone number associated with an established device browsing system. Once the user is connected, the IUR application of the media server 58 will provide the user with a list of available systems that may be monitored or controlled based upon information contained in database 57.
For example, the user may be provided with the option to select "Home Systems" or "Office Systems". The user may then speak the command "access home systems". The media server 58 would then access the database 57 and provide the user with a listing of the home subsystems or devices 50 available on the network 52 for the user to monitor and control. For instance, the user may be given a listing of subsystems such as "Outdoor Lighting System", "Indoor Lighting System", "Security System", or "Heating and Air Conditioning System". The user may then select the indoor lighting subsystem by speaking the command "Indoor Lighting System". The IUR application would then provide the user with a set of options related to the indoor lighting system. For instance the media server 58 may then provide a listing such as "Dining Room", "Living Room", "Kitchen", or "Bedroom". After selecting the desired room, the IUR application would provide the user with the options to hear the "status" of the lighting in that room or to "turn on", "turn off, or "dim" the lighting in the desired room. These commands are provided by the user by speaking the desired command into the users voice enabled device 54. The media server 58 receives this command and translates it into a data message. This data message is then forwarded to the device browsing server 56 which routes the message to the appropriate device 50. The device browsing system 51 of this embodiment of the present invention also provides the same robustness and reliability features described in the first embodiment. The device browsing system 51 has the ability to detect whether new devices have been added to the system or whether current devices are out-of-service. This robustness is achieved by periodically polling or "pinging" all devices 50 listed in database 57. The device browsing server 56 periodically polls each device 50 and monitors the response. If the device browsing server 56 receives a recognized and expected response from the polled device, then the device is categorized as being recognized and in-service. However, if the device browsing server 56 does not receive a response from the polled device 50 or receives an unexpected response, then the device 50 is marked as being either new or out-of-service. A warning message or a report may then be generated for the user indicating that a new device has been detected or that an existing device is experiencing trouble.
Therefore, this embodiment allows users to remotely monitor and control any devices that are connected to a network, such as devices within a home or office. Furthermore, no special telecommunications equipment is required for users to remotely access the device browser system. Users may use any type of voice enabled device (i.e., wireline or wireless telephones, IP phones, or other wireless units) available to them. Furthermore, a user may perform these functions from anywhere without having to subscribe to additional services. Therefore, no additional expenses are incurred by the user. The third preferred embodiment of the present invention, enables a user to associate information of interest found on a specific information source, such as a web site, with a pronounceable name or identification word. This pronounceable name/identification word forms a recognition grammar in the preferred embodiment. When the user wishes to retrieve the selected information, he may use a telephone or other voice enabled device to access a voice browser system. The user then speaks the pronounceable name or command described within the recognition grammar associated with the desired information. The voice browsing system then accesses the associated information source and returns to the user, using a voice synthesizer, the requested information.
Referring to FIG. 6, a user 60 uses a computer 62 to access a network, such as a WAN, LAN, or the Internet, containing various information sources. In the preferred embodiment, the user 60 access the Internet 12 and begins searching for web sites 16, which are information sources that contain information of interest to the user. When the user 60 identifies a web site 16 containing information the user would like to access using only a voice enabled device, such as a telephone, and the voice browsing system 19, the user initiates a "clipping client" engine 64 on his computer 62.
The clipping client 64 allows a user 60 to create a set of instructions for use by the voice browsing system 19 in order to report personalized information back to the user upon request. The instruction set is created by "clipping" information from the identified web site. A user 60 may be interested in weather for a specific city, such as Chicago. The user 60 identifies a web site from which he would like to obtain the latest Chicago weather information. The clipping client 64 is then activated by the user 60.
The clipping client 64 displays the selected web site in the same manner as a conventional web browser such as Microsoft's® Internet Explorer. FIG. 7 depicts a sample of a web page 70 displayed by the clipping client 64. The user 60 begins creation of the instruction set for retrieving information from the identified web site by selecting the uniform resource locator (URL) address 72 for the web site In the preferred embodiment, this selection is done by highlighting and copying the URL address 72 Next, the user selects the information from the displayed web page that he would like to have retrieved when a request is made Referring to FIG 7, the user would select the information regarding the weather conditions in Chicago 74 The web page 70 may also contain additional information such as advertisements 76 or links to other web sites 78 which are not of interest to the user The clipping client 64 allows the user to select only that portion of the web page containing information of interest to the user Therefore, unless the advertisements 76 and links 78 displayed on the web page are of interest to the user, he would not select this information Based on the web page information 74 selected by the user, the clipping client 64 creates a content descriptor file containing a description of the content of the selected web page This content descriptor file indicates where the information selected by the user is located on the web page In the preferred embodiment, the content descriptor file is stored within the web browsing server 4 shown in FIG 1 The web browsing server 4 has been discussed above in relation to the first preferred embodiment
Table 7 below is an example of a content descriptor file created by the clipping client of the preferred embodiment This content descriptor file relates to obtaining weather information from the web site www cnn com
TABLE 7 table name : portalServices column : service content : weather column: config content: [cnn] Input= zip
URL=http : //cgi . cnn . co / cgi-bin/weather/ redirect ?zip=_zip
Pre-fιlter="\n" " Pre-filter=,r< [Λ<>] +>" " Pre-filter=/\s+/ / Pre-filter=" [\ ( \ ) \ I ] " ! "
Output= location Output=first_day_name
Output=first_day_weather
Output=first_day_high_F
Output=first_day_high_C Output=first_day_low_F
Output=first_day_low_C
Output=second_day_name
Output=second_day_weather
Output=second_day_high_F Output=second_day_high_C
Output=-second_day_low_F
Output=second_day_low_C
Output*=third_day_name
Output=third_day_weather Output=third_day_high_F
Output*=*third_day_high_C
Output=third_day_low_F
Output=third_day_low_C
Output=fourth_day_name Output=fourth_day_weather
Output=fourth_day_high_F
Output=fourth_day_high_C
Output=fourth_day_low_F
Output-fourth_day_low_C Output=undef
Output=_curre t_time
Output=_current_month
Output=_current_day
Output=_current_weather Output=_current_temperature_F
Output=_current_temperature_C
Output=_humidity
Output=_wind
Output=_pressure Output =_sunrise
Output**=_sunset
Regular_expression=WEB SERVICES: (.+) Forecast FOUR-DAY
FORECAST (\S+) (\S+) HI
GH (\S+) F (\S+) C LOW (\S+) F (\S+) C (\S+) (\S+) HIGH (\S+)
F (\S+) C LOW
(\S+
) F (\S+) C (\S+) (\S+) HIGH (\S+) F (\S+) C LOW (\S+) F (\S+1 C (\S+) (\S+)
HIG
H (\S+) F (\S+) C LOW (\S+) F (\S+) C WEATHER MAPS RADAR (.+)
Forecast
CURRENT C ONDITIONS (.+) ! local ! , (\S+) (\S+) (.+) Temp: (\S+) F, (\S+)
C Rel. Humidity : (
\S+) Wind: (.+) Pressure: (.+) Sunrise: (.+) Sunset: (.+)
Finally, the clipping client 64 prompts the user to enter an identification word or phrase that will be associated with the identified web site and information. For example, the user could associate the phrase "Chicago weather" with the selected URL 72 and related weather information 74. The identification word or phrase is stored as a personal recognition grammar that can now be recognized by a speech recognition engine of the voice browsing system 19 which will be discussed below. The personal recognition grammar, URL address 72, and a command for executing a content extraction agent are stored within a database used by the voice browser system 19 which has been discussed above in relation to the first preferred embodiment.
The database 2 of the voice browsing system 19 contains a section that stores the personal recognition grammars and related web site information generated by the clipping client 64. A separate record exists for each web site defined by the user. An example of a user - defined web site record is shown in FIG. 8. Each user-defined web site record 80 contains the recognition grammar 82 assigned by the user, the associated Uniform Resource Locator (URL) 84, and a command that enables the "content extraction agent" 86 and retrieves the appropriate content descriptor file 86 required to generate proper requests to the web site and to properly format received data. The content exaction agent has been described above in relation to the first preferred embodiment and Tables 3 and 4. The web site record 80 also contains the timestamp 88 indicating the last time the web site was accessed.
In the third preferred embodiment, when a user access the voice browsing system 19, he will be prompted if he would like to use his "user-defined searches." If the user answers affirmatively, the media servers 8 will retrieve from the database 2 the personal recognition grammars 82 defined by the user while using the clipping client 64.
Upon receiving a user-defined web site record 80 from the database 2 in response to a user request, the web browsing server 4 invokes the "content extraction agent" command 86 contained in the record 80. The content extraction agent 40 retrieves the content descriptor file 46 associated with the user-defined record 80. As mentioned, the content descriptor file 46 directs the extraction agent where to extract data from the accessed web page and how to format a response to the user utilizing that data. Additionally, the content descriptor file 46 for each supported URL indicates the location on the web page where the response information is provided. The extraction agent 40 uses this information to properly extract from the web page the information requested by the user.
The content extraction agent 40 can also parse the content of a web page in which the user-desired information has changed location or format. This is accomplished based on the characteristic that most hypertext documents include named objects like tables, buttons, and forms that contain textual content of interest to a user. When changes to a web page occur, a named object may be moved within a document, but it still exists. Therefore, the content extraction agent 40 simply searches for the relevant name of desired object. In this way, the information requested by the user may still be found and reported regardless of changes that have occurred.
As mentioned, Table 3 above contains source code for a content extraction agent 40 which may be used by the third preferred embodiment. Further, Table 4 above contains source code of the content fetcher 42 which may be used with the content extraction agent 40 to retrieve information from a web site. Referring to FIG. 1, the operation of the personal voice-based information retrieval system will be described. A user establishes a connection between his voice enabled device 14 and a media server 8 of the voice browsing system 19. This may be done using the Public Switched Telephone Network (PSTN) 18 by calling a telephone number associated with the voice browsing system 19. Once the connection is established, the media server 8 initiates an interactive voice response (IVR) application. The IVR application plays audio message to the user presenting a list of options, which includes "perform a user-defined search." The user selects the option to perform a user-defined search by speaking the name of the option into the voice enabled device 14
The media server 8 then accesses the database 2 and retrieves the personal recognition grammars 82. Using the speech synthesis software engine 32, the media server 8 then asks the user, "Which of the following user-defined searches would you like to perform" and reads to the user the identification name, provided by the recognition grammar 82, of each user-defined search. The user selects the desired search by speaking the appropriate speech command or pronounceable name described within the recognition grammar 82. These speech recognition grammars 82 define the speech commands or pronounceable names spoken by a user in order to perform a user-defined search. If the user has a multitude of user-defined searches, he may speak command described within the recognition grammar 82 of the desired search at anytime without waiting for the media server 8 to list all available user-defined searches. This feature is commonly referred to as a
"barge-in" feature.
The media server 8 uses the speech recognition engine 30 to interpret the speech commands received from the user. Based upon these commands, the media server 8 retrieves the appropriate user-defined web site record 80 from the database 2. This record is then transmitted to a web browsing server 4. A firewall 6 may be provided that separates the web browsing server 4 from the database 2 and media server 8. The firewall provides protection to the media server and database by preventing unauthorized access in the event the firewall 10 for the web browsing server fails or is compromised. Any type of firewall protection technique commonly known to one skilled in the art could be used, including packet filter, proxy server, application gateway, or circuit-level gateway techniques.
The web browsing server 4 accesses the web site 16 specified by the URL 84 in the user-defined web site record 80 and retrieves the user-defined information from that site using the content extraction agent and specified content descriptor file specified in the content extraction agent command 86. Since the web browsing server 4 uses the URL and retrieves new information from the Internet each time a request is made, the requested information is always updated.
The content information received from the responding web site 16 is then processed by the web browsing server 4 according to the associated content descriptor file. This processed response is then transmitted to the media server 8 for conversion into audio messages using either the speech synthesis engine 32 or selecting among a database of prerecorded voice responses contained within the database 2. This message is then transmitted to the user's voice enabled device 14.
As mentioned above, the web sites accessible by the personal information retrieval system and voice browser of the preferred embodiments may use any type of mark-up language, including Extensible Markup Language (XML), Wireless Markup Language (WML), Handheld Device Markup Language (HDML), Hyper Text Markup Language (HTML), or any variation of these languages.
Turning to the fourth preferred embodiment of the present invention, a single communications system is provided that offers a plurality of communication services to users from a single provider. These services are required by users in order to effectively communicate with others and manage personal, as well as business, information. The system provided by the fourth preferred embodiment offers to users the following services for home and business uses: local telephone service, cellular telephone service, long distance service, Internet access service, and a variety of messaging services that include voice mail, facsimile, electronic mail ("email"), and paging. The system provides users with the option to obtain multiple email and voice mail accounts. Further, the system allows users to select either dial-up Internet access service or to broadband Internet access service, which includes Digital Subscriber Line service (DSL) and cable modem service.
Although a user may subscribe to these services from individual providers, a communications system allowing a user to subscribe to each of these services from a single provider enables each service to be integrated together and operate seamlessly to the user. This integration allows each service to easily and efficiently communicate with and transfer data to other services as will be described below. A user also acquires these services from many different companies is subject to incompatibility problems which resulting from the varying products used by the different companies. The system provided by the fourth preferred embodiment eliminates these incompatibility problems since all services are provided by the same company. This ability to eliminate incompatibility problems can improve user interest in the "collectively bundled" group of services made available by the provider. Further, significant cost economies can be realized by a service provider with the ability to offer a "collectively bundled" group of services. These cost economies can improve customer satisfaction with the service provider.
Providing a bundle of services, as described by the preferred embodiment, enables the service provider to lower the unit costs for each service since a user will be subscribing to a several services provided by the same company. Several economic advantages will also be realized by the service provider. First, much of the infrastructure (i.e., hardware and software) required for a service provider to provide the various services can be used for multiple services. Therefore, as the number of different services offered by a service provider increases, the less the amount of expenditures that need to be made for capital improvements. Second, the provider will be able to market a wide variety of services to users, each of which can be offered at a lower per unit cost than competitors. Further, the service provider will be able to actually increase revenues since it has the ability to provide multiple services. Referring to FIG. 9, the communications system 90 of the fourth preferred embodiment may be accessed by users via a voice enabled device 92 (i.e., any type of wireline or wireless telephone, Internet Protocol (IP) phones, or other special wireless units) or a computer 96 using a modem or other communication connection (i.e., Digital Subscriber Line connection, cable modem, LAN, or WAN). A user accesses the communications system 90 via a telephone by calling a toll-free number.
FIG. 10 depicts a control system 100 used with the communications system 90 of the fourth preferred embodiment. The control system 100 functions as a central system that monitors and controls the features, services, and functions of the communications system 90. When a user accesses the control system 100, he is presented with an operating menu that enables the user to control all of the services and features of the communications system 90. The control system 100 provides three methods for a user to access the operating menu and handle his communications. First, the control system contains a speech recognition software engine 102. This speech recognition engine 102 uses phonemes to recognize speech commands. Therefore, the speech recognition engine can recognize naturally spoken speech commands and is speaker-independent; it does not have to be trained to recognize the speech patterns of each individual user. A preferred speech recognition engine is developed by Nuance Communications of 1380 Willow Road, Menlo Park, California 94025 (www.nuance.com). The natural speech recognition grammars (i.e., what a user can say that will be recognized by the speech recognition engine) were developed by Webley Systems.
The control system 100 also contains a speech synthesis software engine 104 that converts text messages into audio messages that may be transmitted to a user. A preferred speech synthesis engine is developed by Lernout and Hauspie Speech Products, 52 Third Avenu, Burlington, Massachusetts 01803 (www.lhsl.com). The control system 100 also contains a call processing system 106 and telephony and voice hardware 108 required to interface the communications system 90 with the Public Switched Telephone Network (PSTN) 94. A user may also access the communication system 90 using the touch-tone signals provided by the users voice enabled device 92, such as a telephone. The third option a user has for accessing the communication system 90 is via a computer 96. Regardless of the method used to access the communications system 90, a user may access and control all of the available features and services of via the operating menu provided by the control system 100. The user's computer 96 may be connected directly with the communication system 90 using a modem and the Public Switched Telephone Network (PSTN) 94, or via the Internet 97.
When an outside caller 98 dials a telephone number associated with the communications system 90, the caller 98 is asked by the control system 100 to state the name of the user attempting to be reached. The voice recognition engine 102 will recognize the particular user name and transfers the caller to the user. The transferring features of the preferred embodiment will be discussed in more detail below.
Referring to FIG. 11, the configuration of the communications system 90 of the fourth preferred embodiment will now be described. The communications system 90 comprises a control system 100 that is a central system connected to a variety of different communication services in a ring-type manner similar to a token ring. The services connected to the control system 100 in the preferred embodiment include a local telephone access service 110, a cellular telephone service 112, a voice mail service 114, a facsimile service 116, an email service 118, an Internet access service 120, a long distance service 122, and a pager service 124. These services may also be sub-divided into "business" related services on "home" related services. That is, the local telephone service may be subdivided into "home local telephone access" and "business local telephone access". It is contemplated that this sub-division of services between home and business can be done for any or all services and one skilled in the art would understand how to do so. Further, the local telephone access service 110 is typically provided as a landline service (i.e., it is a non- wireless service). The local telephone access service 110 may also be provided using voice- over-IP technology. This allows telephone calls to be established using a data network such as the Internet. The control system 100 can communicate with all of the services 110, 112, 114, 116,
118, 120, 122, and 124. Therefore, the control system 100 provides an interface between all of the services 110, 112, 114, 116, 118, 120, 122, and 124. By communicating through the control system 100, all of the services are able to communicate with each other. This enables a call being handled by the cellular telephone service 112 to be transferred to the voice mail service 114. Further, this arrangement allows a call being handled by the local telephone access service 110 to transfer the call to a user's cellular telephone, via the cellular telephone service 112, or to be transferred to the voice mail service 114. This example demonstrates a unique advantage of the preferred embodiment. A single voice mail service 114 can be used to record messages from callers that have called either a user's cellular telephone or wireline telephone (the telephone associated with the local telephone access service). Therefore, a user who subscribes to these three services will be provided with the advantage of being able to retrieve all voice mail messages from one location.
As mentioned, the this embodiment of the present invention allows calls to easily be transferred between services. This feature is very crucial for busy professionals. The control system 100 can transfer calls, email messages, facsimile messages, or voice mail messages between any of the services that are part of the communications system 90. A user can therefore instruct the control system 100 to forward any received email messages to a local facsimile machine specified within the facsimile service 116. Alternatively, facsimile messages received for a user may be subsequently forwarded to a user's email address maintained by the email service 118. The ability of the communications system 90 to provide users with a variety of options for receiving and routing messages and calls are beneficial features for users. These options provide users with an added level of control over how they manage information and communicate with others. The communications system 90 of this embodiment also enables users to transfer telephone calls to other locations for a reduced fee. The service provider may easily monitor the number of call transfers attempted by the user. The service provider may allow a user to transfer a received call to one location free of charge. Any additional transfers would be subject to a fee. For instance, a user may specify that all incoming calls to the communications system 90 for a user should be transferred to a business telephone. This transfer would be done free of charge. However, if the user decides to transfer the call to an additional device, such as a cellular telephone, the user would be charged for this additional transfer.
This method for transferring telephone calls would present a substantial reduction in costs for cellular telephones users who subscribe to the cellular service 112 of the communications system 90. Most cellular service providers charge per minute fees for any call received on a cellular telephone. However, if a user obtains his cellular service from a company providing "collectively bundled" services according to the fourth preferred embodiment, then all calls that are initially transferred to this cellular telephone from the control system 100 would be free. No per minute usage fees would be charged. This would present a dramatic cost reduction for most users who previously possessed cellular service from an outside provider.
The ability of this embodiment of the invention to transfer received telephone calls to any of the services shown in FIG. 11 allows a service provider to provide a "Follow Me" feature. This feature allows users to instruct the communications system 90 to forward received calls to a second telephone number if the user does not answer the call at the first designated telephone number. Users can program the communications system 90 to "follow" the user by sequentially transferring a received call to different locations or services communications devices until the user is contacted. The user can create a list of predetermined contact numbers used by the system 90 in trying to locate the user. This list may include telephone numbers for office, home, cellular telephone, pager and other designated locations. The user may also indicate the order in which the system 90 should call each of the numbers in trying to locate the user. The "Follow Me" feature also logs the originating telephone number used by the user when accessing the communications system 90 to retrieve stored messages or make a telephone call. A user can instruct the system 90 to subsequently use this number to re- contact the user when an incoming call is received for that subscriber. For example, a user may be traveling and have the communications system 90 forward all telephone calls to the hotel room where the user is staying. Further, since the communications system is accessible via the telephone, the user is able to obtain and send messages from the hotel telephone.
When a user receives a call that is transferred from the communications system 90, the system advises the user of the telephone number of the calling party, and/or the callers identity. The system may recognize the caller's identity by comparing the telephone number of the caller with a user's contact list which is stored within a database 126 connected with the control system 100. If the user is already on the telephone when a new call is received, the system will whisper the pending inbound call information to the user, allowing the user the option to take the call, thereby putting the user's current call on hold, or direct the pending inbound call into the user's voice mail system provided by the voice mail service 114.
The integrated nature of the services provided by the fourth preferred embodiment allows a user to either retrieve email messages using an Internet connection to the communication system 90 or retrieve then by having then read to the user over a telephone using the speech synthesis engine 104. The integrated features of the communications system enables users to immediately respond to email messages by another email message, a voice mail message, or placing a call to the originator of the email message. A speech-to- text feature enables users to create email messages using only a telephone. Additionally, the speech recognition feature discussed above allows users to edit, forward, saving or deleting email messages using speech commands and a telephone.
The speech synthesis engine 104 may also be used by users to review by telephone facsimile messages received by the facsimile service 116. Further, the speech-to-text feature * may be used to create facsimile messages by telephone. The user may then issue speech commands, which are recognized by the speech recognition engine 102, to send, edit, forward, or delete the existing or newly created facsimile messages.
The communication system 90 of the of the fourth embodiment also contains a
"notification" feature that enables users to be notified of messages received by the system. For example, a user can program the communications system 90 to notify the user via a pager, using the paging service 124, when an incoming message has been received. This notification can further indicate whether the incoming message is a voice mail, email, or a facsimile message.
As previously mentioned, the communications system 90 of the preferred embodiment enables users to maintain a contact list maintained in a database 126 accessible by the control system 100. This contact list enables users to broadcast email, voice mail, or facsimile messages to groups of contacts. This contact list can be accessed by the user at anytime using either an Internet connection or telephone connection with the communications system 90. The speech recognition engine 102 described above enables users to access and edit the contact list and send messages to contacts by using simple speech commands.
The "collectively bundled" communication system also allows users to retrieve, on demand or at predetermined intervals, selected information from the Internet. A user may establish predefined Internet searches using the Internet access service 106 of the communications system 90. The user can then specify that the Internet access service 106 perform the search using an Internet search engine (e.g., www.yahoo.com). The search can be performed upon receiving a speech command from the user or it may periodically be executed based upon a schedule set by the user. The control system 100 would then notify the user of the search results using the method specified by the user. For example, the user may select to by notified of the search results by email, voice mail, or facsimile- Additionally, the speech synthesis engine 104 may be used to read the search results to the user over a telephone connection.
As mentioned, a user can access the communication system 90 via a computer 96. The system 90 allows a user to access and play voice mail messages (using a downloadable audio player, such as RealPlayer, obtainable from www.real.com), read and send email and facsimile message, and manage the user's contact list using computer connection established through the Internet access service 120 or through a direct dial-up connection using the local telephone access service 110. The descriptions of the preferred embodiments described above are set forth for illustrative purposes and are not intended to limit the present invention in any manner.
Equivalent approaches are intended to be included within the scope of the present invention.
While the present invention has been described with reference to the particular embodiments illustrated, those skilled in the art will recognize that many changes and variations may be made thereto without departing from the spirit and scope of the present invention. These embodiments and obvious variations thereof are contemplated as falling within the scope and spirit of the claimed invention.

Claims

We claim
1 A system for acquiring information from sources on a network, comprising a database containing a list of information sources connected to a network, a rank number assigned to each of said information sources listed in said database, and a network interface system connected with said database, said network interface system accesses one of said information sources having the highest rank number in response to a search command from a user
2 The system of claim 1 wherein said network interface system further comprises a polling mechanism that periodically polls each of said information sources listed in said database and modifies said rank number based upon a response received from a polled information source
3 The system of claim 2 wherein said polling mechanism decreases said rank number of said polled information source if no response is received from said polled information source
4 The system of claim 2 wherein said polling mechanism decreases said rank number of said polled information source if an unexpected response is received from said polled information source. 5. The system of claim 2 wherein said polling mechanism decreases said rank number of said polled information source if a response time of said polled information source is longer than a second response time of a second polled information source 6. The system of claim 1 further comprising: a user interface system, interconnected with said database and said network interface system, that receives said speech command
7 The system of claim 6 wherein said user interface system comprises a media server-
8 The system of claim 1 wherein said information sources are web sites
9 A system for acquiring information from sources on a network, comprising: a user interface system that receives a speech command from a user and converts said speech command into a data message; a database containing a list of information sources connected to a network, a rank number assigned to each of said information sources, a network interface system that receives said data message and accesses one of said information sources having the highest said rank number; and an audio response transmitted to said user from said user interface system, said audio response comprised of information received from one of said information sources accessed by said network interface system.
10. The user interface system of claim 9 further comprising: a speech recognition engine that recognizes naturally spoken speech commands.
11. The speech recognition engine of claim 10 wherein said speech recognition engine recognizes speech commands from said user based on analyzing phonemes.
12. A system for monitoring changes in the design of a web site, comprising: a database containing a list of web sites to be monitored and also containing a content descriptor of each of said web sites; an internet access system for connecting with each of said web sites; a polling mechanism that periodically polls each of said web sites and attempts to read response data provided from a polled web site based upon said content descriptor for said polled web site; and a warning message generated by said polling mechanism if said response data provided by said polled web site cannot be read by said polling mechanism.
13. A system for ranking web sites, comprising: a database containing a list of web sites and also containing a content descriptor of each of said web sites; a rank number assigned to each of said web sites; an internet access system for connecting with each of said web sites; a polling mechanism that periodically polls each of said web sites and attempts to read response data provided from a polled web site based upon said content descriptor for said polled web site; and said polling mechanism modifies said rank number based upon said response data received from said polled web site-
14. The system of claim 13 wherein said polling mechanism decreases said rank number of said polled web site if no response is received from said polled web site-
15. The system of claim 13 wherein said polling mechanism decreases said rank number of said polled web site if an unexpected response is received from said polled web site.
16. The system of claim 13 wherein said polling mechanism decreases said rank number of said polled web site if a response time of said polled web site is longer than a second response time of a second polled web site.
17. A voice browsing system for gathering information from web sites on the Internet, comprising: a media server for receiving a speech command from a user and converting said speech command into a data message; a database containing a list of web sites; a rank number assigned to each of said web sites; a web browsing server that receives said data message from said media server and accesses one of said web sites having the highest said rank number; response data received by said web browsing server from one of said web sites; an audio message generated by said media server representing said response data and being transmitted to said user; and a polling mechanism that periodically polls each of said web sites and modifies said rank number of a polled web site based upon said response data received from said polled web site.
18. A method for acquiring information from sources on a network, comprising the steps of: receiving a command from a user; maintaining a database containing a list of information sources connected to a network; assigning a rank number to each of said information sources; accessing said information source having the highest rank number in response to said command from said user; reporting a response from said information source to said user; and periodically polling each information source listed in said database and modifying said rank number of each information source based upon said response received from said information source.
19. A method for using voice commands to browse Internet web sites, comprising the steps of: maintaining a database containing a list of web sites; assigning a rank number to each of said web sites and storing said rank number in said database; receiving a voice command from a user and converting said command into a data message; providing an internet interface system for receiving said data message and accessing one of said web sites having the highest said rank number; receiving at said internet interface system response data from said web site with the highest rank number; converting said response data into an audio message that is transmitted to said user; and periodically polling each web site listed in said database and modifying said rank number of each of said web sites based upon said response data received from each one of said web sites.
20. A method for acquiring information from sources on a network, comprising the steps of: providing a database containing a list of information sources connected to a network; assigning a rank number to each of said information sources listed in said database; and connecting a network access system with said database; and accessing with said network access system one of said information sources having the highest rank number in response to a search command from a user.
21. The method of claim 20 comprising the further step of: polling each of said information sources listed in said database and modifying said rank number based upon a response received from a polled information source.
22. The method of claim 21 wherein said polling step comprises the further step of: decreasing said rank number of said polled information source if no response is received from said polled information source.
23. The method of claim 21 wherein said polling step comprises the further step of: decreasing said rank number of said polled information source if an unexpected response is received from said polled information source.
24. The method of claim 21 wherein said polling step comprises the further step of: decreasing said rank number of said polled information source if a response time of said polled information source is longer than a second response time of a second polled information source.
25. The method of claim 20 wherein said information sources are web sites.
26. A method for monitoring changes in the design of a web site, comprising the steps of: providing a list of web sites to be monitored; providing a content descriptor of each of said web sites; polling each of said web sites and attempting to read response data provided from a polled web site based upon said content descriptor for said polled web site; and generating a warning message if said response data provided by said polled web site cannot be read.
27. A system for remotely controlling household devices, comprising: a user interface system that receives a speech command from a user; a plurality of household devices connected to a network; and a network interface system connected with said user interface system and said network, said network interface system acceses one of said household devices in response to said speech command in order to control the operation of said one of said household devices.
28. The system of claim 27 further comprising: a polling mechanism that periodically polls each of said household devices and attempts to read response data provided from a polled household device; and a warning message generated by said polling mechanism if said response data provided by said polled household device cannot be read by said polling mechanism.
29. The system of claim 27 wherein said plurality of household devices include security systems, lighting systems, heating and air conditioning systems, TVs, or NCRs.
30. A method for remotely controlling household devices, comprising the steps of: receiving at a user interface system a speech command from a user; providing a plurality of household devices connected to a network; and accessing by a network interface system one of said household devices in response to said speech command in order to control the operation of said one of said household devices.
31. The method of claim 30 comprising the further steps of: polling each of said household devices and attempting to read response data received from a polled household device; and generating a warning message if said response data provided by said polled household device cannot be read.
32. A system for retrieving information from a network, comprising: a user-defined record listed within a database, said record comprising location information about an information source; a recognition grammar assign by a user and associated with said user-defined record; and a network interface system connected with said database, said network interface system accesses said information source to retrieve information in response to a speech command described in said recognition grammar.
33. The system of claim 32 further comprising: a user interface system, interconnected with said database and said network interface system, that receives said speech command described in said recognition grammar and retrieves said user-defined record associated with said recognition grammar.
34. The system of claim 33 wherein said user interface system comprises a media server.
35. The system of claim 32 wherein said network interface system comprises a web browsing server.
36. The system of claim 33 wherein said user interface system further comprises: a speech recognition engine for recognizing said speech command; and a speech synthesis engine for converting a response from said information source into an audio message for transmission to said user.
37. The system of claim 32 wherein said information source is a web site.
38. The system of claim 32 wherein said location information within said user- defined record is a web site address.
39. A system for retrieving information from a network, comprising: an instruction set created by a user for retrieving information from an information source; a database containing said instruction set; a user interface system that receives a speech command from said user and retrieves said instruction set; and a network interface system that accesses said information source and retrieves said information specified by said instruction set.
40. The system of claim 39 further comprising: an audio response transmitted to said user from said user interface system, said audio response comprised of said information retrieved from said information source.
41. The system of claim 39 wherein said instruction set comprises: a recognition grammar assigned by said user to identify said instruction set; a location identifier for said information source; and a content descriptor of said information source, said content descriptor identifying the location of said information to be retrieved for said user.
42. The system of claim 39 further comprising a clipping client for creating said instruction set.
43. The system of claim 41 further comprising a clipping client for creating said instruction set.
44. The system of claim 39 wherein said information source is a web site.
45. A system for retrieving user-defined information from a web site, comprising: an instruction set created by a user, said instruction set identifying information to be retrieved from a web site; a recognition grammar assigned to said instruction set by said user; a database containing said instruction set and said recognition grammar; a user interface system that retrieves said instruction set from said database in response to a speech command from said user, said speech command described in said recognition grammar; a network interface system connected with said user interface system, said network interface system accesses said web site and retrieves said information identified by said instruction set.
46. The system of claim 45 further comprising: an audio response transmitted to said user from said user interface system, said audio response comprised of said information retrieved from said web site.
47. The system of claim 45 wherein said instruction set comprises: a location identifier for said web site; and a content descriptor of said web site, said content descriptor identifying the location of said information to be retrieved for said user.
48. The system of claim 45 further comprising a clipping client for creating said instruction set.
49. The system of claim 47 further comprising a clipping client for creating said instruction set.
50. A method for retrieving information from a network, comprising the steps of: providing a user-defined record within a database, said record identifying information to be retrieved from an information source; assigning a recognition grammar to said user-defined record; and receiving a speech command described in said recognition grammar and accessing said information source with a network interface system to retrieve said information identified in said user-defined record.
51. The method of claim 50 comprising the further steps of: converting said information retrieved from said information source into an audio message; and transmitting said audio message to said user.
52. The method of claim 50 wherein said information source is a web site.
53. A method for retrieving information from a network, comprising the steps of: creating an instruction set for retrieving information from an information source; storing said instruction set in a database; receiving a speech command from a user and retrieving said instruction set; and accessing said information source and retrieving said information specified by said instruction set.
54. The method of claim 53 comprising the further steps of: generating an audio response comprised of said information retrieved from said information source; and transmitting said audio response to said user.
55. The method of claim 53 wherein said step of creating an instruction set comprises the further steps of: providing a recognition grammar assigned by said user to identify said instruction set; providing a location identifier for said information source; and creating a content descriptor of said information source, said content descriptor identifying the location of said information to be retrieved for said user.
56. The method of claim 53 wherein said step of creating an instruction step is completed by a clipping client.
57. The method of claim 53 wherein said information source is a web site.
58 A method for retrieving user-defined information from a web site, comprising the steps of: creating an instruction set identifying information to be retrieved from a web site; assigning a recognition grammar to said instruction set; storing said instruction set and said recognition grammar in a database; retrieving said instruction set from said database in response to a speech command from said user described in said recognition grammar; and accessing said web site and retrieving said information identified by said instruction set.
59. The method of claim 58 comprising the further step of: generating an audio response comprised of said information retrieved from said web site; and transmitting to said user said audio response
60. The method of claim 58 wherein said step of creating an instruction set comprises the further steps of: providing a location identifier for said web site; and creating a content descriptor of said web site, said content descriptor identifying the location of said information to be retrieved for said user.
61. The method of claim 58 wherein said step of creating an instruction step is completed by a clipping client.
62. A method for providing a unified communications system comprising the steps of: providing a plurality of communication services to a user from a single service provider; interfacing each of said plurality of communication services to one another through a central system; and, transferring information received by one of said plurality of communication services to a second one of said communication services using said central system.
63. The method of claim 62 wherein said plurality of communication services comprises local telephone access, cellular telephone access, and messaging services.
64. The method of claim 63 wherein said messaging services comprises voice mail and email services.
65. The method of claim 62 wherein said communication services comprises Internet access, email, local telephone access, and facsimile services.
66. The method of claim 62 wherein said communication services comprises local telephone service, cellular telephone service, Internet access service, email service, voice mail service, pager service, facsimile service, and long distance service.
67. The method of claim 62 comprising the further steps of: receiving speech commands from a user; and transferring information received by one of said plurality of communication services to a second one of said communication service based upon said speech commands.
68. A method for providing a variety of communication services comprising the steps of: providing a plurality of communication services to a user from a single service provider; interfacing each of said plurality of communication services with a central system; and transmitting information from one of said plurality of communication services to a second one of said plurality of communication services using said central system.
69. A method for receiving messages from a variety of communication services comprising the steps of: providing a plurality of communication services to a user from a single service provider; interfacing each of said plurality of communication services with a central system; receiving a message from one or more or said communication services; and transmitting a notification message from said central system to said user indicating that said message has been received.
70. The method of claim 69 wherein said notification message indicates the message type.
71. A method for providing a unified communications system comprising the steps of: providing telephone, Internet, and messaging services to a user from a single service provider; interfacing each of said services with a control system; receiving a message from said telephone or said Internet communication services and storing said message in said messaging services; and retrieving said message using either said telephone or said Internet services.
72. The method of claim 71 wherein said messaging services comprises a voice mail service.
73. The method of claim 71 wherein said messaging services comprises a facsimile service.
74. The method of claim 71 wherein said messaging services comprises an email service.
75. The method of claim 71 comprising the further step of: transmitting a notification message from said control system to said user indicating the type of message that has been received.
76. A method for transferring telephone calls to different locations comprising the steps of: providing landline telephone service to a user from a service provider; providing cellular telephone service to said user also from said service provider; interfacing said landline telephone service and said cellular telephone service with a control system; receiving by said control system a telephone call destined for said user; transferring said telephone call to said landline telephone service; and transferring said outside call to said cellular telephone service if said user fails to answer said outside call from said landline telephone service.
77. A method for providing a unified communications system comprising the steps of: providing telephone, Internet, and messaging services to a user from a single service provider; interfacing each of said services with a single control system; providing an operating menu to said user enabling said user to control all of said services interfaced with said control system; and recognizing speech commands from said user, said speech commands used to control said services according to said operating menu.
78. The method of claim 77 wherein said messaging services comprise a voice mail service, an email service, a pager service, or a facsimile service.
79. The method claim 77 wherein said telephone services comprise cellular telephone service, local telephone access service, long distance service, or voice-over-IP service.
80. The method of claim 77 wherein said step of recognizing speech commands comprises the further step of: performing a phoneme-based analysis of said speech command
81. A method for providing a unified communications system comprising the steps of: providing telephone, Internet, and messaging services to a user from a single service provider; interfacing each of said services with a control system; providing an operating menu to said user enabling said user to control all of said services interfaced with said control system; receiving a message from said telephone or said Internet communication services and transferring said message to said messaging services; receiving a speech command from said user for retrieving said message according to said operating menu.
82. The method of claim 81 wherein said messaging services comprises a voice mail service.
83. The method of claim 81 wherein said messaging services comprises a facsimile service.
84. The method of claim 81 wherein said messaging services comprises an email service.
85. The method of claim 81 comprising the further step of: transmitting a notification message from said control system to said user indicating the type of message that has been received.
PCT/US2001/003742 2000-02-04 2001-02-05 Robust voice and device browser system including unified bundle of telephone and network services WO2001057850A2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
AU2001234833A AU2001234833A1 (en) 2000-02-04 2001-02-05 Robust voice and device browser system including unified bundle of telephone andnetwork services

Applications Claiming Priority (10)

Application Number Priority Date Filing Date Title
US18055800P 2000-02-04 2000-02-04
US18034300P 2000-02-04 2000-02-04
US18034400P 2000-02-04 2000-02-04
US18034500P 2000-02-04 2000-02-04
US60/180,345 2000-02-04
US60/180,344 2000-02-04
US60/180,558 2000-02-04
US60/180,343 2000-02-04
US23306800P 2000-09-15 2000-09-15
US60/233,068 2000-09-15

Publications (2)

Publication Number Publication Date
WO2001057850A2 true WO2001057850A2 (en) 2001-08-09
WO2001057850A3 WO2001057850A3 (en) 2002-05-02

Family

ID=27539032

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2001/003742 WO2001057850A2 (en) 2000-02-04 2001-02-05 Robust voice and device browser system including unified bundle of telephone and network services

Country Status (2)

Country Link
AU (1) AU2001234833A1 (en)
WO (1) WO2001057850A2 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2005109826A1 (en) * 2004-05-04 2005-11-17 Qualcomm Incorporated Method and apparatus for ranking of media services and program packages
WO2016174585A1 (en) * 2015-04-27 2016-11-03 Toonimo Inc. Content adapted multimedia guidance

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4776016A (en) * 1985-11-21 1988-10-04 Position Orientation Systems, Inc. Voice control system
US5086385A (en) * 1989-01-31 1992-02-04 Custom Command Systems Expandable home automation system
US5497373A (en) * 1994-03-22 1996-03-05 Ericsson Messaging Systems Inc. Multi-media interface
US5867494A (en) * 1996-11-18 1999-02-02 Mci Communication Corporation System, method and article of manufacture with integrated video conferencing billing in a communication system architecture
US5867495A (en) * 1996-11-18 1999-02-02 Mci Communications Corporations System, method and article of manufacture for communications utilizing calling, plans in a hybrid network
US5873080A (en) * 1996-09-20 1999-02-16 International Business Machines Corporation Using multiple search engines to search multimedia data
US5890123A (en) * 1995-06-05 1999-03-30 Lucent Technologies, Inc. System and method for voice controlled video screen display
US5915001A (en) * 1996-11-14 1999-06-22 Vois Corporation System and method for providing and using universally accessible voice and speech data files
US5974413A (en) * 1997-07-03 1999-10-26 Activeword Systems, Inc. Semantic user interface
US5999525A (en) * 1996-11-18 1999-12-07 Mci Communications Corporation Method for video telephony over a hybrid network
US6081518A (en) * 1999-06-02 2000-06-27 Anderson Consulting System, method and article of manufacture for cross-location registration in a communication system architecture
US6115742A (en) * 1996-12-11 2000-09-05 At&T Corporation Method and apparatus for secure and auditable metering over a communications network

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4776016A (en) * 1985-11-21 1988-10-04 Position Orientation Systems, Inc. Voice control system
US5086385A (en) * 1989-01-31 1992-02-04 Custom Command Systems Expandable home automation system
US5497373A (en) * 1994-03-22 1996-03-05 Ericsson Messaging Systems Inc. Multi-media interface
US5890123A (en) * 1995-06-05 1999-03-30 Lucent Technologies, Inc. System and method for voice controlled video screen display
US5873080A (en) * 1996-09-20 1999-02-16 International Business Machines Corporation Using multiple search engines to search multimedia data
US5915001A (en) * 1996-11-14 1999-06-22 Vois Corporation System and method for providing and using universally accessible voice and speech data files
US5867494A (en) * 1996-11-18 1999-02-02 Mci Communication Corporation System, method and article of manufacture with integrated video conferencing billing in a communication system architecture
US5867495A (en) * 1996-11-18 1999-02-02 Mci Communications Corporations System, method and article of manufacture for communications utilizing calling, plans in a hybrid network
US5999525A (en) * 1996-11-18 1999-12-07 Mci Communications Corporation Method for video telephony over a hybrid network
US6115742A (en) * 1996-12-11 2000-09-05 At&T Corporation Method and apparatus for secure and auditable metering over a communications network
US5974413A (en) * 1997-07-03 1999-10-26 Activeword Systems, Inc. Semantic user interface
US6081518A (en) * 1999-06-02 2000-06-27 Anderson Consulting System, method and article of manufacture for cross-location registration in a communication system architecture

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2005109826A1 (en) * 2004-05-04 2005-11-17 Qualcomm Incorporated Method and apparatus for ranking of media services and program packages
KR100900008B1 (en) * 2004-05-04 2009-05-29 콸콤 인코포레이티드 Method and apparatus for ranking of media services and program packages
US7830833B2 (en) 2004-05-04 2010-11-09 Qualcomm Incorporated Method and apparatus for ranking of media services and program packages
WO2016174585A1 (en) * 2015-04-27 2016-11-03 Toonimo Inc. Content adapted multimedia guidance
US10564991B2 (en) 2015-04-27 2020-02-18 Toonimo Inc. Content adapted multimedia guidance

Also Published As

Publication number Publication date
AU2001234833A1 (en) 2001-08-14
WO2001057850A3 (en) 2002-05-02

Similar Documents

Publication Publication Date Title
US10096320B1 (en) Acquiring information from sources responsive to naturally-spoken-speech commands provided by a voice-enabled device
US10320981B2 (en) Personal voice-based information retrieval system
US7412260B2 (en) Routing call failures in a location-based services system
US6728731B2 (en) Method and apparatus for accessing targeted, personalized voice/audio web content through wireless devices
US6885734B1 (en) System and method for the creation and automatic deployment of personalized, dynamic and interactive inbound and outbound voice services, with real-time interactive voice database queries
US7522711B1 (en) Delivery of audio driving directions via a telephone interface
US7082397B2 (en) System for and method of creating and browsing a voice web
US7286990B1 (en) Universal interface for voice activated access to multiple information providers
US20020091524A1 (en) Method and system for voice browsing web sites
WO2001057850A2 (en) Robust voice and device browser system including unified bundle of telephone and network services
EP1714224A1 (en) Method and system of bookmarking and retrieving electronic documents
WO2001076212A1 (en) Universal interface for voice activated access to multiple information providers

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A2

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CR CU CZ DE DK DM DZ EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT TZ UA UG UZ VN YU ZA ZW

AL Designated countries for regional patents

Kind code of ref document: A2

Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE TR BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
REG Reference to national code

Ref country code: DE

Ref legal event code: 8642

122 Ep: pct application non-entry in european phase
122 Ep: pct application non-entry in european phase
NENP Non-entry into the national phase in:

Ref country code: JP