EP1248251A2

EP1248251A2 - Method and device for automatically converting text messages to speech messages

Info

Publication number: EP1248251A2
Application number: EP02003909A
Authority: EP
Inventors: Volker Luegger
Original assignee: Siemens AG
Current assignee: Unify GmbH and Co KG
Priority date: 2001-04-06
Filing date: 2002-02-21
Publication date: 2002-10-09
Also published as: DE10117367A1; US20020169610A1; EP1248251A3; DE10117367B4

Abstract

The conversion method has entered text information data (5) converted into synthetic speech information data (6) using a speech profile (3) obtained by analysis of sample speech data entered via a user, to allow the spoken text to approximate the speaking voice of the user. Also included are Independent claims for the following: (a) a system for automatic conversion of text information into speech information; (b) a mobile telephone with a system for automatic conversion of text information into speech information

Description

Die vorliegende Erfindung bezieht sich auf ein Verfahren sowie ein System, das beliebige geschriebene maschinenlesbare Text-Nachrichten, beispielsweise E-Mails oder Fax-Nachrichten, über ein geeignetes akustisches Wiedergabesystem, beispielsweise über ein Mobiltelephon, auf Grundlage eines zuvor generierten Stimmenprofils akustisch ausgibt.The present invention relates to a method as well as a system that can write any machine readable Text messages, such as emails or fax messages, via a suitable acoustic reproduction system, for example, via a cell phone, based on a acoustically outputs previously generated voice profile.

Nach dem Stand der Technik ist es bekannt, in einer Multimedia-Umgebung Inhalte von E-Mails, Fax-Nachrichten oder sonstigen Texten mittels fest vorgegebener synthetisch erzeugter Stimmen auszugeben. Um die Kommunikation in einer Multimedia-Umgebung (in der Literatur wird in diesem Zusammenhang häufig von einem "Unified Message System" gesprochen) so natürlich wie möglich erscheinen zu lassen, ist es von Interesse, die entsprechende Text-Nachricht mit der Stimme des jeweiligen Autors auszugeben.It is known in the prior art to be in a multimedia environment Contents of emails, faxes or other texts using fixed synthetic output generated voices. To communicate in one Multimedia environment (in the literature this is Often related to a "Unified Message System" spoken) to appear as natural as possible, it is of interest to use the appropriate text message the voice of the respective author.

Aus der DE 198 41 683 A1 ist eine Vorrichtung und ein Verfahren zur digitalen Sprachbearbeitung bekannt. Die in eine Sprachausgabe umsetzbaren Wörter sind in einer Tabelle (Wörterbuch) zusammen mit Informationen zu ihrer Aussprache (phonetische Einträge, phonetische Entsprechungen) verzeichnet. Ein Übersetzer erzeugt aus den phonetischen Einträgen der einzelnen Worte eine Sprachnachrichten-Datei, die in einem Editor (Editiereinrichtung) in Form einer Lautschrift angezeigt und bearbeitet werden kann. Zur Bearbeitung werden Parameter (Modifikatoren) hinzugefügt oder verändert. Die Parameter verschiedener Sprechertypen (Mann, Frau, Kind etc.) sind in jeweils einem Sprachprofil (Sprechermodell) zusammengefasst und als Standardmodelle vorgegeben. Durch Anpassung der Parameter gestaltet (editiert) der Benutzer die "Stimme" der anschließenden synthetischen Sprachausgabe bis zum gewünschten qualitativen Zustand.DE 198 41 683 A1 describes a device and a Methods for digital speech processing known. In the Words that can be implemented are in a table (Dictionary) along with information on their pronunciation (phonetic entries, phonetic equivalents) recorded. A translator creates from the phonetic Entries of the individual words a voice message file, which in an editor (editing facility) in the form of a Phonetic transcription can be viewed and edited. to Editing parameters (modifiers) are added or changed. The parameters of different types of speakers (man, Woman, child etc.) are each in a language profile (Speaker model) summarized and as standard models specified. Designed by adjusting the parameters (edited) the user the "voice" of the subsequent synthetic speech to the desired qualitative Status.

Bei dem bekannten Verfahren hat sich als nachteilig erwiesen, dass die erzeugte, natürlichen Stimmen nachempfundene Sprachausgabe meist immer noch künstlich oder fremd klingt und dem Hörer nicht vertraut ist.In the known method, it has proven to be disadvantageous that the generated, natural voices are modeled Narrator still sounds artificial or strange and is not familiar to the listener.

Der vorliegenden Erfindung liegt daher die Aufgabe zugrunde, eine Sprachwiedergabe von maschinenlesbaren Texten mit synthetisch erzeugten Stimmen so zu erreichen, dass ein Befremden beim Hören der erzeugten Stimme vermieden wird.The present invention is therefore based on the object a speech reproduction of machine-readable texts with to achieve synthetically generated voices so that a Alienation when listening to the generated voice is avoided.

Diese Aufgabe wird gemäß der Erfindung durch die Merkmale der unabhängigen Ansprüche gelöst. Die abhängigen Ansprüche bilden den zentralen Gedanken in vorteilhafter Weise weiter.This object is achieved according to the invention by the features of independent claims solved. The dependent claims develop the central idea in an advantageous manner.

Erfindungsgemäß wird vorgeschlagen, dass zur automatischen Umsetzung von Text-Nachrichten in Sprach-Nachrichten eines Benutzers Sprachprobedaten des Benutzers analysiert werden und auf Grundlage dieser Analyse ein Sprachprofil erstellt wird. Auf Grundlage des erstellten Sprachprofiles können beliebige Text-Nachrichtendaten angenähert, also gut wiedererkennbar, mit der Stimme des Benutzers auszugeben werden. Insbesondere ist ein Erkennen des Absenders anhand der Stimme möglich, wenn die Text-Nachrichtendaten den Stimmen ensprechend zugeordnet werden.According to the invention it is proposed that for automatic Conversion of text messages into voice messages The user's voice sample data can be analyzed and created a language profile based on this analysis becomes. Based on the created language profile any text message data approximated, so good recognizable to output with the user's voice become. In particular, the sender is identified on the basis of voice if the text message data matches the Voices can be assigned accordingly.

Das Erstellen des Sprachprofiles kann dabei beispielsweise durch einen Vergleich eines schriftlichen Referenz-Textes mit einem durch akustische Artikulation eines Sprechers erzeugten Referenz-Text erfolgen.For example, creating the language profile by comparing a written reference text with one generated by acoustic articulation of a speaker Reference text are made.

Erfindungsgemäß wird weiterhin ein System zur Umsetzung von Text-Nachrichten in Sprach-Nachrichten beansprucht. Dieses weist einen Sprachanalysator auf, der auf Grundlage einer Analyse von Sprachprobedaten ein Sprachprofil für eingegebene Sprachprobedaten erzeugt. Außerdem enthält dieses System einen Sprachgenerator, der auf Grundlage des Sprachprofils eine beliebige Text-Nachricht in synthetische Sprachprobedaten umsetzt.According to the invention, a system for implementing Text messages in voice messages claimed. This has a speech analyzer based on a Analysis of voice sample data a voice profile for entered Voice sample data generated. This system also includes a speech generator based on the speech profile any text message in synthetic Implements voice sample data.

Weitere Vorteile, Merkmale und Eigenschaften der vorliegenden Erfindung werden im folgenden anhand eines Ausführungsbeispiels unter Bezugnahme auf die begleitende Zeichnung näher erläutert.Other advantages, features and characteristics of the present Invention are based on a Embodiment with reference to the accompanying Drawing explained in more detail.

Die Figur zeigt schematisch eine Technik zur automatischen Umsetzung von Text-Nachrichten in Sprach-Nachrichten.The figure shows schematically a technique for automatic Conversion of text messages into voice messages.

In der Figur ist schematisch ein Verfahren bzw. ein System zur automatischen Umsetzung von Text-Nachrichten in Sprach-Nachrichten dargestellt. Ein von einer beliebigen Person gesprochener Text 1 wird in einem Schritt S1 von einem Analysator 2 analysiert. Dies geschieht in der Regel dadurch, dass die akustischen Signale analog registriert werden und durch einen A/D-Wandler in digitale Sprachdateien umgesetzt werden. Mit einer entsprechenden Software kann in einem Schritt S3 auf Grundlage der erfolgten Analyse der digitalen Sprachdateien ein Sprachprofil 3 dieser Person erzeugt werden. Dabei kann der gesprochene Text 1 ein beliebiger Freitext oder ein Referenztext 8 sein, der in einem Schritt S2 im Rahmen der Analyse mit der schriftlichen Form des Referenztextes 8 verglichen wird.A method or a system is shown schematically in the figure for the automatic conversion of text messages into voice messages shown. One from any person spoken text 1 is replaced by a step S1 Analyzer 2 analyzed. This usually happens because that the acoustic signals are registered analogously and converted into digital voice files by an A / D converter become. With the appropriate software, one can Step S3 based on the analysis of the digital Language files creates a voice profile 3 of this person become. The spoken text 1 can be any Free text or a reference text 8, the one step S2 as part of the analysis with the written form of the Reference text 8 is compared.

Auf Grundlage des Sprachprofils 3 lässt sich im folgenden jede beliebige Text-Nachricht 5 über einen Sprachgenerator 4 in synthetische Sprachnachrichtendaten 6 übersetzen (Schritt S5 und Schritt S6). Die Textnachricht 5 kann anschließend in einem Schritt S7 gemäß dem erstellten Sprachprofil 3 akustisch ausgegeben werden. Based on the language profile 3 can be in the following any text message 5 via a speech generator 4 translate into synthetic voice message data 6 (step S5 and step S6). The text message 5 can then in a step S7 according to the created language profile 3 be output acoustically.

So kann anhand einer Sprachprobe 1 eines Sprechers über das dadurch gewonnene Sprachprofil 3 ein Sprachgenerator 4 für eine synthetisch erzeugte Sprache so eingestellt werden, dass beliebige Texte 5 mit der Stimme dieses Sprechers akustisch ausgegeben werden können. Durch die dadurch mögliche Sprachausgabe mit einer natürlichen und vor allem vertrauten Stimme wird ein Befremden beim Hören der ausgegebenen Sprache vermieden. Natürlich ist es auch denkbar, dass dem Sprachgenerator Sprachproben verschiedener Personen und damit mehrere Sprachprofile zur Verfügung stehen. Damit ist eine Auswahl verschiedener Sprecher möglich.So, based on a speech sample 1 of a speaker about the thus obtained speech profile 3 a speech generator 4 for a synthetically generated language can be set so that any texts 5 with the voice of this speaker acoustically can be spent. Because of the possible Narrator with a natural and above all familiar Voice becomes strange when you hear the speech avoided. Of course, it is also conceivable that Speech generator speech samples of different people and thus multiple language profiles are available. So that's one Different speakers can be selected.

Dies ist insbesondere innerhalb von Multimediaumgebungen von großem Wert, wenn nämlich die Verknüpfung von synthetisch erzeugter Sprache zu Dokumenten des Sprechers automatisch erstellt werden kann. Der Hörer kann den Absender der Nachricht dann anhand der Stimme erkennen, was einer angenehmen Kommunikation mit modernen technischen Mitteln entspricht. Dabei ist es zudem äußerst vorteilhaft, dass die Profilgenerierung für die Ausgabe der Sprache automatisch aus einer beliebigen Sprachprobe innerhalb des Multimediaumfeldes erfolgen kann.This is particularly true within multimedia environments from great value, namely when linking synthetic generated speech to the speaker's documents automatically can be created. The listener can choose the sender of the Then recognize the message by voice, what one pleasant communication with modern technical means equivalent. It is also extremely advantageous that the Profile generation for the output of the language automatically any speech sample within the multimedia environment can be done.

Normalerweise werden innerhalb eines einheitlichen Nachrichten-Systems (Unified Message System) verschiedene Dokumente, wie z.B. Sprachnachrichten (Anrufbeantworter), E-Mails, Faxnachrichten usw., des gleichen Autors verwaltet. Um beispielsweise E-Mails innerhalb dieses Systems z.B. auf einem Mobiltelephon auszugeben, wird der E-Mail-Text erfindungsgemäß in Sprache übersetzt. Vorteilhafterweise kann dabei mit Hilfe einer in demselben System eingegangenen Sprach-Nachricht 1 des gleichen Autors und des daraus generierten Stimmenprofils 3 die E-Mail-Nachricht mit der Stimme dieses Autors ausgegeben werden. Bei entsprechender Vorlage einer Sprachprobe anderer Personen, wie z.B. prominenter Personen, wäre auch eine Wiedergabe der Dokumente mit deren Stimme möglich. Usually within a uniform Message system (Unified Message System) various Documents such as Voice messages (answering machine), emails, Fax messages etc., managed by the same author. Around e.g. emails within this system e.g. on to output a cell phone becomes the email text translated into language according to the invention. Can advantageously thereby with the help of a received in the same system Voice message 1 from the same author and from it generated voice profile 3 the email message with the Voice of this author. With appropriate Submission of a voice sample from other people, e.g. prominent people would also be a reproduction of the documents possible with their voice.

Im zuvor beschriebenen Beispiel sendet also ein Autor einem Empfänger eine E-Mail-Nachricht. Als Zieladresse gibt der Autor die Telefonnummer des Empfängers an. Das verwendete Unified Message System stellt fest, dass als Empfänger kein E-Mail-Anschluß, sondern ein Fernsprechanschluß ausgewählt wurde und setzt daher den eingegebenen Text in eine Sprachnachricht um. Dazu wird ein Sprachprofil verwendet, welches zuvor anhand einer Sprechprobe dieses Autors erstellt worden ist. Damit wird die Stimme der synthetisch erzeugten Sprachausgabe der natürlichen Stimme des Autors soweit angenähert, dass der Empfänger die synthetische Stimme als die vertraute Stimme der sendenden Person erkennt. Das Unified Message System veranlasst nun den Aufbau einer Verbindung zum Fernsprechanschluß des Empfängers und gibt die Sprachnachricht mit der Stimme des Autors aus.In the example described above, an author sends one Recipient an email message. The destination address is the Author the recipient's phone number. The used Unified Message System determines that as a recipient no E-mail connection, but a telephone connection selected was and therefore puts the entered text in a Voice message around. A language profile is used for this, which was previously created based on a speech sample by this author has been. With this, the voice of the synthetically produced So far the natural voice of the author approximated that the recipient uses the synthetic voice as a recognizes the familiar voice of the sending person. The Unified Message System now initiates the construction of a Connection to the telephone line of the receiver and gives the Voice message with the author's voice.

Claims

Method for automatically converting text messages (5) into voice messages (6), with the following steps:

Creation (S3) of a language profile (3) and

Converting (4) input text message data (5) into synthetic voice message data (6) based on the voice profile (3),

characterized in that the speech profile (3) is created after analysis (S1 of speech sample data (1) of a user on the basis of the analysis (S1) carried out, in order to output the text approximately with the user's voice.

Method according to claim 1,
characterized in that the speech profile (3) is created on the basis of a comparison (S2) of reference text data (8) with reference speech sample data (1), the reference speech sample data (1) by acoustic reproduction of the reference text data ( 8) generated by a speaker.

System for converting text messages (5) into voice messages (6),

with a speech analyzer (2), which generates a speech profile (3) for entered speech sample data (1) on the basis of an analysis (S1) of speech sample data (1), and

with a speech generator (4), which converts any text message (5) into synthetic speech sample data (6) based on the speech profile (3).

System according to claim 3,
characterized in that the speech generator (4) is designed to generate the speech profile (3) on the basis of a comparison of a written reference text (8) with the form (1) of this reference text (8) spoken by a user.

System according to claim 3 or 4,
characterized in that in multimedia environments the speech portion of voice messages (1) is automatically analyzed (S1) and used for acoustic reproduction (7) of text messages (5).

Mobile phone, comprising a system according to claim 3, 4 or 5,
characterized in that the text messages (5) are documents in a multimedia environment, for example e-mail texts, which are acoustically output on the mobile phone in the language according to the previously generated language profile (3).