WO1999017235A1 - Method and apparatus for storing and retrieving labeled interval data for multimedia recordings - Google Patents

Method and apparatus for storing and retrieving labeled interval data for multimedia recordings Download PDF

Info

Publication number
WO1999017235A1
WO1999017235A1 PCT/US1998/020446 US9820446W WO9917235A1 WO 1999017235 A1 WO1999017235 A1 WO 1999017235A1 US 9820446 W US9820446 W US 9820446W WO 9917235 A1 WO9917235 A1 WO 9917235A1
Authority
WO
WIPO (PCT)
Prior art keywords
interval
intervals
interval data
labeled
data
Prior art date
Application number
PCT/US1998/020446
Other languages
French (fr)
Inventor
Christopher J. Macey
David M. Weimer
Pierre David Wellner
Original Assignee
At & T Corp.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by At & T Corp. filed Critical At & T Corp.
Priority to JP52049099A priority Critical patent/JP2001511991A/en
Priority to CA002271745A priority patent/CA2271745A1/en
Publication of WO1999017235A1 publication Critical patent/WO1999017235A1/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M3/00Automatic or semi-automatic exchanges
    • H04M3/42Systems providing special services or facilities to subscribers
    • H04M3/56Arrangements for connecting several subscribers to a common circuit, i.e. affording conference facilities
    • H04M3/567Multimedia conference systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/40Information retrieval; Database structures therefor; File system structures therefor of multimedia data, e.g. slideshows comprising image and additional audio data
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M3/00Automatic or semi-automatic exchanges
    • H04M3/42Systems providing special services or facilities to subscribers
    • H04M3/42221Conversation recording systems
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M7/00Arrangements for interconnection between switching centres
    • H04M7/12Arrangements for interconnection between switching centres for working between exchanges having different types of switching equipment, e.g. power-driven and step by step or decimal and non-decimal

Definitions

  • the present invention is directed to storage and retrieval of multimedia data. More particularly, the invention is directed to storage and retrieval of labeled interval data in a database.
  • Digital speech records without being converted into text by speech- to-text conversion or transcription or otherwise parsed cannot be located and/or identified using traditional database query techniques as it is not practical to determine whether a word (or phrase) appears in a selected portion of recorded speech. Therefore, review of non-transcribed digital speech records is frequently limited to listening to the digitally recorded speech until the item or items of interest are heard. Unfortunately, this frequently requires listening to a considerable degree of extraneous or irrelevant speech which can be extremely time-consuming without providing any significant elucidation. Moreover, digital speech records frequently contain lengthy pauses and, if the digital speech record is between more than two speakers, it is frequently difficult, if not impossible, to identify the speakers, further exacerbating the problem of identifying a specific segment in recorded digital speech.
  • One embodiment of the present invention is a teleconference system for digitally recording and playing a conference telephone call that includes a plurality of intervals.
  • the teleconference system includes a skim server that detects a first set of the plurality of intervals and a conference bridge that detects a second set of the plurality of intervals during the conference call.
  • An interval database server generates labeled interval data for all detected intervals and stores the labeled interval data in a database.
  • the labeled interval data includes an interval data element that defines each interval. After the conference call is recorded, the labeled interval data can be searched and retrieved based on assorted criteria. Portions of the recorded conference call associated with the retrieved labeled interval data can also be retrieved and played back.
  • a user interface is generated.
  • the user interface displays the stored labeled interval data. A user can easily select or skip to desired portions of the conference call by selecting portions of the user interface.
  • FIG. 1 illustrates a teleconference system in accordance with one embodiment of the present invention.
  • Fig. 2 illustrates the format of an interval data element that forms the labeled interval data associated with a recorded conference.
  • Fig. 3 illustrates a conference playback document in accordance with one embodiment of the present invention.
  • Fig. 4 illustrates in detail how overlapping intervals are displayed.
  • intervals within recorded digital speech or other multimedia data are specifically identified and labeled.
  • the labeled interval data provides a mechanism by which a user can specifically identity an interval within digitally recorded multimedia, and having identified that interval, retrieve it and other intervals sharing desired characteristics.
  • Fig. 1 illustrates a teleconference system in accordance with one embodiment of the present invention.
  • Teleconference system 200 records and stores a teleconference call and associated labeled interval data.
  • Teleconference system 200 f rther allows a recorded teleconference to be played back using the stored labeled interval data.
  • the main components of teleconference system 200 are a conference recorder 110, a skim server 55, an interval database (“IDB”) server 65, and a Java user interface 85.
  • IDB interval database
  • a plurality of telephones 31, 32, and 33 are interconnected through the public switched telephone network
  • PSTN 40 One or more individuals may participate in a teleconference through each telephone 31-33.
  • the participants may be identified by the telephone they are calling from or, alternatively, by voice recognition or other forms of identification during the teleconference.
  • a teleconference may be initiated by a conference host accessing a
  • a WebRoom interface on a WebRooms server 50.
  • a WebRoom interface provides a mechanism by which participants may be actively added to and/or deleted from a teleconference.
  • the WebRoom interface for all teleconference participants is implemented as Common Gateway Interface ("CGI") program 60 on an HyperText Transport
  • Hyper-Text Markup Language HTML
  • the HTML documents are accessible as conference pages 80 through a Web browser 90 such as Netscape ® Navigator or Internet Explorer ® .
  • a depot in teleconference system 200 can be a structured query language (“SQL”) database 35 coupled to an Open DataBase Connectivity (“ODBC”) interface 36.
  • SQL structured query language
  • ODBC Open DataBase Connectivity
  • conference bridges 100 While the conference is running, conference bridges 100 detects call control events (e.g., which participant is talking, new participants being added, etc.) and sends these events through WebRooms server 50 and conference recorder 110 into the new depot (i.e., SQL database 35). Meanwhile, skim server 55 detects pauses in speech and adds these events as well to the depot.
  • the events detected by both conference bridges 100 and skim server 65 are referred to as "intervals".
  • the user brings up a Java user interface 85 to select a recording accessed via IDB server 65.
  • the user interface 85 retrieves labeled interval data for the recording and uses them to display a visual time-line of events.
  • the user enters a phone number that is passed to Skim Server 55 so it can call the user's telephone for conference playback through
  • Java user interface 85 continuously updates the graphical display and controls how the recording is played using skim server 55. All clients like Java user interface 85 and conference recorder 110 communicate with skim server 55 and IDB server 65 through a CORBA application programming interface in one embodiment of the present invention.
  • CORBA was chosen because it allows a simple interface between programs written in different languages running on different platforms. Both servers 50 and 55 and conference recorder 110 are written in C+ + and run on Sun Solaris platforms in one embodiment of the present invention.
  • Skim server 55 performs the following functions: 1. Records audio from telephone line to file. 2. Detects speech events while recording and posts them to the database.
  • skim server 55 is based on the same type of hardware as standard voice mail servers, and it performs many of the same functions.
  • One difference between skim server 55 and a more traditional voice mail server is that it processes speech events and posts them to IDB server 65, and also that it provides fine control over what parts of the audio file are played and what parts are skipped.
  • IDB server 65 One function of IDB server 65 is to store and retrieve labeled interval data associated with a recorded conference. This is data that describes properties about specific intervals within the speech, such as who is talking, pauses in speech, telephone call control data, etc. This can be further extended to applications that require intervals that mark video scene changes, or relate automatic speech recognition output to a recording.
  • the labeled interval data can be created, stored, and retrieved by a number of different applications. Some are automatically derived from raw speech data, some are side effects of user activity, and others may be entered manually at record time or at playtime.
  • Fig. 2 illustrates the format of an interval data element 130 that forms the labeled interval data associated with a recorded conference. Every interval during the recorded conference will be associated with an interval data element 130.
  • each interval data element 130 includes the following: 1. Recording ID or Depot 122: Refers to the recording that is associated with the interval and the collection point where the recording is stored.
  • Start time 123 Applications need both absolute time and time relative to recording start time. Relative time is more compact, and it is easy to convert to absolute as long as an absolute start time is stored with the recording.
  • Type A code to identify the meaning of this interval. Is it a pause in speech, a scene change, etc.?
  • Type-specific data values 126 Depending on the type, this data could be a string of text, a number, a URL, etc.
  • Labeled interval data must be able to be stored, retrieved, and manipulated more than one at a time. Some applications will deal with large collections of intervals that share everything except start time and end time (e.g., all times when a specific person was speaking).
  • NTP NTP
  • the present invention provides for logical/set operations. For example, assume a user wants to see and/or hear only the parts of a recording when person A or person B was talking, and wants to leave all the pauses out. This can be expressed by making three queries: intervals when A was speaking (set A), intervals when B was speaking (set B), and pause intervals (set P). The desired set can be expressed as "A union B less P" , or if these sets are thought of as long bit masks, then they can be described as logical operations: (A B) & ( P).
  • IDB server 65 provides support for "fuzzy" intervals.
  • IDB server 65 uses binary intervals along with a probability value in the type-specific numeric data field to achieve a similar effect as fuzzy intervals, but without fuzzy logical operations.
  • Transcriptions can be stored as interval data, perhaps one sentence per interval, or one word per interval depending on how fine a mapping is desired between words and time.
  • the transcriptions may be produced from close caption text, higher quality off-line transcriptions, or a lower quality automatic speech recognition system.
  • Teleconference system 200 provides playback of recorded conferences using conference playback documents.
  • the system utilizes stored labeled interval data associated with the conference.
  • Fig. 3 illustrates a conference playback document 300 in accordance with one embodiment of the present invention.
  • Conference playback document 300 is implemented as a Java applet through Java user interface 85 of Fig. 1. It uses a visual structuring of the recording as a series of color-coded intervals (e.g., intervals 305 and 310) plotted on a horizontal time axis in an area referred to as a time-line window 315.
  • Fig. 4 illustrates in detail how overlapping intervals are displayed. As shown in Fig. 4, by plotting each interval type one at a time, starting with taller bars, the document displays overlapping intervals on the same line. Referring again to Fig. 3, intervals that are not associated with an individual person are plotted separately above the participants, (e.g., hyperlinks 330, speech segments, etc.). Time-line window 315 provides a snapshot of every participants' activity, and can be used to navigate through the recording.
  • a tool bar 350 below the time-line to begin playing the audio and adjust the skimming parameters.
  • a separate phone connection is not necessary because the audio conference recording can be "streamed" in conjunction with conference playback document 300.
  • Toolbar 350 provides five buttons to control the player: "goto beginning 351", “jump back 352", “stop 353", “play 354", and "jump forward 355". It also contains a slider 356 for adjusting the playing speed (0.7x, l.Ox, 1.3x, 1.7x, and 2. Ox), a zoom menu 357 for selecting the zoom factor (none, 20min., lOmin., and 5min.), and an on/off pause button 358 for pause removal.
  • a slider 356 for adjusting the playing speed (0.7x, l.Ox, 1.3x, 1.7x, and 2. Ox)
  • a zoom menu 357 for selecting the zoom factor (none, 20min., lOmin., and 5min.)
  • an on/off pause button 358 for pause removal.
  • a vertical red needle 360 moves across the time-line.
  • every participant's name tag is colored to reflect that person's state at that time in the meeting.
  • the visual structures help make some details of the call immediately obvious. For example, the number and span of the light colored bars can identify the most/least dominant talkers. The initial long uninterrupted talking bands show who gave the formal presentations.
  • the zooming feature allows the user to narrow the duration displayed in the time-line window.
  • a numbered scroll bar allows the user to register the zoomed-in portion with the full duration, and scroll using mouse clicks or arrow keys on the keyboard. Scrolling is independent of player location needle 360, so the user can separately glance at regions, without disrupting listening.
  • Player needle 360 can be moved by clicking on the time-line, or by pressing a jump forward/backward button. When this happens, the skim server plays a short non-speech audio cue and begins to play at the new location.
  • Clicking the time-line near the top is used to select hyperlinks (e.g., link 330) rather than to move the needle.
  • hyperlinks e.g., link 330
  • a dialog displays all the links in the recording. This dialog can be used to visit a link, edit a link, or create a link both in and out of the time-line.
  • One embodiment of the present invention supports the following types of links: annotations, audio, documents, images, and general URL. All links are implemented using URLs except annotations, which store textual content as interval data.
  • Each type of link is displayed on the time-line with a representative icon.
  • Hyperlinks into and out of the time-line are stored as intervals, and contain both a beginning and ending time offset.
  • a link can refer to a particular point or region of the time-line, allowing a rich set of skimming alternatives. For example, following a link can cause play to begin at a certain point, end at a certain point, or sequence through selected regions. This means that following a link can have multiple effects, including moving the player needle and changing the document page.
  • one embodiment of the present invention is a teleconference recorder and player.
  • an interval database stores labeled interval data associated with the conference.
  • the labeled interval data allows searching and retrieving of the recorded conference, and facilitates playback of the recorded conference.
  • the embodiments disclosed are implemented over the Internet, the present invention can be implemented using a private network, or using any other known or future data communication methods.

Abstract

A teleconference system (200) is disclosed for digitally recording and playing a conference telephone call that includes a plurality of intervals. The teleconference system includes a skim server (55) that detects a first set of the plurality of intervals and a conference bridge (100) that detects a second set of the plurality of intervals during the conference call. An interval database server (65) generates labeled interval data for all detected intervals and stores the labeled interval data in a database. The labeled interval data includes an interval data element that defines each interval. After the conference call is recorded, the labeled interval data can be searched and retrieved based on assorted criteria. Portions of the recorded conference call associated with the retrieved labeled interval data can also be retrieved and played back. This facilitates easy retrieval and playback of desired portions or a recorded conference call. Further, during playback of the conference call (85), a user interface is generated. The user interface displays the stored labeled interval data. A user can easily select or skip to desired portions of the conference call by selecting portions of the user interface.

Description

METHOD AND APPARATUS FOR STORING AND RETRIEVING LABELED INTERVAL DATA FOR MULTIMEDIA RECORDINGS
FIELD OF THE INVENTION
The present invention is directed to storage and retrieval of multimedia data. More particularly, the invention is directed to storage and retrieval of labeled interval data in a database.
BACKGROUND OF THE INVENTION
Unlike records of written communications, records of speech communication are rarely recorded, let alone stored, even though storage of digital speech may be readily achieved. It is presently feasible to store gigabytes and even terabytes of digitally recorded speech or other types of multimedia information (e.g., video). Other than for archival purposes, there is no practical reason for storing such data without having a mechanism by which a user can identify and retrieve only those portions of the stored data, which may be of interest.
The difficulty inherent in searching and retrieving digital speech records stored in a database stems from the traditional approaches to querying a database to locate particular records. Most database queries are logical queries based upon the presence or absence of specified characteristics in the records being searched. Boolean logic and fuzzy logic have been used to increase the utility of database queries, but these techniques merely extend the fundamental basis of most typical database queries, whether one or more terms, indices, or other identifying characteristics are present (or absent) in the records being searched.
Digital speech records, without being converted into text by speech- to-text conversion or transcription or otherwise parsed cannot be located and/or identified using traditional database query techniques as it is not practical to determine whether a word (or phrase) appears in a selected portion of recorded speech. Therefore, review of non-transcribed digital speech records is frequently limited to listening to the digitally recorded speech until the item or items of interest are heard. Unfortunately, this frequently requires listening to a considerable degree of extraneous or irrelevant speech which can be extremely time-consuming without providing any significant elucidation. Moreover, digital speech records frequently contain lengthy pauses and, if the digital speech record is between more than two speakers, it is frequently difficult, if not impossible, to identify the speakers, further exacerbating the problem of identifying a specific segment in recorded digital speech.
Even when a digital speech record is divided into separate digital recordings, and each recording is individually accessible and identified, the digitally recorded data is of limited use. For example, if ten conference calls were recorded in a digital storage medium, a user might be able to locate a particular conference call on a particular date, if the user were fortunate enough to know that the information he or she sought was in that specific conference call. Even, so, the user would still have to listen to the entire recording of the conference call. For a user seeking to identity a specific comment made by a specific participant to the conference call, it is extremely inefficient for the user to have to listen to the entire conference call. Moreover, if the user does not know the specific date and time of the conference call in which the person spoke, the user might have to listen to several conference call recordings before finding the desired information. Clearly, as soon as a greater than minimal number of recordings were stored, it becomes impractical for a user to locate desired information merely by listening to the conference call recordings.
Based on the foregoing, there is a need for a method and apparatus for readily identifying, locating, and retrieving stored digital speech and other digital multimedia records.
SUMMARY OF THE INVENTION
One embodiment of the present invention is a teleconference system for digitally recording and playing a conference telephone call that includes a plurality of intervals. The teleconference system includes a skim server that detects a first set of the plurality of intervals and a conference bridge that detects a second set of the plurality of intervals during the conference call. An interval database server generates labeled interval data for all detected intervals and stores the labeled interval data in a database. The labeled interval data includes an interval data element that defines each interval. After the conference call is recorded, the labeled interval data can be searched and retrieved based on assorted criteria. Portions of the recorded conference call associated with the retrieved labeled interval data can also be retrieved and played back. This facilitates easy retrieval and playback of desired portions of a recorded conference call. Further, during playback of the conference call, a user interface is generated. The user interface displays the stored labeled interval data. A user can easily select or skip to desired portions of the conference call by selecting portions of the user interface.
BRIEF DESCRIPTION OF THE DRAWINGS Fig. 1 illustrates a teleconference system in accordance with one embodiment of the present invention.
Fig. 2 illustrates the format of an interval data element that forms the labeled interval data associated with a recorded conference.
Fig. 3 illustrates a conference playback document in accordance with one embodiment of the present invention.
Fig. 4 illustrates in detail how overlapping intervals are displayed.
DETAILED DESCRIPTION In one embodiment of the present invention, intervals within recorded digital speech or other multimedia data are specifically identified and labeled. The labeled interval data provides a mechanism by which a user can specifically identity an interval within digitally recorded multimedia, and having identified that interval, retrieve it and other intervals sharing desired characteristics.
Fig. 1 illustrates a teleconference system in accordance with one embodiment of the present invention. Teleconference system 200 records and stores a teleconference call and associated labeled interval data. Teleconference system 200 f rther allows a recorded teleconference to be played back using the stored labeled interval data. The main components of teleconference system 200 are a conference recorder 110, a skim server 55, an interval database ("IDB") server 65, and a Java user interface 85.
In teleconference system 200, a plurality of telephones 31, 32, and 33 are interconnected through the public switched telephone network
("PSTN") 40. One or more individuals may participate in a teleconference through each telephone 31-33. The participants may be identified by the telephone they are calling from or, alternatively, by voice recognition or other forms of identification during the teleconference. A teleconference may be initiated by a conference host accessing a
WebRoom interface on a WebRooms server 50. A WebRoom interface provides a mechanism by which participants may be actively added to and/or deleted from a teleconference. In one embodiment, the WebRoom interface for all teleconference participants is implemented as Common Gateway Interface ("CGI") program 60 on an HyperText Transport
Protocol Web Server ("Httpd") 70 that provides interactive control of the teleconference through Hyper-Text Markup Language ("HTML") documents. The HTML documents are accessible as conference pages 80 through a Web browser 90 such as Netscape® Navigator or Internet Explorer®.
At record time, the conference host uses WebRooms server 50 to dial a conference scribe. The conference scribe acts as an additional participant to the teleconference. At the same time, conference recorder 110 tells IDB Server 65 to create a new collection point, referred to as a "depot" for storing all data related to this particular recording, and it tells skim server 55 to begin recording an audio file using, for example, a Dialogic board 57 from Dialogic Corp., or its equivalent. A depot in teleconference system 200 can be a structured query language ("SQL") database 35 coupled to an Open DataBase Connectivity ("ODBC") interface 36. While the conference is running, conference bridges 100 detects call control events (e.g., which participant is talking, new participants being added, etc.) and sends these events through WebRooms server 50 and conference recorder 110 into the new depot (i.e., SQL database 35). Meanwhile, skim server 55 detects pauses in speech and adds these events as well to the depot. The events detected by both conference bridges 100 and skim server 65 are referred to as "intervals". When playing back a recorded conference on teleconference system
200, the user brings up a Java user interface 85 to select a recording accessed via IDB server 65. The user interface 85 retrieves labeled interval data for the recording and uses them to display a visual time-line of events. The user enters a phone number that is passed to Skim Server 55 so it can call the user's telephone for conference playback through
Dialogic board 57. As the audio plays on the user's phone, Java user interface 85 continuously updates the graphical display and controls how the recording is played using skim server 55. All clients like Java user interface 85 and conference recorder 110 communicate with skim server 55 and IDB server 65 through a CORBA application programming interface in one embodiment of the present invention. CORBA was chosen because it allows a simple interface between programs written in different languages running on different platforms. Both servers 50 and 55 and conference recorder 110 are written in C+ + and run on Sun Solaris platforms in one embodiment of the present invention.
Skim server 55 performs the following functions: 1. Records audio from telephone line to file. 2. Detects speech events while recording and posts them to the database.
3. Plays from file to telephone line
- from any point in recording - in variable speeds
- with pauses removed or not.
In one embodiment, skim server 55 is based on the same type of hardware as standard voice mail servers, and it performs many of the same functions. One difference between skim server 55 and a more traditional voice mail server is that it processes speech events and posts them to IDB server 65, and also that it provides fine control over what parts of the audio file are played and what parts are skipped.
One function of IDB server 65 is to store and retrieve labeled interval data associated with a recorded conference. This is data that describes properties about specific intervals within the speech, such as who is talking, pauses in speech, telephone call control data, etc. This can be further extended to applications that require intervals that mark video scene changes, or relate automatic speech recognition output to a recording. The labeled interval data can be created, stored, and retrieved by a number of different applications. Some are automatically derived from raw speech data, some are side effects of user activity, and others may be entered manually at record time or at playtime.
Fig. 2 illustrates the format of an interval data element 130 that forms the labeled interval data associated with a recorded conference. Every interval during the recorded conference will be associated with an interval data element 130. In one embodiment, each interval data element 130 includes the following: 1. Recording ID or Depot 122: Refers to the recording that is associated with the interval and the collection point where the recording is stored.
2. Start time 123: Applications need both absolute time and time relative to recording start time. Relative time is more compact, and it is easy to convert to absolute as long as an absolute start time is stored with the recording.
3. Duration or end time 124.
4. Type: A code to identify the meaning of this interval. Is it a pause in speech, a scene change, etc.?
5. Type-specific data values 126: Depending on the type, this data could be a string of text, a number, a URL, etc.
Labeled interval data must be able to be stored, retrieved, and manipulated more than one at a time. Some applications will deal with large collections of intervals that share everything except start time and end time (e.g., all times when a specific person was speaking).
Applications must be able to store interval data in the database at any time: before recording has begun, during recording, and after. For example, for a teleconference it may be necessary to record caller-id and ringing events before the call, record who is speaking during the call, and make annotations about the call afterwards. Some applications need to display incomplete interval data while a recording is in progress (e.g., catch up to live conference), so it should be possible to post an interval that has started but not ended yet, and post the end time later. It should also be possible to adjust interval data, for example to realign them with other data. All applications that post events to IDB server 65 must specify precise millisecond offsets for start and end times of each interval. All offsets are from an absolute start-time for the recording. Posting intervals from different machines in real-time requires all clients that are posting events have synchronized clocks, so standard network time protocol
("NTP") software is run on all of these machines.
Browse, search, and playback applications need to query and display subsets of interval data. Examples of queries that can be supported by the present invention include: • All interval data for a specific recording, sorted by time and type.
• All intervals of a specific type with specific values, or values within a particular range.
• Intervals within an absolute or relative time range. • Intervals of a specific duration.
The present invention provides for logical/set operations. For example, assume a user wants to see and/or hear only the parts of a recording when person A or person B was talking, and wants to leave all the pauses out. This can be expressed by making three queries: intervals when A was speaking (set A), intervals when B was speaking (set B), and pause intervals (set P). The desired set can be expressed as "A union B less P" , or if these sets are thought of as long bit masks, then they can be described as logical operations: (A B) & ( P).
Some types of intervals may not have clear start and end times. Instead of a binary on/off state at each time increment, some data has an associated probability curve over time because the exact times of the events are not certain. Output from automatic speech recognition (e.g., phoneme lattices) can include several overlapping hypotheses about what words are being said at any given moment. In one embodiment of the present invention, IDB server 65 provides support for "fuzzy" intervals. In another embodiment, IDB server 65 uses binary intervals along with a probability value in the type-specific numeric data field to achieve a similar effect as fuzzy intervals, but without fuzzy logical operations.
Transcriptions can be stored as interval data, perhaps one sentence per interval, or one word per interval depending on how fine a mapping is desired between words and time. The transcriptions may be produced from close caption text, higher quality off-line transcriptions, or a lower quality automatic speech recognition system.
Teleconference system 200 provides playback of recorded conferences using conference playback documents. The system utilizes stored labeled interval data associated with the conference. Fig. 3 illustrates a conference playback document 300 in accordance with one embodiment of the present invention. Conference playback document 300 is implemented as a Java applet through Java user interface 85 of Fig. 1. It uses a visual structuring of the recording as a series of color-coded intervals (e.g., intervals 305 and 310) plotted on a horizontal time axis in an area referred to as a time-line window 315. Each participant in a call
(e.g., participants 316-320) is allocated a separate time-line for graphically depicting all labeled intervals that are associated with that person (e.g., dialing, connected, muted, talking, etc.).
Fig. 4 illustrates in detail how overlapping intervals are displayed. As shown in Fig. 4, by plotting each interval type one at a time, starting with taller bars, the document displays overlapping intervals on the same line. Referring again to Fig. 3, intervals that are not associated with an individual person are plotted separately above the participants, (e.g., hyperlinks 330, speech segments, etc.). Time-line window 315 provides a snapshot of every participants' activity, and can be used to navigate through the recording.
In one embodiment, once users have established a phone connection to the recorded conference player, they can use a tool bar 350 below the time-line to begin playing the audio and adjust the skimming parameters. In another embodiment, a separate phone connection is not necessary because the audio conference recording can be "streamed" in conjunction with conference playback document 300.
Toolbar 350 provides five buttons to control the player: "goto beginning 351", "jump back 352", "stop 353", "play 354", and "jump forward 355". It also contains a slider 356 for adjusting the playing speed (0.7x, l.Ox, 1.3x, 1.7x, and 2. Ox), a zoom menu 357 for selecting the zoom factor (none, 20min., lOmin., and 5min.), and an on/off pause button 358 for pause removal.
As the recorded conference audio plays, a vertical red needle 360 moves across the time-line. When needle 360 moves, every participant's name tag is colored to reflect that person's state at that time in the meeting.
Fig. 3 shows a one hour conference with the entire duration visible (zoom = none). In this view, the visual structures help make some details of the call immediately obvious. For example, the number and span of the light colored bars can identify the most/least dominant talkers. The initial long uninterrupted talking bands show who gave the formal presentations.
Finally the point where the question and answer session began is visible roughly half way into the call, where many short talking intervals are scattered among many participants. More detailed information must be found by either listening to the audio or by searching through linked annotations, images, and other documents.
The zooming feature allows the user to narrow the duration displayed in the time-line window. A numbered scroll bar allows the user to register the zoomed-in portion with the full duration, and scroll using mouse clicks or arrow keys on the keyboard. Scrolling is independent of player location needle 360, so the user can separately glance at regions, without disrupting listening. Player needle 360 can be moved by clicking on the time-line, or by pressing a jump forward/backward button. When this happens, the skim server plays a short non-speech audio cue and begins to play at the new location.
Clicking the time-line near the top is used to select hyperlinks (e.g., link 330) rather than to move the needle. When a link is selected, or a "links" button 340 is pressed, a dialog displays all the links in the recording. This dialog can be used to visit a link, edit a link, or create a link both in and out of the time-line. One embodiment of the present invention supports the following types of links: annotations, audio, documents, images, and general URL. All links are implemented using URLs except annotations, which store textual content as interval data.
Each type of link is displayed on the time-line with a representative icon.
Hyperlinks into and out of the time-line are stored as intervals, and contain both a beginning and ending time offset. Thus a link can refer to a particular point or region of the time-line, allowing a rich set of skimming alternatives. For example, following a link can cause play to begin at a certain point, end at a certain point, or sequence through selected regions. This means that following a link can have multiple effects, including moving the player needle and changing the document page.
As disclosed, one embodiment of the present invention is a teleconference recorder and player. When a conference is recorded, an interval database stores labeled interval data associated with the conference. The labeled interval data allows searching and retrieving of the recorded conference, and facilitates playback of the recorded conference.
Several embodiments of the present invention are specifically illustrated and/or described herein. However, it will be appreciated that modifications and variations of the present invention are covered by the above teachings and within the purview of the appended claims without departing from the spirit and intended scope of the invention.
For example, although the embodiments disclosed are implemented over the Internet, the present invention can be implemented using a private network, or using any other known or future data communication methods.

Claims

WHAT IS CLAIMED IS: 1. A system for recording and playing multimedia data that includes a plurality of intervals, said system comprising: a skim server that detects a first set of the plurality of intervals; an interval database server coupled to said skim server, said interval database server generating labeled interval data for the first set of the plurality of intervals detected by said skim server; and a database coupled to said interval database server and storing said labeled interval data; wherein said labeled interval data comprises an interval data element for each of the detected plurality of intervals. 2. The system of claim 1, further comprising: a conference bridge coupled to said interval database server that detects a second set of the plurality of intervals; wherein said interval database server further generates labeled interval data for the second set of the plurality of intervals detected by said skim server. 3. The system of claim 2, wherein said first set of the plurality of intervals comprise pauses in speech. 4. The system of claim 2, wherein said second set of the plurality of intervals comprise call control events. 5. The system of claim 1, wherein the multimedia data comprises a conference telephone call. 6. The system of claim 1, wherein said interval data element comprises: a type of the detected interval; a start time of the detected interval; and a duration of the detected interval. 7. The system of claim 6, wherein said interval data element further comprises: a recording identification of the detected interval; and a type-specific data value of the detected interval. 8. The system of claim 1, wherein said interval database server comprises: means for searching said stored labeled interval data. 9. The system of claim 8, wherein said interval database server further comprises: means for retrieving said stored labeled interval data and associated multimedia data. 10. The system of claim 5, further comprising: a user interface generated during playback of the conference call, wherein said user interface displays the stored labeled interval data. 11. A method for recording and playing multimedia data that includes a plurality of intervals, said method comprising: detecting the plurality of intervals; generating labeled interval data for the plurality of intervals; and storing the labeled interval data in a database; wherein said labeled interval data comprises an interval data element associated with each of the plurality of intervals. 12. The method of claim 11, wherein said interval data element comprises: a type of the associated interval; a start time of the associated interval; and a duration of the associated interval. 13. The method of claim 12, wherein said interval data element further comprises: a recording identification of the associated interval; and a type-specific data value of the associated interval. 14. The method of claim 11, further comprising: storing the multimedia data in the database. 15. The method of claim 14, further comprising: querying said database based on one or more labeled interval data parameters; and retrieving at least one interval data element and associated multimedia data from the database. 16. The method of claim 11, wherein the multimedia data comprises a conference telephone call. 17. The method of claim 16, further comprising: generating a user interface that displays the labeled interval data; and playing the conference call based on selections of the user interface. 18. A method of recording and playing a teleconference telephone call, said method comprising: detecting a plurality of intervals during the telephone call; generating labeled interval data for each of said plurality of intervals; and storing said labeled interval data in a database. 19. The method of claim 18, wherein said labeled interval data comprises a plurality of interval data elements, said method further comprising: 81 querying said database and retrieving one or more of the stored
82 interval data elements; and
83 playing a portion of the teleconference telephone call that is
84 associated with each of said retrieved interval data elements.
85 20. The method of claim 18, wherein said detected intervals
86 comprise:
87 an identity of a speaker;
88 pauses in speech; and
89 telephone call control.
PCT/US1998/020446 1997-10-01 1998-09-30 Method and apparatus for storing and retrieving labeled interval data for multimedia recordings WO1999017235A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
JP52049099A JP2001511991A (en) 1997-10-01 1998-09-30 Method and apparatus for storing and retrieving label interval data for multimedia records
CA002271745A CA2271745A1 (en) 1997-10-01 1998-09-30 Method and apparatus for storing and retrieving labeled interval data for multimedia recordings

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US6061997P 1997-10-01 1997-10-01
US60/060,619 1997-10-01

Publications (1)

Publication Number Publication Date
WO1999017235A1 true WO1999017235A1 (en) 1999-04-08

Family

ID=22030673

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US1998/020446 WO1999017235A1 (en) 1997-10-01 1998-09-30 Method and apparatus for storing and retrieving labeled interval data for multimedia recordings

Country Status (3)

Country Link
JP (1) JP2001511991A (en)
CA (1) CA2271745A1 (en)
WO (1) WO1999017235A1 (en)

Cited By (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB2359155A (en) * 2000-02-11 2001-08-15 Nokia Mobile Phones Ltd Memory management of acoustic samples eg voice memos
EP1195043A1 (en) * 1999-07-16 2002-04-10 Matra Nortel Communications Sound retrieval system with spatial effect, and telephone terminal incorporating same
WO2002065745A1 (en) * 2001-02-15 2002-08-22 Sivashunmugam Columbus Context association for multimedia using mark-up intelligence
WO2002082793A1 (en) * 2001-04-05 2002-10-17 Timeslice Communications Limited Improvements relating to voice recordal methods and systems
WO2004095839A1 (en) * 2003-04-17 2004-11-04 Siemens Communications, Inc. System and method for real time playback of conferencing streams
WO2005006728A1 (en) * 2003-07-02 2005-01-20 Bbnt Solutions Llc Speech recognition system for managing telemeetings
WO2005067296A1 (en) * 2003-12-29 2005-07-21 Koninklijke Philips Electronics N.V. Method and system for generating specific segments of a program
US7003286B2 (en) 2002-10-23 2006-02-21 International Business Machines Corporation System and method for conference call line drop recovery
EP1811759A1 (en) * 2006-01-23 2007-07-25 Hewlett-Packard Development Company, L.P. Conference call recording system with user defined tagging
US7290207B2 (en) 2002-07-03 2007-10-30 Bbn Technologies Corp. Systems and methods for providing multimedia information management
US7292977B2 (en) 2002-10-17 2007-11-06 Bbnt Solutions Llc Systems and methods for providing online fast speaker adaptation in speech recognition
US7308476B2 (en) 2004-05-11 2007-12-11 International Business Machines Corporation Method and system for participant automatic re-invite and updating during conferencing
US20080072159A1 (en) * 2006-09-14 2008-03-20 Tandberg Telecom As Method and device for dynamic streaming archiving configuration
EP2302867A1 (en) * 2009-09-25 2011-03-30 Research In Motion Limited Method and apparatus for managing multimedia communication recordings
EP2745509A1 (en) * 2011-08-19 2014-06-25 Telefonaktiebolaget LM Ericsson (PUBL) Technique for video conferencing
US8838179B2 (en) 2009-09-25 2014-09-16 Blackberry Limited Method and apparatus for managing multimedia communication recordings
US10471348B2 (en) 2015-07-24 2019-11-12 Activision Publishing, Inc. System and method for creating and sharing customized video game weapon configurations in multiplayer video games via one or more social networks
US10511718B2 (en) 2015-06-16 2019-12-17 Dolby Laboratories Licensing Corporation Post-teleconference playback using non-destructive audio transport
CN113259740A (en) * 2021-05-19 2021-08-13 北京字跳网络技术有限公司 Multimedia processing method, device, equipment and medium

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
AU2003295834A1 (en) * 2002-11-25 2004-06-18 Telesector Resources Group, Inc. Methods and systems for conference call buffering
SG10201602840WA (en) 2011-10-10 2016-05-30 Talko Inc Communication system

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH02134785A (en) * 1988-11-15 1990-05-23 Sony Corp Voice signal recording device
JPH052540A (en) * 1991-06-24 1993-01-08 Fujitsu Ltd Electronic conference system having minutes forming function
EP0660249A1 (en) * 1993-12-27 1995-06-28 AT&T Corp. Table of contents indexing system
US5559875A (en) * 1995-07-31 1996-09-24 Latitude Communications Method and apparatus for recording and retrieval of audio conferences
US5619555A (en) * 1995-07-28 1997-04-08 Latitude Communications Graphical computer interface for an audio conferencing system

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH02134785A (en) * 1988-11-15 1990-05-23 Sony Corp Voice signal recording device
JPH052540A (en) * 1991-06-24 1993-01-08 Fujitsu Ltd Electronic conference system having minutes forming function
EP0660249A1 (en) * 1993-12-27 1995-06-28 AT&T Corp. Table of contents indexing system
US5619555A (en) * 1995-07-28 1997-04-08 Latitude Communications Graphical computer interface for an audio conferencing system
US5559875A (en) * 1995-07-31 1996-09-24 Latitude Communications Method and apparatus for recording and retrieval of audio conferences

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
BERKLEY D A ET AL: "MULTIMEDIA RESEARCH PLATFORMS", AT & T TECHNICAL JOURNAL, vol. 74, no. 5, 1 September 1995 (1995-09-01), pages 34 - 44, XP000531007 *
PATENT ABSTRACTS OF JAPAN vol. 014, no. 366 (P - 1089) 8 August 1990 (1990-08-08) *
PATENT ABSTRACTS OF JAPAN vol. 017, no. 262 (P - 1541) 24 May 1993 (1993-05-24) *

Cited By (26)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1195043A1 (en) * 1999-07-16 2002-04-10 Matra Nortel Communications Sound retrieval system with spatial effect, and telephone terminal incorporating same
US6889039B2 (en) 2000-02-11 2005-05-03 Nokia Mobile Phones Limited Memory management terminal and method for handling acoustic samples
GB2359155A (en) * 2000-02-11 2001-08-15 Nokia Mobile Phones Ltd Memory management of acoustic samples eg voice memos
WO2002065745A1 (en) * 2001-02-15 2002-08-22 Sivashunmugam Columbus Context association for multimedia using mark-up intelligence
WO2002082793A1 (en) * 2001-04-05 2002-10-17 Timeslice Communications Limited Improvements relating to voice recordal methods and systems
US7290207B2 (en) 2002-07-03 2007-10-30 Bbn Technologies Corp. Systems and methods for providing multimedia information management
US7292977B2 (en) 2002-10-17 2007-11-06 Bbnt Solutions Llc Systems and methods for providing online fast speaker adaptation in speech recognition
US7003286B2 (en) 2002-10-23 2006-02-21 International Business Machines Corporation System and method for conference call line drop recovery
WO2004095839A1 (en) * 2003-04-17 2004-11-04 Siemens Communications, Inc. System and method for real time playback of conferencing streams
WO2005006728A1 (en) * 2003-07-02 2005-01-20 Bbnt Solutions Llc Speech recognition system for managing telemeetings
WO2005067296A1 (en) * 2003-12-29 2005-07-21 Koninklijke Philips Electronics N.V. Method and system for generating specific segments of a program
US7308476B2 (en) 2004-05-11 2007-12-11 International Business Machines Corporation Method and system for participant automatic re-invite and updating during conferencing
EP1811759A1 (en) * 2006-01-23 2007-07-25 Hewlett-Packard Development Company, L.P. Conference call recording system with user defined tagging
US20080072159A1 (en) * 2006-09-14 2008-03-20 Tandberg Telecom As Method and device for dynamic streaming archiving configuration
US8260854B2 (en) 2006-09-14 2012-09-04 Cisco Technology, Inc. Method and device for dynamic streaming archiving configuration
US8838179B2 (en) 2009-09-25 2014-09-16 Blackberry Limited Method and apparatus for managing multimedia communication recordings
EP2302867A1 (en) * 2009-09-25 2011-03-30 Research In Motion Limited Method and apparatus for managing multimedia communication recordings
US9479735B2 (en) 2011-08-19 2016-10-25 Telefonaktiebolaget Lm Ericsson (Publ) Technique for video conferencing
EP2745509A1 (en) * 2011-08-19 2014-06-25 Telefonaktiebolaget LM Ericsson (PUBL) Technique for video conferencing
US9591263B2 (en) 2011-08-19 2017-03-07 Telefonaktiebolaget Lm Ericsson (Publ) Technique for video conferencing
EP2745509B1 (en) * 2011-08-19 2021-06-30 Telefonaktiebolaget LM Ericsson (publ) Technique for video conferencing
US10511718B2 (en) 2015-06-16 2019-12-17 Dolby Laboratories Licensing Corporation Post-teleconference playback using non-destructive audio transport
US11115541B2 (en) 2015-06-16 2021-09-07 Dolby Laboratories Licensing Corporation Post-teleconference playback using non-destructive audio transport
US10471348B2 (en) 2015-07-24 2019-11-12 Activision Publishing, Inc. System and method for creating and sharing customized video game weapon configurations in multiplayer video games via one or more social networks
US10835818B2 (en) 2015-07-24 2020-11-17 Activision Publishing, Inc. Systems and methods for customizing weapons and sharing customized weapons via social networks
CN113259740A (en) * 2021-05-19 2021-08-13 北京字跳网络技术有限公司 Multimedia processing method, device, equipment and medium

Also Published As

Publication number Publication date
CA2271745A1 (en) 1999-04-08
JP2001511991A (en) 2001-08-14

Similar Documents

Publication Publication Date Title
WO1999017235A1 (en) Method and apparatus for storing and retrieving labeled interval data for multimedia recordings
US6282510B1 (en) Audio and video transcription system for manipulating real-time testimony
JP4466564B2 (en) Document creation / viewing device, document creation / viewing robot, and document creation / viewing program
US7466334B1 (en) Method and system for recording and indexing audio and video conference calls allowing topic-based notification and navigation of recordings
Hauptmann et al. Informedia: News-on-demand multimedia information acquisition and retrieval
US6298129B1 (en) Teleconference recording and playback system and associated method
Whittaker et al. SCANMail: a voicemail interface that makes speech browsable, readable and searchable
US7848493B2 (en) System and method for capturing media
US8407049B2 (en) Systems and methods for conversation enhancement
US20030128820A1 (en) System and method for gisting, browsing and searching voicemail using automatic speech recognition
CA2326012C (en) Bookmarking voice messages
US7617445B1 (en) Log note system for digitally recorded audio
US20020091658A1 (en) Multimedia electronic education system and method
US20070286573A1 (en) Audio And Video Transcription System For Manipulating Real-Time Testimony
US20020133513A1 (en) Log note system for digitally recorded audio
JPH09185879A (en) Recording indexing method
CN1682279A (en) System and method of media file access and retrieval using speech recognition
AU2002250360A1 (en) Log note system for digitally recorded audio
US7949118B1 (en) Methods and apparatus for processing a session
JP3437617B2 (en) Time-series data recording / reproducing device
US20020044633A1 (en) Method and system for speech-based publishing employing a telecommunications network
US20080167879A1 (en) Speech delimiting processing system and method
KR101783872B1 (en) Video Search System and Method thereof
Bouamrane et al. Navigating multimodal meeting recordings with the meeting miner
KR100806225B1 (en) The Appratus method of automatic generation of the web page for conference record and the method of searching the conference record using the event information

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): CA JP MX

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE

WWE Wipo information: entry into national phase

Ref document number: 1998951990

Country of ref document: EP

ENP Entry into the national phase

Ref document number: 2271745

Country of ref document: CA

WWE Wipo information: entry into national phase

Ref document number: PA/a/1999/005025

Country of ref document: MX

ENP Entry into the national phase

Ref country code: JP

Ref document number: 1999 520490

Kind code of ref document: A

Format of ref document f/p: F

121 Ep: the epo has been informed by wipo that ep was designated in this application
WWW Wipo information: withdrawn in national office

Ref document number: 1998951990

Country of ref document: EP