US20040003028A1 - Automatic display of web content to smaller display devices: improved summarization and navigation - Google Patents

Automatic display of web content to smaller display devices: improved summarization and navigation Download PDF

Info

Publication number
US20040003028A1
US20040003028A1 US10/142,393 US14239302A US2004003028A1 US 20040003028 A1 US20040003028 A1 US 20040003028A1 US 14239302 A US14239302 A US 14239302A US 2004003028 A1 US2004003028 A1 US 2004003028A1
Authority
US
United States
Prior art keywords
subdocuments
document
text
link
label
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
US10/142,393
Inventor
David Emmett
Ahmad Rahman
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
TIMBERLINE VENTURE PARTNERS LP
Original Assignee
TIMBERLINE VENTURE PARTNERS LP
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by TIMBERLINE VENTURE PARTNERS LP filed Critical TIMBERLINE VENTURE PARTNERS LP
Priority to US10/142,393 priority Critical patent/US20040003028A1/en
Assigned to UNISITE SOFTWARE, INC. reassignment UNISITE SOFTWARE, INC. ASSET PURCHASE AGREEMENT Assignors: TIMBERLINE VENTURE PARTNERS, L.P.
Assigned to STIRLING BRIDGE, INC. reassignment STIRLING BRIDGE, INC. ASSET PURCHASE AGREEMENT Assignors: UNISITE SOFTWARE, INC.
Assigned to TIMBERLINE VENTURE PARTNERS, L.P. reassignment TIMBERLINE VENTURE PARTNERS, L.P. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: WIREDPOCKET
Publication of US20040003028A1 publication Critical patent/US20040003028A1/en
Priority to US11/510,467 priority patent/US8983949B2/en
Abandoned legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/957Browsing optimisation, e.g. caching or content distillation
    • G06F16/9577Optimising the visualization of content, e.g. distillation of HTML documents

Definitions

  • the present invention relates to a system and method for modifying a document format.
  • Handheld devices including Personal Digital Assistants (PDAs) and cellular telephones, offer connectivity to the Internet and permit access to documents available over the Internet.
  • WAP Wireless Application Protocol
  • WAP is a standard for providing cellular phones, PDAs, pagers and other handheld devices with secure access to web pages.
  • WAP features the Wireless Markup Language (WML), which generally serves as a medium for translating web-based HTML content into a format that accommodates small form factor displays and key sets found on conventional handheld devices.
  • WML also allows handheld device manufacturers to include microbrowsers in their products that accept WML input from a WAP-based system across vast regions of the world.
  • the first such method can be termed “fixed mapping.”
  • Fixed mapping typically involves rewriting an existing document, such as an HTML-based web page, to conform to a specific standard, such as WAP, J-PHONE, or i-Mode, or to a small display device.
  • a web server must then maintain the rewritten web site as a separate site with its own URL in addition to the original document.
  • a web site operator must manually trim, edit, and condense the new content by rewriting the new content into a format that will accommodate the interface parameters of handheld devices. This method is limited in that considerable time and expense are typically required to maintain the two web sites in parallel. Further, the manual editing of the rewritten web site can be time-consuming, burdensome, and expensive.
  • Transcoding typically involves the use of software that takes the entire content of a web site as input, converts the entire content into a format of a specific handheld wireless standard for transmission to handheld devices. The entire content, as formatted according to a handheld wireless standard, is then transmitted to the handheld device. This conversion may be performed “on-the-fly” (i.e., automatically in real time) or may be performed manually.
  • Transcoding has the advantage of reducing the investment to reach wireless markets since it leverages existing web sites. From a user standpoint, transcoding is desirable in that it preserves all the text-based information from the originating site. For large volumes of text, however, using this approach may overwhelm the handheld device user with large volumes of text to be viewed on a small form factor display. Further, the unorganized transcoded content makes changes or modifications to the wirelessly enabled web site more difficult for the web site operator.
  • a method of ranking entries in a table of contents for display at a client device includes transmitting a first document from an application server over a network, such as the Internet, to the client device.
  • the first document includes text and at least one link.
  • the application server then receives a request for a second document associated with the link from the client device.
  • the application server divides the second document into subdocuments and assigns a label to each of a plurality of the subdocuments.
  • the application server also performs a comparison of the text of the first document with the text of each of the plurality of subdocuments to generate a document-document value for each of the plurality of subdocuments according to the degree of association between the first document and the corresponding one of the subdocuments. After performing this comparison, the application server ranks the plurality of subdocuments based, at least in part, on the document-document values.
  • the application server performs a comparison of the text of the link with the text of each of the subdocuments to generate a link-text value for each of the plurality of subdocuments according to the degree of association between the first document and the corresponding one of the subdocuments. After performing this comparison, the application server ranks the plurality of subdocuments based, at least in part, on the link-text values.
  • the application server performs a comparison of the text of the link with the label assigned to each of the plurality of subdocuments to generate a link-label value for each of the plurality of subdocuments according to the degree of association between the first document and the corresponding one of the subdocuments. After performing this comparison, the application server ranks the plurality of subdocuments based, at least in part, on the link-label values.
  • the application server generates a size value indicative of an amount of text in each of the plurality of subdocuments for each of the plurality of subdocuments. After generating a size value for each of the plurality of subdocuments, the application server ranks the plurality of subdocuments based, at least in part, by the size value.
  • subdocuments likely to be relevant to the first document, the selected link, or both are listed at or near the top of a table of contents to facilitate user selection of the same.
  • users may easily follow a text that spans multiple documents by having table of contents of a requested page list the subdocuments containing continuing portions of the text listed at or near the top of the table of contents.
  • FIG. 1 is a block diagram of a document delivery system in accordance with one embodiment of the present invention.
  • FIG. 2 is a block diagram of the formatter of FIG. 1 in accordance with one embodiment of the present invention.
  • FIG. 3 is a block diagram of the mapper of FIG. 2 in accordance with one embodiment of the present invention.
  • FIG. 4 illustrates a tree data structure in accordance with one embodiment of the present invention.
  • FIG. 5 is a block diagram of the control module of FIG. 2 in accordance with one embodiment of the present invention.
  • FIG. 6 is a flowchart illustrating a method in accordance with one embodiment of the present invention.
  • FIG. 7 is a flowchart illustrating a method in accordance with another embodiment of the present invention.
  • FIG. 8 illustrates a progression of material displayed at the display of the client device of FIG. 1 in accordance with an embodiment of the present invention.
  • FIG. 1 illustrates a document delivery system 100 in accordance with one embodiment of the present invention.
  • the document delivery system 100 permits a client 102 to access content of documents (not shown) stored at server 104 , server 106 , or other servers 108 over a network 110 , such as the Internet, and over a network 111 , such as an intranet.
  • a network 110 such as the Internet
  • a network 111 such as an intranet.
  • the client 102 comprises a handheld device, such a PDA (Personal Digital Assistant), a mobile telephone, or the like, having a small form factor display 112 .
  • the client 102 also includes a web browser 114 .
  • the web browser 114 may comprise a microbrowser designed for small display screens on web-enabled cellular telephones, PDAs and other handheld devices, including wireless handheld devices.
  • the client 102 may exchange data with the network 110 in a wireless fashion via a wireless station 120 and a gateway 122 in accordance with WAP (Wireless Application Protocol), i-Mode, or other suitable protocol or service.
  • WAP Wireless Application Protocol
  • i-Mode or other suitable protocol or service.
  • the client 102 may exchange data with the network 110 via a wired connection (not shown).
  • the client 102 may also exchange data with the network 111 in a wireless fashion via a wireless station 121 and a gateway 123 in accordance with WAP (Wireless Application Protocol), i-Mode, or other suitable protocol or service for delivery of web content to small display devices.
  • WAP Wireless Application Protocol
  • i-Mode or other suitable protocol or service for delivery of web content to small display devices.
  • the client 102 may exchange data with the network 111 via a wired connection (not shown).
  • the gateways 122 , 123 are network devices that connect a wireless network with a wired network, such as the networks 110 , 111 . Access between the client 102 and application server 124 may also pass through one or more other firewalls (not shown), other gateway devices (not shown), or the like.
  • the client 102 transmits requests for documents stored on one or more of the servers 104 , 106 , 108 to the application server 124 .
  • the request for content may comprise an HTTP request or other suitable type of request.
  • the application server 124 may alternatively receive the request for a document from the client 102 from any network (e.g., 110 , 111 ).
  • the application server 124 functions as a proxy server and receives requests for documents from client devices, such as the client 102 , over the networks 110 , 111 and provides associated content in response to such requests by transmitting the associated content over at least one of the networks 110 , 111 .
  • the application server 124 In response to a request for a document from the client 102 , the application server 124 requests the document identified by the request from one or more of the servers 104 , 106 , 108 . Upon receipt of the document identified by the request, the application server 124 modifies the format of the document identified by the request for content using a formatter 126 .
  • the document identified by the request is an HTML or XML web page, although other document types, such as PDF (Portable Document Format), may also be requested.
  • the application server 124 then transmits at least a portion of the reformatted content of the document identified by the request to the client 102 in a format compatible with the browser 114 for display at the display 112 of the client 102 .
  • the formatter 126 includes a database (see FIG. 5) that may be configured from a client admin computer 140 via a database modifier 128 .
  • the database modifier 128 may comprise a JavaScript module that permits a user at the client admin computer to visually modify a data structure of a document into a desired format. The modification may be performed by, for example, adding labels, re-ordering, moving, deleting, or otherwise changing portions of the data structure and stores the changed, or modified version of the data structure in the database.
  • the client admin computer 140 includes a web browser 142 , such as Internet ExplorerTM by Microsoft Corporation or other suitable web browser for permitting a user at the client admin computer 140 to view pages at the database modifier 128 hosted at the application server 124 .
  • the pages at the database modifier 128 of the application server 124 permit user configuration of the FIG. 5 database, as discussed in more detail below.
  • the formatter 126 receives the document identified by the request from one of the servers 104 , 106 , 108 , divides the document into multiple blocks, and assigns labels to individual blocks. The formatter 126 then generates a list containing the content of the various blocks. If a data structure associated with the document is stored in the database, the formatter 126 then uses the data structure to generate output files from the generated list of content.
  • the output file may contain a Table of Contents (TOC) page and subdocuments.
  • the TOC page lists labels associated with the subdocuments and may contain links to the subdocuments.
  • the formatter 126 then transmits the TOC page, a headline, an image, or other content specified by a database at the application server 124 to the client 102 over at least one of the networks 110 , 111 . Details of the operation of the formatter 126 are discussed in more detail below.
  • FIG. 2 illustrates details of the formatter 126 of FIG. 1 according to one embodiment of the invention.
  • the formatter 126 includes a mapper 202 , and a control module 206 , which may comprise software written in C++ or other suitable programming language.
  • the mapper 202 receives the requested document and reformats the document as a list of document content 204 .
  • the control module 206 then generates an output file using the list document content 204 . Additional details regarding the mapper 202 , the list of document content 204 , and the control module 206 are discussed below.
  • FIG. 3 illustrates details of the mapper 202 of FIG. 2 according to one embodiment of the invention.
  • the mapper 202 includes a number of software modules stored in a computer readable medium.
  • the mapper 202 includes a network interface 302 , a parser 304 , a label engine 306 , a data structure converter 308 , and a ranking engine 310 .
  • the network interface 302 receives the document requested from the network.
  • the document requested may comprise a web page, such as an HTML document, and XML document, or the like.
  • FIG. 4 illustrates an example tree data structure 400 , which may comprise a structural representation of a document, such as an HTML web page.
  • the tree data structure 400 includes a root node 402 associated with the document.
  • the parser 304 (FIG. 3) divides the document into multiple blocks and represents each block of the document as a table node 404 in the tree data structure 400 .
  • Each table node 404 has at least one row node 406 as a child node.
  • Individual row nodes 406 each have at least one column node 408 as a child node.
  • the column nodes 408 may then have additional table nodes as children.
  • the tree data structure 400 may be recursive.
  • the document is divided into blocks, which may be defined by the structure of the document.
  • the primary content for each of the blocks, or tables, is stored in the column nodes 408 and the remaining structure of the various blocks is represented in the other portions of the tree data structure 400 .
  • the label engine 306 then assigns labels to individual blocks and may assign a classification to each block according to the contents of the block.
  • the label engine 306 assigns a classification to each block based on the block contents. For example, if the document is a web page, the web page may include links, text, forms, and pictures, as well as other classes of content.
  • the label engine 306 optionally analyzes individual blocks and assigns a classification to the block indicating the type, or class, of content in the block. Hence, a block that contains primarily links may be assigned a “navigation” classification, a block that contains primarily text may be assigned a “story” classification, a block that contains primarily pictures may be assigned an “image” classification, and a block that contains form information like an address may be assigned a “form” classification.
  • the label engine 306 inserts a classifier associated with the assigned classification for each block into the table node of each block.
  • the label engine 306 After classifying the blocks, the label engine 306 optionally merges, or combines, column nodes of each block that have the same classification. For example, if a given block has multiple column nodes having the classification of “story,” the label engine 306 may merge, or combine, the content of these column nodes. Likewise, if a given block has multiple columns having the classification of “navigation,” the label engine 306 may merge, or combine, the content of these column nodes.
  • the label engine 306 may merge, or combine, column nodes in accordance with predetermined merging rules stored at the label engine 306 .
  • An example merging rule is that a large “story” node is not merged with another large “story” node.
  • Another example merging rule is that a small “story” node may get merged with a “navigation” node.
  • a large story which is likely to be substantial enough to be viewed in isolation, will not be combined with another large story.
  • a small story would not be isolated. Rather, the user experience may be improved by merging other nodes, such as a small “navigation” node or a small “story” node.
  • the specifics of these merging rules may vary and may be customized according to particular applications.
  • the classifying and merging are optional according to some embodiments of the invention.
  • the label engine 306 also assigns a label to each block according to the block contents. In one embodiment, the label engine 306 uses the first several words of text of a block including text as the label for that block. In another embodiment, the label engine 306 assigns a label to a block based on the classification of the block. The label engine 306 then adds the assigned label to the table node of the associated block.
  • a data structure converter 308 of the mapper 202 next “flattens” the tree data structure by converting the tree data structure into a linear, one-dimensional list containing the content of the column nodes 408 .
  • the table nodes 404 and the row nodes 406 are not included in the one-dimensional list.
  • Individual entries in the one-dimensional list include the content of an associated column nodes 408 .
  • a ranking engine 310 then ranks the entries in the one-dimensional list according to the content of the individual entries.
  • the ranking engine 310 analyzes characteristics of each entry and assigns a “weight” value to each entry.
  • the weight assigned to each entry may be based on a variety of parameters. These parameters may include, for example, the size of the font used in the entry, whether the text in the entry is boldface, the color of the text, whether the text is flashing, whether the text is underlined, and the position of the item in the document.
  • the ranking engine 310 may also generate a size value indicative of an amount of text in each of the plurality of subdocuments.
  • the size value may be larger for subdocuments comprising large amounts of text and the size value may be smaller for subdocuments comprising smaller amounts of text. Ranking the entries in the table of contents, at least in part, according to the size value tends to make entries associated with larger amount of text appear higher on the list of entries in the table of contents (i.e., or more important or more relevant).
  • the weight assigned to each entry may also depend on the content of the link leading to the document, the text of the previous document, the text of the subdocument associated with the entry, the text of the label associated with the entry, or a combination of these. Additional details regarding this embodiment are described below with reference to FIGS. 7 and 8. Based on parameters such as these, the ranking engine 310 assigns a weight to individual entries in the one-dimensional list and then re-orders the one-dimensional list according to the weighted rankings.
  • the ranking engine 310 reorders the list in an order of decreasing weight values such that the first entry in the re-ordered list is the entry having the largest weight value and the last entry in the list the entry having the smallest weight value.
  • the re-ordered list is then stored as the list of document content 204 (FIG. 2).
  • entries having large or bold text may be ranked before entries having smaller or plain text.
  • entries having a graphic may be ranked higher than entries having primarily links.
  • FIG. 5 illustrates details of the control module 206 of FIG. 2 in accordance with one embodiment of the present invention.
  • the control module 206 receives the list of document content 204 and creates a new document structure according to a navigation rules database 502 and the list of document content 204 .
  • the navigation rules database 502 contains a tree data structure for one or more documents.
  • contents of the navigation rules database 502 may be modified by accessing the formatter 126 (FIG. 1) from a client computer, such as the client admin computer 140 (FIG. 1).
  • the database modifier 128 may modify the contents of the navigation rules database 502 described above.
  • the client admin computer 140 includes browser 142 and permits a user to access the database modifier 128 and to modify the contents of the navigation rules database 502 .
  • a user at the client admin computer 140 directs the browser 142 to the database modifier 128 .
  • the database modifier 128 presents the user with a GUI (Graphical User Interface) via the browser 142 that permits the user to view a default tree data structure, as constructed by the mapper 202 , for a given document, such as an HTML or XML web page document.
  • the default tree structure may be the structure of the document at issue as determined by parsing the document.
  • the user may then delete entries in the tree data structure.
  • the user may alternatively move tree data structure entries from one location to another within the tree data structure. Further, the user may change the label or classification assigned to given nodes within the tree data structure.
  • the control module 206 stores the modified tree data structure as an entry in the navigation rules database 502 associated with the document.
  • the control module 206 also includes a URL (Uniform Resource Locator) checker 504 .
  • the URL checker 504 receives the list of document content 204 from the mapper 302 and determines whether the navigation rules database 502 includes a tree data structure associated with the list of document content 204 . In one embodiment, the URL checker determines whether the URL associated with the list of document content 204 matches a URL associated with an entry in the navigation rules database 502 . If such a match exists, an output file generator 506 retrieves the tree data structure in the navigation rules database 502 associated with the list of document content 204 . The output file generator 506 then creates one or more output files 508 based on the retrieved tree data structure using the content of list of document content 204 .
  • URL Uniform Resource Locator
  • the output files 508 include a table of contents (TOC) page that lists the labels of the document.
  • the output files 508 also contain one or more subdocuments. Individual sub-pages are associated with individual entries in the TOC.
  • One or more of the labels, or entries, of the TOC may include links to associated subdocuments.
  • the output file generator 506 generates an output files 508 that include a TOC page that lists the labels of the document.
  • One or more of the labels, or entries, of the TOC may include links to associated subdocuments.
  • the formatter 126 then transmits the TOC page over at least one of the networks 110 , 111 to the client 102 .
  • the client 102 Upon receipt of the TOC page at the client 102 , the client 102 displays the TOC page at the display 112 of the client 102 .
  • the user may then select a link associated with one of the entries of the TOC, which requests an associated subdocument from the output files 508 .
  • the formatter transmits the requested subdocument to the client 102 over at least one of the networks 110 , 111 for display at the display 112 of the client 102 .
  • FIG. 6 illustrates a flowchart 600 , which depicts a method according to one embodiment of the present invention.
  • the method commences at block 602 where application server 124 receives a request for document from the client 102 (FIG. 1), the requested document residing on at least one of the servers 104 , 106 , 108 .
  • the request for document may be directed to the application server 124 directly.
  • the request for document may be directed directly to one of the servers 104 , 106 , 108 , which, in turn, redirects the request for document to the application server 124 .
  • the request for document may comprise an HTTP request or other suitable request.
  • the requested document may comprise a document in HTML, XML, PDF, or other suitable format.
  • the application server 124 retrieves the requested document from one or more of the servers 104 , 106 , 108 on which the document resides. This retrieval may be accomplished by the application server 124 transmitting an HTTP request to the server 104 , 106 , 108 at which the requested document is stored. For example, if the requested document resides at the server 104 , the application server 124 requests the document from the server 104 over the network 110 and receives the requested document over the network 110 .
  • the formatter 126 of the application server 124 extracts a structure of the retrieved document.
  • a parser 304 (FIG. 3) parses the retrieved document and generates a tree data structure representing the structure of the retrieved document. An example of such a tree data structure is illustrated in FIG. 4 and is described above.
  • the formatter 126 next analyzes the content of the nodes and assigns one of a set of predefined classifiers to each of the nodes based on the content of the nodes, pursuant to block 608 .
  • the label engine 306 of the formatter 126 may assign a “story” classifier to the node.
  • the classifier may comprise a text string or other identifier added to the node.
  • the label engine 306 of the formatter 126 assigns labels to individual nodes of the tree data structure that include document content.
  • the label engine 306 may assign a label based on the content of the node, the assigned classification of the node, or both.
  • the label engine 306 uses the first several words of nodes having text content as the label for the associated node.
  • the label may indicate the content of the node being labeled.
  • the label engine 306 merges nodes having content according to their classification. For example, if a pair of nodes having content both have the classification “navigation,” then the label engine 306 merges the content of these nodes to form a single node that includes the content of the merged nodes. Block 612 may alternatively, or additionally, be performed after block 616 . In one embodiment, the merging is performed before and after ranking.
  • the data structure converter 308 of the mapper 202 converts the tree data structure to a list.
  • the data structure converter 308 extracts the nodes of the tree data structure that include content and generates a list comprising the nodes of the tree data structure that include content, without the other associated nodes, such as table and row nodes, which do not include content.
  • the ranking engine 310 (FIG. 3) of the mapper 202 reorders the entries of the list generated at block 614 .
  • the ranking engine 310 assigns a weight value to each of the entries in the list according to certain parameters of the content of the entries, the classification of the list entry, or a combination thereof.
  • the ranking engine 310 reorders the list according to the weight value of the list entries. For example, the ranking engine 310 may order the list entries in order of decreasing weight value.
  • the ranking engine 310 then stores the re-ordered list as the list of document content 204 (FIG. 2).
  • the control module 206 determines whether the navigation rules database 520 includes an entry associated with the list of document content 204 , pursuant to block 618 .
  • the URL checker 504 of the control module 206 determines whether a URL associated with the list of document content 204 matches a URL associated with an entry in the navigation rules database 502 .
  • the URL checker 504 determines that the navigation rules database 502 contains an entry associated with the list of document content if such a match exists and execution proceeds to block 620 , else execution proceeds to block 622 .
  • the output file generator 506 creates a new data tree structure using the list of document content 204 and the associated entry of the navigation rules database 502 .
  • the entry of the navigation rules database 502 may specify labels to be assigned to the various nodes, the location of the various nodes within the new data tree structure, and whether certain nodes are included in the new data tree structure.
  • the output file generator 506 then creates a new data tree structure according to the entry in the navigation rules database 502 and inserts the associated content from the list of document content 204 to form a new data tree, which may be stored as the output files 508 .
  • the output file generator 506 stores the new data tree structure as the output files 508 if the navigation rules database 502 contains as entry associated with the list of document content 204 . Otherwise, the output file generator 506 stores the list of document content as the output files 508 or processes the list of document content from memory. Moreover, the output file generator 506 may generate device-specific output.
  • the output files 508 include a table of contents (TOC) page that lists the labels of the nodes having content and subdocuments that include the content of blocks associated with the labels. Each of the subdocuments is associated with one of the links so that a user at the client 102 may request a subdocument by selecting the link associated therewith.
  • TOC table of contents
  • the formatter 126 transmits the TOC page to the client 102 .
  • FIGS. 7 and 8 illustrate details of one embodiment of the operation of the ranking engine 310 described above and illustrated in FIG. 3. Since, according to some embodiments, each document is analyzed individually and independently, when a body of text is followed from one document to another, tracking the body of text is a consideration for the ease of reading the body of text and navigating a set of documents. Indeed it is common for a story to begin on a first document and extend to a second document. Hence, in some applications, it may be desirable to facilitate identification of the continuing portion of the story within the second document, which may be divided into multiple subdocuments.
  • the display 112 of the client device 102 displays a subdocument 802 containing text 806 and one or more links 804 .
  • the link 804 is a selectable connection (e.g., a hyperlink) from a word, a set of words, or other information object, to another.
  • One implementation of the link 804 is a highlighted set of words, or text, that can be selected by a user, such as with a mouse or by touch-screen control, resulting in the immediate delivery and view of another file.
  • the highlighted text may be referred to as an anchor.
  • FIG. 7 is a flowchart illustrating a method in accordance with one embodiment of the present invention.
  • FIG. 8 illustrates an example sequence of material displayed at the display 112 (see, FIG. 1).
  • user selection of the link 804 causes the client 102 to transmit a request for an associated file, such as a document, from the application server 124 .
  • the application server 124 when a document is thus requested, the application server 124 generates a table of contents page 810 , including a list of labels, with each label being associated with a subdocument.
  • the label associated with the selected link 804 be at or near the top of the list of labels in the table of contents page 810 to facilitate navigation and to permit the user to easily locate the label associated with the selected link.
  • the label 812 of the table of contents page 810 be associated with the selected link to permit the user to quickly and easily identify the subdocument associated with the selected link 804 .
  • the user may then select the label 812 , which comprises a link to the subdocument 820 containing the text 822 .
  • the user at a client 102 views a subdocument 802 at a display 112 of the client 102 .
  • the subdocument 802 includes text 806 and one or more links 804 .
  • the user selects one of the links 804 of the subdocument 802 .
  • the user selection of the link 804 pursuant to block 701 causes the client 102 to transmit a request for a document associated with the link 804 selected by the user.
  • the application server 124 receives the request for document from the client 102 (FIG. 1), the requested document residing on at least one of the servers 104 , 106 , 108 .
  • the request for document may be directed to the application server 124 directly or to one of the servers 104 , 106 , 108 , which, in turn, redirects the request for document to the application server 124 .
  • the application server 124 retrieves the requested document from one or more of the servers 104 , 106 , 108 on which the document resides. This retrieval may be accomplished as described above.
  • the formatter 126 of the application server 124 extracts a structure of the retrieved document as described above.
  • the formatter 126 next analyzes the content of the nodes and assigns one of a set of predefined classifiers to each of the nodes based on the content of the nodes, pursuant to block 708 as discussed above.
  • the label engine 306 of the formatter 126 assigns labels to individual nodes of the tree data structure that include document content as discussed above.
  • the label engine 306 merges nodes having content according to their classification and, at block 714 , the data structure converter 308 of the mapper 202 converts the tree data structure to a list, as discussed above.
  • the ranking engine 310 compares the text 806 of the previous subdocument 802 to each of the subdocuments of the requested document using conventional document, or text, matching techniques to determine the extent to which the previous subdocument is associated with each of the subdocuments of the requested document.
  • the ranking engine 310 may employ an n-dimensional vector matching technique for comparing the text of the previous subdocument 802 to each of the subdocuments of the requested document.
  • Modern Information Retrieval by R. Baeza-Yates, et al, published by Addison-Wesley Pub Co; 1999, ISBN: 020139829X, discloses related techniques and is incorporated herein by reference.
  • the ranking engine 301 In comparing the text 806 of the previous subdocument 802 to each of the subdocuments of the requested document, the ranking engine 301 generates a document/document value for each of the subdocuments of the requested document.
  • the document/document value indicates the degree to which there is an association between the text 806 of the previous subdocument 802 to each of the subdocuments of the requested document. For example, if the text 806 of the subdocument 802 included the terms such as “XYZ,” “merger,” “corporate,” “shareholders” and the like, the ranking engine 301 would assign a higher degree of association, and thus either a higher or lower document/document value, to subdocuments in the requested page that include the same or similar terms.
  • the ranking engine 310 compares the text of the selected link 804 to each of the subdocuments of the requested document. For example, if the selected link 804 comprised the text “XYX merger” the ranking engine 310 would determine the degree to which the text “XYZ merger” is present in each of the subdocuments of the requested document.
  • the ranking engine 301 generates a link/document value for each of the subdocuments of the requested document. The link/document value indicates the degree to which the text of the selected link is present in each of the subdocuments of the requested document.
  • the ranking engine 310 compares the text of the selected link 804 to each of the labels of the requested document. For example, if the selected link 804 comprised the text “XYZ merger” the ranking engine 310 would determine the degree to which the text “XYZ merger” is present in each of the labels assigned to the requested document and would generate a link/label value for each of the subdocuments of the requested document. The link/label value indicates the degree to which the text of the selected link and the subdocuments of the requested document are related.
  • the ranking engine 310 may also use additional factors in reordering the list entries. For example, the ranking engine may generate a size value indicative of an amount of text in each of the plurality of subdocuments. Pursuant to this embodiment, the size value may be larger for subdocuments comprising large amounts of text and the size value may be smaller for subdocuments comprising smaller amounts of text. Ranking the entries in the table of contents, at least in part, according to the size value tends to make entries associated with larger amount of text appear higher on the list of entries in the table of contents.
  • the ranking engine 310 reorders the list entries according to the document/document value, the link/document value, the link/label value or a combination of these values.
  • the ranking engine 310 assigns a weight to each of the document/document, link/document, and link/label values and then combines the weighted values to determine the reordering of the list entries.
  • the ranking engine 310 reorders the list entries according to one or more of the document/document, link/document, and link/label values and other factors, including, for example, amount of content in the subdocument, the size of the font used in the subdocument, whether the text in the subdocument is boldface, the color of the text in the subdocument, whether the text of the subdocument is flashing, and the position of the item in the document.

Abstract

A system and method are disclosed for modifying a document format. In one embodiment, a structure of a first document is extracted to form a first data structure, including multiple subdocuments, each subdocument having a label assigned thereto. A table of contents listing the labels of the subdocuments is then generated. The various labels are then ordered according to the amount of text of the associated subdocument, a comparison of the text of a previous link and the text of the associated subdocument, a comparison of the text of the previous document and the text of the associated subdocument, a comparison of the text of the previous link and the text of the associated subdocument, or a combination of these.

Description

    CROSS REFERENCE TO RELATED APPLICATIONS
  • This application claims the benefit and priority of U.S. Provisional Patent Application entitled “Automatic Display of Web Content to Smaller Display Devices: Improved Summarization and Navigation” filed May 10, 2001 and of U.S. patent application Ser. No. 10/076,786 entitled “System and Method for Modifying a Document Format)” filed Feb. 14, 2002, the disclosures of which are hereby incorporated by reference in their respective entireties.[0001]
  • TECHNICAL FIELD
  • The present invention relates to a system and method for modifying a document format. [0002]
  • BACKGROUND
  • Handheld devices, including Personal Digital Assistants (PDAs) and cellular telephones, offer connectivity to the Internet and permit access to documents available over the Internet. Wireless Application Protocol (WAP) is a standard for providing cellular phones, PDAs, pagers and other handheld devices with secure access to web pages. WAP features the Wireless Markup Language (WML), which generally serves as a medium for translating web-based HTML content into a format that accommodates small form factor displays and key sets found on conventional handheld devices. WML also allows handheld device manufacturers to include microbrowsers in their products that accept WML input from a WAP-based system across vast regions of the world. [0003]
  • The proliferation of wireless PDAs has also created a popular means for handheld Internet access. However, presenting IP-based content, and other content developed for display on large form factor devices (e.g., PC monitors), on small form factor screens of handheld devices has, in the past, been problematic. Two primary methods of presenting such content to handheld devices have been employed. [0004]
  • The first such method can be termed “fixed mapping.” Fixed mapping typically involves rewriting an existing document, such as an HTML-based web page, to conform to a specific standard, such as WAP, J-PHONE, or i-Mode, or to a small display device. A web server must then maintain the rewritten web site as a separate site with its own URL in addition to the original document. As new content is added to the original document, a web site operator must manually trim, edit, and condense the new content by rewriting the new content into a format that will accommodate the interface parameters of handheld devices. This method is limited in that considerable time and expense are typically required to maintain the two web sites in parallel. Further, the manual editing of the rewritten web site can be time-consuming, burdensome, and expensive. [0005]
  • The second method may be termed “transcoding.” Transcoding typically involves the use of software that takes the entire content of a web site as input, converts the entire content into a format of a specific handheld wireless standard for transmission to handheld devices. The entire content, as formatted according to a handheld wireless standard, is then transmitted to the handheld device. This conversion may be performed “on-the-fly” (i.e., automatically in real time) or may be performed manually. [0006]
  • Transcoding has the advantage of reducing the investment to reach wireless markets since it leverages existing web sites. From a user standpoint, transcoding is desirable in that it preserves all the text-based information from the originating site. For large volumes of text, however, using this approach may overwhelm the handheld device user with large volumes of text to be viewed on a small form factor display. Further, the unorganized transcoded content makes changes or modifications to the wirelessly enabled web site more difficult for the web site operator. [0007]
  • In addition, many wireless handheld devices have limited bandwidth. Thus, downloading an entire web page designed for viewing on a large form factor device at data rates common to handheld wireless devices may require large download times. These large download times may be burdensome to the user who must wait while the entire web page downloads, even though the user may only desire to view a portion of the web page. Further, these large download times may be expensive for users who pay for wireless service based on the amount of time or the number of packets downloaded. For example, service plans are time-based or packet-based. These service plans charge on either the time connected or number of packets received, respectively. Thus, large downloads under these service plans will be more expensive than smaller downloads. [0008]
  • Additional background details are disclosed in U.S. Pat. No. 6,336,124, the disclosure of which is hereby incorporated by reference. [0009]
  • SUMMARY
  • Accordingly, a need exists to provide a system and method for presenting content developed for display on large form factor devices (e.g., PC monitors) on small form factor screens of handheld devices. In particular, a need exists for a system and method for permitting a handheld device user to easily navigate material available over a network, such as an Internet web site. [0010]
  • Pursuant to one embodiment of the present invention, a method of ranking entries in a table of contents for display at a client device includes transmitting a first document from an application server over a network, such as the Internet, to the client device. The first document includes text and at least one link. The application server then receives a request for a second document associated with the link from the client device. Next, the application server divides the second document into subdocuments and assigns a label to each of a plurality of the subdocuments. The application server also performs a comparison of the text of the first document with the text of each of the plurality of subdocuments to generate a document-document value for each of the plurality of subdocuments according to the degree of association between the first document and the corresponding one of the subdocuments. After performing this comparison, the application server ranks the plurality of subdocuments based, at least in part, on the document-document values. [0011]
  • In another embodiment, the application server performs a comparison of the text of the link with the text of each of the subdocuments to generate a link-text value for each of the plurality of subdocuments according to the degree of association between the first document and the corresponding one of the subdocuments. After performing this comparison, the application server ranks the plurality of subdocuments based, at least in part, on the link-text values. [0012]
  • In yet another embodiment, the application server performs a comparison of the text of the link with the label assigned to each of the plurality of subdocuments to generate a link-label value for each of the plurality of subdocuments according to the degree of association between the first document and the corresponding one of the subdocuments. After performing this comparison, the application server ranks the plurality of subdocuments based, at least in part, on the link-label values. [0013]
  • In still another embodiment, the application server generates a size value indicative of an amount of text in each of the plurality of subdocuments for each of the plurality of subdocuments. After generating a size value for each of the plurality of subdocuments, the application server ranks the plurality of subdocuments based, at least in part, by the size value. [0014]
  • In this manner, subdocuments likely to be relevant to the first document, the selected link, or both, are listed at or near the top of a table of contents to facilitate user selection of the same. Hence, users may easily follow a text that spans multiple documents by having table of contents of a requested page list the subdocuments containing continuing portions of the text listed at or near the top of the table of contents. [0015]
  • Additional details regarding the present system and method may be understood by reference to the following detailed description when read in conjunction with the accompanying drawings.[0016]
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • FIG. 1 is a block diagram of a document delivery system in accordance with one embodiment of the present invention. [0017]
  • FIG. 2 is a block diagram of the formatter of FIG. 1 in accordance with one embodiment of the present invention. [0018]
  • FIG. 3 is a block diagram of the mapper of FIG. 2 in accordance with one embodiment of the present invention. [0019]
  • FIG. 4 illustrates a tree data structure in accordance with one embodiment of the present invention. [0020]
  • FIG. 5 is a block diagram of the control module of FIG. 2 in accordance with one embodiment of the present invention. [0021]
  • FIG. 6 is a flowchart illustrating a method in accordance with one embodiment of the present invention. [0022]
  • FIG. 7 is a flowchart illustrating a method in accordance with another embodiment of the present invention. [0023]
  • FIG. 8 illustrates a progression of material displayed at the display of the client device of FIG. 1 in accordance with an embodiment of the present invention.[0024]
  • Common reference numerals are used throughout the drawings and detailed description to indicate like elements. [0025]
  • DETAILED DESCRIPTION
  • FIG. 1 illustrates a [0026] document delivery system 100 in accordance with one embodiment of the present invention. The document delivery system 100 permits a client 102 to access content of documents (not shown) stored at server 104, server 106, or other servers 108 over a network 110, such as the Internet, and over a network 111, such as an intranet.
  • In one embodiment, the [0027] client 102 comprises a handheld device, such a PDA (Personal Digital Assistant), a mobile telephone, or the like, having a small form factor display 112. The client 102 also includes a web browser 114. The web browser 114 may comprise a microbrowser designed for small display screens on web-enabled cellular telephones, PDAs and other handheld devices, including wireless handheld devices.
  • The [0028] client 102 may exchange data with the network 110 in a wireless fashion via a wireless station 120 and a gateway 122 in accordance with WAP (Wireless Application Protocol), i-Mode, or other suitable protocol or service. Optionally, the client 102 may exchange data with the network 110 via a wired connection (not shown).
  • The [0029] client 102 may also exchange data with the network 111 in a wireless fashion via a wireless station 121 and a gateway 123 in accordance with WAP (Wireless Application Protocol), i-Mode, or other suitable protocol or service for delivery of web content to small display devices. Optionally, the client 102 may exchange data with the network 111 via a wired connection (not shown).
  • In one embodiment, the [0030] gateways 122, 123 are network devices that connect a wireless network with a wired network, such as the networks 110, 111. Access between the client 102 and application server 124 may also pass through one or more other firewalls (not shown), other gateway devices (not shown), or the like.
  • Pursuant to one embodiment, the [0031] client 102 transmits requests for documents stored on one or more of the servers 104, 106, 108 to the application server 124. The request for content may comprise an HTTP request or other suitable type of request. Moreover, the application server 124 may alternatively receive the request for a document from the client 102 from any network (e.g., 110, 111). The application server 124, among other functionality, functions as a proxy server and receives requests for documents from client devices, such as the client 102, over the networks 110, 111 and provides associated content in response to such requests by transmitting the associated content over at least one of the networks 110, 111.
  • In response to a request for a document from the [0032] client 102, the application server 124 requests the document identified by the request from one or more of the servers 104, 106, 108. Upon receipt of the document identified by the request, the application server 124 modifies the format of the document identified by the request for content using a formatter 126.
  • In one embodiment, the document identified by the request is an HTML or XML web page, although other document types, such as PDF (Portable Document Format), may also be requested. The [0033] application server 124 then transmits at least a portion of the reformatted content of the document identified by the request to the client 102 in a format compatible with the browser 114 for display at the display 112 of the client 102.
  • The [0034] formatter 126 includes a database (see FIG. 5) that may be configured from a client admin computer 140 via a database modifier 128. The database modifier 128 may comprise a JavaScript module that permits a user at the client admin computer to visually modify a data structure of a document into a desired format. The modification may be performed by, for example, adding labels, re-ordering, moving, deleting, or otherwise changing portions of the data structure and stores the changed, or modified version of the data structure in the database.
  • In particular, the [0035] client admin computer 140 includes a web browser 142, such as Internet Explorer™ by Microsoft Corporation or other suitable web browser for permitting a user at the client admin computer 140 to view pages at the database modifier 128 hosted at the application server 124. The pages at the database modifier 128 of the application server 124 permit user configuration of the FIG. 5 database, as discussed in more detail below.
  • In general, the [0036] formatter 126 receives the document identified by the request from one of the servers 104, 106, 108, divides the document into multiple blocks, and assigns labels to individual blocks. The formatter 126 then generates a list containing the content of the various blocks. If a data structure associated with the document is stored in the database, the formatter 126 then uses the data structure to generate output files from the generated list of content. The output file may contain a Table of Contents (TOC) page and subdocuments. The TOC page lists labels associated with the subdocuments and may contain links to the subdocuments. The formatter 126 then transmits the TOC page, a headline, an image, or other content specified by a database at the application server 124 to the client 102 over at least one of the networks 110, 111. Details of the operation of the formatter 126 are discussed in more detail below.
  • FIG. 2 illustrates details of the [0037] formatter 126 of FIG. 1 according to one embodiment of the invention. As shown, the formatter 126 includes a mapper 202, and a control module 206, which may comprise software written in C++ or other suitable programming language. The mapper 202 receives the requested document and reformats the document as a list of document content 204. The control module 206 then generates an output file using the list document content 204. Additional details regarding the mapper 202, the list of document content 204, and the control module 206 are discussed below.
  • FIG. 3 illustrates details of the [0038] mapper 202 of FIG. 2 according to one embodiment of the invention. The mapper 202 includes a number of software modules stored in a computer readable medium. In particular, the mapper 202 includes a network interface 302, a parser 304, a label engine 306, a data structure converter 308, and a ranking engine 310. The network interface 302 receives the document requested from the network. As mentioned above, the document requested may comprise a web page, such as an HTML document, and XML document, or the like.
  • The [0039] parser 304 parses and decomposes the document into a tree data structure. FIG. 4 illustrates an example tree data structure 400, which may comprise a structural representation of a document, such as an HTML web page. As shown, the tree data structure 400 includes a root node 402 associated with the document. The parser 304 (FIG. 3) divides the document into multiple blocks and represents each block of the document as a table node 404 in the tree data structure 400. Each table node 404 has at least one row node 406 as a child node. Individual row nodes 406 each have at least one column node 408 as a child node. The column nodes 408 may then have additional table nodes as children. At this point, the tree data structure 400 may be recursive.
  • Thus, the document is divided into blocks, which may be defined by the structure of the document. The primary content for each of the blocks, or tables, is stored in the [0040] column nodes 408 and the remaining structure of the various blocks is represented in the other portions of the tree data structure 400.
  • Referring again to FIG. 3, the [0041] label engine 306 then assigns labels to individual blocks and may assign a classification to each block according to the contents of the block. In one embodiment, the label engine 306 assigns a classification to each block based on the block contents. For example, if the document is a web page, the web page may include links, text, forms, and pictures, as well as other classes of content.
  • The [0042] label engine 306 optionally analyzes individual blocks and assigns a classification to the block indicating the type, or class, of content in the block. Hence, a block that contains primarily links may be assigned a “navigation” classification, a block that contains primarily text may be assigned a “story” classification, a block that contains primarily pictures may be assigned an “image” classification, and a block that contains form information like an address may be assigned a “form” classification. The label engine 306 inserts a classifier associated with the assigned classification for each block into the table node of each block.
  • After classifying the blocks, the [0043] label engine 306 optionally merges, or combines, column nodes of each block that have the same classification. For example, if a given block has multiple column nodes having the classification of “story,” the label engine 306 may merge, or combine, the content of these column nodes. Likewise, if a given block has multiple columns having the classification of “navigation,” the label engine 306 may merge, or combine, the content of these column nodes.
  • In one embodiment, the [0044] label engine 306 may merge, or combine, column nodes in accordance with predetermined merging rules stored at the label engine 306. An example merging rule is that a large “story” node is not merged with another large “story” node. Another example merging rule is that a small “story” node may get merged with a “navigation” node. Thus, according to these rules, a large story, which is likely to be substantial enough to be viewed in isolation, will not be combined with another large story. However, a small story would not be isolated. Rather, the user experience may be improved by merging other nodes, such as a small “navigation” node or a small “story” node. The specifics of these merging rules may vary and may be customized according to particular applications. The classifying and merging are optional according to some embodiments of the invention.
  • The [0045] label engine 306 also assigns a label to each block according to the block contents. In one embodiment, the label engine 306 uses the first several words of text of a block including text as the label for that block. In another embodiment, the label engine 306 assigns a label to a block based on the classification of the block. The label engine 306 then adds the assigned label to the table node of the associated block.
  • With continued reference to FIG. 3, a [0046] data structure converter 308 of the mapper 202 next “flattens” the tree data structure by converting the tree data structure into a linear, one-dimensional list containing the content of the column nodes 408. The table nodes 404 and the row nodes 406 are not included in the one-dimensional list. Individual entries in the one-dimensional list include the content of an associated column nodes 408.
  • A [0047] ranking engine 310 then ranks the entries in the one-dimensional list according to the content of the individual entries. In one embodiment, the ranking engine 310 analyzes characteristics of each entry and assigns a “weight” value to each entry. The weight assigned to each entry may be based on a variety of parameters. These parameters may include, for example, the size of the font used in the entry, whether the text in the entry is boldface, the color of the text, whether the text is flashing, whether the text is underlined, and the position of the item in the document.
  • The [0048] ranking engine 310 may also generate a size value indicative of an amount of text in each of the plurality of subdocuments. Pursuant to this embodiment, the size value may be larger for subdocuments comprising large amounts of text and the size value may be smaller for subdocuments comprising smaller amounts of text. Ranking the entries in the table of contents, at least in part, according to the size value tends to make entries associated with larger amount of text appear higher on the list of entries in the table of contents (i.e., or more important or more relevant).
  • In one embodiment, the weight assigned to each entry may also depend on the content of the link leading to the document, the text of the previous document, the text of the subdocument associated with the entry, the text of the label associated with the entry, or a combination of these. Additional details regarding this embodiment are described below with reference to FIGS. 7 and 8. Based on parameters such as these, the [0049] ranking engine 310 assigns a weight to individual entries in the one-dimensional list and then re-orders the one-dimensional list according to the weighted rankings.
  • In one embodiment, the [0050] ranking engine 310 reorders the list in an order of decreasing weight values such that the first entry in the re-ordered list is the entry having the largest weight value and the last entry in the list the entry having the smallest weight value. The re-ordered list is then stored as the list of document content 204 (FIG. 2). Thus, in some embodiments, entries having large or bold text may be ranked before entries having smaller or plain text. Also, entries having a graphic may be ranked higher than entries having primarily links.
  • FIG. 5 illustrates details of the [0051] control module 206 of FIG. 2 in accordance with one embodiment of the present invention. In general, the control module 206 receives the list of document content 204 and creates a new document structure according to a navigation rules database 502 and the list of document content 204.
  • The navigation rules [0052] database 502 contains a tree data structure for one or more documents. In one embodiment, contents of the navigation rules database 502 may be modified by accessing the formatter 126 (FIG. 1) from a client computer, such as the client admin computer 140 (FIG. 1). The database modifier 128 may modify the contents of the navigation rules database 502 described above.
  • In particular, the [0053] client admin computer 140 includes browser 142 and permits a user to access the database modifier 128 and to modify the contents of the navigation rules database 502. To modify the contents of the navigation rules database 502, a user at the client admin computer 140 directs the browser 142 to the database modifier 128. The database modifier 128 then presents the user with a GUI (Graphical User Interface) via the browser 142 that permits the user to view a default tree data structure, as constructed by the mapper 202, for a given document, such as an HTML or XML web page document. The default tree structure may be the structure of the document at issue as determined by parsing the document.
  • The user may then delete entries in the tree data structure. The user may alternatively move tree data structure entries from one location to another within the tree data structure. Further, the user may change the label or classification assigned to given nodes within the tree data structure. After the user has thus modified, or customized, the tree data structure, the [0054] control module 206 stores the modified tree data structure as an entry in the navigation rules database 502 associated with the document.
  • The [0055] control module 206 also includes a URL (Uniform Resource Locator) checker 504. The URL checker 504 receives the list of document content 204 from the mapper 302 and determines whether the navigation rules database 502 includes a tree data structure associated with the list of document content 204. In one embodiment, the URL checker determines whether the URL associated with the list of document content 204 matches a URL associated with an entry in the navigation rules database 502. If such a match exists, an output file generator 506 retrieves the tree data structure in the navigation rules database 502 associated with the list of document content 204. The output file generator 506 then creates one or more output files 508 based on the retrieved tree data structure using the content of list of document content 204.
  • The output files [0056] 508, in one embodiment, include a table of contents (TOC) page that lists the labels of the document. The output files 508 also contain one or more subdocuments. Individual sub-pages are associated with individual entries in the TOC. One or more of the labels, or entries, of the TOC may include links to associated subdocuments.
  • If the [0057] URL checker 504 determines that the navigation rules database 502 does not include a tree data structure associated with the list of document content 204, then the output file generator 506 generates an output files 508 that include a TOC page that lists the labels of the document. One or more of the labels, or entries, of the TOC may include links to associated subdocuments.
  • The [0058] formatter 126 then transmits the TOC page over at least one of the networks 110, 111 to the client 102. Upon receipt of the TOC page at the client 102, the client 102 displays the TOC page at the display 112 of the client 102. The user may then select a link associated with one of the entries of the TOC, which requests an associated subdocument from the output files 508. In response to a request for a subdocument in the output files 508, the formatter transmits the requested subdocument to the client 102 over at least one of the networks 110, 111 for display at the display 112 of the client 102.
  • FIG. 6 illustrates a [0059] flowchart 600, which depicts a method according to one embodiment of the present invention. The method commences at block 602 where application server 124 receives a request for document from the client 102 (FIG. 1), the requested document residing on at least one of the servers 104, 106, 108. The request for document may be directed to the application server 124 directly. Alternatively, the request for document may be directed directly to one of the servers 104, 106, 108, which, in turn, redirects the request for document to the application server 124. The request for document may comprise an HTTP request or other suitable request. Moreover, the requested document may comprise a document in HTML, XML, PDF, or other suitable format.
  • Next, at [0060] block 604, the application server 124 retrieves the requested document from one or more of the servers 104, 106, 108 on which the document resides. This retrieval may be accomplished by the application server 124 transmitting an HTTP request to the server 104, 106, 108 at which the requested document is stored. For example, if the requested document resides at the server 104, the application server 124 requests the document from the server 104 over the network 110 and receives the requested document over the network 110.
  • Then, at [0061] block 606, the formatter 126 of the application server 124 extracts a structure of the retrieved document. In one embodiment, a parser 304 (FIG. 3) parses the retrieved document and generates a tree data structure representing the structure of the retrieved document. An example of such a tree data structure is illustrated in FIG. 4 and is described above.
  • For individual nodes of the tree data structure that include document content, the [0062] formatter 126 next analyzes the content of the nodes and assigns one of a set of predefined classifiers to each of the nodes based on the content of the nodes, pursuant to block 608. As discussed above, for a node having content comprising primarily text, the label engine 306 of the formatter 126 may assign a “story” classifier to the node. The classifier may comprise a text string or other identifier added to the node.
  • At [0063] block 610, the label engine 306 of the formatter 126 assigns labels to individual nodes of the tree data structure that include document content. The label engine 306 may assign a label based on the content of the node, the assigned classification of the node, or both. In one embodiment, the label engine 306 uses the first several words of nodes having text content as the label for the associated node. The label may indicate the content of the node being labeled.
  • At [0064] block 612, the label engine 306 merges nodes having content according to their classification. For example, if a pair of nodes having content both have the classification “navigation,” then the label engine 306 merges the content of these nodes to form a single node that includes the content of the merged nodes. Block 612 may alternatively, or additionally, be performed after block 616. In one embodiment, the merging is performed before and after ranking.
  • At [0065] block 614, the data structure converter 308 of the mapper 202 converts the tree data structure to a list. The data structure converter 308 extracts the nodes of the tree data structure that include content and generates a list comprising the nodes of the tree data structure that include content, without the other associated nodes, such as table and row nodes, which do not include content.
  • Next, at [0066] block 616, the ranking engine 310 (FIG. 3) of the mapper 202 reorders the entries of the list generated at block 614. In one embodiment, the ranking engine 310 assigns a weight value to each of the entries in the list according to certain parameters of the content of the entries, the classification of the list entry, or a combination thereof. Then, the ranking engine 310 reorders the list according to the weight value of the list entries. For example, the ranking engine 310 may order the list entries in order of decreasing weight value. The ranking engine 310 then stores the re-ordered list as the list of document content 204 (FIG. 2).
  • The control module [0067] 206 (FIG. 5) then determines whether the navigation rules database 520 includes an entry associated with the list of document content 204, pursuant to block 618. In one embodiment, the URL checker 504 of the control module 206 determines whether a URL associated with the list of document content 204 matches a URL associated with an entry in the navigation rules database 502. The URL checker 504 determines that the navigation rules database 502 contains an entry associated with the list of document content if such a match exists and execution proceeds to block 620, else execution proceeds to block 622.
  • At [0068] block 620, the output file generator 506 creates a new data tree structure using the list of document content 204 and the associated entry of the navigation rules database 502. The entry of the navigation rules database 502 may specify labels to be assigned to the various nodes, the location of the various nodes within the new data tree structure, and whether certain nodes are included in the new data tree structure. The output file generator 506 then creates a new data tree structure according to the entry in the navigation rules database 502 and inserts the associated content from the list of document content 204 to form a new data tree, which may be stored as the output files 508.
  • At [0069] block 622, the output file generator 506 stores the new data tree structure as the output files 508 if the navigation rules database 502 contains as entry associated with the list of document content 204. Otherwise, the output file generator 506 stores the list of document content as the output files 508 or processes the list of document content from memory. Moreover, the output file generator 506 may generate device-specific output.
  • The output files [0070] 508 include a table of contents (TOC) page that lists the labels of the nodes having content and subdocuments that include the content of blocks associated with the labels. Each of the subdocuments is associated with one of the links so that a user at the client 102 may request a subdocument by selecting the link associated therewith.
  • Lastly, pursuant to block [0071] 624, the formatter 126 transmits the TOC page to the client 102.
  • FIGS. 7 and 8 illustrate details of one embodiment of the operation of the [0072] ranking engine 310 described above and illustrated in FIG. 3. Since, according to some embodiments, each document is analyzed individually and independently, when a body of text is followed from one document to another, tracking the body of text is a consideration for the ease of reading the body of text and navigating a set of documents. Indeed it is common for a story to begin on a first document and extend to a second document. Hence, in some applications, it may be desirable to facilitate identification of the continuing portion of the story within the second document, which may be divided into multiple subdocuments.
  • With reference to FIG. 8, the [0073] display 112 of the client device 102 (FIG. 1) displays a subdocument 802 containing text 806 and one or more links 804. The link 804 is a selectable connection (e.g., a hyperlink) from a word, a set of words, or other information object, to another. One implementation of the link 804 is a highlighted set of words, or text, that can be selected by a user, such as with a mouse or by touch-screen control, resulting in the immediate delivery and view of another file. The highlighted text may be referred to as an anchor.
  • FIG. 7 is a flowchart illustrating a method in accordance with one embodiment of the present invention. FIG. 8 illustrates an example sequence of material displayed at the display [0074] 112 (see, FIG. 1). In general, user selection of the link 804 causes the client 102 to transmit a request for an associated file, such as a document, from the application server 124. As discussed above with reference to FIG. 6, when a document is thus requested, the application server124 generates a table of contents page 810, including a list of labels, with each label being associated with a subdocument.
  • It is desirable in some applications that the label associated with the selected [0075] link 804 be at or near the top of the list of labels in the table of contents page 810 to facilitate navigation and to permit the user to easily locate the label associated with the selected link. Thus, it is desirable that the label 812 of the table of contents page 810 be associated with the selected link to permit the user to quickly and easily identify the subdocument associated with the selected link 804. The user may then select the label 812, which comprises a link to the subdocument 820 containing the text 822.
  • Referring to FIGS. 7 and 8, the user at a client [0076] 102 (FIG. 1) views a subdocument 802 at a display 112 of the client 102. As shown in FIG. 8, the subdocument 802 includes text 806 and one or more links 804. Pursuant to block 701 of FIG. 7, the user selects one of the links 804 of the subdocument 802.
  • The user selection of the [0077] link 804 pursuant to block 701 causes the client 102 to transmit a request for a document associated with the link 804 selected by the user. Pursuant to block 702, the application server 124 receives the request for document from the client 102 (FIG. 1), the requested document residing on at least one of the servers 104, 106, 108. The request for document may be directed to the application server 124 directly or to one of the servers 104, 106, 108, which, in turn, redirects the request for document to the application server 124.
  • Next, at block [0078] 704, the application server 124 retrieves the requested document from one or more of the servers 104, 106, 108 on which the document resides. This retrieval may be accomplished as described above. At block 706, the formatter 126 of the application server 124 extracts a structure of the retrieved document as described above.
  • For individual nodes of the tree data structure that include document content, the [0079] formatter 126 next analyzes the content of the nodes and assigns one of a set of predefined classifiers to each of the nodes based on the content of the nodes, pursuant to block 708 as discussed above. At block 710, the label engine 306 of the formatter 126 assigns labels to individual nodes of the tree data structure that include document content as discussed above. At block 712, the label engine 306 merges nodes having content according to their classification and, at block 714, the data structure converter 308 of the mapper 202 converts the tree data structure to a list, as discussed above.
  • At [0080] block 716, the ranking engine 310 (FIG. 3) compares the text 806 of the previous subdocument 802 to each of the subdocuments of the requested document using conventional document, or text, matching techniques to determine the extent to which the previous subdocument is associated with each of the subdocuments of the requested document. The ranking engine 310 may employ an n-dimensional vector matching technique for comparing the text of the previous subdocument 802 to each of the subdocuments of the requested document. Modern Information Retrieval, by R. Baeza-Yates, et al, published by Addison-Wesley Pub Co; 1999, ISBN: 020139829X, discloses related techniques and is incorporated herein by reference.
  • In comparing the [0081] text 806 of the previous subdocument 802 to each of the subdocuments of the requested document, the ranking engine 301 generates a document/document value for each of the subdocuments of the requested document. The document/document value indicates the degree to which there is an association between the text 806 of the previous subdocument 802 to each of the subdocuments of the requested document. For example, if the text 806 of the subdocument 802 included the terms such as “XYZ,” “merger,” “corporate,” “shareholders” and the like, the ranking engine 301 would assign a higher degree of association, and thus either a higher or lower document/document value, to subdocuments in the requested page that include the same or similar terms.
  • At [0082] block 718, the ranking engine 310 (FIG. 3) compares the text of the selected link 804 to each of the subdocuments of the requested document. For example, if the selected link 804 comprised the text “XYX merger” the ranking engine 310 would determine the degree to which the text “XYZ merger” is present in each of the subdocuments of the requested document. The ranking engine 301 generates a link/document value for each of the subdocuments of the requested document. The link/document value indicates the degree to which the text of the selected link is present in each of the subdocuments of the requested document.
  • At [0083] block 720, the ranking engine 310 compares the text of the selected link 804 to each of the labels of the requested document. For example, if the selected link 804 comprised the text “XYZ merger” the ranking engine 310 would determine the degree to which the text “XYZ merger” is present in each of the labels assigned to the requested document and would generate a link/label value for each of the subdocuments of the requested document. The link/label value indicates the degree to which the text of the selected link and the subdocuments of the requested document are related.
  • The [0084] ranking engine 310 may also use additional factors in reordering the list entries. For example, the ranking engine may generate a size value indicative of an amount of text in each of the plurality of subdocuments. Pursuant to this embodiment, the size value may be larger for subdocuments comprising large amounts of text and the size value may be smaller for subdocuments comprising smaller amounts of text. Ranking the entries in the table of contents, at least in part, according to the size value tends to make entries associated with larger amount of text appear higher on the list of entries in the table of contents.
  • At [0085] block 722, the ranking engine 310 reorders the list entries according to the document/document value, the link/document value, the link/label value or a combination of these values. In one example embodiment, the ranking engine 310 assigns a weight to each of the document/document, link/document, and link/label values and then combines the weighted values to determine the reordering of the list entries. Pursuant to another embodiment, the ranking engine 310 reorders the list entries according to one or more of the document/document, link/document, and link/label values and other factors, including, for example, amount of content in the subdocument, the size of the font used in the subdocument, whether the text in the subdocument is boldface, the color of the text in the subdocument, whether the text of the subdocument is flashing, and the position of the item in the document.
  • After the [0086] ranking engine 310 has reordered the list entries, execution returns to block 618 of the flowchart of FIG. 6 as described above. Performing one or more of the blocks 716, 718, 720 together with the block 722 improves user navigation. In particular, this functionality increases the probability that the label listed at or near the top of the table of contents 810 will be associated with the selected link, the text of the subdocument including the selected link, or both.
  • The above-described embodiments of the present invention are meant to be merely illustrative and not limiting. Thus, those skilled in the art will appreciate that various changes and modifications may be made without departing from this invention in its broader aspects. Therefore, the appended claims encompass such changes and modifications as fall within the scope of this invention. [0087]

Claims (20)

What is claimed is:
1. A method of ranking entries in a table of contents, the method comprising:
transmitting a first document to a client device, the first subdocument including a text and a link;
receiving from the client device a request for a second document associated with the link;
dividing the second document into subdocuments, each of the subdocuments including text;
assigning a label to each of a plurality of the subdocuments;
comparing the text of the first document with the text of each of the subdocuments to generate a document-document value for each of the plurality of subdocuments according to the degree of association between the first document and the corresponding one of the subdocuments;
ranking in ranked order the plurality of subdocuments based, at least in part, on the document-document values.
2. The method of claim 1, further comprising transmitting a table of contents page listing the labels associated with the plurality of subdocuments in the ranked order.
3. The method of claim 1, wherein the link comprises text, the method further comprising:
comparing the text of the link with the text of each of the subdocuments to generate a link-text value for each of the plurality of subdocuments according to the degree of association between the first document and the corresponding one of the subdocuments;
ranking in ranked order the plurality of subdocuments based, at least in part, on the link-text values.
4. The method of claim 1, wherein the link comprises text, the method further comprising:
comparing the text of the link with the label assigned to each of the plurality of subdocuments to generate a link-label value for each of the plurality of subdocuments according to the degree of association between the first document and the corresponding one of the subdocuments;
ranking in ranked order the plurality of subdocuments based, at least in part, on the link-label values.
5. The method of claim 1, further comprising;
displaying the first document at the client device;
detecting user selection of the link;
transmitting the request for a second document associated with the link in response to the detecting user selection of the link.
6. The method of claim 1, further comprising:
generating a size value indicative of an amount of text in each of the plurality of subdocuments;
ranking in ranked order the plurality of subdocuments based, at least in part, by the size value.
7. The method of claim 3, further comprising:
generating a size value indicative of an amount of text in each of the plurality of subdocuments;
ranking in ranked order the plurality of subdocuments based, at least in part, by the size value.
8. The method of claim 4, further comprising:
generating a size value indicative of an amount of text in each of the plurality of subdocuments;
ranking in ranked order the plurality of subdocuments based, at least in part, by the size value.
9. The method of claim 3, wherein the link comprises text, the method further comprising:
comparing the text of the link with the label assigned to each of the plurality of subdocuments to generate a link-label value for each of the plurality of subdocuments according to the degree of association between the first document and the corresponding one of the subdocuments;
ranking in ranked order the plurality of subdocuments based, at least in part, on the link-label values.
10. The method of claim 1, further comprising modifying the ranked order according to a predefined order defined by a data structure.
11. A method of ranking entries in a table of contents, the method comprising:
transmitting a first document to a client device, the first subdocument including a text and a link;
receiving from the client device a request for a second document associated with the link;
dividing the second document into subdocuments, each of the subdocuments including text;
assigning a label to each of a plurality of the subdocuments;
comparing the text of the link with the text of each of the subdocuments to generate a link-text value for each of the plurality of subdocuments according to the degree of association between the first document and the corresponding one of the subdocuments;
ranking in ranked order the plurality of subdocuments based, at least in part, on the link-text values.
12. The method of claim 11, wherein the link comprises text, the method further comprising:
comparing the text of the link with the label assigned to each of the plurality of subdocuments to generate a link-label value for each of the plurality of subdocuments according to the degree of association between the first document and the corresponding one of the subdocuments;
ranking in ranked order the plurality of subdocuments based, at least in part, on the link-label values.
13. The method of claim 11, further comprising;
displaying the first document at the client device;
detecting user selection of the link;
transmitting the request for a second document associated with the link in response to the detecting user selection of the link.
14. The method of claim 11, further comprising:
generating a size value indicative of an amount of text in each of the plurality of subdocuments;
ranking in ranked order the plurality of subdocuments based, at least in part, by the size value.
15. A method of ranking entries in a table of contents, the method comprising:
transmitting a first document to a client device, the first subdocument including a text and a link;
receiving from the client device a request for a second document associated with the link;
dividing the second document into subdocuments, each of the subdocuments including text;
assigning a label to each of a plurality of the subdocuments;
comparing the text of the link with the label assigned to each of the plurality of subdocuments to generate a link-label value for each of the plurality of subdocuments according to the degree of association between the first document and the corresponding one of the subdocuments;
ranking in ranked order the plurality of subdocuments based, at least in part, on the link-label values.
16. The method of claim 15, further comprising;
displaying the first document at the client device;
detecting user selection of the link;
transmitting the request for a second document associated with the link in response to the detecting user selection of the link.
17. The method of claim 16, further comprising:
generating a size value indicative of an amount of text in each of the plurality of subdocuments;
ranking in ranked order the plurality of subdocuments based, at least in part, by the size value.
18. A method of ranking entries in a table of contents, the method comprising:
transmitting a first document to a client device, the first subdocument including a text and a link;
receiving from the client device a request for a second document associated with the link;
dividing the second document into subdocuments, each of the subdocuments including text;
comparing the text of the first document with the text of each of the subdocuments to generate a document-document value for each of the plurality of subdocuments according to the degree of association between the first document and the corresponding one of the subdocuments;
comparing the text of the link with the label assigned to each of the plurality of subdocuments to generate a link-label value for each of the plurality of subdocuments according to the degree of association between the first document and the corresponding one of the subdocuments;
comparing the text of the link with the text of each of the subdocuments to generate a link-text value for each of the plurality of subdocuments according to the degree of association between the first document and the corresponding one of the subdocuments;
ranking in ranked order the plurality of subdocuments based, at least in part, on the document-document, link-label, and link-text values.
19. The method of claim 18, further comprising:
generating a size value indicative of an amount of text in each of the plurality of subdocuments;
ranking in ranked order the plurality of subdocuments based, at least in part, by the size value.
20. A method of ranking entries in a table of contents, the method comprising:
transmitting a first document to a client device, the first subdocument including a text and a link;
receiving from the client device a request for a second document associated with the link;
dividing the second document into subdocuments, each of the subdocuments including text;
assigning a label to each of a plurality of the subdocuments; generating a size value indicative of an amount of text in each of the plurality of subdocuments;
ranking in ranked order the plurality of subdocuments based, at least in part, by the size value.
US10/142,393 2001-02-16 2002-05-08 Automatic display of web content to smaller display devices: improved summarization and navigation Abandoned US20040003028A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
US10/142,393 US20040003028A1 (en) 2002-05-08 2002-05-08 Automatic display of web content to smaller display devices: improved summarization and navigation
US11/510,467 US8983949B2 (en) 2001-02-16 2006-08-24 Automatic display of web content to smaller display devices: improved summarization and navigation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US10/142,393 US20040003028A1 (en) 2002-05-08 2002-05-08 Automatic display of web content to smaller display devices: improved summarization and navigation

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
US10/076,786 Continuation US20020129006A1 (en) 2001-02-16 2002-02-14 System and method for modifying a document format

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US11/510,467 Continuation US8983949B2 (en) 2001-02-16 2006-08-24 Automatic display of web content to smaller display devices: improved summarization and navigation

Publications (1)

Publication Number Publication Date
US20040003028A1 true US20040003028A1 (en) 2004-01-01

Family

ID=29778486

Family Applications (2)

Application Number Title Priority Date Filing Date
US10/142,393 Abandoned US20040003028A1 (en) 2001-02-16 2002-05-08 Automatic display of web content to smaller display devices: improved summarization and navigation
US11/510,467 Expired - Lifetime US8983949B2 (en) 2001-02-16 2006-08-24 Automatic display of web content to smaller display devices: improved summarization and navigation

Family Applications After (1)

Application Number Title Priority Date Filing Date
US11/510,467 Expired - Lifetime US8983949B2 (en) 2001-02-16 2006-08-24 Automatic display of web content to smaller display devices: improved summarization and navigation

Country Status (1)

Country Link
US (2) US20040003028A1 (en)

Cited By (40)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030158855A1 (en) * 2002-02-20 2003-08-21 Farnham Shelly D. Computer system architecture for automatic context associations
US20030167324A1 (en) * 2002-02-20 2003-09-04 Farnham Shelly D. Social mapping of contacts from computer communication information
US20040243936A1 (en) * 2003-05-30 2004-12-02 International Business Machines Corporation Information processing apparatus, program, and recording medium
US20060069982A1 (en) * 2004-09-30 2006-03-30 Microsoft Corporation Click distance determination
US20060074903A1 (en) * 2004-09-30 2006-04-06 Microsoft Corporation System and method for ranking search results using click distance
US20060074871A1 (en) * 2004-09-30 2006-04-06 Microsoft Corporation System and method for incorporating anchor text into ranking search results
US20060123000A1 (en) * 2004-12-03 2006-06-08 Jonathan Baxter Machine learning system for extracting structured records from web pages and other text sources
US20060136411A1 (en) * 2004-12-21 2006-06-22 Microsoft Corporation Ranking search results using feature extraction
US20060155703A1 (en) * 2005-01-10 2006-07-13 Xerox Corporation Method and apparatus for detecting a table of contents and reference determination
US20060200460A1 (en) * 2005-03-03 2006-09-07 Microsoft Corporation System and method for ranking search results using file types
US20060294100A1 (en) * 2005-03-03 2006-12-28 Microsoft Corporation Ranking search results using language types
US20070061415A1 (en) * 2001-02-16 2007-03-15 David Emmett Automatic display of web content to smaller display devices: improved summarization and navigation
US20070093241A1 (en) * 2005-10-21 2007-04-26 Lg Electronics Inc. Mobile communication terminal for providing contents and method thereof
US20070198912A1 (en) * 2006-02-23 2007-08-23 Xerox Corporation Rapid similarity links computation for tableof contents determination
EP1826684A1 (en) * 2006-02-23 2007-08-29 Xerox Corporation Table of contents extraction with improved robustness
US20080065671A1 (en) * 2006-09-07 2008-03-13 Xerox Corporation Methods and apparatuses for detecting and labeling organizational tables in a document
US20080244381A1 (en) * 2007-03-30 2008-10-02 Alex Nicolaou Document processing for mobile devices
US20080288859A1 (en) * 2002-10-31 2008-11-20 Jianwei Yuan Methods and apparatus for summarizing document content for mobile communication devices
US20090106235A1 (en) * 2007-10-18 2009-04-23 Microsoft Corporation Document Length as a Static Relevance Feature for Ranking Search Results
US20090106223A1 (en) * 2007-10-18 2009-04-23 Microsoft Corporation Enterprise relevancy ranking using a neural network
US20090106221A1 (en) * 2007-10-18 2009-04-23 Microsoft Corporation Ranking and Providing Search Results Based In Part On A Number Of Click-Through Features
US20090110268A1 (en) * 2007-10-25 2009-04-30 Xerox Corporation Table of contents extraction based on textual similarity and formal aspects
US20090259651A1 (en) * 2008-04-11 2009-10-15 Microsoft Corporation Search results ranking using editing distance and document information
US20100017403A1 (en) * 2004-09-27 2010-01-21 Microsoft Corporation System and method for scoping searches using index keys
US8302002B2 (en) 2005-04-27 2012-10-30 Xerox Corporation Structuring document based on table of contents
US8316315B2 (en) 2005-02-28 2012-11-20 Microsoft Corporation Automatically generated highlight view of electronic interactions
JP2013003694A (en) * 2011-06-14 2013-01-07 Kddi Corp Id assigning device, method and program
US20130262983A1 (en) * 2012-03-30 2013-10-03 Bmenu As System, method, software arrangement and computer-accessible medium for a generator that automatically identifies regions of interest in electronic documents for transcoding
US8738635B2 (en) 2010-06-01 2014-05-27 Microsoft Corporation Detection of junk in search result ranking
US9495462B2 (en) 2012-01-27 2016-11-15 Microsoft Technology Licensing, Llc Re-ranking search results
US10725645B2 (en) * 2013-05-20 2020-07-28 Rakuten, Inc. Information processing device for controlling display of web pages using main display area and sub display area
US11100070B2 (en) 2005-04-29 2021-08-24 Robert T. and Virginia T. Jenkins Manipulation and/or analysis of hierarchical data
US11204906B2 (en) 2004-02-09 2021-12-21 Robert T. And Virginia T. Jenkins As Trustees Of The Jenkins Family Trust Dated Feb. 8, 2002 Manipulating sets of hierarchical data
US11243975B2 (en) 2005-02-28 2022-02-08 Robert T. and Virginia T. Jenkins Method and/or system for transforming between trees and strings
US11281646B2 (en) 2004-12-30 2022-03-22 Robert T. and Virginia T. Jenkins Enumeration of rooted partial subtrees
US11314766B2 (en) 2004-10-29 2022-04-26 Robert T. and Virginia T. Jenkins Method and/or system for manipulating tree expressions
US11314709B2 (en) 2004-10-29 2022-04-26 Robert T. and Virginia T. Jenkins Method and/or system for tagging trees
US11418315B2 (en) 2004-11-30 2022-08-16 Robert T. and Virginia T. Jenkins Method and/or system for transmitting and/or receiving data
US11615065B2 (en) 2004-11-30 2023-03-28 Lower48 Ip Llc Enumeration of trees from finite number of nodes
US11663238B2 (en) 2005-01-31 2023-05-30 Lower48 Ip Llc Method and/or system for tree transformation

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060085740A1 (en) * 2004-10-20 2006-04-20 Microsoft Corporation Parsing hierarchical lists and outlines
US8509563B2 (en) * 2006-02-02 2013-08-13 Microsoft Corporation Generation of documents from images
US7793216B2 (en) * 2006-03-28 2010-09-07 Microsoft Corporation Document processor and re-aggregator
US8196030B1 (en) * 2008-06-02 2012-06-05 Pricewaterhousecoopers Llp System and method for comparing and reviewing documents
US8984395B2 (en) * 2008-06-19 2015-03-17 Opera Software Asa Methods, systems and devices for transcoding and displaying electronic documents
CN101944104A (en) * 2010-08-19 2011-01-12 百度在线网络技术(北京)有限公司 Evaluation method and equipment for importance of webpage sub-blocks
KR101974867B1 (en) * 2012-08-24 2019-08-23 삼성전자주식회사 Apparatas and method fof auto storage of url to calculate contents of stay value in a electronic device
US8504827B1 (en) 2013-02-27 2013-08-06 WebFilings LLC Document server and client device document viewer and editor
US10042924B2 (en) 2016-02-09 2018-08-07 Oath Inc. Scalable and effective document summarization framework

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5737442A (en) * 1995-10-20 1998-04-07 Bcl Computers Processor based method for extracting tables from printed documents
US5983267A (en) * 1997-09-23 1999-11-09 Information Architects Corporation System for indexing and displaying requested data having heterogeneous content and representation
US6104500A (en) * 1998-04-29 2000-08-15 Bcl, Computer Inc. Networked fax routing via email
US6298357B1 (en) * 1997-06-03 2001-10-02 Adobe Systems Incorporated Structure extraction on electronic documents
US6336124B1 (en) * 1998-10-01 2002-01-01 Bcl Computers, Inc. Conversion data representing a document to other formats for manipulation and display
US20020059367A1 (en) * 2000-09-27 2002-05-16 Romero Richard D. Segmenting electronic documents for use on a device of limited capability
US20020062325A1 (en) * 2000-09-27 2002-05-23 Berger Adam L. Configurable transformation of electronic documents
US20020073235A1 (en) * 2000-12-11 2002-06-13 Chen Steve X. System and method for content distillation
US6438575B1 (en) * 2000-06-07 2002-08-20 Clickmarks, Inc. System, method, and article of manufacture for wireless enablement of the world wide web using a wireless gateway
US6457030B1 (en) * 1999-01-29 2002-09-24 International Business Machines Corporation Systems, methods and computer program products for modifying web content for display via pervasive computing devices
US20050055420A1 (en) * 2000-02-01 2005-03-10 Infogin Ltd. Methods and apparatus for analyzing, processing and formatting network information such as web-pages

Family Cites Families (24)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6070133A (en) * 1997-07-21 2000-05-30 Battelle Memorial Institute Information retrieval system utilizing wavelet transform
US5907840A (en) * 1997-07-25 1999-05-25 Claritech Corporation Overlapping subdocuments in a vector space search process
US5995962A (en) * 1997-07-25 1999-11-30 Claritech Corporation Sort system for merging database entries
US5999925A (en) * 1997-07-25 1999-12-07 Claritech Corporation Information retrieval based on use of sub-documents
US6278990B1 (en) * 1997-07-25 2001-08-21 Claritech Corporation Sort system for text retrieval
US5893094A (en) * 1997-07-25 1999-04-06 Claritech Corporation Method and apparatus using run length encoding to evaluate a database
US5999939A (en) * 1997-12-21 1999-12-07 Interactive Search, Inc. System and method for displaying and entering interactively modified stream data into a structured form
US6857102B1 (en) * 1998-04-07 2005-02-15 Fuji Xerox Co., Ltd. Document re-authoring systems and methods for providing device-independent access to the world wide web
US6128655A (en) * 1998-07-10 2000-10-03 International Business Machines Corporation Distribution mechanism for filtering, formatting and reuse of web based content
US6421656B1 (en) * 1998-10-08 2002-07-16 International Business Machines Corporation Method and apparatus for creating structure indexes for a data base extender
US6125391A (en) * 1998-10-16 2000-09-26 Commerce One, Inc. Market makers using documents for commerce in trading partner networks
US6377957B1 (en) * 1998-12-29 2002-04-23 Sun Microsystems, Inc. Propogating updates efficiently in hierarchically structured date
US6473755B2 (en) * 1999-01-04 2002-10-29 Claritech Corporation Overlapping subdocuments in a vector space search process
US6549916B1 (en) * 1999-08-05 2003-04-15 Oracle Corporation Event notification system tied to a file system
US6812941B1 (en) * 1999-12-09 2004-11-02 International Business Machines Corp. User interface management through view depth
US6549221B1 (en) * 1999-12-09 2003-04-15 International Business Machines Corp. User interface management through branch isolation
AU2001229942A1 (en) * 2000-01-31 2001-08-14 Mobileq Canada Inc. Method and system for building internet-based applications
US7660819B1 (en) * 2000-07-31 2010-02-09 Alion Science And Technology Corporation System for similar document detection
US20020099739A1 (en) * 2001-01-03 2002-07-25 Herman Fischer Transformation and processing of Web form documents and data for small footprint devices
US20040003028A1 (en) * 2002-05-08 2004-01-01 David Emmett Automatic display of web content to smaller display devices: improved summarization and navigation
US7565605B2 (en) * 2001-05-08 2009-07-21 Nokia, Inc. Reorganizing content of an electronic document
US20040133635A1 (en) * 2002-11-26 2004-07-08 Axel Spriestersbach Transformation of web description documents
GB2401215A (en) * 2003-05-02 2004-11-03 David Nicholas Rousseau Digital Library System
US20070198952A1 (en) * 2006-02-21 2007-08-23 Pittenger Robert A Methods and systems for authoring of a compound document following a hierarchical structure

Patent Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5737442A (en) * 1995-10-20 1998-04-07 Bcl Computers Processor based method for extracting tables from printed documents
US5956422A (en) * 1995-10-20 1999-09-21 Bcl Computers, Inc. Processor based method for extracting tablets from printed documents
US6298357B1 (en) * 1997-06-03 2001-10-02 Adobe Systems Incorporated Structure extraction on electronic documents
US5983267A (en) * 1997-09-23 1999-11-09 Information Architects Corporation System for indexing and displaying requested data having heterogeneous content and representation
US6253239B1 (en) * 1997-09-23 2001-06-26 Information Architects Corporation System for indexing and display requested data having heterogeneous content and representation
US6104500A (en) * 1998-04-29 2000-08-15 Bcl, Computer Inc. Networked fax routing via email
US6336124B1 (en) * 1998-10-01 2002-01-01 Bcl Computers, Inc. Conversion data representing a document to other formats for manipulation and display
US6457030B1 (en) * 1999-01-29 2002-09-24 International Business Machines Corporation Systems, methods and computer program products for modifying web content for display via pervasive computing devices
US20050055420A1 (en) * 2000-02-01 2005-03-10 Infogin Ltd. Methods and apparatus for analyzing, processing and formatting network information such as web-pages
US6438575B1 (en) * 2000-06-07 2002-08-20 Clickmarks, Inc. System, method, and article of manufacture for wireless enablement of the world wide web using a wireless gateway
US20020059367A1 (en) * 2000-09-27 2002-05-16 Romero Richard D. Segmenting electronic documents for use on a device of limited capability
US20020062325A1 (en) * 2000-09-27 2002-05-23 Berger Adam L. Configurable transformation of electronic documents
US20020073235A1 (en) * 2000-12-11 2002-06-13 Chen Steve X. System and method for content distillation

Cited By (66)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8983949B2 (en) * 2001-02-16 2015-03-17 David Emmett Automatic display of web content to smaller display devices: improved summarization and navigation
US20070061415A1 (en) * 2001-02-16 2007-03-15 David Emmett Automatic display of web content to smaller display devices: improved summarization and navigation
US7167910B2 (en) 2002-02-20 2007-01-23 Microsoft Corporation Social mapping of contacts from computer communication information
US20030167324A1 (en) * 2002-02-20 2003-09-04 Farnham Shelly D. Social mapping of contacts from computer communication information
US8069186B2 (en) 2002-02-20 2011-11-29 Microsoft Corporation Computer system architecture for automatic context associations
US7761549B2 (en) 2002-02-20 2010-07-20 Microsoft Corporation Social mapping of contacts from computer communication information
US20080222170A1 (en) * 2002-02-20 2008-09-11 Microsoft Corporation Computer system architecture for automatic context associations
US7343365B2 (en) * 2002-02-20 2008-03-11 Microsoft Corporation Computer system architecture for automatic context associations
US20030158855A1 (en) * 2002-02-20 2003-08-21 Farnham Shelly D. Computer system architecture for automatic context associations
US20080288859A1 (en) * 2002-10-31 2008-11-20 Jianwei Yuan Methods and apparatus for summarizing document content for mobile communication devices
US8572482B2 (en) * 2002-10-31 2013-10-29 Blackberry Limited Methods and apparatus for summarizing document content for mobile communication devices
US7383496B2 (en) * 2003-05-30 2008-06-03 International Business Machines Corporation Information processing apparatus, program, and recording medium
US20040243936A1 (en) * 2003-05-30 2004-12-02 International Business Machines Corporation Information processing apparatus, program, and recording medium
US11204906B2 (en) 2004-02-09 2021-12-21 Robert T. And Virginia T. Jenkins As Trustees Of The Jenkins Family Trust Dated Feb. 8, 2002 Manipulating sets of hierarchical data
US8843486B2 (en) 2004-09-27 2014-09-23 Microsoft Corporation System and method for scoping searches using index keys
US20100017403A1 (en) * 2004-09-27 2010-01-21 Microsoft Corporation System and method for scoping searches using index keys
US7827181B2 (en) 2004-09-30 2010-11-02 Microsoft Corporation Click distance determination
US7761448B2 (en) 2004-09-30 2010-07-20 Microsoft Corporation System and method for ranking search results using click distance
US20060074903A1 (en) * 2004-09-30 2006-04-06 Microsoft Corporation System and method for ranking search results using click distance
US7739277B2 (en) 2004-09-30 2010-06-15 Microsoft Corporation System and method for incorporating anchor text into ranking search results
US20060069982A1 (en) * 2004-09-30 2006-03-30 Microsoft Corporation Click distance determination
US20060074871A1 (en) * 2004-09-30 2006-04-06 Microsoft Corporation System and method for incorporating anchor text into ranking search results
US8082246B2 (en) 2004-09-30 2011-12-20 Microsoft Corporation System and method for ranking search results using click distance
US11314709B2 (en) 2004-10-29 2022-04-26 Robert T. and Virginia T. Jenkins Method and/or system for tagging trees
US11314766B2 (en) 2004-10-29 2022-04-26 Robert T. and Virginia T. Jenkins Method and/or system for manipulating tree expressions
US11418315B2 (en) 2004-11-30 2022-08-16 Robert T. and Virginia T. Jenkins Method and/or system for transmitting and/or receiving data
US11615065B2 (en) 2004-11-30 2023-03-28 Lower48 Ip Llc Enumeration of trees from finite number of nodes
US20060123000A1 (en) * 2004-12-03 2006-06-08 Jonathan Baxter Machine learning system for extracting structured records from web pages and other text sources
US7716198B2 (en) * 2004-12-21 2010-05-11 Microsoft Corporation Ranking search results using feature extraction
US20060136411A1 (en) * 2004-12-21 2006-06-22 Microsoft Corporation Ranking search results using feature extraction
US11281646B2 (en) 2004-12-30 2022-03-22 Robert T. and Virginia T. Jenkins Enumeration of rooted partial subtrees
US8706475B2 (en) 2005-01-10 2014-04-22 Xerox Corporation Method and apparatus for detecting a table of contents and reference determination
US20060155703A1 (en) * 2005-01-10 2006-07-13 Xerox Corporation Method and apparatus for detecting a table of contents and reference determination
US11663238B2 (en) 2005-01-31 2023-05-30 Lower48 Ip Llc Method and/or system for tree transformation
US11243975B2 (en) 2005-02-28 2022-02-08 Robert T. and Virginia T. Jenkins Method and/or system for transforming between trees and strings
US8316315B2 (en) 2005-02-28 2012-11-20 Microsoft Corporation Automatically generated highlight view of electronic interactions
US20060294100A1 (en) * 2005-03-03 2006-12-28 Microsoft Corporation Ranking search results using language types
US7792833B2 (en) 2005-03-03 2010-09-07 Microsoft Corporation Ranking search results using language types
US20060200460A1 (en) * 2005-03-03 2006-09-07 Microsoft Corporation System and method for ranking search results using file types
US8302002B2 (en) 2005-04-27 2012-10-30 Xerox Corporation Structuring document based on table of contents
US11100070B2 (en) 2005-04-29 2021-08-24 Robert T. and Virginia T. Jenkins Manipulation and/or analysis of hierarchical data
US11194777B2 (en) * 2005-04-29 2021-12-07 Robert T. And Virginia T. Jenkins As Trustees Of The Jenkins Family Trust Dated Feb. 8, 2002 Manipulation and/or analysis of hierarchical data
US20070093241A1 (en) * 2005-10-21 2007-04-26 Lg Electronics Inc. Mobile communication terminal for providing contents and method thereof
US8073437B2 (en) * 2005-10-21 2011-12-06 Lg Electronics Inc. Mobile communication terminal for providing contents and method thereof
EP1826684A1 (en) * 2006-02-23 2007-08-29 Xerox Corporation Table of contents extraction with improved robustness
US7743327B2 (en) 2006-02-23 2010-06-22 Xerox Corporation Table of contents extraction with improved robustness
US20070198912A1 (en) * 2006-02-23 2007-08-23 Xerox Corporation Rapid similarity links computation for tableof contents determination
JP2007226797A (en) * 2006-02-23 2007-09-06 Xerox Corp Rapid similarity links computation for table of contents determination
US7890859B2 (en) * 2006-02-23 2011-02-15 Xerox Corporation Rapid similarity links computation for table of contents determination
US20080065671A1 (en) * 2006-09-07 2008-03-13 Xerox Corporation Methods and apparatuses for detecting and labeling organizational tables in a document
US20080244381A1 (en) * 2007-03-30 2008-10-02 Alex Nicolaou Document processing for mobile devices
US20090106223A1 (en) * 2007-10-18 2009-04-23 Microsoft Corporation Enterprise relevancy ranking using a neural network
US9348912B2 (en) 2007-10-18 2016-05-24 Microsoft Technology Licensing, Llc Document length as a static relevance feature for ranking search results
US7840569B2 (en) 2007-10-18 2010-11-23 Microsoft Corporation Enterprise relevancy ranking using a neural network
US20090106235A1 (en) * 2007-10-18 2009-04-23 Microsoft Corporation Document Length as a Static Relevance Feature for Ranking Search Results
US20090106221A1 (en) * 2007-10-18 2009-04-23 Microsoft Corporation Ranking and Providing Search Results Based In Part On A Number Of Click-Through Features
US20090110268A1 (en) * 2007-10-25 2009-04-30 Xerox Corporation Table of contents extraction based on textual similarity and formal aspects
US9224041B2 (en) 2007-10-25 2015-12-29 Xerox Corporation Table of contents extraction based on textual similarity and formal aspects
US20090259651A1 (en) * 2008-04-11 2009-10-15 Microsoft Corporation Search results ranking using editing distance and document information
US8812493B2 (en) 2008-04-11 2014-08-19 Microsoft Corporation Search results ranking using editing distance and document information
US8738635B2 (en) 2010-06-01 2014-05-27 Microsoft Corporation Detection of junk in search result ranking
JP2013003694A (en) * 2011-06-14 2013-01-07 Kddi Corp Id assigning device, method and program
US9495462B2 (en) 2012-01-27 2016-11-15 Microsoft Technology Licensing, Llc Re-ranking search results
US20130262983A1 (en) * 2012-03-30 2013-10-03 Bmenu As System, method, software arrangement and computer-accessible medium for a generator that automatically identifies regions of interest in electronic documents for transcoding
US9535888B2 (en) * 2012-03-30 2017-01-03 Bmenu As System, method, software arrangement and computer-accessible medium for a generator that automatically identifies regions of interest in electronic documents for transcoding
US10725645B2 (en) * 2013-05-20 2020-07-28 Rakuten, Inc. Information processing device for controlling display of web pages using main display area and sub display area

Also Published As

Publication number Publication date
US20070061415A1 (en) 2007-03-15
US8983949B2 (en) 2015-03-17

Similar Documents

Publication Publication Date Title
US8983949B2 (en) Automatic display of web content to smaller display devices: improved summarization and navigation
US20020129006A1 (en) System and method for modifying a document format
US6857102B1 (en) Document re-authoring systems and methods for providing device-independent access to the world wide web
US6789075B1 (en) Method and system for prioritized downloading of embedded web objects
US6674453B1 (en) Service portal for links separated from Web content
US7574486B1 (en) Web page content translator
US6670968B1 (en) System and method for displaying and navigating links
JP3437929B2 (en) Method for organizing data in a data processing system, communication network, method for organizing electronic documents, and electronic mail system
KR101342067B1 (en) Displaying information on a mobile device
US7176931B2 (en) Modifying hyperlink display characteristics
KR100461019B1 (en) web contents transcoding system and method for small display devices
US9524353B2 (en) Method and system for providing portions of information content to a client device
US20030004984A1 (en) Methods for transcoding webpage and creating personal profile
US20030029911A1 (en) System and method for converting digital content
WO2001029636A2 (en) Intelligent harvesting and navigation system and method
US20020016801A1 (en) Adaptive profile-based mobile document integration
US20040205620A1 (en) Information distributing program, computer-readable recording medium recorded with information distributing program, information distributing apparatus and information distributing method
US20080153467A1 (en) Methods and apparatus for enabling use of web content on various types of devices
US20020069296A1 (en) Internet content reformatting apparatus and method
US20020047856A1 (en) Web based stacked images
EP1412867A1 (en) System and method for converting an attachment in an e-mail for delivery to a device of limited rendering capability
WO2004040481A1 (en) A system and method for providing and displaying information content
US7797447B1 (en) Data detector for creating links from web content for mobile devices
US20020026472A1 (en) Service request method and system using input sensitive specifications on wired and wireless networks
WO2001073560A1 (en) Contents providing system

Legal Events

Date Code Title Description
AS Assignment

Owner name: TIMBERLINE VENTURE PARTNERS, L.P., WASHINGTON

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:WIREDPOCKET;REEL/FRAME:014415/0439

Effective date: 20021122

Owner name: UNISITE SOFTWARE, INC., WASHINGTON

Free format text: ASSET PURCHASE AGREEMENT;ASSIGNOR:TIMBERLINE VENTURE PARTNERS, L.P.;REEL/FRAME:015195/0776

Effective date: 20030613

Owner name: STIRLING BRIDGE, INC., CALIFORNIA

Free format text: ASSET PURCHASE AGREEMENT;ASSIGNOR:UNISITE SOFTWARE, INC.;REEL/FRAME:014415/0332

Effective date: 20030613

STCB Information on status: application discontinuation

Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION