CN103294791A - Extensible markup language pattern matching method - Google Patents

Extensible markup language pattern matching method Download PDF

Info

Publication number
CN103294791A
CN103294791A CN201310192029XA CN201310192029A CN103294791A CN 103294791 A CN103294791 A CN 103294791A CN 201310192029X A CN201310192029X A CN 201310192029XA CN 201310192029 A CN201310192029 A CN 201310192029A CN 103294791 A CN103294791 A CN 103294791A
Authority
CN
China
Prior art keywords
similar value
value
lvfu
sequence
elements
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201310192029XA
Other languages
Chinese (zh)
Inventor
霍红卫
郭海涛
高培
张懿璞
于强
孙春晓
郭鸿志
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xidian University
Original Assignee
Xidian University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xidian University filed Critical Xidian University
Priority to CN201310192029XA priority Critical patent/CN103294791A/en
Publication of CN103294791A publication Critical patent/CN103294791A/en
Pending legal-status Critical Current

Links

Abstract

The invention discloses an extensible markup language pattern machining method, aiming to solve the problems of the prior art in pattern expression form, complex match discovery, matching efficiency and the like. The method includes the following steps: inputting extensible markup language patterns; establishing a pattern tree; constructing a sequential structure; performing name, data type and cardinal number matching on all elements to acquire language similarity values of all element pairs; performing child similarity, leaf similarity, brother similarity and ancestor similarity matching on complex elements to acquire structural and overall similarity values of complex element pairs, and filtering and finding out matched complex element pairs; as for each matched complex element, finding out an atom set corresponding to each element pair; as for atom-centralized elements, applying an uncomplicated element structural matching method to calculating structural and overall similarity values of atom elements, filtering and finding out matched element pairs, and outputting all the matched element pairs. Automation is realized during the whole process of the method, and matching efficiency is improved on the premise that matching quality is guaranteed.

Description

A kind of expandable mark language mode matching process
Technical field
The invention belongs to communication technical field, further relate to a kind of extend markup language (the eXtensible Markup Language XML) method for mode matching in the technical field of data processing.The present invention can be according to title and the structural information of pattern, two input expandable mark language mode documents are carried out the expandable mark language mode coupling automatically, find out the mapping between all similar elements in two documents, be used for the similarity between definite different extensible markup.
Background technology
Along with the development of Internet, extend markup language arises at the historic moment and becomes data representation in the network, data analysis and data exchange standard.Because the dirigibility that extensible markup is described, the increase day by day of XML document quantity and scale, how efficiently managing large scale extensible markup and integrated a large amount of extensible markup resource become very important.Therefore, the expandable mark language mode matching technique for element consistency between the identification expandable mark language mode becomes the research focus.
Expandable mark language mode coupling as input, uses different similar value computing method to obtain a mapping between two expandable mark language modes with two expandable mark language modes.The expandable mark language mode coupling is being brought into play important effect in the data sharing application: in data integration, it can be used for identifying the also relation of the internal schema between a plurality of patterns of mark; In data warehouse, it can be mapped to store mode with a data resource; In ecommerce, it can realize that the message between the different extend markup language forms reflects thorn; In semantic network, it can be used for setting up the semantic corresponding relation between the Ontological concept of different web sites; In data migrations, it can with from a plurality of resources to leave over the data migration be new data; In data-switching, it can be mapped as destination object with a source object; In the XML data cluster, it can be used for determining the Semantic Similarity between the different extensible markup.
Early stage pattern match is normally finished by hand, and manually the designated mode coupling is one and loses time, makes mistakes easily and process that expense is very big.Current, a large amount of automatic mode matching algorithms and matching system propose in succession, as LSD (Learning Source Descriptions), Cupid, COMA (COmbination of MAtching algorithims), Similarity Flooding, AgreementMaker, ASMOV (Automated Semantic Matching of Ontologies with Verification), OII Harmony etc.Though existing a large amount of pattern matching algorithm and system have realized the semi-automatic or full-automatic coupling of pattern, quality of match is also higher, still has many defectives in the extend markup language automatic mode coupling.At first, most of matching algorithm is only found simple coupling (1:1 coupling), finds that complicated coupling only has less method.Secondly, global similarity between the mainly consideration pattern of most of matching algorithm, ignored the similarity between the independent element, and the element Study on Similarity of expandable mark language mode can be good at supporting semi-automatic and labour-intensive activity, integrated such as expandable mark language mode.At last, also be to the most important thing is that most of matching system only pays close attention to quality of match, ignored matching efficiency, make that the matching efficiency of large-scale data is extremely low.Such as carrying out semantic similar coupling by outside dictionary (WordNet etc.) in the element term coupling, though this has improved the accuracy rate of name-matches, looking into word frequently can increase match time greatly.
The patented claim that Nankai University proposes " based on XML document structure and the Semantic Similarity computing method of expansion adjacency matrix " (application number 201010118060.5 application publication number CN101799825A) discloses a kind of XML document structure and Semantic Similarity computing method based on the expansion adjacency matrix.The concrete steps of this method are: the first, and import XML document, and XML document tree is encoded; The second, for two documents behind the coding, generate pattern document node list and data source document node list; The 3rd, based on two node listings that generate, generate pattern expansion adjacency matrix and data source expansion adjacency matrix; The 4th, use the cosine law to calculate the distance of two adjacency matrix, draw the similar value of two XML documents.The deficiency that this patented claim exists is: at first, this method is the similarity of measurement pattern on the document level only, and be not deep on this thinner granularity of element of document, this just makes that this method can not be used for handling application based on mapped data between the expandable mark language mode element; Secondly, this method is only used these finite information of father node information of node label, node hierarchical information, nodes encoding information and node, and the foundation as the similarity of node metric may produce bigger error in similar value is calculated.
MITRE CORP[US] patented claim " TOOLS AND METHODS FOR SEMI-AUTOMATIC SCHEMA MATCHING " (application number US20060491167 application publication number US2008021912A1) that proposes discloses a kind of automanual expandable mark language mode matching tool and method, concrete steps are: the first, import source and target expandable mark language mode to be matched; The second, graphically show the source and target expandable mark language mode; The 3rd, whether the inquiry user wishes manual some coupling of specifying; If then allow the user on shown expandable mark language mode figure, specify some coupling by hand; Otherwise, carried out for the 4th step to the 7th step; The 4th, to the source and target expandable mark language mode, carry out the language pre-service, and given a mark by one group of coupling ballot device; The 5th, by the ballot combiner resulting all scores of the 4th step are merged, generate the coupling matrix; The 6th, add structural information and further adjust score value; The 7th, the graphical result who shows coupling; The 8th, repeated for the 3rd step to the 7th step, up to calculating all matching score.The deficiency that this patented claim exists is: although under manual intervention, semi-automatic expandable mark language mode matching process can improve the quality of pattern match to a certain extent, but this can only be adapted to small-scale data and handle, for fairly large expandable mark language mode document, manual designated mode coupling be one dull, the process of losing time and makeing mistakes easily, therefore, this may limit this method the scale of treatable expandable mark language mode data.
MICROSOFT CORP[US] patented claim " METHODS AND SYSTEMS FOR MODEL MATCHING " (application number US20010028912 application publication number US2003120651A1) that proposes discloses the method and system of a kind of model or pattern match, concrete steps are: the first, import source and target expandable mark language mode to be matched; The second, two expandable mark language modes importing are carried out DOM Document Object Model resolve; The 3rd, the DOM Document Object Model that generates is converted into common object model; The 4th, carry out root attributes match and structure matching; The 5th, return matching result.The deficiency that this patented claim exists is: structure matching is mainly utilized the similarity of leaf node, and do not consider node comprehensively other as structurally interrelated informations such as child, brothers, this may reduce the quality of structure matching; Secondly, in structure matching, for improving the structure matching effect, need repeat to travel through subtree, carry out the renewal of multipass node similar value, this improves matched accuracy to a certain extent, but when handling extensive expandable mark language mode, may cause very big system overhead, thereby reduce matching efficiency.
Summary of the invention
The objective of the invention is at above-mentioned the deficiencies in the prior art, a kind of expandable mark language mode matching process is proposed, adopt and strengthen the Pu Lvfu sequence as the intermediate representation of expandable mark language mode, and take full advantage of the relevant information of language and the information of structurally associated, find out the mapping between all similar elements in two documents.This method whole-course automation, and under the prerequisite that guarantees quality of match, improved matching efficiency, solve the problem that existing pattern match runs at aspects such as modal representation form, the complicated coupling of discovery, matching efficiencies.
To achieve these goals, concrete steps of the present invention comprise as follows:
(1) two expandable mark language mode documents to be matched of input.
(2) make up scheme-tree:
Two expandable mark language mode documents to be matched are carried out DOM Document Object Model resolve, generate the scheme-tree of two expandable mark language mode files to be matched.
(3) tectonic sequence structure:
Respectively two scheme-trees are carried out the Pu Lvfu sequence structure, strengthen the Pu Lvfu sequence for two that obtain to be formed by numbering Pu Lvfu sequence and mark Pu Lvfu sequence.
(4) language coupling:
4a) from the mark Pu Lvfu sequence of two reinforcement Pu Lvfu sequences, choose an element s and element t arbitrarily respectively;
4b) adopt title similar value computing method, obtain the title similar value of element s and element t;
4c) adopt data type similar value computing method, obtain the data type similar value of element s and element t;
4d) adopt constraint base similar value computing method, obtain the constraint base similar value of element s and element t;
4e) with the weighted mean of the title similar value of element s and element t, data type similar value, the constraint base similar value language similar value as element s and element t;
4f) repeated execution of steps 4a) to step 4e), all elements language similar value between any two in obtaining two mark Pu Lvfu sequences.
(5) complicated element structure coupling:
5a) according to back sequence number from small to large the order of node in scheme-tree, respectively two all nodes of strengthening the numbering Pu Lvfu sequence in the Pu Lvfu sequence are sorted;
5b) from the numbering Pu Lvfu sequence after two orderings, choose an element i and element j respectively arbitrarily;
5c) adopt child's similar value computing method, obtain child's similar value of element i and element j;
5d) adopt leaf similar value computing method, obtain the leaf similar value of element i and element j;
5e) adopt fraternal similar value computing method, obtain the fraternal similar value of element i and element j;
5f) adopt ancestors' similar value computing method, obtain ancestors' similar value of element i and element j;
5g) with the weighted mean of child's similar value of element i and element j, leaf similar value, fraternal similar value, the ancestors' similar value structural similarity value as element i and element j;
5h) weighted mean of the language similar value that the structural similarity value of element i and element j and step (4) are obtained are as the global similarity value of element i and element j;
5i) repeated execution of steps 5c) to step 5h), all elements global similarity value between any two in obtaining two numbering Pu Lvfu sequences after the ordering;
5j) to all elements global similarity value between any two in the numbering Pu Lvfu sequence after two orderings, use threshold method to filter, the complex node that obtains all couplings is right, forms the complex node of coupling to collection.
(6) non-complex element structure coupling:
6a) appoint from the resulting coupling element of complex node structure matching centering that to get an element right, the element of element centering be designated as element e and element f respectively:
6b) the reinforcement Pu Lvfu sequence at searching element e and element f place is respectively found out all atoms of element e and element f, the former subclass of component e and element f;
6c) concentrate from the atom of element e, appoint and get an element c, adopt non-complex element structure matching process, the atom that obtains element c and element f is concentrated the structural similarity value of all elements;
6d) atom of judging element e concentrates whether to also have element, if having, and execution in step 6a then); Otherwise, think that the atom that has obtained element e and element f concentrates all elements global similarity value between any two, execution in step 6e):
6e) repeated execution of steps 6a), step 6b), step 6c), step 6d), up to obtaining the resulting coupling element of all complex node structure matching corresponding atom is concentrated all elements global similarity value between any two;
6f) the global similarity value right to resulting all elements uses threshold method to filter, and the non-complex node that obtains mating is right, forms the non-complex node of coupling to collection.
(7) output matching result:
The union of non-complex node to collecting of the coupling that the complex node of the coupling that obtains of output step (5) obtains collection and step (6).
The present invention compared with prior art has the following advantages:
The first, the present invention carries out the expandable mark language mode coupling at the level of XML document element.Overcome in the prior art the only similarity of measurement pattern on the document level, and be not deep into deficiency on this thinner granularity of element of document, make the present invention can better be used for semi-automatic and labor-intensive task, integrated etc. as expandable mark language mode.
The second, the present invention takes full advantage of name, data type and constraint base information in the language coupling, take into full account these structurally interrelated informations of child, brother, leaf and ancestor node of node in structure matching.Overcome the deficiency that the language that only basis node seldom is relevant in the prior art and structural information are calculated the expandable mark language mode similarity, made the present invention improve the quality of expandable mark language mode coupling.
The 3rd, the present invention uses sequential structure efficiently---and strengthen the Pu Lvfu sequence and represent expandable mark language mode, and in the part of language coupling most critical---adopt the principle of decision tree to merge multiple string matching algorithm in the name-matches, to improve matching efficiency.In addition, in structure matching, only the bar structure matching process is applied to the right atomic element of the complicated element of coupling, rather than calculates the structural similarity value of all atomic elements, and this structure matching method is easy to find complicated coupling, and can guarantee matching efficiency simultaneously.Overcome and only paid close attention to quality of match in the prior art, and ignored the deficiency of matching efficiency, the present invention reaches a kind of balance in quality of match and performance, makes the present invention can be applicable to widely and uses.
Description of drawings
Fig. 1 is process flow diagram of the present invention.
Embodiment
The present invention is described in further detail below in conjunction with accompanying drawing 1.
Step 1 is imported two expandable mark language mode documents to be matched.
From two expandable mark language mode documents to be matched of terminal input.
Step 2 makes up scheme-tree.
Respectively two expandable mark language mode documents to be matched are carried out DOM Document Object Model and resolve, generate the scheme-tree of two expandable mark language mode files to be matched.
Scheme-tree can be expressed as a tlv triple T={N T, E T, Lab NT, N wherein T={ n 1, n 2..., n nBe set of node, an object in the unique expression pattern of each node in the set of node; E T={ (n i, n j) | n i, n j∈ N TBe the limit collection, n iBe n jFather node, two set memberships between the node are represented on every limit; Lab NTBe the node label collection, described label is the character string of description node attribute; Node in the scheme-tree is divided into two kinds, atomic node and complex node; Atomic node is the leaf node that does not go out the limit, and the simple elements in the expression pattern and attribute, complex node are the internal nodes of scheme-tree, represents complicated element.
The structure of expandable mark language mode is very complicated, mainly shows as: the occurrence number of element and attribute is the regular expression of a complexity often, and constraint base is difficult for determining; The existence of the overall situation statement element, attribute and the complicated type of sharing makes and has ring in the expandable mark language mode.Therefore, before making up scheme-tree, also should comprise the simplification expandable mark language mode.Simplifying the steps in sequence order comprises: set up copy with sharing in the solution pattern, restriction recurrence number of times solves the infinite recurrence in the pattern, with the constraint base of rule simplifying element and attribute.Scheme-tree is a kind of oriented labelled tree, and it has reflected the hierarchical structure of expandable mark language mode document.
Step 3, the tectonic sequence structure.
Two scheme-trees are adopted Pu Lvfu sequence (Pr ü fer Sequences) generation method respectively, generate corresponding reinforcement Pu Lvfu sequence (Consolidated Pr ü fer Sequence CPS).Wherein, Pu Lvfu sequence (Label Pr ü fer Sequence LPS) constitutes by numbering Pu Lvfu sequence (Number Pr ü fer Sequence NPS) and marking again to strengthen the Pu Lvfu sequence, be CPS={NPS, LPS}, they have distinguished diverse information in the expression scheme-tree, wherein, numbering Pu Lvfu sequence is represented the structural information of scheme-tree, mark Pu Lvfu sequence is represented the semantic information of scheme-tree, therefore, strengthens that the Pu Lvfu sequence is unique has represented a scheme-tree.Strengthening the Pu Lvfu sequence has own unique advantage, has comprised node characteristic and relation property.The node characteristic refers to: suppose n iBe a node in the scheme-tree, n iBack sequence number in scheme-tree is k, then n iBe that and if only if that k does not belong to NPS for atomic node; n iBe that and if only if that k belongs to NPS for complex node.Relation property comprises again: set membership characteristic and brotherhood characteristic.The set membership characteristic refers to: establish NPS iFor index among the NPS is the element of i, LPS iFor index among the LPS is the element of i, NPS is arranged so then i-node is LPS iFather's node of node, LPS iBe NPS iDirect child nodes.The brotherhood characteristic refers to: establish NPS iAnd NPS jBe respectively that index is the element of i and j among the NPS, LPS iAnd LPS jBe respectively that index is the element of i and j, then LPS among the NPS iAnd LPS jBe the brotgher of node and if only if NPS i=NPS j
Adopting the Pu Lvfu sequence constructing method is the corresponding reinforcement Pu Lvfu sequence of scheme-tree structure of expandable mark language mode file to be matched, and specific implementation process is as follows:
The 1st goes on foot, and searches for the scheme-tree of two expandable mark language mode documents to be matched, therefrom finds the leaf node with minimum postorder traversal serial number;
The 2nd step with the leaf node with minimum postorder traversal serial number that finds, was stored in the mark Pu Lvfu sequence, and the father node with this leaf node is stored in the numbering Pu Lvfu sequence simultaneously;
The 3rd step, from the scheme-tree of two expandable mark language mode files to be matched, the leaf node with minimum postorder traversal serial number that deletion is found;
In the 4th step, judge whether the scheme-tree of two expandable mark language mode files to be matched is empty, if then carried out for the 1st step; Otherwise, finished the structure of Pu Lvfu sequence.
Step 4, the language coupling.
The language matching process is based on the three's of the constraint base of the data type of the title of node, node and node similarity, and specific implementation process is as follows:
4a) from the mark Pu Lvfu sequence of two reinforcement Pu Lvfu sequences, choose an element s and element t arbitrarily respectively.
4b) adopt title similar value computing method, obtain the title similar value of element s and element t.
Under the situation of not considering data instance, nodename is an important information of coupling.Nodename is similar can be semantic similar, as People and Staff, also can be structural similarity, as Staff and TechnicalStaff.The structural similarity available characters string matching method of title, the similar value of two name character strings of calculating.Semantic similar need be by outside dictionary, and search the time that outside dictionary can increase coupling frequently, therefore consider the full-automatic property of matching efficiency and coupling, name-matches only comprises the structural similarity of title.The specific implementation process of name-matches method is as follows:
4b1) according to title token rule, the title of element s and element t is cut apart, obtain token collection 1 and token collection 2.
In expandable mark language mode, the title of some node is longer, and the information that some node is represented is identical but usually have the different digital sequence number in order to distinguish, and the title of some node has special symbol.In order to make nodename better be used for string matching algorithm, nodename at first standard is set-token collection of being made up of many substrings, and each substring is called token.The token rule refers to: with as " _ ", the special symbol in space, numeral, capitalization etc. is separator, and nodename is divided into token, and the non-alphabetical token concentrated of deletion token, as numeral, special symbol token etc.
4b2) appoint from token collection 1 and get an element a, adopt traditional decision-tree, obtain the character string similar value of all elements in element a and the token collection 2, this step specific implementation process is as follows:
In the 1st step, from token collection 2, appoint and get an element b.
The 2nd step, the string value of comparison element a and element b, if identical, then the character string similar value of element a and element b is 1; Otherwise, the editing distance similar value of calculating element a and element b.
In the 3rd step, whether judge the editing distance similar value more than or equal to threshold value 0.58, if then the character string similar value of element a and element b is the editing distance similar value of calculating; Otherwise, adopt the Jaro-Winkler algorithm to calculate the character string similar value of element a and element b, adopt the 3-gram algorithm to calculate another character string similar value of element a and element b simultaneously, with the weighted mean of two similar value character string similar value as element a and element b.
The 4th step, judge in the token collection 2 whether also have element, if having, then forwarded for the 1st step to; Otherwise, think the character string similar value that has obtained all elements in element a and the token collection 2, execution in step 4b3).
4b3) find out step 4b2) in maximal value in the resulting similar value, with the character string similar value of this maximal value as element a and token collection 2.
4b4) judge in the token collection 1 whether also have element, if having, then forward step 4b2 to); Otherwise, think the character string similar value that has obtained token collection 1 all elements and token collection 2, execution in step 4b5).
4b5) the character string similar value with token collection 1 all elements and token collection 2 adds up, obtain adding up and.
4b6) from token collection 2, appoint and get an element b, adopt traditional decision-tree, calculate the character string similar value of all elements in element b and the token collection 1.
4b7) find out step 4b6) in maximal value in the resulting similar value, with the character string similar value of this maximal value as element b and token collection 1.
4b8) judge in the token collection 2 whether also have element, if having, then forward step 4b6 to); Otherwise, think the character string similar value that has obtained token collection 2 all elements and token collection 1, execution in step 4b9).
4b9) the character string similar value with token collection 2 all elements and token collection 1 adds up, obtain adding up and.
4b10) with step 4b5), step 4b9) in resulting adding up and addition, obtain summation.
4b11) with the total number of summation divided by the concentrated element of two tokens, the gained merchant is the language similar value of element s and element t.
4c) adopt data type similar value computing method, obtain the data type similar value of element s and element t.
Though nodename is the important information of language coupling, still have the coupling of a lot of mistakes in the map element that name-matches obtains.In order to improve quality of match, data type becomes another the utilizable pattern information in the language coupling.The data type of expandable mark language mode has two kinds of built-in type and customization types, and built-in type comprises string, int, bool etc., and customization type comprises complicated type and simple types, and simple types sums up in the point that the end also is built-in type.The similarity of two built-in type node is greater than a built-in type node and a complicated type node, and two complicated type node types similar value are by the structures shape of node.The specific implementation process of data type matching process is as follows:
Whether the data type of judging element s and element t is built-in type, if, then search the similar table of type, find out the data type similar value of element s and element t; Otherwise, judge whether the data type of element s and element t is complicated type, if then the data type similar value of element s and element t is 1; Otherwise the data type similar value of element s and element t is 0.
4d) adopt constraint base similar value computing method, obtain the constraint base similar value of element s and element t.
The constraint base information of node becomes is can utilize another important information in the language coupling, and expandable mark language mode has defined the occurrence number of element in the pattern or attribute with " minOccurs " and " maxOccurs ".The radix representation method of node has four kinds of basic constraint base values in the document type definition (Document Type Definition DTD): " * ", "? " "+" and " none ", these four kinds basic constraint base values correspond in the expandable mark language mode definition (XML Schema Definition XSD), " none " expression minOccurs=1 and maxOccurs=1, "? " expression mihOccurs=0 and maxOccurs=1, " * " expression minOccurs=0 and maxOccurs=unbounded, "+" expression minOccurs=1 and maxOccurs=unbounded.If the constraint base of two nodes can be expressed as these four kinds basic constraint base values, the constraint similar value of these two nodes only need be searched the similar table of constraint base and got final product so, otherwise, carry out the following step, calculate the constraint base similar value of element s and element t:
4d1) use following formula to calculate the minimum cardinality constraint similar value of element s and element t:
u = 1 - | x - y | x + y
Wherein, u represents the minimum cardinality constraint similar value of element s and element t, and x represents the least commitment occurrence number of element s, and y represents the least commitment occurrence number of element t.
4d2) use following formula to calculate the maximum constraint base similar value of element s and element t:
v = 1 - | m - n | m + n
Wherein, v represents the maximum constraint base similar value of element s and element t, and m represents the maximum constrained occurrence number of element s, and n represents the maximum constrained occurrence number of element t.
4d3) calculate the minimum cardinality constraint similar value of element s and element t and the mean value of maximum constraint base similar value, this mean value is the constraint base similar value of element s and element t.
4e) with the weighted mean of the title similar value of element s and element t, data type similar value, the constraint base similar value language similar value as element s and element t.
4f) repeated execution of steps 4a) to step 4e), all elements language similar value between any two in obtaining two mark Pu Lvfu sequences.
Step 5, complicated element structure coupling.
The structure matching method of complex node is based on four kinds of structures of node: child, leaf, brother and ancestors, and this step specific implementation process is as follows:
5a) according to back sequence number from small to large the order of node in scheme-tree, respectively two all nodes of strengthening the numbering Pu Lvfu sequence in the Pu Lvfu sequence are sorted.
5b) from the numbering Pu Lvfu sequence after two orderings, choose an element i and element j respectively arbitrarily.
5c) adopt child's similar value computing method, obtain child's similar value of element i and element j.
As most important part in the element structure similar value, child's similar value has directly reflected the basic structure of element, and this step specific implementation process is as follows:
5c1) utilize to strengthen the set membership characteristic that the Pu Lvfu sequence comprises, corresponding reinforcements of searching element i Pu Lvfu sequence is found out all children of element i, and the child of component i collects; The corresponding reinforcement of searching element j Pu Lvfu sequence is found out all children of element j, child's collection of component j;
5c2) child from element i concentrates, and appoints and gets an element p, and the child who obtains element p and element j concentrates the global similarity value of all elements.The global similarity value of two elements here is the weighted mean of the structural similarity value of the language similar value of two elements and two elements; Because the structural similarity value of two elements is according to back sequence number from small to large the order computation of node in scheme-tree, so, the child who has calculated element p and element j this moment concentrates the structural similarity value of element, and its language similar value is calculated in step (4).
5c3) find out the maximal value in the resulting similar value in the 2nd step, with the similar value of this maximal value as child's collection of element p and element j.
5c4) child who judges element i concentrates whether also have element, if having, then forwards step 5c2 to); Otherwise, think to have obtained the global similarity value that element child i concentrates the child of all elements and element j to collect, execution in step 5c5).
5c5) child of element i is concentrated all elements and all similar value of child's collection of element j add up, obtain one add up with.
5c6) with step 5c5) resultingly add up and collect the maximal value of contained number of elements divided by two children, the gained merchant is child's similar value of element i and element j.
5d) adopt leaf similar value computing method, obtain the leaf similar value of element i and element j.
This step specific implementation process is as follows:
5d1) utilize to strengthen set membership and the brotherhood characteristic that the Pu Lvfu sequence comprises, the corresponding reinforcement of searching element i Pu Lvfu sequence is found out all leaves of element i, the leaf collection of component i; The corresponding reinforcement of searching element j Pu Lvfu sequence is found out all leaves of element j, the leaf collection of component j.
5d2) concentrate the difference of the back sequence number of each leaf node in scheme-tree as the component of the digital vectors of element with back sequence number and the leaf of element of element in scheme-tree, make up the digital vectors of element i and element j respectively.
5d3) use the cosine law, calculate the similar value of the digital vectors of element i and element j, this similar value is the leaf similar value of element i and element j.
5e) adopt fraternal similar value computing method, obtain the fraternal similar value of element i and element j.
This step specific implementation process is as follows:
5e1) utilize to strengthen the brotherhood characteristic that comprises in the Pu Lvfu sequence, corresponding reinforcements of searching element i Pu Lvfu sequence is found out all brothers of element i, and the brother of component i collects; The corresponding reinforcement of searching element j Pu Lvfu sequence is found out all brothers of element j, brother's collection of component j.
5e2) brother from element i concentrates, and appoints and gets an element q, and the brother who obtains element q and element j concentrates the language similar value of all elements.
5e3) find out step 5e2) in maximal value in the gained similar value, the similar value that this maximal value is collected as the brother of element q and element j.
5e4) brother who judges element i concentrates whether also have element, if having, then forwards step 5e2 to); Otherwise, think that the brother who has obtained element i concentrates the language similar value of all elements and element j, execution in step 5e5).
5e5) brother of element i is concentrated all elements and all similar value of brother's collection of element j add up, obtain one add up with.
5e6) with step 5e5) resulting add up and divided by the maximal values of two contained number of elements of brother collection, the gained merchant is the fraternal similar value of element i and element j.
5f) adopt ancestors' similar value computing method, obtain ancestors' similar value of element i and element j.
This step specific implementation process is as follows:
5f1) utilize the set membership characteristic that comprises in the reinforcement Pu Lvfu sequence, the corresponding reinforcement of searching element i Pu Lvfu sequence, find out all ancestors of element i, and according to the sequencing of search all ancestors of element i are coupled together the ancestors path that constitutes element i; The corresponding reinforcement of searching element j Pu Lvfu sequence is found out all ancestors of element j, and according to the sequencing of search all ancestors of element j is coupled together the ancestors path that constitutes element j.
5f2) the ancestors path is regarded as a character string sequence, each nodename in the path is regarded an integral body as, utilize the language matching process to calculate the language similar value of each node, based on the language similar value of all nodes in the ancestors path, the editing distance between the ancestors path of calculating element i and the ancestors path of element j; When calculating editing distance, language is similar and do not require identical only to consider nodename.For example, the ancestors path of supposing two nodes is respectively PO/Orders/shipTo and PO/POrders/buyer, wherein, PO and PO are identical, similar on Orders and the POrders language, dissimilar on shipTo and the buyer language, therefore, the editing distance between the ancestors path of these two nodes is 1.
5f3) with step 5f2) in editing distance between the ancestors path of the ancestors path of resulting element i and element j divided by the maximal value in ancestors' path of ancestors' path (contained ancestor node number) of element i and element j, obtain a merchant.
5f4) unit 1 deducts step 5f3) resulting merchant, be ancestors' similar value of element i and element j.
5g) with the weighted mean of child's similar value of element i and element j, leaf similar value, fraternal similar value, the ancestors' similar value structural similarity value as element i and element j.
5h) with the weighted mean of the structural similarity value of element i and element j and the language similar value global similarity value as element i and element j.
5i) repeated execution of steps 5c) to step 5h), all elements global similarity value between any two in obtaining two numbering Pu Lvfu sequences after the ordering.
5j) to all elements global similarity value between any two in the numbering Pu Lvfu sequence after two orderings, use threshold method to filter, the complex node that obtains all couplings is right, forms the complex node of coupling to collection.
Step 6, non-complex element structure coupling.
To right by resulting each the coupling element of complex node structure matching, calculate element corresponding atom is concentrated all internodal structural similarity values.This match party can also identify complicated coupling except improving the matching efficiency.This step specific implementation process is as follows:
6a) appoint from the resulting coupling element of complex node structure matching centering that to get an element right, the element of element centering is designated as element e and element f respectively.
6b) the reinforcement Pu Lvfu sequence at searching element e and element f place is respectively found out all atoms of element e and element f, the former subclass of component e and element f.
6c) concentrate from the atom of element e, appoint and get an element c, the atom that obtains element c and element f is concentrated the structural similarity value of all elements, and concrete steps are as follows:
6c1) get an element d from concentrated of the atom of element t.
6c2) employing 5e) the fraternal similar value computing method described in, the fraternal similar value of acquisition element c and element d.
6c3) employing 5f) the ancestors' similar value computing method described in, ancestors' similar value of acquisition element c and element d.
6c4) with the weighted mean value of fraternal similar value and the ancestors' similar value of element c and element d, as the structural similarity value of element c and element d.
6c5) with structural similarity value and the addition of language similar value of element c and element d, resulting and as the global similarity value of element c and element d.
6c6) atom of judging element f concentrates whether also have element, if having, then forwards step 6c1 to); Otherwise, think that the atom that has obtained element c and element f concentrates the global similarity value of all elements, execution in step 6d).
6d) atom of judging element e concentrates whether to also have element, if having, and execution in step 6a then); Otherwise, think that the atom that has obtained element e and element f concentrates all elements global similarity value between any two, execution in step 6e).
6e) repeated execution of steps 6a), step 6b), step 6c), step 6d), up to obtaining the resulting coupling element of all complex node structure matching corresponding atom is concentrated all elements global similarity value between any two.
6f) the global similarity value right to resulting all elements uses threshold method to filter, and the non-complex node that obtains mating is right, forms the non-complex node of coupling to collection.
(7) output matching result:
The union of non-complex node to collecting of the coupling that the complex node of the coupling that obtains of output step (5) obtains collection and step (6).

Claims (10)

1. expandable mark language mode matching process comprises following concrete steps:
(1) two expandable mark language mode documents to be matched of input;
(2) make up scheme-tree:
Two expandable mark language mode documents to be matched are carried out DOM Document Object Model resolve, generate the scheme-tree of two expandable mark language mode files to be matched;
(3) tectonic sequence structure:
Respectively two scheme-trees are carried out the Pu Lvfu sequence structure, strengthen the Pu Lvfu sequence for two that obtain to be formed by numbering Pu Lvfu sequence and mark Pu Lvfu sequence;
(4) language coupling:
4a) from the mark Pu Lvfu sequence of two reinforcement Pu Lvfu sequences, choose an element s and element t arbitrarily respectively;
4b) adopt title similar value computing method, obtain the title similar value of element s and element t;
4c) adopt data type similar value computing method, obtain the data type similar value of element s and element t;
4d) adopt constraint base similar value computing method, obtain the constraint base similar value of element s and element t;
4e) with the weighted mean of the title similar value of element s and element t, data type similar value, the constraint base similar value language similar value as element s and element t;
4f) repeated execution of steps 4a) to step 4e), all elements language similar value between any two in obtaining two mark Pu Lvfu sequences;
(5) complicated element structure coupling:
5a) according to back sequence number from small to large the order of node in scheme-tree, respectively two all nodes of strengthening the numbering Pu Lvfu sequence in the Pu Lvfu sequence are sorted;
5b) from the numbering Pu Lvfu sequence after two orderings, choose an element i and element j respectively arbitrarily;
5c) adopt child's similar value computing method, obtain child's similar value of element i and element j;
5d) adopt leaf similar value computing method, obtain the leaf similar value of element i and element j;
5e) adopt fraternal similar value computing method, obtain the fraternal similar value of element i and element j;
5f) adopt ancestors' similar value computing method, obtain ancestors' similar value of element i and element j;
5g) with the weighted mean of child's similar value of element i and element j, leaf similar value, fraternal similar value, the ancestors' similar value structural similarity value as element i and element j;
5h) weighted mean of the language similar value that the structural similarity value of element i and element j and step (4) are obtained are as the global similarity value of element i and element j;
5i) repeated execution of steps 5c) to step 5h), all elements global similarity value between any two in obtaining two numbering Pu Lvfu sequences after the ordering;
5j) to all elements global similarity value between any two in the numbering Pu Lvfu sequence after two orderings, use threshold method to filter, the complex node that obtains all couplings is right, forms the complex node of coupling to collection;
(6) non-complex element structure coupling:
6a) appoint from the resulting coupling element of complex node structure matching centering that to get an element right, the element of element centering be designated as element e and element f respectively:
6b) the reinforcement Pu Lvfu sequence at searching element e and element f place is respectively found out all atoms of element e and element f, the former subclass of component e and element f;
6c) concentrate from the atom of element e, appoint and get an element c, adopt non-complex element structure matching process, the atom that obtains element c and element f is concentrated the structural similarity value of all elements;
6d) atom of judging element e concentrates whether to also have element, if having, and execution in step 6a then); Otherwise, think that the atom that has obtained element e and element f concentrates all elements global similarity value between any two, execution in step 6e);
6e) repeated execution of steps 6a), step 6b), step 6c), step 6d), up to obtaining the resulting coupling element of all complex node structure matching corresponding atom is concentrated all elements global similarity value between any two;
6f) the global similarity value right to resulting all elements uses threshold method to filter, and the non-complex node that obtains mating is right, forms the non-complex node of coupling to collection;
(7) output matching result:
The union of non-complex node to collecting of the coupling that the complex node of the coupling that obtains of output step (5) obtains collection and step (6).
2. a kind of expandable mark language mode matching process according to claim 1 is characterized in that, the concrete steps of the Pu Lvfu sequence structure described in the step (3) are as follows:
The 1st goes on foot, and searches for the scheme-tree of two expandable mark language mode documents to be matched, therefrom finds the leaf node with minimum postorder traversal serial number;
The 2nd step with the leaf node with minimum postorder traversal serial number that finds, was stored in the mark Pu Lvfu sequence, and the father node with this leaf node is stored in the numbering Pu Lvfu sequence simultaneously;
The 3rd step, from the scheme-tree of two expandable mark language mode files to be matched, the leaf node with minimum postorder traversal serial number that deletion is found;
In the 4th step, judge whether the scheme-tree of two expandable mark language mode files to be matched is empty, if then carried out for the 1st step; Otherwise, finished the structure of Pu Lvfu sequence.
3. a kind of expandable mark language mode matching process according to claim 1 is characterized in that step 4b) described in the performing step of title similar value computing method as follows:
The 1st step, according to title token rule, the title of element s and element t is cut apart, obtain token collection 1 and token collection 2;
The 2nd step, from token collection 1, appoint and get an element a, adopt traditional decision-tree, obtain the character string similar value of all elements in element a and the token collection 2;
In the 3rd step, find out the maximal value in the resulting similar value in the 2nd step, with the character string similar value of this maximal value as element a and token collection 2;
The 4th step, judge in the token collection 1 whether also have element, if having, then forwarded for the 2nd step to; Otherwise, think the character string similar value that has obtained token collection 1 all elements and token collection 2, carried out for the 5th step;
The 5th step, the character string similar value of token collection 1 all elements and token collection 2 is added up, obtain adding up with;
The 6th step, from token collection 2, appoint and get an element b, adopt traditional decision-tree, calculate the character string similar value of all elements in element b and the token collection 1;
In the 7th step, find out the maximal value in the resulting similar value in the 6th step, with the character string similar value of this maximal value as element b and token collection 1;
The 8th step, judge in the token collection 2 whether also have element, if having, then forwarded for the 6th step to; Otherwise, think the character string similar value that has obtained token collection 2 all elements and token collection 1, carried out for the 9th step;
The 9th step, the character string similar value of token collection 2 all elements and token collection 1 is added up, obtain adding up with;
In the 10th step, with resulting adding up and addition in the 5th step, the 9th step, obtain summation;
In the 11st step, with the total number of summation divided by the concentrated element of two tokens, the gained merchant is the language similar value of element s and element t.
4. a kind of expandable mark language mode matching process according to claim 1, it is characterized in that, step 4c) the data type similar value computing method described in refer to, whether the data type of judging element s and element t is built-in type, if then from the similar table of type, find out the data type similar value of element s and element t; Otherwise, judge whether the data type of element s and element t is complicated type, if then the data type similar value of element s and element t is 1: otherwise the data type similar value of element s and element t is 0.
5. a kind of expandable mark language mode matching process according to claim 1 is characterized in that step 4d) described in constraint base similar value computing method refer to,
The 1st step, according to the different values of the constraint base of element s and element t, judge whether the constraint base of element s and element t is basic constraint base value, if, then search the similar table of constraint base, draw the constraint base similar value of element s and element t; Otherwise, carried out for the 2nd step to the 4th step, calculate the constraint base similar value of element s and element t:
In the 2nd step, use following formula to calculate the minimum cardinality constraint similar value of element s and element t:
u = 1 - | x - y | x + y
Wherein, u represents the minimum cardinality constraint similar value of element s and element t, and x represents the least commitment occurrence number of element s, and y represents the least commitment occurrence number of element t;
In the 3rd step, use following formula to calculate the maximum constraint base similar value of element s and element t:
v = 1 - | m - n | m + n
Wherein, v represents the maximum constraint base similar value of element s and element t, and m represents the maximum constrained occurrence number of element s, and n represents the maximum constrained occurrence number of element t;
The 4th step, calculate the minimum cardinality constraint similar value of element s and element t and the mean value of maximum constraint base similar value, this mean value is the constraint base similar value of element s and element t.
6. a kind of expandable mark language mode matching process according to claim 1 is characterized in that step 5c) described in the performing step of child's similar value computing method as follows:
The 1st step, utilize to strengthen the set membership characteristic that the Pu Lvfu sequence comprises, corresponding reinforcements of searching element i Pu Lvfu sequence is found out all children of element i, and the child of component i collects; The corresponding reinforcement of searching element j Pu Lvfu sequence is found out all children of element j, child's collection of component j;
The 2nd step, concentrated from the child of element i, appoint and get an element p, the child who obtains element p and element j concentrates the global similarity value of all elements;
In the 3rd step, find out the maximal value in the resulting similar value in the 2nd step, with the similar value of this maximal value as child's collection of element p and element j;
The 4th step, judge the child of element i concentrates whether also have element, if having, then forwarded for the 2nd step to; Otherwise, think to have obtained the global similarity value that element child i concentrates the child of all elements and element j to collect, carried out for the 5th step;
The 5th step, concentrate all elements and all similar value of child's collection of element j to add up the child of element i, obtain one add up with;
The 6th step, the 5th step was resultingly added up and collects the maximal value of contained number of elements divided by two children, the gained merchant is child's similar value of element i and element j.
7. a kind of expandable mark language mode matching process according to claim 1 is characterized in that step 5d) described in the performing step of leaf similar value computing method as follows:
The 1st step, utilize to strengthen set membership and brotherhood characteristic that the Pu Lvfu sequence comprises, the corresponding reinforcement of searching element i Pu Lvfu sequence is found out all leaves of element i, the leaf collection of component i; The corresponding reinforcement of searching element j Pu Lvfu sequence is found out all leaves of element j, the leaf collection of component j;
In the 2nd step, with the difference of the concentrated back sequence number of each leaf node in scheme-tree of leaf of back sequence number and the element of element in the scheme-tree component as the digital vectors of element, make up the digital vectors of element i and element j respectively;
The 3rd step, use the cosine law, calculate the similar value of the digital vectors of element i and element j, this similar value is the leaf similar value of element i and element j.
8. a kind of expandable mark language mode matching process according to claim 1 is characterized in that step 5e) described in the performing step of fraternal similar value computing method as follows:
The 1st step, utilize to strengthen the brotherhood characteristic that comprises in the Pu Lvfu sequence, corresponding reinforcements of searching element i Pu Lvfu sequence is found out all brothers of element i, and the brother of component i collects; The corresponding reinforcement of searching element j Pu Lvfu sequence is found out all brothers of element j, brother's collection of component j;
The 2nd step, concentrated from the brother of element i, appoint and get an element q, the brother who obtains element q and element j concentrates the language similar value of all elements;
In the 3rd step, find out the maximal value in the gained similar value in the 2nd step, with the similar value of this maximal value as brother's collection of element q and element j;
The 4th step, judge the brother of element i concentrates whether also have element, if having, then forwarded for the 2nd step to; Otherwise, think that the brother who has obtained element i concentrates the language similar value of all elements and element j, carries out for the 5th step;
The 5th step, concentrate all elements and all similar value of brother's collection of element j to add up the brother of element i, obtain one add up with;
The 6th step is with resulting add up and divided by the maximal values of two contained number of elements of brother collection, the gained merchant is the fraternal similar value of element i and element j of the 5th step.
9. a kind of expandable mark language mode matching process according to claim 1 is characterized in that step 5f) described in the performing step of ancestors' similar value computing method as follows:
The 1st step, utilize the set membership characteristic that comprises in the reinforcement Pu Lvfu sequence, the corresponding reinforcement of searching element i Pu Lvfu sequence is found out all ancestors of element i, and according to the sequencing of search all ancestors of element i is coupled together the ancestors path that constitutes element i; The corresponding reinforcement of searching element j Pu Lvfu sequence is found out all ancestors of element j, and according to the sequencing of search all ancestors of element j is coupled together the ancestors path that constitutes element j;
The 2nd step, regard the ancestors path as a character string sequence, each nodename in the path is regarded an integral body as, the language similar value of utilizing the language matching process to obtain, the editing distance between the ancestors path of calculating element i and the ancestors path of element j;
In the 3rd step, the editing distance between the ancestors path of the ancestors path of the element i that obtains in the 2nd step and element j divided by the maximal value in ancestors' path of ancestors' path (contained ancestor node number) of element i and element j, is obtained a merchant;
In the 4th step, unit 1 deducts resulting merchant of the 3rd step, is ancestors' similar value of element i and element j.
10. a kind of expandable mark language mode matching process according to claim 1 is characterized in that step 6c) described in the performing step of non-complex element structure coupling as follows:
In the 1st step, get an element d from concentrated of the atom of element t;
The 2nd step, adopt the described fraternal similar value computing method of claim 8, obtain the fraternal similar value of element c and element d;
The 3rd step, adopt the described ancestors' similar value of claim 9 computing method, obtain ancestors' similar value of element c and element d;
The 4th step is with the weighted mean value of fraternal similar value and the ancestors' similar value of element c and element d, as the structural similarity value of element c and element d;
The 5th step, with structural similarity value and the addition of language similar value of element c and element d, resulting and as the global similarity value of element c and element d.
CN201310192029XA 2013-05-13 2013-05-13 Extensible markup language pattern matching method Pending CN103294791A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310192029XA CN103294791A (en) 2013-05-13 2013-05-13 Extensible markup language pattern matching method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310192029XA CN103294791A (en) 2013-05-13 2013-05-13 Extensible markup language pattern matching method

Publications (1)

Publication Number Publication Date
CN103294791A true CN103294791A (en) 2013-09-11

Family

ID=49095653

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310192029XA Pending CN103294791A (en) 2013-05-13 2013-05-13 Extensible markup language pattern matching method

Country Status (1)

Country Link
CN (1) CN103294791A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104133673A (en) * 2014-07-04 2014-11-05 清华大学 Ontology example matching system and method based on user customization
CN105867995A (en) * 2016-04-29 2016-08-17 无锡天脉聚源传媒科技有限公司 Editing method and device for XML (extensible markup language) file
CN109918397A (en) * 2019-01-23 2019-06-21 中国银行股份有限公司 A kind of data matching method, device and storage medium
CN110912794A (en) * 2019-11-15 2020-03-24 国网安徽省电力有限公司安庆供电公司 Approximate matching strategy based on token set
CN111967607A (en) * 2020-07-31 2020-11-20 中国科学院深圳先进技术研究院 Model training method and device, electronic equipment and machine-readable storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030120651A1 (en) * 2001-12-20 2003-06-26 Microsoft Corporation Methods and systems for model matching
CN101799825A (en) * 2010-03-05 2010-08-11 南开大学 XML (Extensible Markup Language) document structure based on extended adjacent matrix and semantic similarity calculation method
CN102760173A (en) * 2012-07-02 2012-10-31 河海大学 Bottom-up XML (eXtensible Markup Language) twig pattern matching method

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030120651A1 (en) * 2001-12-20 2003-06-26 Microsoft Corporation Methods and systems for model matching
CN101799825A (en) * 2010-03-05 2010-08-11 南开大学 XML (Extensible Markup Language) document structure based on extended adjacent matrix and semantic similarity calculation method
CN102760173A (en) * 2012-07-02 2012-10-31 河海大学 Bottom-up XML (eXtensible Markup Language) twig pattern matching method

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
高培: "XML模式匹配算法的研究", 《中国优秀硕士学位论文全文数据库信息科技辑》, no. 3, 15 March 2013 (2013-03-15), pages 138 - 707 *

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104133673A (en) * 2014-07-04 2014-11-05 清华大学 Ontology example matching system and method based on user customization
CN104133673B (en) * 2014-07-04 2017-09-26 清华大学 The instances of ontology matching system and method customized based on user
CN105867995A (en) * 2016-04-29 2016-08-17 无锡天脉聚源传媒科技有限公司 Editing method and device for XML (extensible markup language) file
CN109918397A (en) * 2019-01-23 2019-06-21 中国银行股份有限公司 A kind of data matching method, device and storage medium
CN109918397B (en) * 2019-01-23 2021-04-27 中国银行股份有限公司 Data matching method, device and storage medium
CN110912794A (en) * 2019-11-15 2020-03-24 国网安徽省电力有限公司安庆供电公司 Approximate matching strategy based on token set
CN110912794B (en) * 2019-11-15 2021-07-16 国网安徽省电力有限公司安庆供电公司 Approximate matching strategy based on token set
CN111967607A (en) * 2020-07-31 2020-11-20 中国科学院深圳先进技术研究院 Model training method and device, electronic equipment and machine-readable storage medium
CN111967607B (en) * 2020-07-31 2023-09-01 中国科学院深圳先进技术研究院 Model training method and device, electronic equipment and machine-readable storage medium

Similar Documents

Publication Publication Date Title
CN102395965B (en) Method for searching objects in a database
WO2017211051A1 (en) Mining method and server for social network account of target subject, and storage medium
CN108710663B (en) Data matching method and system based on ontology model
CN104239513A (en) Semantic retrieval method oriented to field data
CN103646032A (en) Database query method based on body and restricted natural language processing
CN101183385B (en) XML enquire method based on multi-modality indexes structure
CN102156726A (en) Geographic element querying and extending method based on semantic similarity
CN103294791A (en) Extensible markup language pattern matching method
CN104699786A (en) Semantic intelligent search communication network complaint system
CN102609465A (en) Information recommendation method based on potential communities
CN106354844A (en) Service combination package recommendation system and method based on text mining
CN102664915A (en) Service selection method based on resource constraint in cloud manufacturing environment
CN103678436A (en) Information processing system and information processing method
CN102508971B (en) Method for establishing product function model in concept design stage
CN103246731A (en) Web service semantic annotation method based on associated data
CN104484337B (en) The storage method of XML document
Bai et al. Querying fuzzy spatiotemporal data using XQuery
CN104156431B (en) A kind of RDF keyword query methods based on sterogram community structure
CN102262658A (en) Method for extracting web data from bottom to top based on entity
CN104765763B (en) A kind of semantic matching method of the Heterogeneous Spatial Information classification of service based on concept lattice
CN109885694B (en) Document selection and learning sequence determination method
CN105447104A (en) Knowledge map generating method and apparatus
CN103020283A (en) Semantic search method based on dynamic reconfiguration of background knowledge
CN107391690B (en) Method for processing document information
CN103365960A (en) Off-line searching method of structured data of electric power multistage dispatching management

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C02 Deemed withdrawal of patent application after publication (patent law 2001)
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20130911