CN103020044A - Machine-aided webpage translation method and system thereof - Google Patents

Machine-aided webpage translation method and system thereof Download PDF

Info

Publication number
CN103020044A
CN103020044A CN2012105056324A CN201210505632A CN103020044A CN 103020044 A CN103020044 A CN 103020044A CN 2012105056324 A CN2012105056324 A CN 2012105056324A CN 201210505632 A CN201210505632 A CN 201210505632A CN 103020044 A CN103020044 A CN 103020044A
Authority
CN
China
Prior art keywords
translation
web page
webpage
term
page module
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN2012105056324A
Other languages
Chinese (zh)
Inventor
宗竞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
JIANGSU LEMAIDAO NETWORK TECHNOLOGY Co Ltd
Original Assignee
JIANGSU LEMAIDAO NETWORK TECHNOLOGY Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by JIANGSU LEMAIDAO NETWORK TECHNOLOGY Co Ltd filed Critical JIANGSU LEMAIDAO NETWORK TECHNOLOGY Co Ltd
Priority to CN2012105056324A priority Critical patent/CN103020044A/en
Publication of CN103020044A publication Critical patent/CN103020044A/en
Pending legal-status Critical Current

Links

Images

Abstract

The invention discloses a machine-aided webpage translation system. The machine-aided webpage translation system comprises a webpage receiving module, a webpage reading module and a webpage translating module, wherein the webpage receiving module parses a webpage through a parser to obtain a document object model; the webpage reading module reads the document object model; and the webpage translating module translates the webpage, builds a database, performs terminological management and performs bidirectional translation and layout. The system can effectively eliminate repeated work of a translator, thus improving the working efficiency.

Description

The auxiliary web page translation method of a kind of machine and system thereof
Technical field
The present invention relates to the auxiliary web page translation method of a kind of machine and system thereof.
Background technology
The accurate rate of translating of web page translation system is paced up and down for a long time about 70%, the readability of translation, system to the coverage rate of language phenomenon, especially opening is all unsatisfactory for the robustness of system.Society is in the urgent need to processing on a large scale real text (especially online mass text), and the expectation that the web page translation system processes extensive real text with society differs greatly.The thought of machine aided translation (Computer Aided Translation is called for short CAT) produces under such background.System compares with Fully Automatic Machine Translation, and machine-aided translation system is a kind of man-machine interactive system.In this interpretive scheme, computing machine is responsible for supplementary translation personnel's task, the knowledge of some vocabulary, term, phrase translation not only is provided to the translator, and from translating the translation of searching same or similar statement the text, make the translator avoid the unnecessary duplication of labour, carry out high efficiency translation.The Important Thought of computer-aided translation (comprising based on the translation memory technology with based on the translation technology of instance mode) is the same or analogous sentence of search or phrase in translation memory library (bilingual alignment storehouse) and instance mode storehouse, provides reference translation.
The translator takes full advantage of existing translated resources, avoids the duplication of labour as far as possible.This supplementary translation mechanism is particularly suitable for the translation of the text that this length such as scientific and technological monograph, scientific and technical literature, product description, service manual, the United Nations's file is long, the cataphasia phenomenon is more, can help the translator to eliminate the translation work of repetition, only need be absorbed in the translation of fresh content.
The machine aided translation software of mechanical translation data base technology based on such one simple true: because the related translation information enormous amount in technical translator field, and scope is relatively narrow, concentrate on certain or certain several specialties, technical translator company or the department of oneself arranged such as specialties such as politics, economy, military affairs, space flight, computing machine, communications.This just must bring the repetition in various degree of translation information.According to statistics, in different industries and department, the repetition rate of this data reaches 20% ~ 70% and does not wait.This is the meaningless duplication of labour with regard to meaning that the translator has the work more than 20% at least.The translation memory technology from setting about here, at first is devoted to eliminate translator's the duplication of labour, thereby is increased work efficiency exactly.
The web page translation function refers under the prerequisite that does not change webpage format, and the needed spoken and written languages of user translated in the spoken and written languages on the webpage of browser display.At present mostly common web page translation technology is for super word marking language (Hyper Text Markup Language, HTML) webpage of being write as is translated, its principle system obtains first the content of the source file (namely HTML shelves) of webpage, seeking afterwards the literal (being the literal between the HTML label) that needs in the webpage to translate translates, the result that then will translate substitutes original text, and generate new webpage, indicate again the newly-generated webpage of browser display.
Summary of the invention
In order to overcome the weak point in the above-mentioned background technology, the invention provides the auxiliary web page translation system of a kind of machine, comprise the reception Web page module, read Web page module and translating web page module, described translating web page module realizes by following step:
The first step, translation process, in the new sentence of translation, the search translation memory library compares and mates translation unit in this sentence and the data base, chooses the immediate translation unit of original text, provides reference translation;
Second step is built the storehouse automatically, automatic analysis and coupling original text and translation, and with original text and the translation corresponding translation memory library file that then automatically generates a standard one by one, all data of user can be recycled by this instrument take sentence as unit;
The 3rd step, the term management.All terms are carried out standard, disposablely set up the tabulation of one or more standard terminologys, when using the translation of translation memory system, open corresponding term tabulation in the term management tool, can automatically identify which word is arranged in the current sentence is defined term, and provide the term translation of standard;
In the 4th step, carry out two-way intertranslation between multilingual;
The 5th step, automatic typesetting, translation is applied mechanically the form of original text automatically, carries out automatic typesetting.
According to a kind of auxiliary web page translation system of machine that adopts said method, it comprises the reception Web page module, reads Web page module and translating web page module, described reception Web page module is resolved the acquisition document dbject model by resolver to webpage, the described Web page module that reads reads described document dbject model, and storehouse, term management and two-way intertranslation and composing are translated, built to described translating web page module to webpage.
Description of drawings
In order to be illustrated more clearly in the technical scheme in the embodiment of the invention, the accompanying drawing of required use was done to introduce simply during the below will describe embodiment, obviously, accompanying drawing in the following describes only is part embodiment of the present invention, for those of ordinary skills, under the prerequisite of not paying creative work, can also obtain other accompanying drawing according to these accompanying drawings.
Fig. 1 shows according to web page translation flow process of the present invention.
Embodiment
Below in conjunction with the accompanying drawing in the embodiment of the invention, the technical scheme in the embodiment of the invention is clearly and completely described, obviously, described embodiment only is the present invention's part embodiment, rather than whole embodiment.Based on the embodiment among the present invention, those of ordinary skills belong to the scope of protection of the invention not making the every other embodiment that obtains under the creative work prerequisite.
According to one embodiment of present invention, as shown in Figure 1, the auxiliary web page translation system of machine comprises the reception Web page module, reads Web page module and translating web page module, described reception Web page module is resolved the acquisition document dbject model by resolver to webpage, the described Web page module that reads reads described document dbject model, and storehouse, term management and two-way intertranslation and composing are translated, built to described translating web page module to webpage.After receiving webpage, meeting is resolved this webpage by resolver and is obtained document dbject model, and this document dbject model namely is stored in the receiver module.In the present embodiment, resolver is that the resolver (such as the MSXML of Microsoft) built-in to general browser is similar.Read module is in order to the first language literal in the literal node of file reading object model, and exports it to translation module.Wherein, read module is to come information in the file reading object model with order code (script) or program, such as Java script, VB script or PHP supervisor language.Translating web page is wherein realized by following step: translation process, automatically build the management of storehouse, term, multilingual two-way intertranslation and automatic typesetting:
The first step, translation process, in the new sentence of translation, the search translation memory library compares and mates translation unit in this sentence and the data base, chooses the immediate translation unit of original text, provides reference translation;
Second step is built the storehouse automatically, automatic analysis and coupling original text and translation, and with original text and the translation corresponding translation memory library file that then automatically generates a standard one by one, all data of user can be recycled by this instrument take sentence as unit;
The 3rd step, the term management.All terms are carried out standard, disposablely set up the tabulation of one or more standard terminologys, when using the translation of translation memory system, open corresponding term tabulation in the term management tool, can automatically identify which word is arranged in the current sentence is defined term, and provide the term translation of standard;
In the 4th step, carry out two-way intertranslation between multilingual;
The 5th step, automatic typesetting, translation is applied mechanically the form of original text automatically, carries out automatic typesetting.
Specific descriptions are:
Translation memory product automatically " memory " is lived each sentence translation of user's translation, in the new sentence of translation, searches for translation memory library, and translation unit in this sentence and the data base is compared and mates, and chooses the immediate translation unit of original text, provides reference translation.The user can accept this translation, also can make some modifications, and amended new translation can deposit data base automatically in, for later on.Because professional domain vocabulary and formula are relative fixing, after the user had accumulated a plurality of data bases that certain scale arranged, the repetition sentence that runs into can get more and more, and it is more and more lighter that translation also becomes.
General translation memory product also all network enabled share the data base function.That is to say, when many people translate simultaneously, can be by translation memory library of LAN-sharing, each online translator can call other people achievement in real time.
For before using the translation memory product, accumulated the user of a large amount of translation informations, the translation memory product can provide an automatic Library Construction Kit [Microsoft FoxPro].This instrument energy automatic analysis and coupling original text and translation are corresponding one by one with original text and translation take sentence as unit.The user finishes after some adjustment and the check and correction, and this instrument can generate the translation memory library file of a standard automatically.The all data of user can be recycled by this instrument, thereby set up translation memory library efficiently, quickly.These storehouses can further be replenished again and perfect in continuous use procedure.
It is the term management that the translation memory product generally also provides a very important function.For professional skill field, almost every piece of document is all with a large amount of technical terms, and the self-consistentency of term translation is one of important content of check and correction all the time.This work is wasted time and energy, and also difficult guarantor has careless omission.The translation memory product comes all terms of standard by a term management tool (generally being e-dictionary).The user only needs the disposable one or more standard terminology tabulations (comprising term original text and translation in the table) of setting up, when using the translation of translation memory system, open corresponding term tabulation in the term management tool, can automatically identify which word is arranged in the current sentence is defined term, and provide the term translation of standard.
Because what translation memory realized is comparison and the coupling of original text and translation, has brought an innate advantage of translation memory--support the two-way intertranslation between multilingual.Take translation memory software vendor Germany TRADOS (TRADOS) company as example, the product of the said firm is supported 55 kinds of language based on Unicode, has covered the Windows95/98/NT of nearly all language version.In other words, can realize two-way intertranslation between each languages once the cover product, this is unthinkable in mechanical translation.
The thing that the people does not go for again not only be the duplication of labour.The composing work of electronic document also is the work that allows the translator have a headache.Especially Localization Industry is very strict to the call format of translation, must be consistent with the form of former document.In this respect, the translation memory product is walked out and away in the front again.Present translation memory product generally all provides various format analysis processing instruments, supports popular document format, such as DOC, RTF, HTML, SGML, PPT etc.Translation can be applied mechanically the form of original text automatically, and the translator needn't take a lot of trouble to set type, as long as it is just passable to concentrate to be engaged in translation.
Need to prove that above embodiment only is the exemplary description to technical solution of the present invention, and is not limitation of the present invention; Although with reference to top embodiment the present invention is had been described in detail; but; those of ordinary skill in the art should be understood that fully; do not breaking away from the protection domain that limited by claims of the present invention under the prerequisite of spirit; can make amendment or part technical characterictic wherein is equal to replacement the technical scheme that above-described embodiment is put down in writing, these all should belong to protection scope of the present invention.

Claims (2)

1. a machine is assisted the web page translation method, it is characterized in that comprising following step:
The first step, translation process, in the new sentence of translation, the search translation memory library compares and mates translation unit in this sentence and the data base, chooses the immediate translation unit of original text, provides reference translation;
Second step is built the storehouse automatically, automatic analysis and coupling original text and translation, and with original text and the translation corresponding translation memory library file that then automatically generates a standard one by one, all data of user can be recycled by this instrument take sentence as unit;
The 3rd step, the term management, all terms are carried out standard, disposablely set up the tabulation of one or more standard terminologys, when using the translation of translation memory system, open in the term management tool corresponding term tabulation, can automatically identify which word is arranged in the current sentence is defined term, and provide the term translation of standard;
In the 4th step, carry out two-way intertranslation between multilingual;
The 5th step, automatic typesetting, translation is applied mechanically the form of original text automatically, carries out automatic typesetting.
2. one kind is adopted the machine of the described method of claim 1 to assist the web page translation system, it is characterized in that comprising the reception Web page module, read Web page module and translating web page module, described reception Web page module is resolved the acquisition document dbject model by resolver to webpage, the described Web page module that reads reads described document dbject model, and storehouse, term management and two-way intertranslation and composing are translated, built to described translating web page module to webpage.
CN2012105056324A 2012-12-03 2012-12-03 Machine-aided webpage translation method and system thereof Pending CN103020044A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2012105056324A CN103020044A (en) 2012-12-03 2012-12-03 Machine-aided webpage translation method and system thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2012105056324A CN103020044A (en) 2012-12-03 2012-12-03 Machine-aided webpage translation method and system thereof

Publications (1)

Publication Number Publication Date
CN103020044A true CN103020044A (en) 2013-04-03

Family

ID=47968661

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2012105056324A Pending CN103020044A (en) 2012-12-03 2012-12-03 Machine-aided webpage translation method and system thereof

Country Status (1)

Country Link
CN (1) CN103020044A (en)

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103235775A (en) * 2013-04-25 2013-08-07 中国科学院自动化研究所 Statistics machine translation method integrating translation memory and phrase translation model
CN103885942A (en) * 2014-03-18 2014-06-25 成都优译信息技术有限公司 Rapid translation device and method
CN104331399A (en) * 2014-07-25 2015-02-04 一朵云(北京)科技有限公司 Dictionary tree translation method
CN104881406A (en) * 2015-06-15 2015-09-02 携程计算机技术(上海)有限公司 Web page translation method and system
CN106126508A (en) * 2016-06-22 2016-11-16 上海者信息科技有限公司 A kind of language material management method
CN106557478A (en) * 2015-09-25 2017-04-05 四川省科技交流中心 Distributed across languages searching systems and its search method based on bridge language
CN106557466A (en) * 2015-09-25 2017-04-05 四川省科技交流中心 Distributed across languages searching systems and its search method based on centralized translation
CN106844354A (en) * 2017-01-11 2017-06-13 中国科学院合肥物质科学研究院 A kind of webpage takes word Chinese interpretation method and its device
CN107066454A (en) * 2017-03-27 2017-08-18 成都优译信息技术股份有限公司 Number and sequence number replacement method and system for machine translation
CN107329958A (en) * 2017-06-08 2017-11-07 努比亚技术有限公司 Language transfer method and device based on webpage
CN109783826A (en) * 2019-01-15 2019-05-21 四川译讯信息科技有限公司 A kind of document automatic translating method
CN110083845A (en) * 2019-04-25 2019-08-02 数译(成都)信息技术有限公司 Web page translation method and system
CN110889296A (en) * 2019-11-27 2020-03-17 福建亿榕信息技术有限公司 Real-time translation method and device combined with crawler technology
CN111563387A (en) * 2019-02-12 2020-08-21 阿里巴巴集团控股有限公司 Sentence similarity determining method and device and sentence translation method and device
CN113792558A (en) * 2021-11-16 2021-12-14 北京百度网讯科技有限公司 Self-learning translation method and device based on machine translation and post-translation editing

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030154071A1 (en) * 2002-02-11 2003-08-14 Shreve Gregory M. Process for the document management and computer-assisted translation of documents utilizing document corpora constructed by intelligent agents
CN1687925A (en) * 2005-05-10 2005-10-26 贺方升 Method for realizing bilingual web page searching
CN101470705A (en) * 2007-12-29 2009-07-01 英业达股份有限公司 Dynamic web page translation system and method

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030154071A1 (en) * 2002-02-11 2003-08-14 Shreve Gregory M. Process for the document management and computer-assisted translation of documents utilizing document corpora constructed by intelligent agents
CN1687925A (en) * 2005-05-10 2005-10-26 贺方升 Method for realizing bilingual web page searching
CN101470705A (en) * 2007-12-29 2009-07-01 英业达股份有限公司 Dynamic web page translation system and method

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
张国霞等: "浅议计算机辅助翻译软件", 《中国现代教育装备》 *
柏晓静等: "面向中文学术专著的机器辅助翻译研究", 《中国翻译》 *
许钧等: "《翻译学概论》", 31 October 2009, 译林出版社 *

Cited By (22)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103235775B (en) * 2013-04-25 2016-06-29 中国科学院自动化研究所 A kind of statistical machine translation method merging translation memory and phrase translation model
CN103235775A (en) * 2013-04-25 2013-08-07 中国科学院自动化研究所 Statistics machine translation method integrating translation memory and phrase translation model
CN103885942B (en) * 2014-03-18 2017-09-05 成都优译信息技术股份有限公司 A kind of rapid translation device and method
CN103885942A (en) * 2014-03-18 2014-06-25 成都优译信息技术有限公司 Rapid translation device and method
CN104331399A (en) * 2014-07-25 2015-02-04 一朵云(北京)科技有限公司 Dictionary tree translation method
CN104881406A (en) * 2015-06-15 2015-09-02 携程计算机技术(上海)有限公司 Web page translation method and system
CN104881406B (en) * 2015-06-15 2018-05-04 上海携程商务有限公司 Web page translation method and system
CN106557478A (en) * 2015-09-25 2017-04-05 四川省科技交流中心 Distributed across languages searching systems and its search method based on bridge language
CN106557466A (en) * 2015-09-25 2017-04-05 四川省科技交流中心 Distributed across languages searching systems and its search method based on centralized translation
CN106126508A (en) * 2016-06-22 2016-11-16 上海者信息科技有限公司 A kind of language material management method
CN106844354A (en) * 2017-01-11 2017-06-13 中国科学院合肥物质科学研究院 A kind of webpage takes word Chinese interpretation method and its device
CN107066454A (en) * 2017-03-27 2017-08-18 成都优译信息技术股份有限公司 Number and sequence number replacement method and system for machine translation
CN107329958A (en) * 2017-06-08 2017-11-07 努比亚技术有限公司 Language transfer method and device based on webpage
CN107329958B (en) * 2017-06-08 2021-03-26 努比亚技术有限公司 Language conversion method and device based on webpage
CN109783826A (en) * 2019-01-15 2019-05-21 四川译讯信息科技有限公司 A kind of document automatic translating method
CN109783826B (en) * 2019-01-15 2023-11-21 四川译讯信息科技有限公司 Automatic document translation method
CN111563387A (en) * 2019-02-12 2020-08-21 阿里巴巴集团控股有限公司 Sentence similarity determining method and device and sentence translation method and device
CN111563387B (en) * 2019-02-12 2023-05-02 阿里巴巴集团控股有限公司 Sentence similarity determining method and device, sentence translating method and device
CN110083845A (en) * 2019-04-25 2019-08-02 数译(成都)信息技术有限公司 Web page translation method and system
CN110083845B (en) * 2019-04-25 2023-06-16 四川语言桥信息技术有限公司 Webpage translation method and system
CN110889296A (en) * 2019-11-27 2020-03-17 福建亿榕信息技术有限公司 Real-time translation method and device combined with crawler technology
CN113792558A (en) * 2021-11-16 2021-12-14 北京百度网讯科技有限公司 Self-learning translation method and device based on machine translation and post-translation editing

Similar Documents

Publication Publication Date Title
CN103020044A (en) Machine-aided webpage translation method and system thereof
CN108694214A (en) Generation method, generating means, readable medium and the electronic equipment of data sheet
KR20090015604A (en) Method and apparatus for constructing translation knowledge
CN102262621A (en) Device and method for checking translated text
JP2009151777A (en) Method and apparatus for aligning spoken language parallel corpus
RU2546064C1 (en) Distributed system and method of language translation
Wang The development of translation technology in the era of big data
Che et al. A word segmentation method of ancient Chinese based on word alignment
AbuSa’aleek The adequacy and acceptability of machine translation in translating the Islamic texts
CN102184171B (en) Method for checking mechanical translation
CN103793368B (en) A kind of method of labelling in protection markup language automatically in automatization translation processes
Sellam et al. Improved Statistical Machine Translation by Cross-Lingustic Projection of Named Entities Recognition and Translation
Batoulis et al. Automatic business process model translation with BPMT
Deksne et al. The modern electronic dictionary that always provides an answer
Goswami et al. An empirical study on English to Hindi E-contents Machine Translation through multi engines
CN108008947B (en) Intelligent prompting method and device for programming statement, server and storage medium
CN108932326B (en) Instance extension method, device, equipment and medium
CN110618809B (en) Front-end webpage input constraint extraction method and device
Hameed et al. A deconverter framework for Malayalam
Le Thuyen et al. Results Comparison of machine translation by Direct translation and by Through intermediate language
Isahara Toward Practical Use of Machine Translation
Zhou Functional analysis of snowman CAT standard edition translation software based on the normal distribution and similarity model
Mörth et al. Towards a diatopic dictionary of spoken arabic varieties: challenges in compiling the VICAV dictionaries
Inna THE ROLE OF MACHINE TRANSLATION IN IT-SPECIALISTS’TRANSLATION ACTIVITY
Yulianti et al. Automatic Generation with Manual Post-Editing of Shallow-Transfer Rule for English-to-Indonesian

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C02 Deemed withdrawal of patent application after publication (patent law 2001)
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20130403