CN104410424B - The fast and lossless compression method of embedded device internal storage data - Google Patents

The fast and lossless compression method of embedded device internal storage data Download PDF

Info

Publication number
CN104410424B
CN104410424B CN201410696377.5A CN201410696377A CN104410424B CN 104410424 B CN104410424 B CN 104410424B CN 201410696377 A CN201410696377 A CN 201410696377A CN 104410424 B CN104410424 B CN 104410424B
Authority
CN
China
Prior art keywords
length
character
byte
record
fresh character
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201410696377.5A
Other languages
Chinese (zh)
Other versions
CN104410424A (en
Inventor
宋彬
李慧玲
秦浩
裴远
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xidian University
Original Assignee
Xidian University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xidian University filed Critical Xidian University
Priority to CN201410696377.5A priority Critical patent/CN104410424B/en
Publication of CN104410424A publication Critical patent/CN104410424A/en
Application granted granted Critical
Publication of CN104410424B publication Critical patent/CN104410424B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Abstract

The invention discloses a kind of fast and lossless compression method of technical field of data processing embedded device internal storage data, mainly solve the problems, such as that existing compression method is low to memory pages compression speed.It is mainly characterized by devising two kinds of compressed formats of suitable memory pages data:One kind is first byte record character repetition length, offset distance and fresh character length, records remaining fresh character length, fresh character and remaining offset distance successively since second byte;Second is first byte recording compressed format denotation, offset distance and fresh character length, records remaining fresh character length, fresh character, character repetition length and remaining offset distance successively since second byte.It is of the invention compared with current LZO lossless compression methods, improve the compression & decompression speed of memory pages data, and more preferable compression ratio is obtained, so as to improve the internal storage data memory capacity and utilization rate of embedded device, can be used for the embedded device of constrained storage.

Description

The fast and lossless compression method of embedded device internal storage data
Technical field
The invention belongs to technical field of data processing, it is related to the data compression method of embedded device internal storage data, this hair The bright feature in data compression according to internal storage data improves the speed of compression using new data compression format, can be used on and deposits In the limited embedded device of storage.
Background technology
Internal memory is one of important component of computer, and it is the bridge linked up with CPU.All programs in computer Operation is carried out all in internal memory.Influence of the performance of internal memory to computer is very big, and in limited embedding of volume, memory capacity Enter in formula portable equipment, influence of the internal memory to equipment performance and Consumer's Experience is especially prominent.In the last few years, with mobile Internet Development, portable embedded equipment such as mobile phone, panel computer etc. have become a kind of indispensable means of communication of people.Therefore internally Deposit data is compressed, and memory storage ability very high and utilization rate will greatly improve the overall performance of equipment.With social development, Information content constantly increases, and people it is also proposed requirement higher to the systematic function of embedded device, such as speed higher, lower Power consumption, smaller volume, more information etc. can be accessed.In order to reach various performance requirements above, there has been proposed Various improved methods.Compared to the breakthrough of the hardware technology of great number, more rapidly and effectively one of method is exactly lossless data Compress technique.If using lossless data compression technology in embedded device, then can be in identical hardware memory space More data are accessed, memory usage, reduces cost, equipment performance very high and Consumer's Experience is improved.In view of above-mentioned technology Various advantages, with the technology of the improvement embedded system performance of this simple and inexpensive, research lossless data compression technology is Necessary.
Lempel and Ziv proposed a kind of efficient undistorted compression technology, i.e. LZ77 lossless datas pressure in 1977 Compression algorithm, the cardinal principle of the compression algorithm is using the repetition word string occurred before shorter mark representative, tag format It is repeat length, offset distance, such as abcdekabcdeha can then be encoded into abcdek (5,6) ha and represent, so from entirety For upper, shorter information replaces information more long, so as to reach the effect of compression.Nineteen eighty-two, James Storer and Be improved for algorithm on the basis of LZ77 propose LZSS algorithms by Thomas Szymanski, improves compression efficiency.Later Be improved for algorithm on the basis of LZSS again and propose LZO algorithms by Lempel-Ziv-Oberhumer, drastically increases Compressed encoding speed.LZO algorithms are a kind of lossless data compression algorithms based on dictionary, with compression speed is fast, instantaneity The characteristics of.The algorithm devises five kinds of data compression formats according to different repeat length and offset distance, coding side according to Pairing, i.e. repeat length, a certain compressed format encodings of size selection of offset distance, the lead-in that decoding end passes through compressed format Section size discrimination this five kinds of different forms, maximum offset distance can reach 48K.The method exist weak point be, From the beginning of internal memory " Paging system " proposition, the default size of memory pages is just arranged to 4096 bytes, i.e. 4KB.Although in original The memory pages size for then going up computer is configurable, but most of operating system is in the implementation still using acquiescence The 4KB pages.For ease of main-memory data management, reply internal storage data by the way of compressing page by page, and LZO initial designs purposes are The indefinite data of reduction length, it can only obtain very low compression ratio, it is impossible to effectively improve internal memory in compression memory data Utilization rate, and compression & decompression speed is all very slow.Therefore for internal storage data, with the compress mode and pressure of current LZO Contracting form can not all be applicable.
The content of the invention
It is an object of the invention to overcome the shortcomings of above-mentioned prior art, it is proposed that a kind of embedded device internal storage data Fast and lossless compression method, with can faster compression & decompression internal storage data, so as to effectively improve memory storage ability And utilization rate.
Realize the technical scheme is that:According to the data characteristics of memory pages, a kind of suitable memory pages are designed Compressed format, coding is compressed for memory pages data, and specific steps include as follows:
(1) memory pages of internal storage data in embedded device are read, i.e., is read page by page by the page-size of 4KB interior Deposit the page;
(2) whether the data for judging the read page are new data, if read data are not recorded in dictionary, are judged as new Data, and dictionary is charged in new data position, continue to read memory pages data, until there is not new data untill;
The dictionary is the Hash table structure directly accessed according to key value, and the key value is calculated by hash function Go out;
(3) to being reported in reading data in dictionary, according to character repetition length and offset distance, i.e. character present bit Put with the distance between record position in Hash table, encoded from different compressed format:
8, and memory pages data of the offset distance less than or equal to 2KB, its first byte note are less than for character repetition length Record character repetition length L, offset distance D and fresh character length S;Record remaining fresh character successively since second byte long Degree M, fresh character C and remaining offset distance N;
For being unsatisfactory for character repetition length less than 8, and memory pages data of the offset distance less than or equal to 2KB, it is first Byte records compressed format mark T, offset distance D and fresh character length S;Record remaining new successively since second byte Character length M, fresh character C, character repetition length L and remaining offset distance N;
(4) judge whether coding site is currently to read in memory pages ending, if so, the then data sum after output squeezing According to length, and record end flag performs step (5), and otherwise, return to step (2) continues to read in new data;
(5) judge current page whether be internal storage data bag last memory pages, if so, then end-of-encode, no Then, return to step (1) reads in next memory pages.
The present invention is simple by the compressed format for being used, so as to improve the compression & decompression speed of memory pages data Degree, and more preferable compression ratio is obtained, it is capable of the internal storage data memory capacity and utilization rate of larger amplitude raising embedded device.
Test result shows:It is of the invention compared with current LZO lossless compression methods, its compression time improves 14.52%, the decompression time improves 98.84%, and compression ratio improves 1.1%.
Brief description of the drawings
Fig. 1 is compression process figure of the invention;
Fig. 2 is the compressed format figure in the present invention.
Specific embodiment
The present invention is described in further detail with reference to figure:
Reference picture 1, it is of the invention to realize that step is as follows:
Step 1:From the internal storage data bag of embedded device read in a memory pages, i.e., by 4KB page-size by Page reads memory pages.
Step 2:From institute, the rdma read page reads in four characters, does first time Hash operation, i.e., by first Hash letter Number calculates key value, and the hash function is first hash function in current LZO lossless compression methods.
Step 3:Judge whether the position of character is legal according to key value in step 2, if legal, into step 4, if not It is legal, then Hash table is updated, the Hash table is the data structure directly accessed according to key value, returns again to step 2.
Described is legal, refers to that each position deposited in Hash table can only access according to a key value.
Step 4:Judge current Hash table deposit the character in position whether with read in character it is identical, if identical, enter Step 7, if differing, into step 5.
Described current Hash table deposits position, refers to that the Hash table directly accessed according to key value in step 2 is deposited One position.
Step 5:Second Hash operation is done with the key value obtained in step 2, i.e., is calculated by second hash function Second key value is drawn, the hash function is second hash function in current LZO lossless compression methods;Further according to Two key values judge whether character position is legal, if legal, into step 6, if illegal, update Hash table, return Step 2.
Step 6:Judge that the Hash table deposits whether the character in address is identical with character is read in, if identical, entrance is walked Rapid 7, if differing, judge that it is fresh character C to read in character, and Hash table is updated, return to step 2.
Described Hash table deposits address, refers to be deposited according to second direct Hash table for accessing of key value in step 5 An address.
Step 7:Calculate fresh character length S, character repetition length L and offset distance D, i.e. character current location and Hash table The distance between interior record position.
The computational methods are identical with the computational methods in current LZO lossless compression methods.
Step 8:Judge whether character repetition length is less than 8 and whether offset distance is less than or equal to 2KB, if so, then performing Step 9;If it is not, then performing step 10.
Step 9:By character in compressed format 1 record rule encoded.
Reference picture 2 (a) is as follows the step of this step is encoded according to the record rule of compressed format 1:
9.1) preceding 3 records character repetition length L of first byte, the 4th, the 5th, the 6th of first byte is record-shifted apart from D Rear 3 bit, i.e., each record 1 bit;
9.2) whether fresh character length S is judged more than 3, if it is not, then last 2 records fresh character length S of first byte, And from second byte start recording fresh character C;If so, then last 2 of first byte are recorded as 0 as mark, and use new word Whether symbol length S subtracts 3, obtains remaining fresh character length M, then judges remaining fresh character length M more than 255, if it is not, then Remaining fresh character length M is recorded, if so, then recording a byte 0, and subtracts 255 with remaining fresh character length M, until surplus Remaining fresh character length is less than 255, records the remaining fresh character length, re-records fresh character C;
9.3) after record fresh character is completed, remaining offset distance N is recorded, the remaining offset distance N is offset distance 8 bit before D;Subsequently into step 11.
Step 10:By character in compressed format 2 record rule encoded.
Reference picture 2 (b) is as follows the step of this step is encoded according to the record rule of compressed format 2:
10.1) first 2 of first byte are recorded as 01 and are recorded as the 0, the 2nd record as compressed format mark T, i.e., the 1st It is 1, the 3rd, the 4th, the 5th, the 6th record-shifted rear 4 bit apart from D of first byte, i.e. each 1 bit of record.
10.2) whether fresh character length S is judged more than 3, if it is not, then last 2 records fresh character length of first byte S, i.e. each 1 bit of record, and from second byte start recording fresh character;If so, then last 2 of first byte are recorded as 0 used as mark, and subtracts 3 with fresh character length S, obtains remaining fresh character length M, then judge that remaining fresh character length M is It is no to be more than 255, if it is not, then record remaining fresh character length M, if so, then record a byte 0, and with remaining new word Symbol length M subtracts 255, until remaining fresh character length is less than 255, records the remaining fresh character length, re-records fresh character C;
10.3) after record fresh character is completed, whether character repetition length L is judged more than 255, if it is not, then recording word Symbol repeat length L, if so, then recording a byte 0, and subtracts 255, until character repetition length is less than with character repetition length L 255, record the character repetition length;
10.4) after record character repetition length is completed, remaining offset distance N is recorded, the remaining offset distance N is Preceding 8 bit of offset distance D.
Step 11:Judge whether coding site is identical with current memory pages end position of reading in, if identical, output is compiled The length of data and data after code, and record end flag, subsequently into step 12, if differing, return to step 2.
Described end mark, refers to that the record byte records of three data of byte, i.e., the 1st are the 17, the 2nd and the 3rd Byte is all recorded as 0.
Step 12:Judge the current memory page whether be internal storage data bag last memory pages, i.e., it is embedded to set Whether all data in standby internal storage data bag have all been read, if so, then end-of-encode, if it is not, then return to step 1。
Effect of the invention is described further with reference to experiment:
The proposed compression method of invention is write in this experiment using C language, by it is relatively more of the invention with current LZO without Compression method is damaged to the compression effectiveness of internal memory page data to illustrate the advantage of the inventive method compression & decompression speed.LZO It is lossless compression method best at present.The internal storage data that this experiment is used is the internal memory of the 4KB sizes of representative mobile device Page data, experiment is memory pages packet using data, and data package size is 453MB.In VS2010 programming development environments In, respectively with LZO lossless compression method compression memory page datas of the invention and current, experimental result is as shown in table 1:
Table 1
Time in table 1 is compression time and the decompression time, data in form of all memory pages of whole compressed package It is to have run the result being averaged for 1000 times.As can be seen from Table 1, compression ratio of the invention improves 1.1%, while during compression Between be respectively increased 14.52% and 98.84% with the decompression time, so as to improve embedded memory data storage capacity and profit With rate.

Claims (5)

1. a kind of fast and lossless compression method of embedded device internal storage data, comprises the following steps:
(1) memory pages of internal storage data in embedded device are read, i.e., reads page page by page by the page-size of 4KB Face;
(2) whether the data for judging the read page are new data, if read data are not recorded in dictionary, are judged as new number According to, and dictionary is charged in new data position, continue to read memory pages data, until there is not new data untill;
The dictionary is the Hash table structure directly accessed according to key value, and the key value is calculated by hash function;
(3) to being reported in reading data in dictionary, according to character repetition length and offset distance, i.e. character current location with The distance between record position in Hash table, is encoded from different compressed format:
8, and memory pages data of the offset distance less than or equal to 2KB, its first byte record word are less than for character repetition length Symbol repeat length L, offset distance D and fresh character length S;Recorded successively since second byte remaining fresh character length M, Fresh character C and remaining offset distance N;
For being unsatisfactory for character repetition length less than 8, and memory pages data of the offset distance less than or equal to 2KB are unsatisfactory for, its First byte recording compressed format denotation T, offset distance D and fresh character length S;Recorded successively since second byte remaining Fresh character length M, fresh character C, character repetition length L and remaining offset distance N;
(4) judge whether coding site is currently to read in memory pages ending, if so, data then after output squeezing and data Length, and record end flag, perform step (5), and otherwise, return to step (2) continues to read in new data;
(5) judge current page whether be internal storage data bag last memory pages, if so, then end-of-encode, otherwise, returns Return step (1) and read in next memory pages.
2. the fast and lossless compression method of embedded device internal storage data according to claim 1, it is characterised in that:Step (3) the first byte record character repetition length L described in, offset distance D and fresh character length S, record according to the following rules:
Preceding 3 record character repetition the length L, L of first byte<8;
4th, the 5th, the 6th record-shifted rear 3 bit apart from D of first byte, i.e. each 1 bit of record;
Whether judge fresh character length S more than 3, if it is not, then last 2 records fresh character length S of first byte, i.e., each 1 bit is recorded, if so, then last 2 of first byte are recorded as 0 as mark, and subtracts 3 with fresh character length S, obtain remaining Fresh character length M.
3. the fast and lossless compression method of embedded device internal storage data according to claim 1, it is characterised in that:Step (3) remaining fresh character length M, fresh character C and remaining offset distance N are recorded successively since second byte described in, Record according to the following rules:
Whether remaining fresh character length M is judged more than 255, if it is not, remaining fresh character length M is then recorded, if so, then remembering One byte 0 of record, and subtract 255 with remaining fresh character length M, until remaining fresh character length is less than 255, record the residue Fresh character length, re-record fresh character C;
After record fresh character is completed, remaining offset distance N is recorded, the remaining offset distance N is preceding the 8 of offset distance D Bit.
4. the fast and lossless compression method of embedded device internal storage data according to claim 1, it is characterised in that:Step (3) first byte recording compressed format denotation T, offset distance D and fresh character length S described in, record according to the following rules:
First 2 of first byte are recorded as 01 and are recorded as the 0, the 2nd as compressed format mark T, i.e., the 1st being recorded as 1;
3rd, the 4th, the 5th, the 6th record-shifted rear 4 bit apart from D of first byte, i.e. each 1 bit of record;
Whether judge fresh character length S more than 3, if it is not, then last 2 records fresh character length S of first byte, i.e., each 1 bit is recorded, if so, then last 2 of first byte are recorded as 0 as mark, and subtracts 3 with fresh character length S, obtain remaining Fresh character length M.
5. the fast and lossless compression method of embedded device internal storage data according to claim 1, it is characterised in that:Step (3) remaining fresh character length M, fresh character C, character repetition length L and surplus are recorded successively since second byte described in Remaining offset distance N, records according to the following rules:
Whether remaining fresh character length M is judged more than 255, if it is not, remaining fresh character length M is then recorded, if so, then remembering One byte 0 of record, and subtract 255 with remaining fresh character length M, until remaining fresh character length is less than 255, record the residue Fresh character length, re-record fresh character C;
After record fresh character is completed, whether character repetition length L is judged more than 255, if it is not, then recording character repetition length L, if so, then recording a byte 0, and subtracts 255 with character repetition length L, and until character repetition length is less than 255, record should Character repetition length;
After record character repetition length is completed, remaining offset distance N is recorded, the remaining offset distance N is offset distance D Preceding 8 bit.
CN201410696377.5A 2014-11-26 2014-11-26 The fast and lossless compression method of embedded device internal storage data Active CN104410424B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410696377.5A CN104410424B (en) 2014-11-26 2014-11-26 The fast and lossless compression method of embedded device internal storage data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410696377.5A CN104410424B (en) 2014-11-26 2014-11-26 The fast and lossless compression method of embedded device internal storage data

Publications (2)

Publication Number Publication Date
CN104410424A CN104410424A (en) 2015-03-11
CN104410424B true CN104410424B (en) 2017-06-16

Family

ID=52648025

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410696377.5A Active CN104410424B (en) 2014-11-26 2014-11-26 The fast and lossless compression method of embedded device internal storage data

Country Status (1)

Country Link
CN (1) CN104410424B (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106385260B (en) * 2016-09-28 2019-05-21 中电莱斯信息系统有限公司 A kind of FPGA realization system of the LZ lossless compression algorithm based on low delay
CN107967296B (en) * 2017-10-31 2020-06-09 西安空间无线电技术研究所 Improved LZO compression method with rapid low resource overhead
CN110417923B (en) * 2018-04-26 2021-10-29 阿里巴巴集团控股有限公司 DNS message processing method, device and equipment
CN112230032A (en) * 2020-08-03 2021-01-15 青岛鼎信通讯股份有限公司 Electric energy meter data compression and decompression method
CN114244373B (en) * 2022-02-24 2022-05-20 麒麟软件有限公司 LZ series compression algorithm coding and decoding speed optimization method
CN114598329B (en) * 2022-03-18 2023-04-25 电子科技大学 Lightweight lossless compression method for rapid decompression application

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5455576A (en) * 1992-12-23 1995-10-03 Hewlett Packard Corporation Apparatus and methods for Lempel Ziv data compression with improved management of multiple dictionaries in content addressable memory
US7190284B1 (en) * 1994-11-16 2007-03-13 Dye Thomas A Selective lossless, lossy, or no compression of data based on address range, data type, and/or requesting agent
CN103138764A (en) * 2011-11-22 2013-06-05 上海麦杰科技股份有限公司 Method and system for lossless compression of real-time data
CN103236847A (en) * 2013-05-06 2013-08-07 西安电子科技大学 Multilayer Hash structure and run coding-based lossless compression method for data
CN103258030A (en) * 2013-05-09 2013-08-21 西安电子科技大学 Mobile device memory compression method based on dictionary encoding and run-length encoding
CN103618554A (en) * 2013-12-01 2014-03-05 西安电子科技大学 Internal storage page compression method based on dictionary
CN104050269A (en) * 2014-06-23 2014-09-17 上海帝联信息科技股份有限公司 Log compression method and device and log decompression method and device
CN104125458A (en) * 2013-04-27 2014-10-29 展讯通信(上海)有限公司 Lossless stored data compression method and device

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5455576A (en) * 1992-12-23 1995-10-03 Hewlett Packard Corporation Apparatus and methods for Lempel Ziv data compression with improved management of multiple dictionaries in content addressable memory
US7190284B1 (en) * 1994-11-16 2007-03-13 Dye Thomas A Selective lossless, lossy, or no compression of data based on address range, data type, and/or requesting agent
CN103138764A (en) * 2011-11-22 2013-06-05 上海麦杰科技股份有限公司 Method and system for lossless compression of real-time data
CN104125458A (en) * 2013-04-27 2014-10-29 展讯通信(上海)有限公司 Lossless stored data compression method and device
CN103236847A (en) * 2013-05-06 2013-08-07 西安电子科技大学 Multilayer Hash structure and run coding-based lossless compression method for data
CN103258030A (en) * 2013-05-09 2013-08-21 西安电子科技大学 Mobile device memory compression method based on dictionary encoding and run-length encoding
CN103618554A (en) * 2013-12-01 2014-03-05 西安电子科技大学 Internal storage page compression method based on dictionary
CN104050269A (en) * 2014-06-23 2014-09-17 上海帝联信息科技股份有限公司 Log compression method and device and log decompression method and device

Also Published As

Publication number Publication date
CN104410424A (en) 2015-03-11

Similar Documents

Publication Publication Date Title
CN104410424B (en) The fast and lossless compression method of embedded device internal storage data
CN103236847B (en) Based on the data lossless compression method of multilayer hash data structure and Run-Length Coding
CN103258030B (en) Based on the mobile device memory compression methods that dictionary and brigade commander are encoded
US9077368B2 (en) Efficient techniques for aligned fixed-length compression
WO2019153700A1 (en) Encoding and decoding method, apparatus and encoding and decoding device
WO2010044100A1 (en) Lossless compression
US20130006981A1 (en) Storage device and data processing device utilizing determined dictionary compression
US20200294629A1 (en) Gene sequencing data compression method and decompression method, system and computer-readable medium
CN104753540A (en) Data compression method, data decompression method and device
WO2019228098A1 (en) Data compression method and device
CN104378119B (en) The fast and lossless compression method of file system of embedded device data
KR101866151B1 (en) Adaptive rate compression hash processing device
CN105096367A (en) Method and device of optimizing Canvas rendering performance
CN103618554B (en) Memory pages compression method based on dictionary
CN107623855A (en) A kind of embedded rate steganography device of height based on compressed encoding and steganography method
US20100321218A1 (en) Lossless content encoding
CN110021368B (en) Comparison type gene sequencing data compression method, system and computer readable medium
CN103731154A (en) Data compression algorithm based on semantic analysis
CN105337617A (en) Method for efficiently compressing full service network (FSN) files
CN116170027B (en) Data management system and processing method for poison detection equipment
US9479195B2 (en) Non-transitory computer-readable recording medium, compression method, decompression method, compression device, and decompression device
CN102932001B (en) Motion capture data compression, decompression method
CN102708191A (en) Word stock coding and decoding method capable of saving memory
CN104682966A (en) Non-destructive compressing method for list data
CN109698704B (en) Comparative gene sequencing data decompression method, system and computer readable medium

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant