python压缩字符串_python如何读取通过gzip压缩的字符串

‘壹’ pythonzip函数引入什么包

zip() 函数是Python内置函数之一,它可以将多个序列(列表、元组、字典、集合、字符串以及 range() 区间构成的列表)“压缩”成一个 zip 对象。

‘贰’ Python中的常用内置函数有哪些呢

abs() divmod() input() open() staticmethod()
all() enumerate() int() ord() str()
any() eval() isinstance() pow() sum()
basestring() execfile() issubclass() print() super()
bin() file() iter() property() tuple()
bool() filter() len() range() type()
bytearray() float() list() raw_input() unichr()
callable() format() locals() rece() unicode()
chr() frozenset() long() reload() vars()
classmethod() getattr() map() repr() xrange()
cmp() globals() max() reverse() zip()
compile() hasattr() memoryview() round() __import__()
complex() hash() min() set()
delattr() help() next() setattr()
dict() hex() object() slice()
dir() id() oct() sorted()

‘叁’ python zip函数

zip()函数用于将携竖可迭代的对象作为参考，将对象中对应的元素打包成一个个远哗隐凯足，然后返回有这些元祖组成的乱唤列表。

zip（[iterabale,....]）

‘肆’ 利用python编程，在多个打包压缩的文件中搜索指定字符串。有很多xml文件

ziprar.py

__author__='williezh'
#!/usr/bin/envpython3

importos
importsys
importtime
importshutil
importzipfile
fromzipfileimportZIP_DEFLATED


#Zip文件处理类
classZFile(object):
def__init__(self,fname,mode='r',basedir=''):
self.fname=fname
self.mode=mode
ifself.modein('w','a'):
self.zfile=zipfile.ZipFile(fname,mode,compression=ZIP_DEFLATED)
else:
self.zfile=zipfile.ZipFile(fname,self.mode)
self.basedir=basedir
ifnotself.basedir:
self.basedir=os.path.dirname(fname)

defaddfile(self,path,arcname=None):
path=path.replace('//','/')
ifnotarcname:
ifpath.startswith(self.basedir):
arcname=path[len(self.basedir):]
else:
arcname=''
self.zfile.write(path,arcname)

defaddfiles(self,paths):
forpathinpaths:
ifisinstance(path,tuple):
self.addfile(*path)
else:
self.addfile(path)

defclose(self):
self.zfile.close()

defextract_to(self,path):
forpinself.zfile.namelist():
self.extract(p,path)

defextract(self,fname,path):
ifnotfname.endswith('/'):
fn=os.path.join(path,fname)
ds=os.path.dirname(fn)
ifnotos.path.exists(ds):
os.makedirs(ds)
withopen(fn,'wb')asf:
f.write(self.zfile.read(fname))


#创建Zip文件
defcreateZip(zfile,files):
z=ZFile(zfile,'w')
z.addfiles(files)
z.close()


#解压缩Zip到指定文件夹
defextractZip(zfile,path):
z=ZFile(zfile)
z.extract_to(path)
z.close()


#解压缩rar到指定文件夹
defextractRar(zfile,path):
rar_command1="WinRAR.exex-ibck%s%s"%(zfile,path)
rar_command2=r'"C:WinRAR.exe"x-ibck%s%s'%(zfile,path)
try:
res=os.system(rar_command1)
ifres==0:
print("PathOK.")
except:
try:
res=os.system(rar_command2)
ifres==0:
print("Successtounrarthefile{}.".format(path))
except:
print('Error:cannotunrarthefile{}'.format(path))


#解压多个压缩文件到一个临时文件夹
defextract_files(file_list):
newdir=str(int(time.time()))
forfninfile_list:
subdir=os.path.join(newdir,fn)
ifnotos.path.exists(subdir):
os.makedirs(subdir)
iffn.endswith('.zip'):
extractZip(fn,subdir)
eliffn.endswith('.rar'):
extractRar(fn,subdir)
returnnewdir


#查找一个文件夹中的某些文件,返回文件内容包含findstr_list中所有字符串的文件
deffindstr_at(basedir,file_list,findstr_list):
files=[]
forr,ds,fsinos.walk(basedir):
forfninfs:
iffninfile_list:
withopen(os.path.join(r,fn))asf:
s=f.read()
ifall(iinsforiinfindstr_list):
files.append(os.path.join(r,fn))
returnfiles


if__name__=='__main__':
files=[iforiinsys.argv[1:]ifnoti.startswith('-')]
unzipfiles=[iforiinfilesifi.endswith('.zip')ori.endswith('.rar')]
xmlfiles=[iforiinfilesifi.endswith('.xml')]
save_unzipdir=Trueif'-s'insys.argvelseFalse
findstr=[i.split('=')[-1]foriinsys.argvifi.startswith('--find=')]
findstring=','.join(['`{}`'.format(i)foriinfindstr])
newdir=extract_files(unzipfiles)
result=findstr_at(newdir,xmlfiles,findstr)
ifnotresult:
msg='Noneofthefile(s)containthegivenstring{}.'
print(msg.format(findstring))
else:
msg='{}file(s)containthegivenstring{}:'
print(msg.format(len(result),findstring))
print('
'.join([i.replace(newdir+os.sep,'')foriinsorted(result)]))

ifnotsave_unzipdir:
shutil.rmtree(newdir)

$python3ziprar.pyaaa.zipaaa2.zipaaa3.zipaaa.xmlaaa1.xmlaaa2.xml--find="Itwas"--find="when"
Noneofthefile(s)containthegivenstring`Itwas`,`when`.
$python3ziprar.pyaaa.zipaaa2.zipaaa3.zipaaa.xmlaaa1.xmlaaa2.xml--find="Itwas"--find="I"
2file(s)containthegivenstring`Itwas`,`I`:
aaa.zip/aaa2.xml
aaa2.zip/aaa2.xml
$python3ziprar.pyaaa.zipaaa2.zipaaa3.zipaaa.xmlaaa1.xmlaaa2.xml--find="Itwas"
2file(s)containthegivenstring`Itwas`:
aaa.zip/aaa2.xml
aaa2.zip/aaa2.xml

‘伍’ 怎么在Python里使用UTF-8编码

概述

在python代码即.py文件的头部声明即可

解析

py文件中的编码

Python 默认脚本文件都是 ANSCII 编码的，当文件中有非 ANSCII 编码范围内的字符的时候就要使用"编码指示"来修正一个 mole 的定义中，如果.py文件中包含中文字符（严格的说是含有非anscii字符），则需要在第一行或第二行指定编码声明：

# -*- coding=utf-8 -*-
#coding=utf-8
# 以上两种选其一即可

其他的编码如：gbk、gb2312也可以；否则会出现:

SyntaxError: Non-ASCII character 'xe4' in file test.py on line 3, but no encoding declared; see http://www.python.org/peps/pep-0263.html for details

python中的编码与解码

先说一下python中的字符串类型，在python中有两种字符串类型，分别是 str 和 unicode，他们都是basestring的派生类；

str类型是一个包含Characters represent (at least) 8-bit bytes的序列；

unicode 的每个 unit 是一个 unicode obj;

在str的文档中有这样的一句话：

The string data type is also used to represent arrays of bytes, e.g., to hold data read from a file.

也就是说在读取一个文件的内容，或者从网络上读取到内容时，保持的对象为str类型；如果想把一个str转换成特定编码类型，需要把str转为Unicode,然后从unicode转为特定的编码类型如：utf-8、gb2312等。

拓展内容

utf-8编码

UTF-8（8-bit Unicode Transformation Format）是一种针对Unicode的可变长度字符编码，也是一种前缀码。它可以用来表示Unicode标准中的任何字符，且其编码中的第一个字节仍与ASCII兼容，这使得原来处理ASCII字符的软件无须或只须做少部分修改，即可继续使用。因此，它逐渐成为电子邮件、网页及其他存储或发送文字的应用中，优先采用的编码。

UTF-8使用一至六个字节为每个字符编码（尽管如此，2003年11月UTF-8被RFC 3629重新规范，只能使用原来Unicode定义的区域，U+0000到U+10FFFF，也就是说最多四个字节）：

1、128个US-ASCII字符只需一个字节编码（Unicode范围由U+0000至U+007F）。

2、带有附加符号的拉丁文、希腊文、西里尔字母、亚美尼亚语、希伯来文、阿拉伯文、叙利亚文及它拿字母则需要两个字节编码（Unicode范围由U+0080至U+07FF）。

3、其他基本多文种平面（BMP）中的字符（这包含了大部分常用字，如大部分的汉字）使用三个字节编码（Unicode范围由U+0800至U+FFFF）。

4、其他极少使用的Unicode辅助平面的字符使用四至六字节编码（Unicode范围由U+10000至U+1FFFFF使用四字节，Unicode范围由U+200000至U+3FFFFFF使用五字节，Unicode范围由U+4000000至U+7FFFFFFF使用六字节）。

对上述提及的第四种字符而言，UTF-8使用四至六个字节来编码似乎太耗费资源了。但UTF-8对所有常用的字符都可以用三个字节表示，而且它的另一种选择，UTF-16编码，对前述的第四种字符同样需要四个字节来编码，所以要决定UTF-8或UTF-16哪种编码比较有效率，还要视所使用的字符的分布范围而定。不过，如果使用一些传统的压缩系统，比如DEFLATE，则这些不同编码系统间的的差异就变得微不足道了。若顾及传统压缩算法在压缩较短文字上的效果不大，可以考虑使用Unicode标准压缩格式（SCSU）。

互联网工程工作小组（IETF）要求所有互联网协议都必须支持UTF-8编码。互联网邮件联盟（IMC）建议所有电子邮件软件都支持UTF-8编码。

‘陆’ python怎样压缩和解压缩ZIP文件

有时我们需要在 Python 中使用 zip 文件，而在1.6版中，Python 就已经提供了 zipfile 模块可以进行这样的操作。不过 Python 中的 zipfile 模块不能处理多卷的情况，不过这种情况并不多见，因此在通常情况下已经足够使用了。下面我只是对一些基本的 zipfile 操作进行了记录，足以应付大部分的情况了。
zipfile 模块可以让你打开或写入一个 zip 文件。比如：
import zipfile
z = zipfile.ZipFile('zipfilename', mode='r')
这样就打开了一个 zip 文件，如果mode为'w'或'a'则表示要写入一个 zip 文件。如果是写入，则还可以跟上第三个参数：
compression=zipfile.ZIP_DEFLATED 或
compression=zipfile.ZIP_STORED ZIP_DEFLATED是压缩标志，如果使用它需要编译了zlib模块。而后一个只是用zip进行打包，并不压缩。
在打开了zip文件之后就可以根据需要是读出zip文件的内容还是将内容保存到 zip 文件中。
读出zip中的内容
很简单，zipfile 对象提供了一个read(name)的方法。name为 zip文件中的一个文件入口，执行完成之后，将返回读出的内容，你把它保存到想到的文件中即可。
写入zip文件
有两种方式，一种是直接写入一个已经存在的文件，另一种是写入一个字符串。
对于第一种使用 zipfile 对象的 write(filename, arcname, compress_type)，后两个参数是可以忽略的。第一个参数是文件名，第二个参数是表示在 zip 文件中的名字，如果没有给出，表示使用与filename一样的名字。compress_type是压缩标志，它可以覆盖创建 zipfile 时的参数。第二种是使用 zipfile 对象的 writestr(zinfo_or_arcname, bytes)，第一个参数是zipinfo 对象或写到压缩文件中的压缩名，第二个参数是字符串。使用这个方法可以动态的组织文件的内容。
需要注意的是在读出时，因为只能读出内容，因此如果想实现按目录结构展开 zip 文件的话，这些操作需要自已来完成，比如创建目录，创建文件并写入。而写入时，则可以根据需要动态组织在 zip 文件中的目录结构，这样可以不按照原来的目录结构来生成 zip 文件。
于是我为了方便使用，创建了自已的一个 ZFile 类，主要是实现象 winrar 的右键菜单中的压缩到的功能--即将一个zip文件压缩到指定目录，自动创建相应的子目录。再有就是方便生成 zip 文件。类源码为：
# coding:cp936
# Zfile.py
# xxteach.com
import zipfile
import os.path
import os
class ZFile(object):
def __init__(self, filename, mode='r', basedir=''):
self.filename = filename
self.mode = mode
if self.mode in ('w', 'a'):
self.zfile = zipfile.ZipFile(filename, self.mode, compression=zipfile.ZIP_DEFLATED)
else:
self.zfile = zipfile.ZipFile(filename, self.mode)
self.basedir = basedir
if not self.basedir:
self.basedir = os.path.dirname(filename)
def addfile(self, path, arcname=None):
path = path.replace('//', '/')
if not arcname:
if path.startswith(self.basedir):
arcname = path[len(self.basedir):]
else:
arcname = ''
self.zfile.write(path, arcname)
def addfiles(self, paths):
for path in paths:
if isinstance(path, tuple):
self.addfile(*path)
else:
self.addfile(path)
def close(self):
self.zfile.close()
def extract_to(self, path):
for p in self.zfile.namelist():
self.extract(p, path)
def extract(self, filename, path):
if not filename.endswith('/'):
f = os.path.join(path, filename)
dir = os.path.dirname(f)
if not os.path.exists(dir):
os.makedirs(dir)
file(f, 'wb').write(self.zfile.read(filename))
def create(zfile, files):
z = ZFile(zfile, 'w')
z.addfiles(files)
z.close()
def extract(zfile, path):
z = ZFile(zfile)
z.extract_to(path)
z.close()

‘柒’ python怎样压缩和解压缩ZIP文件

1、python使用zipfile模块压缩和解压ZIP文件
2、读取zip文件
首先，通过zipfile模块打开指定zip文灶简件，如：
zpfd = zipfile.ZipFile(path, mode='r')
对于zipfile，其标志与open所用的打开文件标志有所不同，不能识别 'rb'。
然后，读取zip文件中的内容，zipfile对象提供一个read(name)的方法，name为zip文件中的一个文件入口，执行完成之后，将返回读出的内容，如：
for filename in zpfd.namelist():
tmpcont = zpfd.read(filename)
print 'len(tmpcont)', 'tmpcont'
需要注意的是，读取zip文件时，只能读取内容
3、写入zip文件
首先，需要zipfile模块写打开或创建zip文件，如：
zpfd = zipfile.ZipFile(path, mode='w')
写打开是标志可以为'w'或'a'('a'表示写入一个zip文件)，或者传入第三个参数cmopression压缩标志
compression=zipfile.ZIP_DEFLATED 需要导入zlib模块
compression=zipfile.ZIP_STORED则表示只对文件进行打包，并不压缩
写入有两种方式，一种是直接写入一个已经存在的文件，可使用zipfile对象中write(filename, arcname, compress_type)第一个参数为文件名，第二个参数指写入zip文件中的文件名，默认与filename一致，第三个参数压缩标志可以覆盖打开zipfile时的使用参数；另一种是写入一个字符串缓辩兄，可使用zipfile对象中的writestr(zinfo_or_arcname, bytes)，第一个参数是zipinfo对象或写到zip文件中的压缩名，第二个参数是待写入的字符串
4、最后，对于打开的zipfile对象需要进行关扰袭闭，从而使得写入内容真正写入磁盘，即：
zpfd.close()

‘捌’ 用python编写一个字符串压缩程序(要求为自适应模型替代法)

你好，下面是LZ777自适应压缩算法的一个简单实现，你可以看看
import math
from bitarray import bitarray
class LZ77Compressor:
"""
A simplified implementation of the LZ77 Compression Algorithm
"""
MAX_WINDOW_SIZE = 400
def __init__(self, window_size=20):
self.window_size = min(window_size, self.MAX_WINDOW_SIZE)
self.lookahead_buffer_size = 15 # length of match is at most 4 bits
def compress(self, input_file_path, output_file_path=None, verbose=False):
"""
Given the path of an input file, its content is compressed by applying a simple
LZ77 compression algorithm.
The compressed format is:
0 bit followed by 8 bits (1 byte character) when there are no previous matches
within window
1 bit followed by 12 bits pointer (distance to the start of the match from the
current position) and 4 bits (length of the match)

If a path to the output file is provided, the compressed data is written into
a binary file. Otherwise, it is returned as a bitarray
if verbose is enabled, the compression description is printed to standard output
"""
data = None
i = 0
output_buffer = bitarray(endian='big')
# read the input file
try:
with open(input_file_path, 'rb') as input_file:
data = input_file.read()
except IOError:
print 'Could not open input file ...'
raise
while i < len(data):
#print i
match = self.findLongestMatch(data, i)
if match:
# Add 1 bit flag, followed by 12 bit for distance, and 4 bit for the length
# of the match
(bestMatchDistance, bestMatchLength) = match
output_buffer.append(True)
output_buffer.frombytes(chr(bestMatchDistance >> 4))
output_buffer.frombytes(chr(((bestMatchDistance & 0xf) << 4) | bestMatchLength))
if verbose:
print "<1, %i, %i>" % (bestMatchDistance, bestMatchLength),
i += bestMatchLength
else:
# No useful match was found. Add 0 bit flag, followed by 8 bit for the character
output_buffer.append(False)
output_buffer.frombytes(data[i])

if verbose:
print "<0, %s>" % data[i],
i += 1
# fill the buffer with zeros if the number of bits is not a multiple of 8
output_buffer.fill()
# write the compressed data into a binary file if a path is provided
if output_file_path:
try:
with open(output_file_path, 'wb') as output_file:
output_file.write(output_buffer.tobytes())
print "File was compressed successfully and saved to output path ..."
return None
except IOError:
print 'Could not write to output file path. Please check if the path is correct ...'
raise
# an output file path was not provided, return the compressed data
return output_buffer

def decompress(self, input_file_path, output_file_path=None):
"""
Given a string of the compressed file path, the data is decompressed back to its
original form, and written into the output file path if provided. If no output
file path is provided, the decompressed data is returned as a string
"""
data = bitarray(endian='big')
output_buffer = []
# read the input file
try:
with open(input_file_path, 'rb') as input_file:
data.fromfile(input_file)
except IOError:
print 'Could not open input file ...'
raise
while len(data) >= 9:
flag = data.pop(0)
if not flag:
byte = data[0:8].tobytes()
output_buffer.append(byte)
del data[0:8]
else:
byte1 = ord(data[0:8].tobytes())
byte2 = ord(data[8:16].tobytes())
del data[0:16]
distance = (byte1 << 4) | (byte2 >> 4)
length = (byte2 & 0xf)
for i in range(length):
output_buffer.append(output_buffer[-distance])
out_data = ''.join(output_buffer)
if output_file_path:
try:
with open(output_file_path, 'wb') as output_file:
output_file.write(out_data)
print 'File was decompressed successfully and saved to output path ...'
return None
except IOError:
print 'Could not write to output file path. Please check if the path is correct ...'
raise
return out_data

def findLongestMatch(self, data, current_position):
"""
Finds the longest match to a substring starting at the current_position
in the lookahead buffer from the history window
"""
end_of_buffer = min(current_position + self.lookahead_buffer_size, len(data) + 1)
best_match_distance = -1
best_match_length = -1
# Optimization: Only consider substrings of length 2 and greater, and just
# output any substring of length 1 (8 bits uncompressed is better than 13 bits
# for the flag, distance, and length)
for j in range(current_position + 2, end_of_buffer):
start_index = max(0, current_position - self.window_size)
substring = data[current_position:j]
for i in range(start_index, current_position):
repetitions = len(substring) / (current_position - i)
last = len(substring) % (current_position - i)
matched_string = data[i:current_position] * repetitions + data[i:i+last]
if matched_string == substring and len(substring) > best_match_length:
best_match_distance = current_position - i
best_match_length = len(substring)
if best_match_distance > 0 and best_match_length > 0:
return (best_match_distance, best_match_length)
return None

‘玖’ python怎么把两个字符为一组纵向显示

Python可以使用内置的zip函数来把两个字符串组合成一组纵向显示。zip函数的语法如下：
zip(iter1 [,iter2 [...]])
其中iter1，iter2为可迭代的对象，可以是字符串，列表，元组等。zip函数会搭郑燃将iter1，iter2中的元素一一对应组合成一组元素，并将这些组合结果以列表形式返回。
比如，将两个字符串"Hello"和"World"组合成一组纵向显示，可以使用以下代码：
# 定义字符串
str1 = "Hello"
str2 = "World"
# 使用zip函数将两个字符串组合
result = zip(str1, str2)
# 输出结果
print(list(result))
输出结果如下：
[('H', '丛乎W'), ('e', 'o'), ('l', 'r'), ('l', 'l'), ('o', 'd')]
从输出结果知虚可以看到，两个字符串中的字符已经按照纵向的方式组合在一起。

‘拾’ python如何读取通过gzip压缩的字符串

import osimport gzip # 那是因为你调用了read方法，而这个方法会把文件一股脑儿读取出来的# 为了便于你迭代，你可以在这里使用一个生成器def read_gz_file(path): if os.path.exists(path): with gzip.open(path, 'rt') as pf: for line in pf: yield line else: print('the path [{}] is not exist!'.format(path)) con = read_gz_file('abc.gz')if getattr(con, '__iter__', None): for line in con: print(line, end = '')

导航:首页 > 文件处理 > python压缩字符串

python压缩字符串

概述

解析

拓展内容

与python压缩字符串相关的资料