python壓縮字元串_python如何讀取通過gzip壓縮的字元串

『壹』 pythonzip函數引入什麼包

zip() 函數是Python內置函數之一,它可以將多個序列(列表、元組、字典、集合、字元串以及 range() 區間構成的列表)「壓縮」成一個 zip 對象。

『貳』 Python中的常用內置函數有哪些呢

abs() divmod() input() open() staticmethod()
all() enumerate() int() ord() str()
any() eval() isinstance() pow() sum()
basestring() execfile() issubclass() print() super()
bin() file() iter() property() tuple()
bool() filter() len() range() type()
bytearray() float() list() raw_input() unichr()
callable() format() locals() rece() unicode()
chr() frozenset() long() reload() vars()
classmethod() getattr() map() repr() xrange()
cmp() globals() max() reverse() zip()
compile() hasattr() memoryview() round() __import__()
complex() hash() min() set()
delattr() help() next() setattr()
dict() hex() object() slice()
dir() id() oct() sorted()

『叄』 python zip函數

zip()函數用於將攜豎可迭代的對象作為參考，將對象中對應的元素打包成一個個遠嘩隱凱足，然後返回有這些元祖組成的亂喚列表。

zip（[iterabale,....]）

『肆』利用python編程，在多個打包壓縮的文件中搜索指定字元串。有很多xml文件

ziprar.py

__author__='williezh'
#!/usr/bin/envpython3

importos
importsys
importtime
importshutil
importzipfile
fromzipfileimportZIP_DEFLATED


#Zip文件處理類
classZFile(object):
def__init__(self,fname,mode='r',basedir=''):
self.fname=fname
self.mode=mode
ifself.modein('w','a'):
self.zfile=zipfile.ZipFile(fname,mode,compression=ZIP_DEFLATED)
else:
self.zfile=zipfile.ZipFile(fname,self.mode)
self.basedir=basedir
ifnotself.basedir:
self.basedir=os.path.dirname(fname)

defaddfile(self,path,arcname=None):
path=path.replace('//','/')
ifnotarcname:
ifpath.startswith(self.basedir):
arcname=path[len(self.basedir):]
else:
arcname=''
self.zfile.write(path,arcname)

defaddfiles(self,paths):
forpathinpaths:
ifisinstance(path,tuple):
self.addfile(*path)
else:
self.addfile(path)

defclose(self):
self.zfile.close()

defextract_to(self,path):
forpinself.zfile.namelist():
self.extract(p,path)

defextract(self,fname,path):
ifnotfname.endswith('/'):
fn=os.path.join(path,fname)
ds=os.path.dirname(fn)
ifnotos.path.exists(ds):
os.makedirs(ds)
withopen(fn,'wb')asf:
f.write(self.zfile.read(fname))


#創建Zip文件
defcreateZip(zfile,files):
z=ZFile(zfile,'w')
z.addfiles(files)
z.close()


#解壓縮Zip到指定文件夾
defextractZip(zfile,path):
z=ZFile(zfile)
z.extract_to(path)
z.close()


#解壓縮rar到指定文件夾
defextractRar(zfile,path):
rar_command1="WinRAR.exex-ibck%s%s"%(zfile,path)
rar_command2=r'"C:WinRAR.exe"x-ibck%s%s'%(zfile,path)
try:
res=os.system(rar_command1)
ifres==0:
print("PathOK.")
except:
try:
res=os.system(rar_command2)
ifres==0:
print("Successtounrarthefile{}.".format(path))
except:
print('Error:cannotunrarthefile{}'.format(path))


#解壓多個壓縮文件到一個臨時文件夾
defextract_files(file_list):
newdir=str(int(time.time()))
forfninfile_list:
subdir=os.path.join(newdir,fn)
ifnotos.path.exists(subdir):
os.makedirs(subdir)
iffn.endswith('.zip'):
extractZip(fn,subdir)
eliffn.endswith('.rar'):
extractRar(fn,subdir)
returnnewdir


#查找一個文件夾中的某些文件,返迴文件內容包含findstr_list中所有字元串的文件
deffindstr_at(basedir,file_list,findstr_list):
files=[]
forr,ds,fsinos.walk(basedir):
forfninfs:
iffninfile_list:
withopen(os.path.join(r,fn))asf:
s=f.read()
ifall(iinsforiinfindstr_list):
files.append(os.path.join(r,fn))
returnfiles


if__name__=='__main__':
files=[iforiinsys.argv[1:]ifnoti.startswith('-')]
unzipfiles=[iforiinfilesifi.endswith('.zip')ori.endswith('.rar')]
xmlfiles=[iforiinfilesifi.endswith('.xml')]
save_unzipdir=Trueif'-s'insys.argvelseFalse
findstr=[i.split('=')[-1]foriinsys.argvifi.startswith('--find=')]
findstring=','.join(['`{}`'.format(i)foriinfindstr])
newdir=extract_files(unzipfiles)
result=findstr_at(newdir,xmlfiles,findstr)
ifnotresult:
msg='Noneofthefile(s)containthegivenstring{}.'
print(msg.format(findstring))
else:
msg='{}file(s)containthegivenstring{}:'
print(msg.format(len(result),findstring))
print('
'.join([i.replace(newdir+os.sep,'')foriinsorted(result)]))

ifnotsave_unzipdir:
shutil.rmtree(newdir)

$python3ziprar.pyaaa.zipaaa2.zipaaa3.zipaaa.xmlaaa1.xmlaaa2.xml--find="Itwas"--find="when"
Noneofthefile(s)containthegivenstring`Itwas`,`when`.
$python3ziprar.pyaaa.zipaaa2.zipaaa3.zipaaa.xmlaaa1.xmlaaa2.xml--find="Itwas"--find="I"
2file(s)containthegivenstring`Itwas`,`I`:
aaa.zip/aaa2.xml
aaa2.zip/aaa2.xml
$python3ziprar.pyaaa.zipaaa2.zipaaa3.zipaaa.xmlaaa1.xmlaaa2.xml--find="Itwas"
2file(s)containthegivenstring`Itwas`:
aaa.zip/aaa2.xml
aaa2.zip/aaa2.xml

『伍』怎麼在Python里使用UTF-8編碼

概述

在python代碼即.py文件的頭部聲明即可

解析

py文件中的編碼

Python 默認腳本文件都是 ANSCII 編碼的，當文件中有非 ANSCII 編碼范圍內的字元的時候就要使用"編碼指示"來修正一個 mole 的定義中，如果.py文件中包含中文字元（嚴格的說是含有非anscii字元），則需要在第一行或第二行指定編碼聲明：

# -*- coding=utf-8 -*-
#coding=utf-8
# 以上兩種選其一即可

其他的編碼如：gbk、gb2312也可以；否則會出現:

SyntaxError: Non-ASCII character 'xe4' in file test.py on line 3, but no encoding declared; see http://www.python.org/peps/pep-0263.html for details

python中的編碼與解碼

先說一下python中的字元串類型，在python中有兩種字元串類型，分別是 str 和 unicode，他們都是basestring的派生類；

str類型是一個包含Characters represent (at least) 8-bit bytes的序列；

unicode 的每個 unit 是一個 unicode obj;

在str的文檔中有這樣的一句話：

The string data type is also used to represent arrays of bytes, e.g., to hold data read from a file.

也就是說在讀取一個文件的內容，或者從網路上讀取到內容時，保持的對象為str類型；如果想把一個str轉換成特定編碼類型，需要把str轉為Unicode,然後從unicode轉為特定的編碼類型如：utf-8、gb2312等。

拓展內容

utf-8編碼

UTF-8（8-bit Unicode Transformation Format）是一種針對Unicode的可變長度字元編碼，也是一種前綴碼。它可以用來表示Unicode標准中的任何字元，且其編碼中的第一個位元組仍與ASCII兼容，這使得原來處理ASCII字元的軟體無須或只須做少部分修改，即可繼續使用。因此，它逐漸成為電子郵件、網頁及其他存儲或發送文字的應用中，優先採用的編碼。

UTF-8使用一至六個位元組為每個字元編碼（盡管如此，2003年11月UTF-8被RFC 3629重新規范，只能使用原來Unicode定義的區域，U+0000到U+10FFFF，也就是說最多四個位元組）：

1、128個US-ASCII字元只需一個位元組編碼（Unicode范圍由U+0000至U+007F）。

2、帶有附加符號的拉丁文、希臘文、西里爾字母、亞美尼亞語、希伯來文、阿拉伯文、敘利亞文及它拿字母則需要兩個位元組編碼（Unicode范圍由U+0080至U+07FF）。

3、其他基本多文種平面（BMP）中的字元（這包含了大部分常用字，如大部分的漢字）使用三個位元組編碼（Unicode范圍由U+0800至U+FFFF）。

4、其他極少使用的Unicode輔助平面的字元使用四至六位元組編碼（Unicode范圍由U+10000至U+1FFFFF使用四位元組，Unicode范圍由U+200000至U+3FFFFFF使用五位元組，Unicode范圍由U+4000000至U+7FFFFFFF使用六位元組）。

對上述提及的第四種字元而言，UTF-8使用四至六個位元組來編碼似乎太耗費資源了。但UTF-8對所有常用的字元都可以用三個位元組表示，而且它的另一種選擇，UTF-16編碼，對前述的第四種字元同樣需要四個位元組來編碼，所以要決定UTF-8或UTF-16哪種編碼比較有效率，還要視所使用的字元的分布范圍而定。不過，如果使用一些傳統的壓縮系統，比如DEFLATE，則這些不同編碼系統間的的差異就變得微不足道了。若顧及傳統壓縮演算法在壓縮較短文字上的效果不大，可以考慮使用Unicode標准壓縮格式（SCSU）。

互聯網工程工作小組（IETF）要求所有互聯網協議都必須支持UTF-8編碼。互聯網郵件聯盟（IMC）建議所有電子郵件軟體都支持UTF-8編碼。

『陸』 python怎樣壓縮和解壓縮ZIP文件

有時我們需要在 Python 中使用 zip 文件，而在1.6版中，Python 就已經提供了 zipfile 模塊可以進行這樣的操作。不過 Python 中的 zipfile 模塊不能處理多卷的情況，不過這種情況並不多見，因此在通常情況下已經足夠使用了。下面我只是對一些基本的 zipfile 操作進行了記錄，足以應付大部分的情況了。
zipfile 模塊可以讓你打開或寫入一個 zip 文件。比如：
import zipfile
z = zipfile.ZipFile('zipfilename', mode='r')
這樣就打開了一個 zip 文件，如果mode為'w'或'a'則表示要寫入一個 zip 文件。如果是寫入，則還可以跟上第三個參數：
compression=zipfile.ZIP_DEFLATED 或
compression=zipfile.ZIP_STORED ZIP_DEFLATED是壓縮標志，如果使用它需要編譯了zlib模塊。而後一個只是用zip進行打包，並不壓縮。
在打開了zip文件之後就可以根據需要是讀出zip文件的內容還是將內容保存到 zip 文件中。
讀出zip中的內容
很簡單，zipfile 對象提供了一個read(name)的方法。name為 zip文件中的一個文件入口，執行完成之後，將返回讀出的內容，你把它保存到想到的文件中即可。
寫入zip文件
有兩種方式，一種是直接寫入一個已經存在的文件，另一種是寫入一個字元串。
對於第一種使用 zipfile 對象的 write(filename, arcname, compress_type)，後兩個參數是可以忽略的。第一個參數是文件名，第二個參數是表示在 zip 文件中的名字，如果沒有給出，表示使用與filename一樣的名字。compress_type是壓縮標志，它可以覆蓋創建 zipfile 時的參數。第二種是使用 zipfile 對象的 writestr(zinfo_or_arcname, bytes)，第一個參數是zipinfo 對象或寫到壓縮文件中的壓縮名，第二個參數是字元串。使用這個方法可以動態的組織文件的內容。
需要注意的是在讀出時，因為只能讀出內容，因此如果想實現按目錄結構展開 zip 文件的話，這些操作需要自已來完成，比如創建目錄，創建文件並寫入。而寫入時，則可以根據需要動態組織在 zip 文件中的目錄結構，這樣可以不按照原來的目錄結構來生成 zip 文件。
於是我為了方便使用，創建了自已的一個 ZFile 類，主要是實現象 winrar 的右鍵菜單中的壓縮到的功能--即將一個zip文件壓縮到指定目錄，自動創建相應的子目錄。再有就是方便生成 zip 文件。類源碼為：
# coding:cp936
# Zfile.py
# xxteach.com
import zipfile
import os.path
import os
class ZFile(object):
def __init__(self, filename, mode='r', basedir=''):
self.filename = filename
self.mode = mode
if self.mode in ('w', 'a'):
self.zfile = zipfile.ZipFile(filename, self.mode, compression=zipfile.ZIP_DEFLATED)
else:
self.zfile = zipfile.ZipFile(filename, self.mode)
self.basedir = basedir
if not self.basedir:
self.basedir = os.path.dirname(filename)
def addfile(self, path, arcname=None):
path = path.replace('//', '/')
if not arcname:
if path.startswith(self.basedir):
arcname = path[len(self.basedir):]
else:
arcname = ''
self.zfile.write(path, arcname)
def addfiles(self, paths):
for path in paths:
if isinstance(path, tuple):
self.addfile(*path)
else:
self.addfile(path)
def close(self):
self.zfile.close()
def extract_to(self, path):
for p in self.zfile.namelist():
self.extract(p, path)
def extract(self, filename, path):
if not filename.endswith('/'):
f = os.path.join(path, filename)
dir = os.path.dirname(f)
if not os.path.exists(dir):
os.makedirs(dir)
file(f, 'wb').write(self.zfile.read(filename))
def create(zfile, files):
z = ZFile(zfile, 'w')
z.addfiles(files)
z.close()
def extract(zfile, path):
z = ZFile(zfile)
z.extract_to(path)
z.close()

『柒』 python怎樣壓縮和解壓縮ZIP文件

1、python使用zipfile模塊壓縮和解壓ZIP文件
2、讀取zip文件
首先，通過zipfile模塊打開指定zip文灶簡件，如：
zpfd = zipfile.ZipFile(path, mode='r')
對於zipfile，其標志與open所用的打開文件標志有所不同，不能識別 'rb'。
然後，讀取zip文件中的內容，zipfile對象提供一個read(name)的方法，name為zip文件中的一個文件入口，執行完成之後，將返回讀出的內容，如：
for filename in zpfd.namelist():
tmpcont = zpfd.read(filename)
print 'len(tmpcont)', 'tmpcont'
需要注意的是，讀取zip文件時，只能讀取內容
3、寫入zip文件
首先，需要zipfile模塊寫打開或創建zip文件，如：
zpfd = zipfile.ZipFile(path, mode='w')
寫打開是標志可以為'w'或'a'('a'表示寫入一個zip文件)，或者傳入第三個參數cmopression壓縮標志
compression=zipfile.ZIP_DEFLATED 需要導入zlib模塊
compression=zipfile.ZIP_STORED則表示只對文件進行打包，並不壓縮
寫入有兩種方式，一種是直接寫入一個已經存在的文件，可使用zipfile對象中write(filename, arcname, compress_type)第一個參數為文件名，第二個參數指寫入zip文件中的文件名，默認與filename一致，第三個參數壓縮標志可以覆蓋打開zipfile時的使用參數；另一種是寫入一個字元串緩辯兄，可使用zipfile對象中的writestr(zinfo_or_arcname, bytes)，第一個參數是zipinfo對象或寫到zip文件中的壓縮名，第二個參數是待寫入的字元串
4、最後，對於打開的zipfile對象需要進行關擾襲閉，從而使得寫入內容真正寫入磁碟，即：
zpfd.close()

『捌』用python編寫一個字元串壓縮程序(要求為自適應模型替代法)

你好，下面是LZ777自適應壓縮演算法的一個簡單實現，你可以看看
import math
from bitarray import bitarray
class LZ77Compressor:
"""
A simplified implementation of the LZ77 Compression Algorithm
"""
MAX_WINDOW_SIZE = 400
def __init__(self, window_size=20):
self.window_size = min(window_size, self.MAX_WINDOW_SIZE)
self.lookahead_buffer_size = 15 # length of match is at most 4 bits
def compress(self, input_file_path, output_file_path=None, verbose=False):
"""
Given the path of an input file, its content is compressed by applying a simple
LZ77 compression algorithm.
The compressed format is:
0 bit followed by 8 bits (1 byte character) when there are no previous matches
within window
1 bit followed by 12 bits pointer (distance to the start of the match from the
current position) and 4 bits (length of the match)

If a path to the output file is provided, the compressed data is written into
a binary file. Otherwise, it is returned as a bitarray
if verbose is enabled, the compression description is printed to standard output
"""
data = None
i = 0
output_buffer = bitarray(endian='big')
# read the input file
try:
with open(input_file_path, 'rb') as input_file:
data = input_file.read()
except IOError:
print 'Could not open input file ...'
raise
while i < len(data):
#print i
match = self.findLongestMatch(data, i)
if match:
# Add 1 bit flag, followed by 12 bit for distance, and 4 bit for the length
# of the match
(bestMatchDistance, bestMatchLength) = match
output_buffer.append(True)
output_buffer.frombytes(chr(bestMatchDistance >> 4))
output_buffer.frombytes(chr(((bestMatchDistance & 0xf) << 4) | bestMatchLength))
if verbose:
print "<1, %i, %i>" % (bestMatchDistance, bestMatchLength),
i += bestMatchLength
else:
# No useful match was found. Add 0 bit flag, followed by 8 bit for the character
output_buffer.append(False)
output_buffer.frombytes(data[i])

if verbose:
print "<0, %s>" % data[i],
i += 1
# fill the buffer with zeros if the number of bits is not a multiple of 8
output_buffer.fill()
# write the compressed data into a binary file if a path is provided
if output_file_path:
try:
with open(output_file_path, 'wb') as output_file:
output_file.write(output_buffer.tobytes())
print "File was compressed successfully and saved to output path ..."
return None
except IOError:
print 'Could not write to output file path. Please check if the path is correct ...'
raise
# an output file path was not provided, return the compressed data
return output_buffer

def decompress(self, input_file_path, output_file_path=None):
"""
Given a string of the compressed file path, the data is decompressed back to its
original form, and written into the output file path if provided. If no output
file path is provided, the decompressed data is returned as a string
"""
data = bitarray(endian='big')
output_buffer = []
# read the input file
try:
with open(input_file_path, 'rb') as input_file:
data.fromfile(input_file)
except IOError:
print 'Could not open input file ...'
raise
while len(data) >= 9:
flag = data.pop(0)
if not flag:
byte = data[0:8].tobytes()
output_buffer.append(byte)
del data[0:8]
else:
byte1 = ord(data[0:8].tobytes())
byte2 = ord(data[8:16].tobytes())
del data[0:16]
distance = (byte1 << 4) | (byte2 >> 4)
length = (byte2 & 0xf)
for i in range(length):
output_buffer.append(output_buffer[-distance])
out_data = ''.join(output_buffer)
if output_file_path:
try:
with open(output_file_path, 'wb') as output_file:
output_file.write(out_data)
print 'File was decompressed successfully and saved to output path ...'
return None
except IOError:
print 'Could not write to output file path. Please check if the path is correct ...'
raise
return out_data

def findLongestMatch(self, data, current_position):
"""
Finds the longest match to a substring starting at the current_position
in the lookahead buffer from the history window
"""
end_of_buffer = min(current_position + self.lookahead_buffer_size, len(data) + 1)
best_match_distance = -1
best_match_length = -1
# Optimization: Only consider substrings of length 2 and greater, and just
# output any substring of length 1 (8 bits uncompressed is better than 13 bits
# for the flag, distance, and length)
for j in range(current_position + 2, end_of_buffer):
start_index = max(0, current_position - self.window_size)
substring = data[current_position:j]
for i in range(start_index, current_position):
repetitions = len(substring) / (current_position - i)
last = len(substring) % (current_position - i)
matched_string = data[i:current_position] * repetitions + data[i:i+last]
if matched_string == substring and len(substring) > best_match_length:
best_match_distance = current_position - i
best_match_length = len(substring)
if best_match_distance > 0 and best_match_length > 0:
return (best_match_distance, best_match_length)
return None

『玖』 python怎麼把兩個字元為一組縱向顯示

Python可以使用內置的zip函數來把兩個字元串組合成一組縱向顯示。zip函數的語法如下：
zip(iter1 [,iter2 [...]])
其中iter1，iter2為可迭代的對象，可以是字元串，列表，元組等。zip函數會搭鄭燃將iter1，iter2中的元素一一對應組合成一組元素，並將這些組合結果以列表形式返回。
比如，將兩個字元串"Hello"和"World"組合成一組縱向顯示，可以使用以下代碼：
# 定義字元串
str1 = "Hello"
str2 = "World"
# 使用zip函數將兩個字元串組合
result = zip(str1, str2)
# 輸出結果
print(list(result))
輸出結果如下：
[('H', '叢乎W'), ('e', 'o'), ('l', 'r'), ('l', 'l'), ('o', 'd')]
從輸出結果知虛可以看到，兩個字元串中的字元已經按照縱向的方式組合在一起。

『拾』 python如何讀取通過gzip壓縮的字元串

import osimport gzip # 那是因為你調用了read方法，而這個方法會把文件一股腦兒讀取出來的# 為了便於你迭代，你可以在這里使用一個生成器def read_gz_file(path): if os.path.exists(path): with gzip.open(path, 'rt') as pf: for line in pf: yield line else: print('the path [{}] is not exist!'.format(path)) con = read_gz_file('abc.gz')if getattr(con, '__iter__', None): for line in con: print(line, end = '')

導航:首頁 > 文件處理 > python壓縮字元串

python壓縮字元串

概述

解析

拓展內容

與python壓縮字元串相關的資料