python解析doc_python中的_doc_是什麼

㈠ python doc文件怎麼打開

使用python idld的時候按F1就是，否則直接到python安裝目錄下doc文件夾下打開文件

㈡如何在 Linux 上使用 Python 讀取 word 文件信息

第一步：獲取doc文件的xml組成文件

import zipfiledef get_word_xml(docx_filename):
with open(docx_filename) as f:
zip = zipfile.ZipFile(f)
xml_content = zip.read('word/document.xml')
return xml_content

第二步：解析xml為樹形數據結構
from lxml import etreedef get_xml_tree(xml_string):
return etree.fromstring(xml_string)

第三步：讀取word內容：
def _itertext(self, my_etree):
"""Iterator to go through xml tree's text nodes"""
for node in my_etree.iter(tag=etree.Element):
if self._check_element_is(node, 't'):
yield (node, node.text)def _check_element_is(self, element, type_char):
word_schema = '99999'
return element.tag == '{%s}%s' % (word_schema,type_char)

㈢如何正確認識Python 源文件

該任務會遞歸遍歷所有的子目錄，並編譯所有的 Python 模塊。腳本中沒有採用將 src 目錄硬編碼到調用之處的方式，而是在構建腳本中定義了稱為 src.dir 的屬性。然後，在需要使用這個目錄名的時候。就可以通過 ${src.dir} 來引用。要運行構建腳本，可從 Eclipse 中打開它。Eclipse 具有內置的 Ant 構建腳本編輯和瀏覽功能。Outline 視圖可以顯示出構建腳本的結構。在 Navigator 視圖中。選擇該構建腳本，用右鍵點擊，然後選擇「Run Ant...」。選擇 compile 目標，然後點擊「Run」。構建腳本執行過程中的輸出信息應該顯示在 Console 視圖中，表示運行成功。從對上述 pydoc 目標的解析可看出。對Python編程技巧大總結對於Python設計語言特性進行全解析簡讀靈活性的Python編程語言有關Python版本大雜燴如何掌握Python異常處理問題第7 行聲明了目標名稱，並指出它依賴於 init 和 compile 目標。這意味著在運行 pydoc 目標之前，Ant 必須保證 init 和 compile 目標已經運行，如果沒有，則首先運行這兩個目標。 pydoc 目標所依賴的 init 目標在第 3 至第 5 行定義。 init 目標僅僅創建了一個存放 PyDoc API 文檔文件的目錄。如前所述，要為所生成文檔的保存位置定義一個屬性，名為 pydoc.dir。第8 行開始是 py-doc 任務。如前所述，您傳入生成 pydoc 過程中所使用的 PYTHONPATH 。 destdir 屬性告訴 py-doc 任務將生成的 HTML 文檔輸出到何處。第 9 至第 11 行定義了在生成文檔的過程中應該處理哪些 Python 源文件。文件集是 Ant 腳本中通用的結構。可用於定義所操作的一組文件。這是一種很強大的特性，它使您能夠通過名字模式、布爾邏輯和文件屬性來選擇所要操作的文件。Ant 文檔中有這方面的完整描述。本例中遞歸選擇了「src」目錄下的所有文件。 Python 源文件中具有標準的單元測試框架（從 Python 2.3 開始。在 Python 2.2 中這只是可選模塊），與 Java jUnit 框架十分類似。測試用例的結構與 jUnit 採用相同的方式。每一個待測試的類和模塊通常都具有自己的測試類。測試類中包含測試裝置（fixture）。它們在 setUp 函數中初始化。每一個測試都編寫為測試類中的一個獨立的測試函數。unittest 框架會在測試函數之間循環往復，先調用 setUp 、再測試函數、然後清除（ tearDown ）測試函數。請參閱清單 4 中的樣例。 import unittest from pprint import pprint import feedparser class FeedparserTest(unittest.TestCase): """ A test class for the feedparser mole. """ def setUp(self): """ set up data used in the tests. setUp is called before each test function execution. """ self.developerWorksUrl = "testData/developerworks.rss" def testParse09Rss(self): """ Test a successful run of the parse function for a 0.91 RSS feed. """ print "FeedparserTest.testParse09RSS()" result = feedparser.parse(self.developerWorksUrl) pprint(result) self.assertEqual(0, result['bozo']) self.assert_(result is not None) channel = result['channel'] self.assert_(channel is not None) chanDesc = channel['description'] self.assertEqual(u'The latest content from IBM developerWorks', chanDesc) items = result['items'] self.assert_(items is not None) self.assert_(len(items)> 3) firstItem = items[0] title = firstItem['title'] self.assertEqual(u'Build installation packages with solution installation and deployment technologies', title) def tearDown(self): """ tear down any data used in tests tearDown is called after each test function execution. """ pass if __name__ == '__main__': unittest.main() 上述清單是實現 feedparser 模塊基本測試功能的測試類。完整的測試類見 feedParserTest 項目下的 src/feedparserTest/FeedparserTest.py。step 函數負責准備整個測試過程中需要使用的測試裝置，在本例中只有測試用的 RSS 文件的目錄。測試函數將對其進行解析。testParse09Rss 是真正的測試函數。Python 源文件這個函數調用 feedparser.parse 函數，傳遞測試用的 RSS 文件，輸出解析結果，並通過 TestCase 類的 assert 函數執行基本的檢查統作。如果任何 assert 的求值結果不是真，或是在執行過程中拋出任何異常，unittest 就會報告一次測試失敗或錯誤。最後的兩行負責在這個測試類內部運行測試，方法是直接運行該模塊即可。

㈣ python中的_doc_是什麼

文檔字元串。注意，是 __doc__ ，前後各兩個下劃線。

一般而言，是對函數/方法/模塊所實現功能的簡單描述。但當指向具體對象時，會顯示此對象從屬的類型的構造函數的文檔字元串。(示例見以下 a.__doc__)

>>> str.__doc__
"str(string[, encoding[, errors]]) -> str\n\nCreate a new string object from the given encoded string.\nencoding defaults to the current default string encoding.\nerrors can be 'strict', 'replace' or 'ignore' and defaults to 'strict'."

>>> import math
>>> math.__doc__
'This mole is always available. It provides access to the\nmathematical functions defined by the C standard.'

>>> a = [1]

>>> a.count.__doc__
'L.count(value) -> integer -- return number of occurrences of value'
>>> a.__doc__
"list() -> new empty list\nlist(iterable) -> new list initialized from iterable's items"

為自定義的函數創建 __doc__ 的方法示例：

>>> def func():
"""Here's a doc string"""
pass
>>> func.__doc__
"Here's a doc string"

更詳細的資料請參考 Python Tutorial 4.7.6 Documentation Strings.

㈤如何用Python實現doc文件批量轉換為docx

有一批pdf文件，好幾百個，每個只列印第2，3頁，雙面列印。
網上搜索一波，方案如下：
安裝Ghostscript，GhostView,使用gsprint命令列印pdf文件。
gsprint命令參數說明：
"-dQUIET", 安靜的意思，指代執行過程中盡可能少的輸出日誌等信息。(也可以簡寫為「-q」)
"-dNOSAFER", 通過命令行運行
"-dBATCH", 執行到最後一頁後退出
"-dNOPAUSE", 每一頁轉換之間沒有停頓
"-dNOPROMPT", 沒有相關提示
"-dFirstPage=1", 從第幾頁開始
"-dLastPage=5", 到第幾頁結束
"-sDEVICE=pngalpha", 轉換輸出的文件類型裝置，默認值為x11alpha
"-g720x1280", 圖片像素(-gx)，一般不指定，使用默認輸出
"-r300", 圖片解析度(即圖片解析度為300dpi)，默認值好像是72(未測試證實)
"-sOutputFile=/opt/shanhy/error1png/%d.png", 圖片輸出路徑，使用%d或%ld輸出頁數
比如列印c.pdf第2，3頁，命令如下
gsprint -dFirstPage=2 -dLastPage=3 c.pdf
大部分pdf只列印第2，第3頁，雙面列印，所以用python控制批量列印所有pdf的第二頁，暫停，提示翻頁，然後批量列印第三頁。
完整代碼如下
#-*- coding: utf-8 -*-
importosimporttimedefprint_pdf(pdf_file_name, page):"""靜默列印pdf
:param pdf_file_name
:page 列印第幾頁
:return:"""cmd= 'gsprint -dFirstPage=%s -dLastPage=%s %s' %(page, page, pdf_file_name)print(cmd)
p=os.popen(cmd)
time.sleep(3)print(p.read())if __name__ == '__main__':
curr_path=os.getcwd()
fl=os.listdir(curr_path)for i in range(2,4):print(i)for f infl:if 'pdf' inf.lower():
print_pdf(f, i)

㈥ python讀取word文檔內容

import fnmatch, os, sys, win32com.client

readpath=r'D:\123'

wordapp = win32com.client.gencache.EnsureDispatch("Word.Application")
try:
for path, dirs, files in os.walk(readpath):
for filename in files:
if not fnmatch.fnmatch(filename, '*.docx'):continue
doc = os.path.abspath(os.path.join(path,filename))
print 'processing %s...' % doc
wordapp.Documents.Open(doc)
docastext = doc[:-4] + 'txt'
wordapp.ActiveDocument.SaveAs(docastext,FileFormat=win32com.client.constants.wdFormatText)
wordapp.ActiveDocument.Close()
finally:
wordapp.Quit()
print 'end'

f=open(r'd:\123\test.txt','r')
for line in f.readlines():
print line.decode('gbk')
f.close()

㈦ python如何讀取word文件

>>>defPrintAllParagraphs(doc):
count=doc.Paragraphs.Count
foriinrange(count-1,-1,-1):
pr=doc.Paragraphs[i].Range
printpr.Text


>>>app=my.Office.Word.GetInstance()
>>>doc=app.Documents[0]
>>>PrintAllParagraphs(doc)

1.什麼是域

域應用基礎

>>>

@staticmethod
defGetInstance():
u'''獲取Word應用程序的Application對象'''
importwin32com.client
returnwin32com.client.Dispatch('Word.Application')

my.Office.Word.GetInstance的方法實現如上，是一個使用win32com操縱Word Com的介面的封裝
所有Paragraph即段落對象，都是通過Paragraph.Range.Text來訪問它的文字的

㈧如何使用python或R或c或dos命令，獲取docx或doc格式文檔的字數信息

在windows下你可以調用win32com.client來讀取doc文件，然後導出text到變數，用count來統計字數。但結果肯定跟Word統計的字數不一樣。

㈨ python 怎麼解析f12doc

不同於其他網站的網頁（是通過按鈕進入下一頁或者其它頁，這種網站可以清楚看到每個網頁的網址信息），需要自己找網頁鏈接，可以通過chrome瀏覽器自帶的開發者工具或者fiddler軟體抓包；我們按F12打開工具，按F5刷新，下拉到載入更多處點擊！

導航:首頁 > 編程語言 > python解析doc

python解析doc

與python解析doc相關的資料