python3lxml教程_lxml在python中怎麼安裝

1. python中lxml模塊怎麼導入

這個模塊是第三方模塊，需要先安裝再導入。

安裝：終端命令界面下，pip install lxml（安裝過程中如果提示需要其他哪個庫，需要先裝提示的庫，再裝lxml）。
如果使用pip安裝失敗，到pypi社區官網下載壓縮包解壓，終端界面進入其目錄（當前目錄有個叫「setup.py」就對了），用命令 python setup install 就行。

導入：import lxml 即可

2. lxml在python中怎麼安裝

首先要有 Python ：You need Python 2.3 or later.
然後是需要：You need libxml2 and libxslt, in particular:
使用：$sudo apt-get install libxml2 libxml2-dev 安裝 libxml2
使用：$sudo apt-get install libxlst libxslt-dev 安裝 libxslt
安裝 python-libxml2 和 python-libxslt ：$sudo apt-get install python-libxml2 python-libxslt
然後就可以使用：$sudo easy_install lxml 來安裝最新的 lxml 了。我裝的是最新版本：lxml 2.2beta1
在 Cygwin 上安裝也一樣，直接選擇安裝 libxml2, libxml2-devel, libxlst, libxlst-devel, python-libxml2, python-libxslt 包安裝，然後 $sudo easy_install lxml 就可以裝上了！

3. python lxml etree怎麼甩

lxml是Python語言中處理XML和HTML功能最豐富，最易於使用的庫。

lxml是libxml2和libxslt兩個C庫的Python化綁定，它的獨特之處在於兼顧了這些庫的速度和功能完整性，同時還具有Python API的簡介。兼容ElementTree API,但是比它更優越。

用libxml2編程就像是一個異於常人的陌生人的令人驚恐的擁抱，它看上去可以滿足你一切瘋狂的夢想，但是你的內心深處一直在警告你，你有可能會以最糟糕的方式遭殃，所以就有了lxml。

這是一個用lxml.etree來處理XML的教程，它簡單的概述了ElementTree API的主要概念，同時有一些能讓你的程序生涯更輕松的簡單的提高。

首先是導入lxml.etree的方式:

fromlxmlimportetree

為了協助代碼的可移植性，本教程中的例子很明顯可以看出，一部分API是lxml.etree在ElementTree API（由Fredrik Lundh 的ElementTree庫定義）的基礎上的擴展。

Element是ElementTree API的主要容器類，大部分XML tree的功能都是通過這個類來實現的，Element的創建很容易：

root=etree.Element("root")

element的XML tag名通過tag屬性來訪問

>>>printroot.tag
root

許多Element被組織成一個XML樹狀結構，創建一個子element並添加進父element使用append方法：

>>>root.append(etree.Element("child1"))

還有一個更簡短更有效的方法：the SubElement，它的參數和element一樣，但是需要父element作為第一個參數：

>>>child2=etree.SubElement(root,"child2")
>>>child3=etree.SubElement(root,"child3")

可以序列化你創建的樹：

>>>print(etree.tostring(root,pretty_print=True))
<root>
<child1/>
<child2/>
<child3/>
</root>

為了更方便直觀的訪問這些子節點，element模仿了正常的Python鏈：

>>>child=root[0]>>>print(child.tag)
child1
>>>print(len(root))
>>>root.index(root[1])#lxml.etreeonly!
>>>children=list(root)>>>forchildinroot:...print(child.tag)child1child2
child3
>>>root.insert(0,etree.Element("child0"))>>>start=root[:1]>>>end=root[-1:]>>>print(start[0].tag)child0>>>print(end[0].tag)child3

還可以根據element的真值看其是否有孩子節點：

ifroot:#thisnolongerworks!
print("Therootelementhaschildren")

用len(element)更直觀，且不容易出錯：

>>>print(etree.iselement(root))#testifit'ssomekindofElement
True
>>>iflen(root):#testifithaschildren
...print("Therootelementhaschildren")
Therootelementhaschildren

還有一個重要的特性，原文的句子只可意會，看例子應該是能看懂什麼意思吧。

>>>forchildinroot:...print(child.tag)child0child1child2child3>>>root[0]=root[-1]#移動了element>>>forchildinroot:...print(child.tag)child3child1child2>>>l=[0,1,2,3]>>>l[0]=l[-1]>>>l[3,1,2,3]
>>>rootisroot[0].getparent()#lxml.etreeonly!.etree,'sstandardlibrary:>>>fromimportdeep>>>element=etree.Element("neu")>>>element.append(deep(root[1]))>>>print(element[0].tag)child1>>>print([c.tagforcinroot])['child3','child1','child2']

XML支持屬性，創建方式如下：

>>>root=etree.Element("root",interesting="totally")
>>>etree.tostring(root)
b'<rootinteresting="totally"/>'

屬性是無序的鍵值對，所以可以用element類似於字典介面的方式處理：

>>>print(root.get("interesting"))
totally
>>>print(root.get("hello"))
None
>>>root.set("hello","Huhu")
>>>print(root.get("hello"))
Huhu
>>>etree.tostring(root)
b'<rootinteresting="totally"hello="Huhu"/>'
>>>sorted(root.keys())
['hello','interesting']
>>>forname,valueinsorted(root.items()):
...print('%s=%r'%(name,value))
hello='Huhu'
interesting='totally'

如果需要獲得一個類似dict的對象，可以使用attrib屬性：

>>>attributes=root.attrib
>>>print(attributes["interesting"])
totally
>>>print(attributes.get("no-such-attribute"))
None
>>>attributes["hello"]="GutenTag"
>>>print(attributes["hello"])
GutenTag
>>>print(root.get("hello"))
GutenTag

既然attrib是element本身支持的類似dict的對象，這就意味著任何對element的改變都會影響attrib，反之亦然。這還意味著只要element的任何一個attrib還在使用，XML樹就一直在內存中。通過如下方法，可以獲得一個獨立於XML樹的attrib的快照：

>>>d=dict(root.attrib)
>>>sorted(d.items())
[('hello','GutenTag'),('interesting','totally')]

4. python3中 lxml.html 模塊怎麼用

lxml 模塊不是內置的，需要先安裝才能使用。 lxml安裝依賴 python-devel,libxml2-devel,libxslt-devel，裝好之後，下載 http://codespeak.net/lxml/lxml-2.2.8.tgz， tar zxvf lxml-2.2.8.tgz，然後python setup.py install即可

5. python lxml庫怎麼安裝

lxml是Python中與XML及HTML相關功能中最豐富和最容易使用的庫。lxml並不是Python自帶的包，而是為libxml2和libxslt庫的一個Python化的綁定。它與眾不同的地方是它兼顧了這些庫的速度和功能完整性，以及純Python API的簡潔性，與大家熟知的ElementTree API兼容但比之更優越！但安裝lxml卻又有點麻煩，因為存在依賴，直接安裝的話用easy_install, pip都不能成功，會報gcc錯誤。下面列出來Windows、Linux下面的安裝方法:
【Windows系統】
先確保Python已經安裝好，環境變數也配置好了，相應的的easy_install、pip也安裝好了.
1. 執行 pip install virtualenv
[python] view plain print?
C:\>pip install virtualenv
Requirement already satisfied (use --upgrade to upgrade): virtualenv in c:\python27\lib\site-package
s\virtualenv-12.0.4-py2.7.egg
2. 從官方網站下載與系統，Python版本匹配的lxml文件：
http //pypi.python.org/pypi/lxml/2.3/
NOTE:
比如說我的電腦是Python 2.7.4, 64位操作系統，那麼我就可以下載
[python] view plain print?
lxml-2.3-py2.7-win-amd64.egg (md5) # Python Egg
或
lxml-2.3.win-amd64-py2.7.exe (md5) # MS Windows installer
3. 執行 easy_install lxml-2.3-py2.7-win-amd64.egg
[python] view plain print?
D:\Downloads>easy_install lxml-2.3-py2.7-win-amd64.egg # 進入該文件所在目錄執行該命令
Processing lxml-2.3-py2.7-win-amd64.egg
creating c:\python27\lib\site-packages\lxml-2.3-py2.7-win-amd64.egg
Extracting lxml-2.3-py2.7-win-amd64.egg to c:\python27\lib\site-packages
Adding lxml 2.3 to easy-install.pth file
Installed c:\python27\lib\site-packages\lxml-2.3-py2.7-win-amd64.egg
Processing dependencies for lxml==2.3
Finished processing dependencies for lxml==2.3
NOTE:
1. 可用exe可執行文件，方法更簡單直接安裝就可以
2. 可用easy_install安裝方式，也可以用pip的方式
[python] view plain print?
#再執行下，就安裝成功了！
>>> import lxml
>>>
3. 如用pip安裝，常用命令就是:
pip install simplejson # 安裝Python包
pip install --upgrade simplejson # 升級Python包
pip uninstall simplejson # 卸載Python包
4. 如用Eclipse+Pydev的開發方式，需要移除舊包，重新載入一次
Window --> Preferences --> PyDev --> Interperter-python # 否則導包的時候會報錯
【Linux系統】
因為lxml依賴的包如下:
libxml2, libxml2-devel, libxlst, libxlst-devel, python-libxml2, python-libxslt
所以安裝步驟如下:
第一步: 安裝 libxml2
$ sudo apt-get install libxml2 libxml2-dev
第二步: 安裝 libxslt
$ sudo apt-get install libxlst libxslt-dev
第三步: 安裝 python-libxml2 和 python-libxslt
$ sudo apt-get install python-libxml2 python-libxslt
第四步: 安裝 lxml
$ sudo easy_install lxml

6. python爬蟲lxml基本用法

python3環境下安裝命令

用lxml解析html，利用etree.HTML解析字元串將字元串解析從html格式的文件，經過處理後，部分缺失的節點可以自動修復，並且還自動添加了 body、html 節點

通過 / 或 // 即可查找元素的子節點或子孫節點。
選擇 li 節點的所有直接 a 子節點xpath為：//li/a

標簽[@屬性=「」]

@text()

/@屬性

[contains(@屬性,"值")]

導航:首頁 > 編程語言 > python3lxml教程

python3lxml教程

與python3lxml教程相關的資料