批量src下載python_如何在centos安裝python

⑴ python如何才能獲取src地址

Copyright © 1999-2020, CSDN.NET, All Rights Reserved

python
打開APP

pergoods
關注
Python多線程爬取網站image的src屬性實例原創
2017-05-16 11:18:51

pergoods

碼齡6年

關注
# coding=utf-8
'''
Created on 2017年5月16日
@author: chenkai
Python多線程爬取某單無聊圖圖片地址(requests+BeautifulSoup+threading+Queue模塊)
'''

import requests
from bs4 import BeautifulSoup
import threading
import Queue
import time

class Spider_Test(threading.Thread):
def __init__(self,queue):
threading.Thread.__init__(self)
self.__queue = queue
def run(self):
while not self.__queue.empty():
page_url=self.__queue.get() [color=red]#從隊列中取出url[/color]
print page_url
self.spider(page_url)
def spider(self,url):
r=requests.get(url) [color=red]#請求url[/color]
soup=BeautifulSoup(r.content,'lxml') [color=red]#r.content就是響應內容，轉換為lxml的bs對象[/color]
imgs = soup.find_all(name='img',attrs={}) #查找所有的img標簽，並獲取標簽屬性值（為列表類型）
for img in imgs:
if 'onload' in str(img): [color=red]#img屬性集合中包含onload屬性的為動態圖.gif,[/color]
print 'http:'+img['org_src']
else:
print 'http:'+img['src']

def main():
queue=Queue.Queue()
url_start = 'http://jandan.net/pic/page-'
for i in range(293,295):
url = url_start+str(i)+'#comment'
queue.put(url) [color=red]#將循環拼接的url放入隊列中[/color]

threads=[]
thread_count=2 [color=red]#默認線程數（可自動修改）[/color]
for i in range(thread_count):
threads.append(Spider_Test(queue))
for i in threads:
i.start()
for i in threads:
i.join()

if __name__ == '__main__':[color=red] #在.py文件中使用這個條件語句，可以使這個條件語句塊中的命令只在它獨立運行時才執行[/color]
time_start = time.time()
main() [color=red]#調用main方法[/color]
print time.time()-time_start

[color=red]#背景知識[/color]
'''
q = Queue.Queue(maxsize = 10)
Queue.Queue類即是一個隊列的同步實現。隊列長度可為無限或者有限。可通過Queue的構造函數的可選參數maxsize來設定隊列長度。如果maxsize小於1就表示隊列長度無限。
將一個值放入隊列中
q.put(10)
調用隊列對象的put()方法在隊尾插入一個項目。put()有兩個參數，第一個item為必需的，為插入項目的值；第二個block為可選參數，默認為
1。如果隊列當前為空且block為1，put()方法就使調用線程暫停,直到空出一個數據單元。如果block為0，put方法將引發Full異常。
將一個值從隊列中取出
q.get()
調用隊列對象的get()方法從隊頭刪除並返回一個項目。可選參數為block，默認為True。如果隊列為空且block為True，get()就使調用線程暫停，直至有項目可用。如果隊列為空且block為False，隊列將引發Empty異常。

'''

[color=red]如果想要下載圖片需要
import urllib

再替換spider方法即可[/color]

def spider(self,url):
r=requests.get(url)
soup=BeautifulSoup(r.content,'lxml')
imgs = soup.find_all(name='img',attrs={})
urls=[]
for img in imgs:
if 'onload' in str(img):
print 'http:'+img['org_src']
urls.append('http:'+img['org_src'])
else:
print 'http:'+img['src']
url = urls.append('http:'+img['src'])
#下載圖片
k=0
for urlitem in urls:
k+=1
if '.jpg' in urlitem:
urllib.urlretrieve(url=urlitem,filename='F:\image\\'+str(k)+'.jpg')

[color=red]-----------多線程訪問網路實例[/color]
#coding:utf-8
import requests
import threading
import time
import sys

url = 'https://www..com'

def get_():
global url
time_start = time.time()
r = requests.get(url=url)
times = time.time()-time_start
sys.stdout.write('status:%s time:%s current_time:%s\n'%(r.status_code,times,time.strftime('%H:%M:%S')))

def main():
threads = []
thread_count = 10
for i in range(thread_count):
t = threading.Thread(target=get_,args=())
threads.append(t)
for i in range(thread_count):
threads[i].start()

for i in range(thread_count):
threads[i].join()

if __name__=='__main__':

⑵ python 如何將大量圖片的url保存到本地

你如果要保存圖片的url，直接把imgsrc寫入本地文件就可以了，urllib.request.urlretrieve(imgsrc)這個的意思是你要保存的不是圖片的url，而是要把圖片下載下來，這個是要批量爬取網站上的圖片，需要考慮網站的反爬蟲措施了。

⑶ python爬蟲將在線html網頁中的圖片鏈接替換成本地鏈接並將html文件下載到本地

import os,re
def check_flag(flag):
regex = re.compile(r'images\/')
result = True if regex.match(flag) else False
return result

#soup = BeautifulSoup(open('index.html'))
from bs4 import BeautifulSoup
html_content = '''
<a href="https://xxx.com">測試01</a>
<a href="https://yyy.com/123">測試02</a>
<a href="https://xxx.com">測試01</a>
<a href="https://xxx.com">測試01</a>
'''
file = open(r'favour-en.html','r',encoding="UTF-8")
soup = BeautifulSoup(file, 'html.parser')
for element in soup.find_all('img'):
if 'src' in element.attrs:
print(element.attrs['src'])
if check_flag(element.attrs['src']):
#if element.attrs['src'].find("png"):
element.attrs['src'] = "michenxxxxxxxxxxxx" +'/'+ element.attrs['src']

print("##################################")
with open('index.html', 'w',encoding="UTF-8") as fp:
fp.write(soup.prettify()) # prettify()的作⽤是將sp美化⼀下，有可讀性

⑷ python爬取到了src的鏈接怎麼去下載

把img轉成list然後用個for循環一個個下載唄，下載方法網上搜

⑸ 如何在centos安裝python

更新python千萬不要把老版本的刪除！新老版本是可以共存的，很多基本的命令、軟體包都要依賴預裝的老版本python的，比如yum。
[root@localhost ~]# wget Python-2.7.11.tgz
[root@localhost ~]# tar -zxvf Python-2.7.11.tgz
[root@localhost ~]# cd Python-2.7.11
[root@localhost ~]# make
[root@localhost ~]# make install //默認安裝到 /usr/local/lib/python2.7下
[root@localhost ~]# python -c "from distutils.sysconfig import get_python_lib; print (get_python_lib())"
/usr/local/lib/python2.7/site-packages
[root@localhost ~]# mv /usr/bin/python /usr/bin/python_old //修改舊的python版本為python_old
[root@localhost ~]# ln -s /usr/local/bin/python2.7 /usr/bin/python //建立軟連接指向到當前系統默認python命令的bin目錄，讓系統使用新版本python
補充：默認的python成功指向3.3.0以後，yum不能正常使用，需要修改yum的配置文件。

⑹ python中怎麼把圖中的圖片鏈接提取出來並且下載鏈接對應的圖片啊

你不已經提出出來了嗎？
在做個下載，保存就行了。
req=request.get(img.get('src'))
picture=req.content
path=r'D:\ProgramData\picture.png'
with open(path,'wb') as f:
f.write(picture)

⑺ 如何在Ubuntu和LinuxMint上安裝Python 3.6.0

Python 3.6.0是在編寫教程時的最新者老寬穩定版本。此Python版本可供下載和安裝。本文將幫助您在Ubuntu和Linuxmint操作系統上安裝Python 3.6.0。要了解這個版本的更多信息，請訪問Python官方網站。
步驟1 – 安裝所需的包
在安裝Python之前，請使用以下命令來安裝Python的含沖先決條件。
$ sudo apt-get install build-essential checkinstall
$ sudo apt-get install libreadline-gplv2-dev libncursesw5-dev libssl-dev libsqlite3-dev tk-dev libgdbm-dev libc6-dev libbz2-dev

步驟2 – 下載Python 3.6.0
從python官方網站下載Python使用下面的命令。您也可以下載最新版本代替下面指定。
$ cd /usr/src
$ wget

現在解壓下載的軟體包。
$ sudo tar xzf Python-3.6.0.tgz

第3步 – 編譯Python源
現在使用下面的命令來使用altinstall在你的系統上編譯python源代碼。
$ cd Python-3.6.0
$ sudo ./configure
$ sudo make altinstall

make altinstall 用於防止替換默認首亮的python二進制文件/ usr / bin / python。

步驟4 – 檢查Python版本
最後，您已經成功地在系統上安裝了Python 3.6。讓我們使用下面的命令檢查安裝的python的版本
# python3.6 -V

Python 3.6.0

導航:首頁 > 編程語言 > 批量src下載python

批量src下載python

與批量src下載python相關的資料