python多進程多線程實例_python 多進程和多線程配合

① python中的進程-實戰部分

如果想了解進程可以先看一下這一篇 python中的進程-理論部分

python中的多線程無法利用多核優勢，如果想要充分地使用多核CPU的資源（os.cpu_count()查看），在python中大部分情況需要使用多進程。Python提供了multiprocessing。
multiprocessing模塊用來開啟子進程，並在子進程中執行我們定製的任務（比如函數），該模塊與多線程模塊threading的編程介面類似。

multiprocessing模塊的功能眾多：支持子進程、通信和共享數據、執行不同形式的同步，提供了Process、Queue、Pipe、Lock等組件。

需要再次強調的一點是：與線程不同，進程沒有任何共享狀態，進程修改的數據，改動僅限於該進程內。

創建進程的類 ：

參數介紹：

group參數未使用，值始終為None

target表示調用對象，即子進程要執行的任務

args表示調用對象的位置參數元組，args=(1,2,'tiga',)

kwargs表示調用對象的字典,kwargs={'name':'tiga','age':18}

name為子進程的名稱

方法介紹：

p.start()：啟動進程，並調用該子進程中的p.run()
p.run():進程啟動時運行的方法，正是它去調用target指定的函數，我們自定義類的類中一定要實現該方法

p.terminate():強制終止進程p，不會進行任何清理操作，如果p創建了子進程，該子進程就成了僵屍進程，使用該方法需要特別小心這種情況。如果p還保存了一個鎖那麼也將不會被釋放，進而導致死鎖
p.is_alive():如果p仍然運行，返回True

p.join([timeout]):主線程等待p終止（強調：是主線程處於等的狀態，而p是處於運行的狀態）。timeout是可選的超時時間，需要強調的是，p.join只能join住start開啟的進程，而不能join住run開啟的進程

屬性介紹：

注意：在windows中Process()必須放到# if __name__ == '__main__':下

創建並開啟子進程的兩種方式

方法一:

方法二：

有了join，程序不就是串列了嗎？？？

terminate與is_alive

name與pid

② python 多進程和多線程配合

由於python的多線程中存在PIL鎖，因此python的多線程不能利用多核，那麼，由於現在的計算機是多核的，就不能充分利用計算機的多核資源。但是python中的多進程是可以跑在不同的cpu上的。因此，嘗試了多進程+多線程的方式，來做一個任務。比如：從中科大的鏡像源中下載多個rpm包。
#!/usr/bin/pythonimport reimport commandsimport timeimport multiprocessingimport threadingdef download_image(url):
print '*****the %s rpm begin to download *******' % url
commands.getoutput('wget %s' % url)def get_rpm_url_list(url):
commands.getoutput('wget %s' % url)
rpm_info_str = open('index.html').read()

regu_mate = '(?<=<a href=")(.*?)(?=">)'
rpm_list = re.findall(regu_mate, rpm_info_str)

rpm_url_list = [url + rpm_name for rpm_name in rpm_list] print 'the count of rpm list is: ', len(rpm_url_list) return rpm_url_
def multi_thread(rpm_url_list):
threads = [] # url = 'https://mirrors.ustc.e.cn/centos/7/os/x86_64/Packages/'
# rpm_url_list = get_rpm_url_list(url)
for index in range(len(rpm_url_list)): print 'rpm_url is:', rpm_url_list[index]
one_thread = threading.Thread(target=download_image, args=(rpm_url_list[index],))
threads.append(one_thread)

thread_num = 5 # set threading pool, you have put 4 threads in it
while 1:
count = min(thread_num, len(threads)) print '**********count*********', count ###25,25,...6707%25

res = [] for index in range(count):
x = threads.pop()
res.append(x) for thread_index in res:
thread_index.start() for j in res:
j.join() if not threads:
def multi_process(rpm_url_list):
# process num at the same time is 4
process = []
rpm_url_group_0 = []
rpm_url_group_1 = []
rpm_url_group_2 = []
rpm_url_group_3 = [] for index in range(len(rpm_url_list)): if index % 4 == 0:
rpm_url_group_0.append(rpm_url_list[index]) elif index % 4 == 1:
rpm_url_group_1.append(rpm_url_list[index]) elif index % 4 == 2:
rpm_url_group_2.append(rpm_url_list[index]) elif index % 4 == 3:
rpm_url_group_3.append(rpm_url_list[index])
rpm_url_groups = [rpm_url_group_0, rpm_url_group_1, rpm_url_group_2, rpm_url_group_3] for each_rpm_group in rpm_url_groups:
each_process = multiprocessing.Process(target = multi_thread, args = (each_rpm_group,))
process.append(each_process) for one_process in process:
one_process.start() for one_process in process:
one_process.join()# for each_url in rpm_url_list:# print '*****the %s rpm begin to download *******' %each_url## commands.getoutput('wget %s' %each_url)
def main():
url = 'https://mirrors.ustc.e.cn/centos/7/os/x86_64/Packages/'
url_paas = 'http://mirrors.ustc.e.cn/centos/7.3.1611/paas/x86_64/openshift-origin/'
url_paas2 ='http://mirrors.ustc.e.cn/fedora/development/26/Server/x86_64/os/Packages/u/'

start_time = time.time()
rpm_list = get_rpm_url_list(url_paas) print multi_process(rpm_list) # print multi_thread(rpm_list)
#print multi_process()
# print multi_thread(rpm_list)
# for index in range(len(rpm_list)):
# print 'rpm_url is:', rpm_list[index]
end_time = time.time() print 'the download time is:', end_time - start_timeprint main()123456789101112131415161718

代碼的功能主要是這樣的：
main（）方法中調用get_rpm_url_list（base_url）方法，獲取要下載的每個rpm包的具體的url地址。其中base_url即中科大基礎的鏡像源的地址，比如：http://mirrors.ustc.e.cn/centos/7.3.1611/paas/x86_64/openshift-origin/，這個地址下有幾十個rpm包，get_rpm_url_list方法將每個rpm包的url地址拼出來並返回。
multi_process（rpm_url_list）啟動多進程方法，在該方法中，會調用多線程方法。該方法啟動4個多進程，將上面方法得到的rpm包的url地址進行分組，分成4組，然後每一個組中的rpm包再最後由不同的線程去執行。從而達到了多進程+多線程的配合使用。
代碼還有需要改進的地方，比如多進程啟動的進程個數和rpm包的url地址分組是硬編碼，這個還需要改進，畢竟，不同的機器，適合同時啟動的進程個數是不同的。

③ Python進階：聊聊IO密集型任務、計算密集型任務，以及多線程、多進程

Python中常見的並發方式有：多線程和多進程。多線程適用於IO密集型任務，而多進程適用於計算密集型任務。

在Python中，多線程是通過在單個進程中啟動多個線程實現的。然而，由於全局解釋鎖（GIL）的存在，Python的多線程實際上是「交替執行」，而非真正並行。因此，對於計算密集型任務，多線程並不理想。

相比之下，多進程能夠充分利用CPU資源，特別是對於計算密集型任務。Python提供了多進程介面，如multiprocessing模塊，支持創建進程、傳遞數據等。進程之間的交互通過管道或隊列完成。

為直觀展示多線程與多進程的適用場景，以IO密集型任務為例。首先，定義隊列和初始化隊列的函數。接著，分別實現IO密集型任務與計算密集型任務，從隊列獲取任務數據。通過對比不同並發方式的執行用時，可以發現，多線程適用於IO密集型任務，而多進程在計算密集型任務上表現更優。

實際操作中，通過實例代碼進行驗證，對比多線程、多進程執行相同任務的時間，發現多進程在計算密集型任務上顯著提高了效率。

代碼實例和詳細實驗結果已上傳至GitHub，歡迎訪問：xianhu/LearnPython。如果您對Python的多線程、多進程有任何疑問或建議，歡迎在GitHub頁面參與討論。讓我們一起交流學習，共同進步。

導航:首頁 > 編程語言 > python多進程多線程實例

python多進程多線程實例

與python多進程多線程實例相關的資料