scrapylinux_同时装有python27 python35 scrapy命令怎么在python27上面运行

① 如何在云服务器上部署持久运行scrapy

作为linux服务器管理员,经常要使用ssh登陆到远程linux机器上做一些耗时的操作。
也许你遇到过使用telnet或SSH远程登录linux,运行一些程序。如果这些程序需要运行很长时间(几个小时)，而程序运行过程中出现网络故障，或者客户机故障，这时候客户机与远程服务器的链接将终端，并且远程服务器没有正常结束的命令将被迫终止。
又比如你SSH到主机上后，开始批量的scp命令，如果这个ssh线程断线了，scp进程就中断了。在远程服务器上正在运行某些耗时的作业，但是工作还没做完快要下班了，退出的话就会中断操作了，如何才好呢?
我们利用screen命令可以很好的解决这个问题。实现在断开SSH的情况下,在服务器上继续执行程序。
那什么是screen命令?
Screen被称之为一个全屏窗口管理器，用他可以轻松在一个物理终端上获得多个虚拟终端的效果。
Screen功能说明：
简单来说，Screen是一个可以在多个进程之间多路复用一个物理终端的窗口管理器,这意味着你能够使用一个单一的终端窗口运行多终端的应用。Screen中有会话的概念，用户可以在一个screen会话中创建多个screen窗口，在每一个screen窗口中就像操作一个真实的telnet/SSH连接窗口那样。
Screen命令语法：
screen [-AmRvx -ls -wipe][-d <作业名称>][-h <行数>][-r <作业名称>][-s ][-S <作业名称>]
Screen命令参数：
-A -[rR] 将所有的视窗都调整为目前终端机的大小。
-c filename 用指定的filename文件替代screen的配置文件’.screenrc’.
-d [pid.tty.host] 断开screen进程(使用该命令时，screen的状态一定要是Attached，也就是说有用户连在screen里)。一般进程的名字是以pid.tty.host这种形式表示(用screen -list命令可以看出状态)。
-D [pid.tty.host] 与-d命令实现一样的功能，区别就是如果执行成功，会踢掉原来在screen里的用户并让他logout。
-h <行数> 指定视窗的缓冲区行数。
-ls或–list 显示目前所有的screen作业。
-m 即使目前已在作业中的screen作业，仍强制建立新的screen作业。
-p number or name 预先选择一个窗口。
-r [pid.tty.host] 恢复离线的screen进程，如果有多个断开的进程，需要指定[pid.tty.host]
-R 先试图恢复离线的作业。若找不到离线的作业，即建立新的screen作业。
-s shell 指定建立新视窗时，所要执行的shell。
-S <作业名称> 指定screen作业的名称。(用来替代[pid.tty.host]的命名方式,可以简化操作).
-v 显示版本信息。
-wipe 检查目前所有的screen作业，并删除已经无法使用的screen作业。
-x 恢复之前离线的screen作业。
Screen命令的常规用法:
screen -d -r:连接一个screen进程，如果该进程是attached，就先踢掉远端用户再连接。
screen -D -r:连接一个screen进程，如果该进程是attached，就先踢掉远端用户并让他logout再连接
screen -ls或者-list:显示存在的screen进程，常用命令
screen -m:如果在一个Screen进程里，用快捷键crtl+a c或者直接打screen可以创建一个新窗口,screen -m可以新建一个screen进程。
screen -dm:新建一个screen，并默认是detached模式，也就是建好之后不会连上去。
screen -p number or name:预先选择一个窗口。
Screen实现后台运行程序的简单步骤:
1> 要进行某项操作时，先使用命令创建一个Screen:
代码如下:
[linux@user~]$ screen -S test1
2>接着就可以在里面进行操作了，如果你的任务还没完成就要走开的话，使用命令保留Screen：
代码如下:
[linux@user~]$ Ctrl+a+d #按Ctrl+a，然后再按d即可保留Screen
[detached] #这时会显示出这个提示，说明已经保留好Screen了
如果你工作完成的话，就直接输入:
代码如下:
[linux@user~]$ exit #这样就表示成功退出了
[screen is terminating]
3> 如果你上一次保留了Screen，可以使用命令查看：
代码如下:
[linux@user~]$ screen -ls
There is a screen on:
9649.test1 (Detached)
恢复Screen，使用命令：
代码如下:
[linux@user~]$ screen -r test1 (or 9649)
Screen命令中用到的快捷键
Ctrl+a c ：创建窗口
Ctrl+a w ：窗口列表
Ctrl+a n ：下一个窗口
Ctrl+a p ：上一个窗口
Ctrl+a 0-9 ：在第0个窗口和第9个窗口之间切换
Ctrl+a K(大写) ：关闭当前窗口，并且切换到下一个窗口(当退出最后一个窗口时，该终端自动终止，并且退回到原始shell状态)
exit ：关闭当前窗口，并且切换到下一个窗口(当退出最后一个窗口时，该终端自动终止，并且退回到原始shell状态)
Ctrl+a d ：退出当前终端，返回加载screen前的shell命令状态
多窗口
screen，像许多的窗口管理器一样，能支持多窗口。这个功能在处理多个任务且同时没有打开新的会话时很有用。作为一个系统管理员，我常常要同时开四五个SSH会话。在每个shell下，我可能要处理两三个任务。不使用screen的话，需要15个SSH 会话，15次登录，15个窗口等等。使用screen，每个系统都分配到一个单独的会话中，我通过screen来管理系统上不同的作业。
要打开新的窗口，只需要使用“Ctrl-A”“c”。创建的新的窗口会显示一个默认的命令提示符。例如，我可以运行top命令后再打开一个新的窗口来做其它的工作。Top继续留在那运行!可以亲身实验一下，启动screen并运行top。(注：为了节省空间我截断了多个屏幕。)
启动top
代码如下:
Mem: 506028K av, 500596K used, 5432K free,
0K shrd, 11752K buff
Swap: 1020116K av, 53320K used, 966796K free
393660K cached
< p> PID USER PRI NI SIZE RSS SHARE STAT %CPU %ME

6538 root 25 0 1892 1892 596 R 49.1 0.3
6614 root 16 0 1544 1544 668 S 28.3 0.3
7198 admin 15 0 1108 1104 828 R 5.6 0.2
现在可以通过“Ctrl-A”“c”来打开一个新窗口
代码如下:
[admin@ensim admin]$
To get back to top, use "Ctrl-A "n"
Mem: 506028K av, 500588K used, 5440K free,
0K shrd, 11960K buff
Swap: 1020116K av, 53320K used, 966796K free
392220K cached
< p> PID USER PRI NI SIZE RSS SHARE STAT %CPU %ME

6538 root 25 0 1892 1892 596 R 48.3 0.3
6614 root 15 0 1544 1544 668 S 30.7 0.3
你可以创建多个窗口然后通过“Ctrl-A”“n”切换到下一个窗口，或者使用“Ctrl-A”“p”返回上一个窗口。当你在其它窗口工作时，其它窗口的每个程序都会保持运行。
退出screen
有两种方式退出screen。第一种和登出一个shell一样，你可以通过“Ctrl-A”“K”或者“exit”来终止一个窗口。这样当前的窗口会被关闭，如果你打开了多个窗口，你就会直接转到其余中的一个，而如果是仅有的一个窗口时，你就退出了screen。
另外一种退出screen的方式是分离窗口。这种方式只是简单地关闭了窗口但进程仍运行着。如果你有确定要长时间执行的进程，还需要关闭SSH程序时，你便可以使用“Ctrl-A”“d”分离窗口。这会使你回到shell中。所有的screen窗口都待在那里，你可以稍后重新接管它们。(译者注：这很像我们实际中的最小化窗口和程序后台运行)
接管会话
假设你正用着screen花了很长时间编译着一个程序，突然间你的连接断开了。请不用担心，screen会保存你的编译进度。重新登录你的操作系统后使用screen列表工具查看有哪些会话正在运行：
代码如下:
[root@gigan root]# screen -ls
There are screens on:
31619.ttyp2.gigan (Detached)
4731.ttyp2.gigan (Detached)
2 Sockets in /tmp/screens/S-root.
在这里，我有两个不同的screen会话。要需要重新接管其中一个，使用恢复窗口的命令：
代码如下:
[root@gigan root]#screen -r 31619.ttyp2.gigan
只需要使用 -r 选项再接会话的名，现在你便可以重新回到刚才的屏幕。令人欣喜的是，你还可以在任何地方重新接管。不论在办公室还是其它客户端上，你都可以使用screen来启动一项工作然后退出。
多窗口
screen，像许多的窗口管理器一样，能支持多窗口。这个功能在处理多个任务且同时没有打开新的会话时很有用。作为一个系统管理员，我常常要同时开四五个SSH会话。在每个shell下，我可能要处理两三个任务。不使用screen的话，需要15个SSH 会话，15次登录，15个窗口等等。使用screen，每个系统都分配到一个单独的会话中，我通过screen来管理系统上不同的作业。
要打开新的窗口，只需要使用“Ctrl-A”“c”。创建的新的窗口会显示一个默认的命令提示符。例如，我可以运行top命令后再打开一个新的窗口来做其它的工作。Top继续留在那运行!可以亲身实验一下，启动screen并运行top。(注：为了节省空间我截断了多个屏幕。)
启动top
代码如下:
Mem: 506028K av, 500596K used, 5432K free,
0K shrd, 11752K buff
Swap: 1020116K av, 53320K used, 966796K free
393660K cached
< p> PID USER PRI NI SIZE RSS SHARE STAT %CPU %ME

6538 root 25 0 1892 1892 596 R 49.1 0.3
6614 root 16 0 1544 1544 668 S 28.3 0.3
7198 admin 15 0 1108 1104 828 R 5.6 0.2
现在可以通过“Ctrl-A”“c”来打开一个新窗口
代码如下:
[admin@ensim admin]$
To get back to top, use "Ctrl-A "n"
Mem: 506028K av, 500588K used, 5440K free,
0K shrd, 11960K buff
Swap: 1020116K av, 53320K used, 966796K free
392220K cached
< p> PID USER PRI NI SIZE RSS SHARE STAT %CPU %ME

6538 root 25 0 1892 1892 596 R 48.3 0.3
6614 root 15 0 1544 1544 668 S 30.7 0.3
你可以创建多个窗口然后通过“Ctrl-A”“n”切换到下一个窗口，或者使用“Ctrl-A”“p”返回上一个窗口。当你在其它窗口工作时，其它窗口的每个程序都会保持运行。
退出screen
有两种方式退出screen。第一种和登出一个shell一样，你可以通过“Ctrl-A”“K”或者“exit”来终止一个窗口。这样当前的窗口会被关闭，如果你打开了多个窗口，你就会直接转到其余中的一个，而如果是仅有的一个窗口时，你就退出了screen。
另外一种退出screen的方式是分离窗口。这种方式只是简单地关闭了窗口但进程仍运行着。如果你有确定要长时间执行的进程，还需要关闭SSH程序时，你便可以使用“Ctrl-A”“d”分离窗口。这会使你回到shell中。所有的screen窗口都待在那里，你可以稍后重新接管它们。(译者注：这很像我们实际中的最小化窗口和程序后台运行)
接管会话
假设你正用着screen花了很长时间编译着一个程序，突然间你的连接断开了。请不用担心，screen会保存你的编译进度。重新登录你的操作系统后使用screen列表工具查看有哪些会话正在运行：
代码如下:
[root@gigan root]# screen -ls
There are screens on:
31619.ttyp2.gigan (Detached)
4731.ttyp2.gigan (Detached)
2 Sockets in /tmp/screens/S-root.
在这里，我有两个不同的screen会话。要需要重新接管其中一个，使用恢复窗口的命令：
代码如下:
[root@gigan root]#screen -r 31619.ttyp2.gigan
只需要使用 -r 选项再接会话的名，现在你便可以重新回到刚才的屏幕。令人欣喜的是，你还可以在任何地方重新接管。不论在办公室还是其它客户端上，你都可以使用screen来启动一项工作然后退出。

② linux 怎么删除scrapy

一.安装scrapy
pip install Scrapy 由于scrapy相关依赖较多，因此在安装过程中可能遇到如下问题：
1．ImportError: No mole named w3lib.http
解决：pip install w3lib
2．ImportError: No mole named twisted
解决：pip install twisted
3．ImportError: No mole named lxml.html
解决：pip install lxml
4．error: libxml/xmlversion.h: No such file or directory
解决：apt-get install libxml2-dev libxslt-dev
apt-get install python-lxml
5．ImportError: No mole named cssselect
解决：pip install cssselect
6．ImportError: No mole named OpenSSL
解决：pip install pyOpenSSL
以上基本涵盖安装过程中可能出现的依赖问题，如有遗漏待发现后补充

使用scrapy --version 如显示出版本信息则安装成功

③ 如何在ubuntu中安装scrapy

Scrapy是Python开发的一个快速,高层次的屏幕抓取和web抓取框架，用于抓取web站点并从页面中提取结构化的数据。Scrapy用途广泛，可以用于数据挖掘、监测和自动化测试。官网网站http://www.scrapy.org/
1、安装如下软件

sudo apt-get install build-essential;
sudo apt-get install python-dev;
sudo apt-get install libxml2-dev;
sudo apt-get install libxslt1-dev;
sudo apt-get install python-setuptools;
2、安装Scrapy

sudo easy_install Scrapy;
wang@ubuntu:/usr/local/lib/python2.7/dist-packages$ sudo easy_install Scrapy
Searching for Scrapy
Best match: Scrapy 0.16.1
Processing Scrapy-0.16.1-py2.7.egg
Scrapy 0.16.1 is already the active version in easy-install.pth
Installing scrapy script to /usr/local/bin

Using /usr/local/lib/python2.7/dist-packages/Scrapy-0.16.1-py2.7.egg
Processing dependencies for Scrapy
Searching for lxml
Reading http://pypi.python.org/simple/lxml/
Reading http://codespeak.net/lxml
Best match: lxml 3.0.1
Downloading http://pypi.python.org/packages/source/l/lxml/lxml-3.0.1.tar.gz#md5=
Processing lxml-3.0.1.tar.gz
Running lxml-3.0.1/setup.py -q bdist_egg --dist-dir /tmp/easy_install-qibAzL/lxml-3.0.1/egg-dist-tmp-mSvUVN
Building lxml version 3.0.1.
Building without Cython.
Using build configuration of libxslt 1.1.26
Building against libxml2/libxslt in the following directory: /usr/lib/x86_64-linux-gnu
warning: no files found matching '*.txt' under directory 'src/lxml/tests'
src/lxml/lxml.etree.c: In function ‘__pyx_f_4lxml_5etree__getFilenameForFile’:
src/lxml/lxml.etree.c:26310:7: warning: variable ‘__pyx_clineno’ set but not used [-Wunused-but-set-variable]
src/lxml/lxml.etree.c:26309:15: warning: variable ‘__pyx_filename’ set but not used [-Wunused-but-set-variable]
src/lxml/lxml.etree.c:26308:7: warning: variable ‘__pyx_lineno’ set but not used [-Wunused-but-set-variable]
src/lxml/lxml.etree.c: In function ‘__pyx_pf_4lxml_5etree_4XSLT_18__call__’:
src/lxml/lxml.etree.c:132608:81: warning: passing argument 1 of ‘__pyx_f_4lxml_5etree_12_XSLTContext__’ from incompatible pointer type [enabled by default]
src/lxml/lxml.etree.c:130569:52: note: expected ‘struct __pyx_obj_4lxml_5etree__XSLTContext *’ but argument is of type ‘struct __pyx_obj_4lxml_5etree__BaseContext *’
src/lxml/lxml.etree.c: In function ‘__pyx_f_4lxml_5etree__XSLT’:
src/lxml/lxml.etree.c:133997:79: warning: passing argument 1 of ‘__pyx_f_4lxml_5etree_12_XSLTContext__’ from incompatible pointer type [enabled by default]
src/lxml/lxml.etree.c:130569:52: note: expected ‘struct __pyx_obj_4lxml_5etree__XSLTContext *’ but argument is of type ‘struct __pyx_obj_4lxml_5etree__BaseContext *’
src/lxml/lxml.etree.c: At top level:
src/lxml/lxml.etree.c:12128:13: warning: ‘__pyx_f_4lxml_5etree_displayNode’ defined but not used [-Wunused-function]
src/lxml/lxml.etree.c: In function ‘__pyx_f_4lxml_5etree_11_BaseParser__parseDocFromFile’:
src/lxml/lxml.etree.c:86715:3: warning: ‘__pyx_r’ may be used uninitialized in this function [-Wuninitialized]
src/lxml/lxml.etree.c: In function ‘__pyx_f_4lxml_5etree_11_BaseParser__parseDoc’:
src/lxml/lxml.etree.c:86403:3: warning: ‘__pyx_r’ may be used uninitialized in this function [-Wuninitialized]
src/lxml/lxml.etree.c: In function ‘__pyx_f_4lxml_5etree_11_BaseParser__parseUnicodeDoc’:
src/lxml/lxml.etree.c:86093:3: warning: ‘__pyx_r’ may be used uninitialized in this function [-Wuninitialized]
src/lxml/lxml.etree.c: In function ‘__pyx_f_4lxml_5etree_11_BaseParser__parseDocFromFilelike’:
src/lxml/lxml.etree.c:86925:3: warning: ‘__pyx_r’ may be used uninitialized in this function [-Wuninitialized]
Adding lxml 3.0.1 to easy-install.pth file

Installed /usr/local/lib/python2.7/dist-packages/lxml-3.0.1-py2.7-linux-x86_64.egg
Searching for w3lib>=1.2
Reading http://pypi.python.org/simple/w3lib/
Reading http://github.com/scrapy/w3lib
Best match: w3lib 1.2
Downloading http://pypi.python.org/packages/source/w/w3lib/w3lib-1.2.tar.gz#md5=
Processing w3lib-1.2.tar.gz
Running w3lib-1.2/setup.py -q bdist_egg --dist-dir /tmp/easy_install-ZAXTgy/w3lib-1.2/egg-dist-tmp-aU3vpc
zip_safe flag not set; analyzing archive contents...
Adding w3lib 1.2 to easy-install.pth file

Installed /usr/local/lib/python2.7/dist-packages/w3lib-1.2-py2.7.egg
Searching for Twisted>=8.0
Reading http://pypi.python.org/simple/Twisted/
Reading http://www.twistedmatrix.com
Reading http://twistedmatrix.com/procts/download
Reading http://twistedmatrix.com/
Reading http://tmrc.mit.e/mirror/twisted/Twisted/9.0/
Reading http://tmrc.mit.e/mirror/twisted/Twisted/10.0/
Reading http://twistedmatrix.com/projects/core/
Reading http://tmrc.mit.e/mirror/twisted/Twisted/8.2/
Reading http://tmrc.mit.e/mirror/twisted/Twisted/8.1/
Best match: Twisted 12.2.0
Downloading http://pypi.python.org/packages/source/T/Twisted/Twisted-12.2.0.tar.bz2#md5=
Processing Twisted-12.2.0.tar.bz2
Running Twisted-12.2.0/setup.py -q bdist_egg --dist-dir /tmp/easy_install-kw897y/Twisted-12.2.0/egg-dist-tmp-sZWFYb
In file included from /usr/include/python2.7/Python.h:8:0,
from twisted/internet/_sigchld.c:9:
/usr/include/python2.7/pyconfig.h:1161:0: warning: "_POSIX_C_SOURCE" redefined [enabled by default]
/usr/include/features.h:215:0: note: this is the location of the previous definition
twisted/internet/_sigchld.c: In function ‘got_signal’:
twisted/internet/_sigchld.c:15:13: warning: variable ‘ignored_result’ set but not used [-Wunused-but-set-variable]
Adding Twisted 12.2.0 to easy-install.pth file
Installing mailmail script to /usr/local/bin
Installing conch script to /usr/local/bin
Installing pyhtmlizer script to /usr/local/bin
Installing twistd script to /usr/local/bin
Installing lore script to /usr/local/bin
Installing tkconch script to /usr/local/bin
Installing tapconvert script to /usr/local/bin
Installing ckeygen script to /usr/local/bin
Installing tap2rpm script to /usr/local/bin
Installing manhole script to /usr/local/bin
Installing trial script to /usr/local/bin
Installing cftp script to /usr/local/bin
Installing tap2deb script to /usr/local/bin

Installed /usr/local/lib/python2.7/dist-packages/Twisted-12.2.0-py2.7-linux-x86_64.egg
Finished processing dependencies for Scrapy
表示安装成功。

3、测试

scrapy shell http://ziki.cn
获取所有a标签

hxs.select('//a').extract()
参考资料

http://doc.scrapy.org/en/latest/intro/install.html
http://doc.scrapy.org/en/latest/intro/tutorial.html

④ linux环境下怎样监控scrapy程序的运行状况

#!/bin/sh

#------------------------------------------------------------------------------
# 函数: CheckProcess
# 功能: 检查一个进程是否存在
# 参数: $1 --- 要检查的进程名称
# 返回: 如果存在返回0, 否则返回1.
#------------------------------------------------------------------------------
CheckProcess()
{
# 检查输入的参数是否有效
if [ "$1" = "" ];
then
return 1
fi

#$PROCESS_NUM获取指定进程名的数目，为1返回0，表示正常，不为1返回1，表示有错误，需要重新启动
PROCESS_NUM=`ps -ef | grep "$1" | grep -v "grep" | wc -l`
if [ $PROCESS_NUM -eq 1 ];
then
return 0
else
return 1
fi
}

# 检查test实例是否已经存在
while [ 1 ] ; do
CheckProcess "test"
CheckQQ_RET=$?
if [ $CheckQQ_RET -eq 1 ];
then

# 杀死所有test进程，可换任意你需要执行的操作

killall -9 test
exec ./test &
fi
sleep 1
done

⑤ 如何升级crapy版本 linux

在terminal里面看到scrapy的版本是1.0.3，为了升级到最新版本，可以采用一行命令即可。
?

1

sudo pip install --upgrade scrapy

安装完成之后，查看版本：
?

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21

☁ Code scrapy --version
Scrapy 1.3.0 - no active project

Usage:
scrapy <command> [options] [args]

Available commands:
bench Run quick benchmark test
commands
fetch Fetch a URL using the Scrapy downloader
genspider Generate new spider using pre-defined templates
runspider Run a self-contained spider (without creating a project)
settings Get settings values
shell Interactive scraping console
startproject Create new project
version Print Scrapy version
view Open URL in browser, as seen by Scrapy

[ more ] More commands available when run from project directory

Use "scrapy <command> -h" to see more info about a command

⑥ 如何抓取linux系统的eth0网卡来自172.16.2.10的80端口的包

tcpmp -i eth0 -nn ' port 80 and host 172.16.2.10'
使用tcpmp指令来抓取包，后面接相应的参数即可。

⑦ 如何在linux下安装支持python3的scrapy

如何在linux下安装支持python3的scrapy
window)的历史内容已经被tmux接管了，所以原来console/terminal提供的Shift+PgUp/PgDn所显示的内容并不是当前窗口的历史内容，所以要用C-b
[进入-mode，然后才能用PgUp/PgDn/光标/Ctrl-S等键在-mode中移动。
如果要启用鼠标滚轮来卷动窗口内容的话，可以按C-b
:然后输入
setw
mode-mouse
on
这就可以了。如果要对所有窗口开启的话:
setw
-g
mode-mouse
on

⑧ linux下怎么用pycharm调试scrapy-linux博客

方法/步骤首先，打开pycharm，同时来检查一下是否安装好了git。用命令行来执行 git version，会有结果出来，就证明了git安装好了，然后就通过git下载代码。将代码导入到pycharm中，会发现右上角有提示，意思就是找不到git的路径，无法解析代.

⑨ 同时装有python2.7 python3.5 scrapy命令怎么在python2.7上面运行

终端scrapy命令实际上是调用了python安装文件夹的子文件夹Scripts里的scrapy.exe(windows系统)或者scrapy.sh(linux系统)

所以如果你在终端键入scrapy执行的是python3.5的, 说明你的默认python是3.5

这个时候要想执行python2.7的,有几个方法:

改变path环境变量, 设置python2.7为默认python(这个太麻烦, 不推荐)
执行scrapy命令(假设要执行scrapy startproject projectname)的时候不直接键入scrapy, 而是加上scrapy的绝对路径,
windows上: C:InstallAnaconda2Scriptsscrapy startproject projectname
linux 同理
把python2.7的scrapy.exe所在的文件夹(我这里是C:InstallAnaconda2Scripts)添加到环境变量Path里, 将scrapy.exe重命名为scrapy2.exe(linux同理)
然后终端运行的时候就输入 scrapy2 startproject projectname 即可

4. 用virtualenv创建两个隔离的虚拟python环境, 分别执行

⑩ 如何在linux ubuntu 下安装scapy pyx

最近在学习爬虫，早就听说Python写爬虫极爽（貌似pythoner说python都爽，不过也确实，python的类库非常丰富，不用重复造轮子），还有一个强大的框架Scrapy，于是决定尝试一下。
要想使用Scrapy第一件事，当然是安装Scrapy，尝试了Windows和Ubuntu的安装，本文先讲一下 Ubuntu的安装，比Windows的安装简单太多了。抽时间也会详细介绍一下怎么在Windows下进行安装。
官方介绍，在安装Scrapy前需要安装一系列的依赖.
* Python 2.7： Scrapy是Python框架，当然要先安装Python ，不过由于Scrapy暂时只支持 Python2.7，因此首先确保你安装的是Python 2.7
* lxml：大多数Linux发行版自带了lxml
* OpenSSL：除了windows之外的系统都已经提供
* Python Package: pip and setuptools. 由于现在pip依赖setuptools,所以安装pip会自动安装setuptools
有上面的依赖可知，在非windows的环境下安装 Scrapy的相关依赖是比较简单的，只用安装pip即可。Scrapy使用pip完成安装。
检查Scrapy依赖是否安装
你可能会不放心自己的电脑是否已经安装了，上面说的已经存在的依赖，那么你可以使用下面的方法检查一下，本文使用的是Ubuntu 14.04。
检查Python的版本
$ python --version
如果看到下面的输出，说明Python的环境已经安装，我这里显示的是Python 2.7.6，版本也是2.7的满足要求。如果没有出现下面的信息，那么请读者自行网络安装Python，本文不介绍Python的安装（网上一搜一堆）。

检查lxml和OpenSSL是否安装
假设已经安装了Python，在控制台输入python，进入Python的交互环境。

然后分别输入import lxml和import OpenSSL如果没有报错，说明两个依赖都已经安装。

安装python-dev和libevent
python-dev是linux上开发python比较重要的工具，以下的情况你需要安装
* 你需要自己安装一个源外的python类库, 而这个类库内含需要编译的调用python api的c/c++文件
* 你自己写的一个程序编译需要链接libpythonXX.(a|so)
libevent是一个时间出发的高性能的网络库，很多框架的底层都使用了libevent
上面两个库是需要安装的，不然后面后报错。使用下面的指令安装
$sudo apt-get install python-dev
$sudo apt-get install libevent-dev
安装pip
因为Scrapy可以使用pip方便的安装，因此我们需要先安装pip，可以使用下面的指令安装pip
$ sudo apt-get install python-pip
使用pip安装Scrapy
使用下面的指令安装Scrapy。
$ sudo pip install scrapy
记住一定要获得root权限，否则会出现下面的错误。

至此scrapy安装完成，使用下面的命令检查Scrapy是否安装成功。
$ scrapy version
显示如下结果说明安装成功，此处的安装版本是1.02

导航:首页 > 操作系统 > scrapylinux

scrapylinux

与scrapylinux相关的资料