① php判断来访是搜索引擎蜘蛛还是普通用户的代码小结
1、推荐的一种方法:php判断搜索引擎蜘蛛爬虫还是人为访问代码,摘自Discuz x3.2
<?php
function checkrobot($useragent=''){
static $kw_spiders = array('bot', 'crawl', 'spider' ,'slurp', 'sohu-search', 'lycos', 'robozilla');
static $kw_browsers = array('msie', 'netscape', 'opera', 'konqueror', 'mozilla');
$useragent = strtolower(empty($useragent) ? $_SERVER['HTTP_USER_AGENT'] : $useragent);
if(strpos($useragent, 'http://') === false && dstrpos($useragent, $kw_browsers)) return false;
if(dstrpos($useragent, $kw_spiders)) return true;
return false;
}
function dstrpos($string, $arr, $returnvalue = false) {
if(empty($string)) return false;
foreach((array)$arr as $v) {
if(strpos($string, $v) !== false) {
$return = $returnvalue ? $v : true;
return $return;
}
}
return false;
}
if(checkrobot()){
echo '机器人爬虫';
}else{
echo '人';
}
?>
实际应用中可以这样判断,直接不是搜索引擎才执行操作
<?php
if(!checkrobot()){
//do something
}
?>
2、第二种方法:
使用PHP实现蜘蛛访问日志统计
$useragent = addslashes(strtolower($_SERVER['HTTP_USER_AGENT']));
if (strpos($useragent, 'googlebot')!== false){$bot = 'Google';}
elseif (strpos($useragent,'mediapartners-google') !== false){$bot = 'Google Adsense';}
elseif (strpos($useragent,'spider') !== false){$bot = 'Bai';}
elseif (strpos($useragent,'sogou spider') !== false){$bot = 'Sogou';}
elseif (strpos($useragent,'sogou web') !== false){$bot = 'Sogou web';}
elseif (strpos($useragent,'sosospider') !== false){$bot = 'SOSO';}
elseif (strpos($useragent,'360spider') !== false){$bot = '360Spider';}
elseif (strpos($useragent,'yahoo') !== false){$bot = 'Yahoo';}
elseif (strpos($useragent,'msn') !== false){$bot = 'MSN';}
elseif (strpos($useragent,'msnbot') !== false){$bot = 'msnbot';}
elseif (strpos($useragent,'sohu') !== false){$bot = 'Sohu';}
elseif (strpos($useragent,'yoBot') !== false){$bot = 'Yo';}
elseif (strpos($useragent,'twiceler') !== false){$bot = 'Twiceler';}
elseif (strpos($useragent,'ia_archiver') !== false){$bot = 'Alexa_';}
elseif (strpos($useragent,'iaarchiver') !== false){$bot = 'Alexa';}
elseif (strpos($useragent,'slurp') !== false){$bot = '雅虎';}
elseif (strpos($useragent,'bot') !== false){$bot = '其它蜘蛛';}
if(isset($bot)){
$fp = @fopen('bot.txt','a');
fwrite($fp,date('Y-m-d H:i:s')."\t".$_SERVER["REMOTE_ADDR"]."\t".$bot."\t".'http://'.$_SERVER['SERVER_NAME'].$_SERVER["REQUEST_URI"]."\r\n");
fclose($fp);
}
第三种方法:
我们可以通过HTTP_USER_AGENT来判断是否是蜘蛛,搜索引擎的蜘蛛都有自己的独特标志,下面列取了一部分。
function is_crawler() {
$userAgent = strtolower($_SERVER['HTTP_USER_AGENT']);
$spiders = array(
'Googlebot', // Google 爬虫
'Baispider', // 网络爬虫
'Yahoo! Slurp', // 雅虎爬虫
'YoBot', // 有道爬虫
'msnbot' // Bing爬虫
// 更多爬虫关键字
);
foreach ($spiders as $spider) {
$spider = strtolower($spider);
if (strpos($userAgent, $spider) !== false) {
return true;
}
}
return false;
}
下面的php代码附带了更多的蜘蛛标识
function isCrawler() {
echo $agent= strtolower($_SERVER['HTTP_USER_AGENT']);
if (!empty($agent)) {
$spiderSite= array(
"TencentTraveler",
"Baispider+",
"BaiGame",
"Googlebot",
"msnbot",
"Sosospider+",
"Sogou web spider",
"ia_archiver",
"Yahoo! Slurp",
"YouBot",
"Yahoo Slurp",
"MSNBot",
"Java (Often spam bot)",
"BaiDuSpider",
"Voila",
"Yandex bot",
"BSpider",
"twiceler",
"Sogou Spider",
"Speedy Spider",
"Google AdSense",
"Heritrix",
"Python-urllib",
"Alexa (IA Archiver)",
"Ask",
"Exabot",
"Custo",
"OutfoxBot/YoBot",
"yacy",
"SurveyBot",
"legs",
"lwp-trivial",
"Nutch",
"StackRambler",
"The web archive (IA Archiver)",
"Perl tool",
"MJ12bot",
"Netcraft",
"MSIECrawler",
"WGet tools",
"larbin",
"Fish search",
);
foreach($spiderSite as $val) {
$str = strtolower($val);
if (strpos($agent, $str) !== false) {
return true;
}
}
} else {
return false;
}
}
if (isCrawler()){
echo "你好蜘蛛精!";
}
else{
echo "你不是蜘蛛精啊!";
② 如何利用php来判断是百度蜘蛛还是普通用户,是普通用户进入a.htm,百度蜘蛛进入b.htm
function checkrobot($useragent = '') {
static $kw_spiders = 'Bot|Crawl|Spider|slurp|sohu-search|lycos|robozilla';
static $kw_browsers = 'MSIE|Netscape|Opera|Konqueror|Mozilla';
$useragent = empty($useragent) ? $_SERVER['HTTP_USER_AGENT'] : $useragent;
if(!strexists($useragent, 'http://') && preg_match("/($kw_browsers)/i", $useragent)) {
return false;
} elseif(preg_match("/($kw_spiders)/i", $useragent)) {
return true;
} else {
return false;
}
}
③ php 如何区分 useragent是不是伪造的
PHP通过内置全局变量$_SERVER['HTTP_USER_AGENT']来获取用户信息,包括浏览器信息,操作系统等;判断是否是手机还是电脑终端访问,只需判断他的$_SERVER['HTTP_USER_AGENT']信息是否存在手机终端类型即可。示例如下:
④ php获取网页源码内容有哪些办法
可以参考以下几种方法:
方法一: file_get_contents获取
<span style="white-space:pre"></span>$url="http://www..com/";
<span style="white-space:pre"></span>$fh= file_get_contents
('http://www.hxfzzx.com/news/fzfj/');<span style="white-space:pre"></span>echo $fh;
拓展资料
PHP(外文名:PHP: Hypertext Preprocessor,中文名:“超文本预处理器”)是一种通用开源脚本语言。语法吸收了C语言、Java和Perl的特点,利于学习,使用广泛,主要适用于Web开发领域。PHP 独特的语法混合了C、Java、Perl以及PHP自创的语法。它可以比CGI或者Perl更快速地执行动态网页。
用PHP做出的动态页面与其他的编程语言相比,PHP是将程序嵌入到HTML(标准通用标记语言下的一个应用)文档中去执行,执行效率比完全生成HTML标记的CGI要高许多;PHP还可以执行编译后代码,编译可以达到加密和优化代码运行,使代码运行更快。
⑤ 如何用PHP判断搜索引擎蜘蛛来路急!
一下是DZ代码中的实现细节,你可以参考一下: 其实PHP有个很简单的方式去实现,通过_SERVER这个预定义变量中的_SERVER['HTTP_USER_AGENT']可以取得访问者的属性,具体可以看下Diiscuz!是如何判断搜索引擎的,函数代码如下:
function getrobot() {
if(!defined('IS_ROBOT')) {
kw_spiders = 'Bot|Crawl|Spider|slurp|sohu-search|lycos|robozilla';
kw_browsers = 'MSIE|Netscape|Opera|Konqueror|Mozilla';
if(preg_match("/(kw_browsers)/", _SERVER['HTTP_USER_AGENT'])) {
define('IS_ROBOT', FALSE);
} elseif(preg_match("/(kw_spiders)/", _SERVER['HTTP_USER_AGENT'])) {
define('IS_ROBOT', TRUE);
} else {
define('IS_ROBOT', FALSE);
}
}
return IS_ROBOT;
}
根据上面还可以精简如下:
if(preg_match("/(Bot|Crawl|Spider|slurp|sohu-search|lycos|robozilla)/i", _SERVER['HTTP_USER_AGENT'])) {
echo 'robot';
}
如果你需要返回详细的搜索引擎名称,而不是是否是搜索引擎机器人的话,请看下面的代码:
function get_naps_bot() {
useragent = strtolower(_SERVER['HTTP_USER_AGENT']);
if (strpos(useragent, 'googlebot') !== false){
return 'Googlebot';
}
if (strpos(useragent, 'msnbot') !== false){
return 'MSNbot';
}
if (strpos(useragent, 'slurp') !== false){
return 'Yahoobot';
}
if (strpos(useragent, 'spider') !== false){
return 'Baispider';
}
if (strpos(useragent, 'sohu-search') !== false){
return 'Sohubot';
}
if (strpos(useragent, 'lycos') !== false){
return 'Lycos';
}
if (strpos(useragent, 'robozilla') !== false){
return 'Robozilla';
}
return false;
}
⑥ PHP判断普通用户或蜘蛛,调用不同代码
定义一个函数 get_naps_bot()
如果是 BOT 则返回字符串, 如果不是 BOT 返回 false
function get_naps_bot()
{
$useragent = strtolower($_SERVER['HTTP_USER_AGENT']);
if (strpos($useragent, 'googlebot') !== false){
return 'Googlebot';
}
if (strpos($useragent, 'msnbot') !== false){
return 'MSNbot';
}
if (strpos($useragent, 'slurp') !== false){
return 'Yahoobot';
}
if (strpos($useragent, 'spider') !== false){
return 'Baispider';
}
if (strpos($useragent, 'sohu-search') !== false){
return 'Sohubot';
}
if (strpos($useragent, 'lycos') !== false){
return 'Lycos';
}
if (strpos($useragent, 'robozilla') !== false){
return 'Robozilla';
}
return false;
}
$botName = get_naps_bot();
if( empty($botName ) )
{
include( "11.php" );// 用户访问
}
else
{
include( "22.php" ); // 蜘蛛访问
}
⑦ 在php中如何模拟HTTP_USER_AGENT,一个网站需要验证这个,请问下怎么模拟
<?php
//client
$ch=curl_init();
curl_setopt_array($ch,
array(
CURLOPT_URL=>'http://localhost/ua.php',
CURLOPT_USERAGENT=>"YeRenChai_v1.0",
CURLOPT_RETURNTRANSFER=>True,
CURLOPT_FOLLOWLOCATION=>True,
)
);
$response=curl_exec($ch);
if(!$response)exit(curl_error($ch));
var_mp($response);
?>
⑧ php 判断是否是手机还是电脑
<?php
error_reporting(E_ALL||~E_NOTICE);
header("Content-Type:text/html;charset=utf8");
functionisMobile(){
$useragent=isset($_SERVER['HTTP_USER_AGENT'])?$_SERVER['HTTP_USER_AGENT']:'';
$useragent_commentsblock=preg_match('|(.*?)|',$useragent,$matches)>0?$matches[0]:'';
functionCheckSubstrs($substrs,$text){
foreach($substrsas$substr)
if(false!==strpos($text,$substr)){
returntrue;
}
returnfalse;
}
$mobile_os_list=array('GoogleWirelessTranscoder','WindowsCE','WindowsCE','Symbian','Android','armv6l','armv5','Mobile','CentOS','mowser','AvantGo','OperaMobi','J2ME/MIDP','Smartphone','Go.Web','Palm','iPAQ');
$mobile_token_list=array('Profile/MIDP','Configuration/CLDC-','160×160','176×220','240×240','240×320','320×240','UP.Browser','UP.Link','SymbianOS','PalmOS','PocketPC','SonyEricsson','Nokia','BlackBerry','Vodafone','BenQ','Novarra-Vision','Iris','NetFront','HTC_','Xda_','SAMSUNG-SGH','Wapaka','DoCoMo','iPhone','iPod');
$found_mobile=CheckSubstrs($mobile_os_list,$useragent_commentsblock)||
CheckSubstrs($mobile_token_list,$useragent);
if($found_mobile){
returntrue;
}else{
returnfalse;
}
}
if(isMobile()){
echo"是在手机端";
exit();
}else{
echo"是在PC端";
exit();
}
?>
⑨ 在PHP中如何模拟HTTP_USER_AGENT
在curl里可以设置UA
<?php
//client
$ch=curl_init();
curl_setopt_array($ch,
array(
CURLOPT_URL=>'http://localhost/ua.php',
CURLOPT_USERAGENT=>"YeRenChai_v1.0",
CURLOPT_RETURNTRANSFER=>True,
CURLOPT_FOLLOWLOCATION=>True,
)
);
$response=curl_exec($ch);
if(!$response)exit(curl_error($ch));
var_mp($response);
?>
<?php//server
echo$_SERVER['HTTP_USER_AGENT'];
?>
⑩ PHP如何判断网页是否有搜索引擎机器人在访问浏览
使用PHP技术搭建,因此我们用php去判断是否是搜索引擎,PHP有个很简单的方式去实现,通过_SERVER这个预定义变量中的_SERVER['HTTP_USER_AGENT']可以取得访问者的属性,具体可以看下Diiscuz!是如何判断搜索引擎的,函数代码如下:
function getrobot() {
if(!defined('IS_ROBOT')) {
kw_spiders = 'Bot|Crawl|Spider|slurp|sohu-search|lycos|robozilla';
kw_browsers = 'MSIE|Netscape|Opera|Konqueror|Mozilla';
if(preg_match("/(kw_browsers)/", $_SERVER['HTTP_USER_AGENT'])) {
define('IS_ROBOT', FALSE);
} elseif(preg_match("/(kw_spiders)/", $_SERVER['HTTP_USER_AGENT'])) {
define('IS_ROBOT', TRUE);
} else {
define('IS_ROBOT', FALSE);
}
}
return IS_ROBOT;
}
根据上面还可以精简如下(最终我们采用了这个方案):
if(preg_match("/(Bot|Crawl|Spider|slurp|sohu-search|lycos|robozilla)/i", $_SERVER['HTTP_USER_AGENT'])) {
echo 'robot';
}
如果你需要返回详细的搜索引擎名称,而不是是否是搜索引擎机器人的话,请看下面的代码:
function get_naps_bot() {
useragent = strtolower($_SERVER['HTTP_USER_AGENT']);
if (strpos(useragent, 'googlebot') !== false){
return 'Googlebot';
}
if (strpos(useragent, 'msnbot') !== false){
return 'MSNbot';
}
if (strpos(useragent, 'slurp') !== false){
return 'Yahoobot';
}
if (strpos(useragent, 'spider') !== false){
return 'Baispider';
}
if (strpos(useragent, 'sohu-search') !== false){
return 'Sohubot';
}
if (strpos(useragent, 'lycos') !== false){
return 'Lycos';
}
if (strpos(useragent, 'robozilla') !== false){
return 'Robozilla';
}
return false;
}