python判断含中文文字_python 判断是不是中文字

1. python 判断是不是中文字

法一：

isinstance(s, str) 用来判断是否为一般字符串

isinstance(s, unicode) 用来判断是否为unicode

或

if type(str).__name__!="unicode":
str=unicode(str,"utf-8")
else:
pass

法二：

Python chardet 字符编码判断
使用 chardet 可以很方便的实现字符串/文件的编码检测。尤其是中文网页，有的页面使用GBK/GB2312，有的使用UTF8，如果你需要去爬一些页面，知道网页编码很重要的，虽然HTML页面有charset标签，但是有些时候是不对的。那么chardet就能帮我们大忙了。

chardet实例
>>> import urllib
>>> rawdata = urllib.urlopen('http://www.google.cn/').read()
>>> import chardet
>>> chardet.detect(rawdata)
{'confidence': 0.98999999999999999, 'encoding': 'GB2312'}
>>>chardet可以直接用detect函数来检测所给字符的编码。函数返回值为字典，有2个元数，一个是检测的可信度，另外一个就是检测到的编码。

chardet 安装
下载chardet后，解压chardet压缩包，直接将chardet文件夹放在应用程序目录下，就可以使用import chardet开始使用chardet了。

或者使用setup.py安装文件，将chardet拷贝到Python系统目录下，这样所有的python程序只要用import chardet就可以了。

2. python 判断字符串中是否含有汉字

#! /usr/bin/python
# -*- coding: utf-8 -*-
import re
zhPattern = re.compile(u'[\u4e00-\u9fa5]+')
#一个小应用，判断一段文本中是否包含简体中：
contents=u'一个小应用，判断一段文本中是否包含简体中：'
match = zhPattern.search(contents)

if match:
print u'有中文：%s' % (match.group(0),)
else:
print u'没有包含中文'

3. python 判断是否有中文字符

根据GB2312－80标准，每个汉字的机内码由二个字节组成，每个字节的最高位均为1。
是以程序可以判断：

#include<stdio.h>
int main()
{int i,k=1,j=0;
unsigned char s[100];
gets(s);
for(i=0;s[i];i++)
if(s[i]>128){k=0;j++;}
if(j==i)printf("\"%s\"全部是由汉字组成\n",s);
else if(k)printf("\"%s\"中没有中文\n",s);
else printf("\"%s\"中有部分汉字\n",s);
system("pause");
}

4. python 文件是否含有中文

python判断是否是中文需要满足u'[u4e00-u9fa5]+'，需要注意如果正则表达式的模式中使用unicode，那么要匹配的字符串也必须转换为unicode，否则肯定会不匹配。

zhPattern = re.compile(u'[u4e00-u9fa5]+')

示例代码：

#-*-coding:utf-8-*-
importre
zhPattern=re.compile(u'[u4e00-u9fa5]+')
contents=u'判断一段文本中是否包含简体中：'
match=zhPattern.search(contents)
ifmatch:
printu'有中文：%s'%(match.group(0),)
else:
printu'没有包含中文'

5. Python判断字符串中是否有中文字符

首先，在Python中字符串的表示是用unicode编码。所以在做编码转换时，通常要以unicode作为中间编码。
decode的作用是将其他编码的字符串转换成unicode编码，比如 a.decode('utf-8')，表示将utf-8编码的字符串转换成unicode编码
encode的作用是将unicode编码的字符串转换成其他编码格式的字符串，比如b.encode('utf-8')，表示将unicode编码格式转换成utf-8编码格式的字符串

判断一个字符串中是否含有中文字符：
好了，有了以上知识，就可以很容易的解决这个问题了。这是代码

1 #-*- coding:utf-8 -*-
2
3 import sys
4 reload(sys)
5 sys.setdefaultencoding('utf8')
6
7 def check_contain_chinese(check_str):
8 for ch in check_str.decode('utf-8'):
9 if u'\u4e00' <= ch <= u'\u9fff':
10 return True
11 return False
12
13 if __name__ == "__main__":
14 print check_contain_chinese('中国')
15 print check_contain_chinese('xxx')
16 print check_contain_chinese('xx中国')
17
18 结果：
19 True
20 False
21 True

6. python 判断是否含有数字，英文字符和汉字

str=''
这里到str代表任意字符串
1.判断是否含有数字
if str >= u'\u4e00' and str =< u'\u9fa5':
return "包含汉字"
else:
return "不包含汉字"
2.判断一个unicode是否是英文字母
if (str>= u'\u0041' and str<=u'\u005a') or (str >= u'\u0061'and str<=u'\u007a'):
return "包含"
else:
return "不包含"
3.判断是否非汉字，数字和英文字符
if not (is_chinese(uchar) or is_number(uchar) or is_alphabet(uchar)):
return True
else:
return False

7. 14python 判断字符串中是否含有汉字

1. 判断字符串中是否含有汉字。

def has_hz(text):

hz_yes = False

for ch in text:

if isinstance(ch, unicode):

if unicodedata.east_asian_width(ch)!= 'Na':

hz_yes = True

break

else:

continue

return hz_yes

def has_hz(text):

hz_yes = False

for ch in text:

if isinstance(ch, unicode):

if unicodedata.east_asian_width(ch)!= 'Na':

hz_yes = True

break

else:

continue

return hz_yes

单元测试：

assert not has_hz("")

assert not has_hz(" ")

assert not has_hz("123")

assert not has_hz(u"123abc")

assert has_hz(u"123abc汉字")

assert has_hz(u"汉字")

assert not has_hz("")

assert not has_hz(" ")

assert not has_hz("123")

assert not has_hz(u"123abc")

assert has_hz(u"123abc汉字")

assert has_hz(u"汉字")

8. python判断是否含有中文字符及长度

1
2
3
4

#coding=utf-8

test_str = u'提问123'
print len(test_str) # 输出5

或者

1
2
3
4
5

#coding=utf-8

test_str = '提问123'
test_str_unicode = test_str.decode('utf-8')
print len(test_str_unicode) ＃输出5

求这种长度可以转化成求解码（unicode）的长度；报UnicodeDecodeError，应该是直接用了test_str.encode('utf-8')，这是编码。

导航:首页 > 编程语言 > python判断含中文文字

python判断含中文文字

与python判断含中文文字相关的资料