python正则捕获_python 正则表达式（*）

⑴ python 正则如何抓取 <a></a> 中 href 属性和标签里的内容

importre
pattern='<a.*?href="(.+)".*?>(.*?)</a>'
withopen("test.html","r")asfp:
forlineinfp:
ret=re.search(pattern,line)
ifret:
forxinret.groups():printx

不知道具体格式是怎样的，我这里也就简单举个例子。

groups获取到的就是正则pattern里面( )中的内容，以元组形式返回。

⑵ python 正则获取网页内容

importre
#id=45717
common_log_format_regex=re.compile('id=d+')
files=open("aaa.txt",'r',encoding='utf-8')
lines=files.readlines()
txt=''.join(lines)
files.close()
data=common_log_format_regex.findall(txt)
writer=open("id.txt",'w',encoding='utf-8')
writer.write('
'.join(data))
writer.close()

⑶ python 正则表达式（.*）

groups()返回所有捕获组构成的tuple。你的正则表达式中有唯一一个捕获组(.*?)，而?在此处表示非贪婪匹配，即在整个正则表达式成立的前提下匹配尽可能少的字符，此处最少的情况是什么也不匹配，整个正则表达式匹配Python中的Py，而捕获组自然为空字符串。

⑷ python 正则表达式，怎样匹配以某个字符串开头，以某个字符串结尾的情况

匹配以某个字符串开头，以某个字符串结尾的情况的正则表达式:^abc.*?qwe$

Python正则表达式的几种匹配用法：

1.测试正则表达式是否匹配字符串的全部或部分

regex=ur""#正则表达式
ifre.search(regex,subject):
do_something()
else:
do_anotherthing()

2.测试正则表达式是否匹配整个字符串

regex=ur"/Z"#正则表达式末尾以/Z结束
ifre.match(regex,subject):
do_something()
else:
do_anotherthing()

3.创建一个匹配对象，然后通过该对象获得匹配细节(Create an object with details about how the regex matches (part of) a string)

regex=ur""#正则表达式
match=re.search(regex,subject)
ifmatch:
# match start:match.start()
# match end(exclusive):atch.end()
# matched text:match.group()
do_something()
else:
do_anotherthing()

4.获取正则表达式所匹配的子串(Get the part of a string matched by the regex)

regex=ur""#正则表达式
match=re.search(regex,subject)
ifmatch:
result=match.group()
else:
result=""

5. 获取捕获组所匹配的子串(Get the part of a string matched by a capturing group)

regex=ur""#正则表达式
match=re.search(regex,subject)
ifmatch:
result=match.group(1)
else:
result=""

6. 获取有名组所匹配的子串(Get the part of a string matched by a named group)

regex=ur"" #正则表达式
match = re.search(regex, subject)
if match:
result = match.group"groupname")
else:
result = ""

7. 将字符串中所有匹配的子串放入数组中(Get an array of all regex matches in a string)

result=re.findall(regex,subject)

8.遍历所有匹配的子串(Iterate over all matches in a string)

formatchinre.finditer(r"<(.*?)/s*.*?//1>",subject)
# match start:match.start()
# match end(exclusive):atch.end()
# matched text:match.group()

9.通过正则表达式字符串创建一个正则表达式对象(Create an object to use the same regex for many operations)

reobj=re.compile(regex)

10.用法1的正则表达式对象版本（use regex object for if/else branch whether (part of) a string can be matched）

reobj=re.compile(regex)
ifreobj.search(subject):
do_something()
else:
do_anotherthing()

11.用法2的正则表达式对象版本（use regex object for if/else branch whether a string can be matched entirely）

reobj=re.compile(r"/Z")＃正则表达式末尾以/Z 结束
ifreobj.match(subject):
do_something()
else:
do_anotherthing()

12.创建一个正则表达式对象，然后通过该对象获得匹配细节（Create an object with details about how the regex object matches (part of) a string）

reobj=re.compile(regex)
match=reobj.search(subject)
ifmatch:
# match start:match.start()
# match end(exclusive):atch.end()
# matched text:match.group()
do_something()
else:
do_anotherthing()

13.用正则表达式对象获取匹配子串（Use regex object to get the part of a string matched by the regex）

reobj=re.compile(regex)
match=reobj.search(subject)
ifmatch:
result=match.group()
else:
result=""

14.用正则表达式对象获取捕获组所匹配的子串（Use regex object to get the part of a string matched by a capturing group）

reobj=re.compile(regex)
match=reobj.search(subject)
ifmatch:
result=match.group(1)
else:
result=""

15.用正则表达式对象获取有名组所匹配的子串（Use regex object to get the part of a string matched by a named group）

reobj=re.compile(regex)
match=reobj.search(subject)
ifmatch:
result=match.group("groupname")
else:
result=""

16.用正则表达式对象获取所有匹配子串并放入数组（Use regex object to get an array of all regex matches in a string）

reobj=re.compile(regex)
result=reobj.findall(subject)

17.通过正则表达式对象遍历所有匹配子串（Use regex object to iterate over all matches in a string）

reobj=re.compile(regex)
formatchinreobj.finditer(subject):
# match start:match.start()
# match end(exclusive):match.end()
# matched text:match.group()

⑸ Python 正则匹配抓取到的网页超链接怎么URLopen

re.findall返回的是列表，遍历列表将其值做为参数传入urlopen即可，类似于下面这样
for url in re.findall(XXXXXX):
print urlopen(url).read()

⑹ 在python正则表达式中\1是什么意思

\1
有两者意义：
1.
如果\1前面有捕获的分组的表达式即用()括起来的匹配，则
\1
表示对前面第一个捕获分组内容的引用。例如
([A-Z])567\1表示匹配前后为相同大写字母包围567的字串。
2.
如果\1前面没有捕获的分组的表达式即用()括起来的匹配，则
\1
表示匹配八进制数字1

⑺ python 正则表达式如何截取字符串中间的内容

示例代码

启动ipython先导入re模块

re 模块的一般使用步骤如下：

使用 compile 函数将正则表达式的字符串形式编译为一个 Pattern 对象
通过 Pattern 对象提供的一系列方法对文本进行匹配查找，获得匹配结果（一个 Match 对象）
最后使用 Match 对象提供的属性和方法获得信息，根据需要进行其他的操作

findall 方法的使用形式如下：

findall(string[, pos[, endpos]])

其中，string 是待匹配的字符串，pos 和 endpos 是可选参数，指定字符串的起始和终点位置，默认值分别是 0 和 len (字符串长度)。

findall 以列表形式返回全部能匹配的子串，如果没有匹配，则返回一个空列表。

⑻ Python正则表达式的非贪婪匹配的原则是什么

正则表达式匹配中，已经被捕获的内容不会再次用于匹配测试。
第二个k已经被第一个匹配捕获，然后就在45k67k中继续做匹配测试。

⑼ Python正则表达式（二）

上节我们说到 Python 正则表达式的基本字符，以及这些字符的用法

今天，我们继续讲讲 Python 中一些扩展标记法，以及一些特殊序列

(?...) : 这种扩展标记法以括号内 ? 开头，其后第一个字符决定了采用什么样的语法。

在 ? 后面添加( 'a', 'i', 'L', 'm', 's', 'u', 'x' 中的一个或多个)，然后加上匹配规则。

这些字符对正则表达式设置以下标记，免去设置 flag 参数

注意： 'a', 'L', 'u' 作为内联标记是相互排斥的，它们不能结合在一起

括号分组的非捕获版本，该分组所匹配的子字符串不能在执行匹配后被获取或是在之后的模式中被引用

可以配合 | 和 {m} 使用

为分组再指定一个组合名

每个组合名只能用一个正则表达式定义，只能定义一次

反向引用一个命名组合

匹配前面那个名字叫 name 的命名组中匹配到的字符串

注释信息，里面的内容会被忽略。

哈哈，是不是没看懂，没事，举个栗子

看看，是不是一下子就明了了。

哈哈，这个又看不懂？

思考一下，既然有根据后面字符断言的，那么根据前面字符来断言，也是很合理的，

如果给定的 id 或 name 存在，将会尝试匹配 yes-pattern ，否则就尝试匹配 no-pattern ， no-pattern 可选，也可以被忽略。

是不是有点像 if else 三目运算，其中 id 和 name 是分组 id 、和指定的分组名 name

照旧，举个栗子吧

看了栗子是不是有点糊涂呢，我们来解析一下这个正则表达式

其结果匹配的就是 <[email protected]> 和 [email protected] 。

而不会匹配 <[email protected] ' 和 <[email protected]

但是上面的第三个结果为啥不一样呢？

因为 findall 允许返回空匹配的，在有 ? 的情况下，所以它会分两种情况去匹配

今天讲了一些扩展标记法，其实没那么难，多看看例子，多练习练习。

下节将介绍 re 模块各函数的用法，敬请期待......

⑽ Python 正则获取文本中匹配内容

正则表达式：(?<=d+.)[sS]+?(?=d+|$)

我给你个java语言的例子:

publicclassAEF{
publicstaticvoidmain(String[]args){
Strings="12.ewq
example
fdsfdf
fd中文
13.wer
fdsfd
例子
14.qrew
发的萨芬的
fdsfs
15.fwewq
范德萨范德萨";
Stringregex="(?<=\d+\.)[\s\S]+?(?=\d+|$)";
Patternp=Pattern.compile(regex);
Matcherm=p.matcher(s);
while(m.find()){
System.out.println(m.group());
}
}
}

运行结果：

ewq
example
fdsfdf
fd中文

wer
fdsfd
例子

qrew
发的萨芬的
fdsfs

fwewq
范德萨范德萨

导航:首页 > 编程语言 > python正则捕获

python正则捕获

示例代码

与python正则捕获相关的资料