python re正则匹配网页中图片url地址的方法

站长资源 2024/12/28 佚名

51 1538 51

铁雪资源网 Design By www.gsvan.com

最近写了个python抓取必应搜索首页http://cn.bing.com/的背景图片并将此图片更换为我的电脑桌面的程序，在正则匹配图片url时遇到了匹配失败问题。

要抓取的图片地址如图所示：

首先，使用这个pattern

reg = re.compile('.*g_img={url: "(http.*"')

无论怎么匹配都匹配不到，后来把网页源码抓下来放在notepad++中查看，并用notepad++的正则匹配查找，很轻易就匹配到了，如图：

后来我写了个测试代码，把图片地址在的那一行保存在一个字符串中，很快就匹配到了，如下面代码所示，data是匹配不到的，然而line是可以匹配到的。

# -*-coding:utf-8-*-
import os
import re
 
f = open('bing.html','r')
 
line = r'''Bnp.Internal.Close(0,0,60056); } });;g_img={url: "https://az12410.vo.msecnd.net/homepage/app/2016hw/BingHalloween_BkgImg.jpg",id:'bgDiv',d:'200',cN'''
data = f.read().decode('utf-8','ignore').encode('gbk','ignore')
 
print " "
 
reg = re.compile('.*g_img={url: "(http.*"')
 
if re.match(reg, data):
  m1 = reg.findall(data)
  print m1[0]
else:
  print("data Not match .")
  
print 20*'-'
#print line
if re.match(reg, line):
  m2 = reg.findall(line)
  print m2[0]
else:
  print("line Not match .")

由此可见line和data是有区别的，什么区别呢？那就是data是多行的，包含换行符，而line是单行的，没有换行符。我有在字符串line中加了换行符，结果line没有匹配到。

到这了原因就清楚了。原因就在这句话

re.compile('.*g_img={url: "(http.*"')。

后来翻阅python文档，发现re.compile()这个函数的第二个可选参数flags。这个参数是re中定义的常量，有如下常量

re.DEBUG Display debug information about compiled expression.
re.I 
re.IGNORECASE Perform case-insensitive matching; expressions like [A-Z] will match lowercase letters, too. This is not affected by the current locale.

re.L 


re.LOCALE Make \w, \W, \b, \B, \s and \S dependent on the current locale.

re.M 


re.MULTILINE When specified, the pattern character '^' matches at the beginning of the string and at the beginning of each line (immediately following each newline); and the pattern character '$' matches at the end of the string and at the end of each line (immediately preceding each newline). By default, '^' matches only at the beginning of the string, and '$' only at the end of the string and immediately before the newline (if any) at the end of the string.

re.S 


re.DOTALL Make the '.' special character match any character at all, including a newline; without this flag, '.' will match anything except a newline.re.U re.UNICODE Make \w, \W, \b, \B, \d, \D, \s and \S dependent on the Unicode character properties database.New in version 2.0.

re.X 


re.VERBOSE This flag allows you to write regular expressions that look nicer and are more readable by allowing you to visually separate logical sections of the pattern and add comments. Whitespace within the pattern is ignored, except when in a character class or when preceded by an unescaped backslash. When a line contains a # that is not in a character class and is not preceded by an unescaped backslash, all characters from the leftmost such # through the end of the line are ignored.

这里我们需要的就是re.S 让'.'匹配所有字符，包括换行符。修改正则表达式为

reg = re.compile('.*g_img={url: "(http.*"', re.S)

即可完美解决问题。

以上这篇python re正则匹配网页中图片url地址的方法就是小编分享给大家的全部内容了，希望能给大家一个参考，也希望大家多多支持。

python,re,正则匹配,url

标签：

python,re,正则匹配,url

铁雪资源网 Design By www.gsvan.com

广告合作：本站广告合作请联系QQ：858582 申请时备注：广告合作（否则不回）
免责声明：本站文章均来自网站采集或用户投稿，网站不提供任何软件下载或自行开发的软件！如有用户或公司发现本站内容信息存在侵权行为，请邮件告知！ 858582#qq.com

铁雪资源网 Design By www.gsvan.com

评论“python re正则匹配网页中图片url地址的方法”

暂无python re正则匹配网页中图片url地址的方法的评论...

www.gsvan.com 铁雪资源网

39,976影音资源

144,792福利资源

1,817软件资源

431,128技术资源

最新文章

群星《奔赴！万人现场第2期》[FLAC/分轨][5

2024/12/28

群星《奇妙浪一夏 (上海迪士尼度假区音乐)》

2024/12/28

群星《奇妙浪一夏 (上海迪士尼度假区音乐)》

2024/12/28

【古典音乐】詹姆斯·高威《季节》1993[WAV+

2024/12/28

贝拉芳蒂《卡里普索之王》SACD[WAV+CUE]

2024/12/28

一句话新闻

苹果官宣WWDC 2024！预计会有大批AI功能 - 2024/12/28

3月27日消息，苹果宣布2024年全球开发者大会（WWDC）将于6月10日至6月14日举行，巧合的是，这次大会与端午假期重合。

苹果官方表示：

在线参加 Apple 每年规模最大的开发者盛会。亲眼见证 Apple 最新平台、技术和工具的发布。了解如何创建和改进你的 App 和游戏。与 Apple 设计师和工程师互动交流，与全球开发者社区建立联系。以上活动均免费在线举行。

探索各种新的工具、框架和功能，助力你打造出理想的 App 和游戏。通过视频讲座学习新技能，与 Apple 专家进行一对一会面，以推进你的项目，完善你的构思。

Swift Student Challenge 旨在支持和鼓舞下一代开发者、创作者和企业家。太平洋时间 3 月 28 日，我们将公布今年的获奖者名单。获奖者将有资格参加在 Apple Park 举办的特别活动。我们还会选出 50 名杰出获胜者，他们将受邀前往库比提诺，获得为期三天的非凡体验，包括参加 Apple Park 的特别活动。

python re正则匹配网页中图片url地址的方法

python,re,正则匹配,url

用Python编写一个高效的端口扫描器的方法

python使用pdfminer解析pdf文件的方法示例

评论“python re正则匹配网页中图片url地址的方法”

RTX 5090要首发性能要翻倍！三星展示GDDR7显存

友情链接

python re正则匹配网页中图片url地址的方法

python,re,正则匹配,url

用Python编写一个高效的端口扫描器的方法

python使用pdfminer解析pdf文件的方法示例

评论“python re正则匹配网页中图片url地址的方法”

RTX 5090要首发 性能要翻倍！三星展示GDDR7显存

友情链接

RTX 5090要首发性能要翻倍！三星展示GDDR7显存