微信扫码
添加专属顾问
我要投稿
网页如下,有多个链接:
找到其中的a标签:
<a hotrep="doc.overview.modules.path.0.0.1" href="https://cloud.tencent.com/document/product/1093/35681" title="产品优势">
产品优势
</a>
在deepseek中输入提示词:
你是一个Python编程专家,要完成一个百度搜索页面爬取的Python脚本,具体任务如下:
解析网页:https://cloud.tencent.com/document/product/1093
定位class="rno-learning-path-wrap"的div标签;
然后定位div标签中所有a标签,提取title属性值作为网页文件名,提取href属性值作为网页下载地址,下载网页,保存网页到文件夹:F:\aivideo\腾讯云语音识别
注意:
每一步都要输出信息到屏幕上
每下载一个网页后,随机暂停3-6秒;
设置请求标头:
Accept:
text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7
Accept-Encoding:
gzip, deflate, br, zstd
Accept-Language:
zh-CN,zh;q=0.9,en;q=0.8,en-GB;q=0.7,en-US;q=0.6
Cache-Control:
max-age=0
Priority:
u=0, i
Referer:
https://cloud.tencent.com/product/asr?from_column=20421&from=20421
Sec-Ch-Ua:
" Not A;Brand";v="99", "Chromium";v="127", "Google Chrome";v="127"
Sec-Ch-Ua-Mobile:
?0
Sec-Ch-Ua-Platform:
"Windows"
Sec-Fetch-Dest:
document
Sec-Fetch-Mode:
navigate
Sec-Fetch-Site:
same-origin
Sec-Fetch-User:
?1
Upgrade-Insecure-Requests:
1
User-Agent:
Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/127.0.0.0 Safari/537.36
Deepseek生成的源代码:
import requests
from bs4 import BeautifulSoup
import os
import time
import random
# 设置请求头
headers = {
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7",
"Accept-Encoding": "gzip, deflate, br, zstd",
"Accept-Language": "zh-CN,zh;q=0.9,en;q=0.8,en-GB;q=0.7,en-US;q=0.6",
"Cache-Control": "max-age=0",
"Priority": "u=0, i",
"Referer": "https://cloud.tencent.com/product/asr?from_column=20421&from=20421",
"Sec-Ch-Ua": '" Not A;Brand";v="99", "Chromium";v="127", "Google Chrome";v="127"',
"Sec-Ch-Ua-Mobile": "?0",
"Sec-Ch-Ua-Platform": '"Windows"',
"Sec-Fetch-Dest": "document",
"Sec-Fetch-Mode": "navigate",
"Sec-Fetch-Site": "same-origin",
"Sec-Fetch-User": "?1",
"Upgrade-Insecure-Requests": "1",
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/127.0.0.0 Safari/537.36"
}
# 创建文件夹
output_folder = 'F:\\aivideo\\腾讯云语音识别'
if not os.path.exists(output_folder):
os.makedirs(output_folder)
# 请求网页
url = "https://cloud.tencent.com/document/product/1093"
response = requests.get(url, headers=headers)
response.encoding = 'utf-8'
# 解析网页
soup = BeautifulSoup(response.text, 'html.parser')
# 定位class="rno-learning-path-wrap"的div标签
div_tag = soup.find('div', class_='rno-learning-path-wrap')
# 定位div标签中所有a标签
a_tags = div_tag.find_all('a')
# 提取title属性值作为网页文件名,提取href属性值作为网页下载地址
for a_tag in a_tags:
title = a_tag.get('title')
href = a_tag.get('href')
if title and href:
# 输出信息到屏幕上
print(f"Found link: {title} - {href}")
# 下载网页
try:
response = requests.get(href, headers=headers)
response.encoding = 'utf-8'
file_path = os.path.join(output_folder, f"{title}.html")
with open(file_path, 'w', encoding='utf-8') as file:
file.write(response.text)
print(f"Downloaded {href} to {file_path}")
except Exception as e:
print(f"Failed to download {href}: {e}")
# 随机暂停3-6秒
time.sleep(random.uniform(3, 6))
53AI,企业落地大模型首选服务商
产品:场景落地咨询+大模型应用平台+行业解决方案
承诺:免费POC验证,效果达标后再合作。零风险落地应用大模型,已交付160+中大型企业
2026-02-05
当AI能替你干活,什么能力决定你不可替代?
2026-02-03
Claude Code 创始人亲授:10 条进阶秘籍(附 12 条工作流 Prompt 清单)
2026-01-31
全网首发!一张通往“人类禁区”的船票,手把手教你把 AI 送进 Moltbook
2026-01-30
ESLint创始人揭秘:用AI角色扮演编程,效率飙升
2026-01-30
谷歌官方推出!10 个 Gem 提示词,附详细Gem自律助手创建流程
2026-01-30
Skill手搓“自动化PPT神器”,不写一行代码
2026-01-29
简单的AGENTS.md竟然完胜复杂Skills,Vercel实测
2026-01-25
Claude Code 最佳实践:50 个实用技巧
2025-11-14
2025-12-03
2025-12-26
2025-12-17
2026-01-18
2025-11-27
2025-11-09
2026-01-04
2025-11-30
2026-01-07
2026-02-05
2026-01-21
2026-01-16
2026-01-13
2026-01-05
2025-12-22
2025-12-14
2025-12-03