82 lines
1.9 KiB
Markdown
82 lines
1.9 KiB
Markdown
# BeautifulSoup 简介
|
||
|
||
> 原文:<https://www.pythonforbeginners.com/beautifulsoup/python-beautifulsoup-basic>
|
||
|
||
### 什么是 BeautifulSoup?
|
||
|
||
```py
|
||
BeautifulSoup is a Python library from [www.crummy.com](http://www.crummy.com/software/BeautifulSoup/ "crummy")
|
||
```
|
||
|
||
### 它能做什么
|
||
|
||
```py
|
||
On their website they write "Beautiful Soup parses anything you give it, and does the tree traversal stuff for you.
|
||
|
||
**You can tell it to:**
|
||
|
||
"Find all the links"
|
||
|
||
"Find all the links of class externalLink"
|
||
|
||
"Find all the links whose urls match "foo.com"
|
||
|
||
"Find the table heading that's got bold text, then give me that text.""
|
||
```
|
||
|
||
### 美丽的例子
|
||
|
||
```py
|
||
In this example, we will try and find a link (a tag) in a webpage.
|
||
|
||
Before we start, we have to import two modules. (BeutifulSoup and urllib2).
|
||
|
||
Urlib2 is used to open the URL we want.
|
||
|
||
We will use the soup.findAll method to search through the soup object to match fortext and html tags within the page.
|
||
```
|
||
|
||
```py
|
||
from BeautifulSoup import BeautifulSoup
|
||
import urllib2
|
||
|
||
url = urllib2.urlopen("http://www.python.org")
|
||
content = url.read()
|
||
soup = BeautifulSoup(content)
|
||
links = soup.findAll("a")
|
||
|
||
```
|
||
|
||
##### 输出
|
||
|
||
```py
|
||
That will print out all the elements in python.org with an "a" tag.
|
||
|
||
(The "a" tag defines a hyperlink, which is used to link from one page to another.)
|
||
```
|
||
|
||
### 美丽组图示例 2
|
||
|
||
```py
|
||
To make it a bit more useful, we can specify the URL's we want to return.
|
||
```
|
||
|
||
```py
|
||
from BeautifulSoup import BeautifulSoup
|
||
import urllib2
|
||
import re
|
||
|
||
url = urllib2.urlopen("http://www.python.org")
|
||
content = url.read()
|
||
soup = BeautifulSoup(content)
|
||
for a in soup.findAll('a',href=True):
|
||
if re.findall('python', a['href']):
|
||
print "Found the URL:", a['href']
|
||
|
||
```
|
||
|
||
##### 进一步阅读
|
||
|
||
```py
|
||
I recommend that you head over to [http://www.crummy.com](http://www.crummy.com/software/BeautifulSoup/ "Crummy") to read more about what you can do with this awesome module.
|
||
``` |