Web Scrapping
Data Scraping using Beautiful Soap
Import beautiful soap
Make a GET request to fetch page data
Parse HTML
Filter relevant parts
Step - 1: Fetch page data in HTML form
!pip install bs4
from urllib.request import urlopen
url = "https://en.wikipedia.org/wiki/Bengal"
data = urlopen(url)
print(type(data))
dhtml = data.read()
print(dhtml)Step-2: Filter Page Data
from bs4 import BeautifulSoup as soup
dsoup = soup(dhtml, 'html.parser')
print(type(dsoup))
dsoup.findAll('h1',{})
#findAll returns data as a list
table = dsoup.findAll('table',{'class':'sortable wikitable'})
print(len(table))
#printing 1st table in case is there are multiple tables with class name sortable wikitable
#but here we have only 1 table so table and dtable will return same table
dtable = table[0]
print(dtable)
Understanding the concept of idx,row in enumerate(row_data):
0
I
1
love
2
python
Copying the Data to CSV file:
Downloading files from web using Python
Note: Image can be seen in compiler (jupyter) local host
Download large files:
Downloading Videos:
Last updated
Was this helpful?