Web Scrapping

Data Scraping using Beautiful Soap

  • Import beautiful soap

  • Make a GET request to fetch page data

  • Parse HTML

  • Filter relevant parts

Step - 1: Fetch page data in HTML form

!pip install bs4
from urllib.request import urlopen
url = "https://en.wikipedia.org/wiki/Bengal"
data = urlopen(url)
print(type(data))
dhtml = data.read()
print(dhtml)

Step-2: Filter Page Data

from bs4 import BeautifulSoup as soup
dsoup = soup(dhtml, 'html.parser')
print(type(dsoup))
dsoup.findAll('h1',{})
#findAll returns data as a list
table = dsoup.findAll('table',{'class':'sortable wikitable'})
print(len(table))

#printing 1st table in case is there are multiple tables with class name sortable wikitable
#but here we have only 1 table so table and dtable will return same table 
dtable = table[0]
print(dtable)

Understanding the concept of idx,row in enumerate(row_data):

0

I

1

love

2

python

Copying the Data to CSV file:

Downloading files from web using Python

Note: Image can be seen in compiler (jupyter) local host

Download large files:

Downloading Videos:

Last updated

Was this helpful?