Scraping a website to get HTML is something you need to occasionally or often, depending on your job. Many times this is used to get competitors’ offers and pricing.
If and when you have a legal need to scrap a website, the Beautiful Soup package and Python allow you to easily download the content as HTML from any URL. From here it could be used to directly import into a file or database.
import urllib.request from bs4 import BeautifulSoup page = urllib.request.urlopen("https://splash.gallery/portfolio/").read() html = BeautifulSoup(page, "html.parser") print(html.title.string) print(html.find_all(["h3", "p"]))
print(html.title.string) line will fetch the title.
print(html.find_all([“h3”, “p”])) will fetch all header 3 and paragraph elements.
MAC only – If you get SSL: CERTIFICATE_VERIFY_FAILED error for https URLs – run the Install Certificates.command in applications/python3.7 (or your current version)