Python

For scraping with python, we will be using the following libraries:

These can be downloaded using the following commands:

This is a pre-setup replit that is able to be forked and edited as a simple and easy way to get started without having to install any special software

To import httpx and beautifulsoup - you must import them as follows:

import httpx
from bs4 import BeautifulSoup

After importing the libraries we can get started with the very basics

To start creating requests using httpx we first need to define a client (it is best practice to use a client but it isn't strictly necesary)

To set up the client type the following code:

url = "https://learnscrapingbydoing.github.io/scrapable/1.html"
with httpx.Client() as client:
    URIData = client.get(url)

This code snippit defines a variable called URL with a page that contains json with text in it

We then create an httpx.Client using with to easily create and destroy the instance automatically

After the client is created we send a get request to the URL variable we defined earlier

Adding the following lines to the code will then allow us to print out the results of our scraping attempts

  print(URIData.text)

We finish off with the whole file looking like this:

import httpx
from bs4 import BeautifulSoup
url = "https://learnscrapingbydoing.github.io/scrapable/1.html"
with httpx.Client() as client:
    URIData = client.get(url)
    print(URIData.text)

Running this file will print out the json that we just scraped!

It should look something like this:

<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <meta http-equiv="X-UA-Compatible" content="IE=edge">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>A very scrapable page</title>
</head>
<body>
    <pre>
[
    {
            "text": "id, libero. Donec consectetuer mauris id sapien. Cras dolor dolor,"
    },
    {
            "text": "eleifend nec, malesuada ut, sem. Nulla interdum. Curabitur dictum. Phasellus"
    },
    {
            "text": "felis, adipiscing fringilla, porttitor vulputate, posuere vulputate, lacus. Cras interdum."
    },
    {
            "text": "sit amet diam eu dolor egestas rhoncus. Proin nisl sem,"
    },
    {
            "text": "eget, ipsum. Donec sollicitudin adipiscing ligula. Aenean gravida nunc sed"
    }
]
</pre>
</body>
</html>

Dark Mode: Toggle