
Il web scraping è diventato uno strumento indispensabile per raccogliere dati da Internet, consentendo ad analisti di dati, appassionati di tecnologia e aziende di prendere decisioni informate. Ma l'estrazione dei dati è solo il primo passo. Per sbloccare tutto il suo potenziale, è necessario esportarli in modo efficiente nel formato giusto, che sia un file CSV per i fogli di calcolo, JSON per leAPI o database per l'archiviazione e l'analisi su larga scala.
Questo blog vi illustrerà gli elementi essenziali dell'esportazione di dati web-scraped. Imparerete passo dopo passo a lavorare con i file CSV e JSON, a integrare i dati web-scraped con i database e a sfruttare al meglio le vostre pratiche di gestione dei dati.
Before diving into the script, let’s understand the dataset and workflow that we’ll use to demonstrate the data-saving process.
We’ll be scraping data from the website Books to Scrape, which provides a list of books along with their:
This website is designed for practice purposes, making it an ideal choice for showcasing web scraping techniques.
Here’s the process we’ll follow:
requests e BeautifulSoup libraries to extract the book details from the website.To run the script, you’ll need the following Python libraries:
Install these libraries using . Run the following command in your terminal: pip
pip installa le richieste di beautifulsoup4 pandas
Here’s the Python script to scrape the data from the website and store it in a Pandas DataFrame:
import requests
from bs4 import BeautifulSoup
import pandas as pd
# Scrape data from the website
def scrape_books():
url = "https://books.toscrape.com/"
response = requests.get(url)
if response.status_code != 200:
raise Exception("Failed to load page")
soup = BeautifulSoup(response.content, "html.parser")
books = []
# Extract book data
for article in soup.find_all("article", class_="product_pod"):
title = article.h3.a["title"]
price = article.find("p", class_="price_color").text.strip()
availability = article.find("p", class_="instock availability").text.strip()
books.append({"Title": title, "Price": price, "Availability": availability})
# Convert to DataFrame
books_df = pd.DataFrame(books)
return books_df
# Main execution
if __name__ == "__main__":
print("Scraping data...")
books_df = scrape_books()
print("Data scraped successfully!")
print(books_df)
The table we will use to demonstrate the data-saving process is structured as follows:
| Title | Prezzo | Disponibilità |
| A Light in the Attic | £51.77 | In stock |
| Tipping the Velvet | £53.74 | In stock |
| Soumission | £50.10 | In stock |
| Sharp Objects | £47.82 | In stock |
| Sapiens: A Brief History of Humankind | £54.23 | NA |
| The Requiem Red | £22.65 | In stock |
| ... | ... | .... |
Use the method from Pandas: to_csv
def save_to_csv(dataframe, filename="books.csv"):
dataframe.to_csv(filename, index=False)
print(f"Data saved to {filename}")
Code Explanation:
filename: Specifies the name of the output file.index=False: Ensures the index column is not included in the CSV file. Use the method from Pandas: to_json
def save_to_json(dataframe, filename="books.json"):
dataframe.to_json(filename, orient="records", indent=4)
print(f"Data saved to {filename}")
Code Explanation:
orient="records": : Each row in the DataFrame is converted into a JSON object.indent=4: Formats the JSON for better readability. Use the method from Pandas con SQLite: to_sql
import sqlite3
def save_to_database(dataframe, database_name="books.db"):
conn = sqlite3.connect(database_name)
dataframe.to_sql("books", conn, if_exists="replace", index=False)
conn.close()
print(f"Data saved to {database_name} database")
Code Explanation:
sqlite3.connect(database_name): Connects to the SQLite database (creates it if it doesn’t exist).to_sql("books", conn, if_exists="replace", index=False):While formats like CSV or JSON work well for smaller projects, databases offer superior performance, query optimization, and data integrity when handling larger datasets. The seamless integration of Pandas with SQLite makes it simple to store, retrieve, and manipulate data efficiently. Whether you're building a data pipeline or a complete application, understanding how to leverage databases will greatly enhance your ability to work with data effectively. Start using these tools today to streamline your data workflows and unlock new possibilities!