Building a Full-Stack Web Scraping Dashboard with Python and React

Building a Full-Stack Web Scraping Dashboard with Python and React

Scraping projects are a great way to flex your full-stack web development muscles. This article is going to run you through a quick example of how you can build out a simple scraping dashboard. Those of you who are either subscribed to my Ghost blogs, follow me on Twitter or saw the article on Medium might recognize this code: it’s from my blog on structuring React apps. If you want to know how I structure all this code, you can check out that article, either on my blog or on Medium.

Want to just see the code? You can check it out on GitHub here.

Front-end UI

The front-end is built in React using the structure I outlined in my React project structure blog. The TL;DR is that the code is broken up into Services, Types, Assets, Components and Styles, or STACS as the easy-to-remember acronym. The Services, Types, Assets and Styles all usually belong to the component, meaning each Component will have its own services/types/assets/styles files and/or folders. You can just take a look at the tree output to get a feel for the structure.

└── scrapeboard_fe    ├── package.json    ├── package-lock.json    ├── README.md    ├── src    │   ├── App.tsx    │   ├── Components    │   │   ├── Header    │   │   │   ├── Header.tsx    │   │   │   └── style.js    │   │   ├── ScrapeTable    │   │   │   ├── ScrapeTable.tsx    │   │   │   ├── Services    │   │   │   │   └── tableservice.ts    │   │   │   ├── style.js    │   │   │   └── Types    │   │   │       └── Scrape.ts    │   │   └── Stats    │   │       ├── services    │   │       │   └── scrapeservice.js    │   │       ├── Stats.tsx    │   │       └── style.js    │   ├── index.tsx    │   └── react-app-env.d.ts    └── tsconfig.json
11 directories, 20 files

The Header.tsx component contains a dead simple header that you can customize with logos, a custom name, etc. It’s the least interesting piece of the pie, so I’m going to leave the code out and you can just peek at it on GitHub.

The meat and potatoes of the whole shebang is in the Stats and the ScrapeTable components. The Stats component contains a simple counter that counts the amount of times that data has been scraped. It also contains a button that’s responsible for using the scrapeservice to actually hit the backend, grab the new data and update the state of the scrape count, which triggers an update of the scrape table.

The ScrapeTable functionally uses the scrape count variable as a trigger to update the table with new data. The logic is pretty simple, if the scrape count changes, chances are there’s new data to grab, right?

So, naturally, I implemented this in an incredibly inefficient way in that hitting the scrape button triggers a call to the backend to grab new data, which updates the count, which causes the scrape table to… make another call to the backend to grab the new data. I know, this would be a super easy fix, but I’ve got… a life, and this is intentionally a simplistic project, so I’m not going to fix that right now.

(On that same note, you’ll notice there’s a “Delete” column that isn’t implemented. And you know what? It won’t be implemented. Sue me.)

So, let’s peek a bit deeper, shall we? Not too deeply, though, because this structure is explained in my blog on organizing React code.

//Stats.tsximport React from 'react'import Style from './style'import ScrapeService from './services/scrapeservice'
export default function Stats(props : {scrapecount : number, setter : Function}) {  const [ss, setSS] = React.useState(new ScrapeService)    React.useEffect(() => {    async function setter() {      props.setter(await ss.getScrapes())    }    setter()  },[props.scrapecount])
  async function clicker(){    async function setter() {      props.setter(await ss.incScrapes())    }    setter()  }
  return (    <div style={Style.container as React.CSSProperties}>      <p>Scrapes: {props.scrapecount}</p><br/>      <button onClick={clicker}>Scrape again!</button>    </div>  )}// -----// services/scrapeservice.jsimport axios from 'axios'
class ScrapeService {    async getScrapes(){        let res = await axios.get('<http://127.0.0.1:8000/get_scrapes>')        return res.data.scrapes    }
    async incScrapes(){        let res = await axios.post('<http://127.0.0.1:8000/inc_scrapes>')        return res.data.scrapes;    }}
export default ScrapeService// -------// style.jsconst style = {    container : {        marginLeft:"auto",        marginRight:"auto",        fontSize: "24px",        textAlign:"center"    }}export default style;

The Stats component utilizes a servicer class that actually reaches out to the backend to increment the scrape count. We’ll run through the backend here in a second, but this servicer is what does the actual data fetching and manipulation, while the component itself just tracks state, calls the right servicer functions on state updates and button clicks, and returns the UI. The styles are all stored in Javascript files as well, and if there are types to be used, they’re implemented in type files.

//ScrapeTable.tsx
// ------// Types/Scrape.tstype Scrape = {    time:number,    temp: number,    desc: string}
export default Scrape;// -------// Services/tableservice.tsimport axios from "axios";import Scrape from "../Types/Scrape"
class TableService {    async getScrapes(){        let Scrapes : Array<Scrape> = [];        let res = await axios.get('<http://127.0.0.1:8000/get_scrapes>')        console.log('Res data: ',res.data)        let resarr : Array<{time : number, temp : number, desc : string}> = res.data['scrapelist'];        resarr.forEach((ele) => {            Scrapes.push({time:ele.time, temp:ele.temp, desc:ele.desc})        })        console.log('Number of scrapes: ',Scrapes.length)        return Scrapes;    }}export default TableService// -------// style.jsconst Style = {    table:{        width:"80%",        border:"1px solid black",        borderCollapse: "collapse",        textAlign:"center",        marginLeft:"auto",        marginRight:"auto",        marginTop:"30px"    },    tableRow:{        border:"1px solid black"    }}export default Style

The ScrapeTable component follows pretty much the same format, with a simple type file to boot. Again, not going to rehash the organizing methodology that I laid out in the other blog, but at the end of it all it looks like this with data loaded in.

Ugly, right? Well, luckily the backend is prettier.

Back-end API and “Database”

The backend is just as simple as the front.

The API is built in FastAPI, my new favorite backend framework that I wrote an intro blog for here and a benchmark against Flask here. TL;DR, it’s rad, faster than Flask and much prettier to write, but you should read those blogs because they go a bit more in depth and explain the basics of FastAPI a bit more than I’ll go into here.

The API reaches out to the OpenWeatherMap API which is a really easy to use (and free) API for fetching weather data. I’m only making use of the general weather description, time and temperature data, but it actually gives you a lot of really granular data. Also, I chose the weather for Chicago for literally no reason, so the latitude and longitude of Chicago is hardcoded in.

First, the code. There’s not much of it.

import timeimport requestsfrom fastapi import FastAPIfrom fastapi.middleware.cors import CORSMiddlewareimport jsonfrom dotenv import dotenv_values
app = FastAPI()config = dotenv_values(".env") API = config['API']
origins = [    "*"]
app.add_middleware(    CORSMiddleware,    allow_origins=origins)def getJS():    f = open('./data.json','r')    js = json.loads(f.read())    f.close()    return js
def setJS(js):
    f = open('./data.json','w')    f.write(json.dumps(js))    f.close()    return js
@app.get('/get_scrapes')def get_scrapes():    js = getJS()    scrapes = js['scrapes']    print(f'[-] Scrapes: {scrapes}')    return {'status':'success', 'scrapes':scrapes, 'scrapelist':js['scrapelist']}
@app.post('/inc_scrapes')def incScrapes():    t = time.time()    js = getJS()    js['scrapes'] = js['scrapes']+1    scrapelist = js['scrapelist']    lat = 41.8781    lon = 87.6298    url = f'<https://api.openweathermap.org/data/3.0/onecall?lat={lat}&lon={lon}&appid={API}&units=imperial>'    print(url)    res = requests.get(url)    with open('./tempdata.json','w') as f:        f.write(json.dumps(res.json()))    res = res.json()    temp = res['current']['temp']    desc = res['current']['weather'][0]['description']        scrapelist.append({'time':t, 'temp':temp, 'desc':desc})    js['scrapelist'] = scrapelist    setJS(js)    return {'status':'success', 'scrapes':js['scrapes']}

This simple app depends on some of our favorite libraries:

  • dotenv — importing the OpenWeatherMap API key in a way that’s supposed to keep me from pushing the key to GitHub, if I had remembered to add .env to my .gitignore… Don’t worry, I was able to kill the API key and deleted the .env file from the public GitHub.
  • JSON — Parsing JSON data to dump it to our “database” which is just a JSON file.
  • Time — Getting timestamps
  • Requests — Making web requests to the OpenWeatherMap API

Now for the routes!

  • /get_scrapes — This route will return the scrape count (js[’scrapes’]) and the actual array of scrapes (js[’scrapelist’]), which are pieces of data pulled out of our data.json file using the getJS() function.
  • /inc_scrapes — This route will actually trigger a call to the OpenWeatherMap API and store the data in the data.json file using the setJS() function.

That’s… really it. There’s some stuff at the top that deals with CORS for us, but the backend is intentionally simple. It doesn’t even use a database! It just pretends to with a JSON file.

Like I said, this is a massively oversimplified “scraping dashboard” but it’s a good way to demonstrate how quickly you can produce a “full stack” web application as a proof-of-concept, or just to practice and experiment with different techniques.

I hope this blog helped you learn a bit about building full stack web applications. If it did, hit me with a follow on Twitter or subscribe to my Ghost blog or Substack to get emails when my blogs go live or my monthly research newsletter goes out.