What is this?
It's a spin-off of a final project for a class I've taken. The class was an intro to Python class, and we were able to choose from a few different types of projects to expand upon. I chose to expand upon my web scraping project. The final project that I actually turned in is a command-line utility that will generate an HTML report based on the locations and date range that a user chooses. That part of the project will find it's way into a different post.
This part of the project is the backbone of the project and is the result of lots of testing to make sure that the data made it from Weather Underground's website to a pandas data frame as seamlessly as possible. Enjoy!
Visualizing Scraped Weather Underground Data
Using matplotlib & pandas and written in Python 3.5
by Jacob Paul
Scraping and parsing scripts by Randy Olsen and FiveThirtyEight and modified by me
Github for scraper and parser: https://github.com/fivethirtyeight/data/blob/master/us-weather-history/
import csv
import matplotlib.pyplot as plt
import pandas as pd
import wunderground_parser
import wunderground_scraper
#date_range = {'fromYear': int, 'fromMonth': int, 'fromDay': int, 'toYear': int, 'toMonth': int, 'toDay': int}
date_range = {'fromYear': 2014, 'fromMonth': 1, 'fromDay': 1, 'toYear': 2014, 'toMonth': 12, 'toDay': 31}
#wunderground_scraper.scrape_station("KOAK", date_range)
wunderground_parser.parse_station("KOAK", date_range)
station1_df = pd.read_csv("KOAK.csv", parse_dates=['date'])
A chart like the one below displays the actual temperature and average temperature in Oakland, CA over 2014. We can see here that there were a few times during the year where the temperature was significantly above or below average.
plt.figure(figsize=(20,10))
plt.plot(station1_df['date'], station1_df['actual_max_temp'], color="Orange", label="KOAK High Temp")
plt.plot(station1_df['date'], station1_df['average_max_temp'], color="Orange", linestyle='--', label="KOAK Avg. Temp")
plt.legend(bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0.)
plt.xlabel("Date")
plt.ylabel("Temperature in Degrees F")
plt.show()
A use for a this data is visually comparing metrics between two cities. Here I'll do that for Colorado Springs and Oakland:
#First we must scrape and parse Colorado Springs weather data to include it in the chart
date_range = {'fromYear': 2014, 'fromMonth': 1, 'fromDay': 1, 'toYear': 2014, 'toMonth': 12, 'toDay': 31}
wunderground_scraper.scrape_station("KCOS", date_range)
wunderground_parser.parse_station("KCOS", date_range)
2014-01-01 00:00:00
2014-02-01 00:00:00
2014-03-01 00:00:00
2014-04-01 00:00:00
2014-05-01 00:00:00
2014-06-01 00:00:00
2014-07-01 00:00:00
2014-08-01 00:00:00
2014-09-01 00:00:00
2014-10-01 00:00:00
2014-11-01 00:00:00
2014-12-01 00:00:00
station2_df = pd.read_csv("KCOS.csv", parse_dates=['date'])
While it is fun to look at avearage high temperatures against actual high temperatures, the real power in this data comes when we compare two very difference locations. In the case of this chart, we compare Colorado Springs, CO and Oakland, CA.
plt.figure(figsize=(20,10))
plt.plot(station1_df['date'], station2_df['average_max_temp'], color="Orange", label="KCOS Avg. Temp")
plt.plot(station1_df['date'], station1_df['average_max_temp'], color="Red", label="KOAK Avg. Temp")
plt.legend(bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0.)
plt.xlabel("Date")
plt.ylabel("Temperature in Degrees F")
plt.show()