Webscraping Concert Data

As a side project over my winter break from school, I decided to write a script that would take the legwork out of a chore I do every week: figuring out what DC-area concerts should be on my radar. Checking the websites for the different venues is a task perfect for automation given the repetition of a set number of steps: check the artist, check the date, check the price, see if it looks interesting.

The script I wrote checks four venues: DC9, the 9:30 Club, Black Cat, and Songbyrd. It scrapes data on main act, opener, price, doors, and date. Then each main act is searched on Spotify (thanks to the API wrapper package spotipy) and data on genre(s) and related artists is pulled down. From there, I write a couple of functions that clean up the data a bit and allow me to query it to narrow the list of concerts down to those meeting my set criteria for what shows I want to go to. Finally, I export the total list and my preferred shows to .csv files to share with friends.

There are a few limitations here; genre and artist information is based on what is available on Spotify only. Ideally, for cases where this data isn’t available, I’d have another source for at least identifying genre. One option would be to use selenium/a webdriver to Google the artist and try and parse genre information from search results, a feature I can add in the future. Additionally, the design of some venues’ sites makes accurate data collection more challenging. For example, some header classes serve multiple purposes, either containing information about opening acts, or venue changes, or ticket availability. Without some NLP element, it would be difficult to consistently and across the board parse through which information is which.

That being said, the script runs in under ten minutes (Songbyrd doesn’t keep price data on their site, so accessing it externally for each show takes more time) and has clued me into shows I may be interested in that I otherwise would have missed faster than emails from music apps or venues themselves can alert me.

Link to script on GitHub

Leave a comment