Data data science, modeling, deep learning
I love working with data. Here’s some of my favorite projects I’ve done in various classes and workshops.
The Modal Voter: GenData 2020 Final Project
For my final project as part of the Generation Data Intro to Progressive Data training, I applied statistical methods I learned in one of my favorite Planet Money episodes (“The Modal American”) to our mock campaign data.
I wrote SQL queries to break voters into demographic “buckets” based on generation, party, sex, and race, and then used Tableau to visualize patterns of voter behavior.
This presentation was selected for the final showcase (one of 6 out of 20).
Generation Data does amazing work training data folks for progressive campaigns. You can donate to support them here.
Classifying Building Types: Image Classification for Colloquial Architecture
My final project for the Deep Learning course at Harvard in fall 2020 was an image classifier for common American houses as defined by Virginia Savage McAlester in A Field Guide to American Houses.
I trained a VGG16 image classification model in Keras on a custom dataset. I leveraged transfer learning, image augmentation, dataset balancing, and fine tuning to classify 10 common styles of houses at 86% accuracy.
Flipper: Real Estate Classifier
Flipper was a final group project for a course I took with Zona Kostic in summer 2020. My team had access to a subset of Greater Boston MLS (real estate) data from the past 10 years. Our goal was to determine whether a property had flip potential without using listing photos or descriptions, and discern which predictors contributed most to a given listing’s flippability. This was a classic data science problem - no deep learning required.
I owned the data science portion of the project and used a random forest classifier, training the model on past flips we found in the listings augmented with SMOTE to create a more balanced data set. You can browse the full notebook here.
Getting the model to make accurate predictions using listing metadata was a challenge, but we were able to classify non-flippable properties with ~98% accuracy and classify flippable properties with just under 70% accuracy. Our team was the runner up for best final project with the Flipper model and site.
Collaborators: @gregfrasco and @goudete.