Dissertation: Twitter Sentiment Predicts Stock Price Movements

For my MEng dissertation at the University of York I came up with a question I wanted to answer: does the mood of Twitter on a given day tell you anything useful about whether Tesla's stock will close higher or lower the next day? The idea was to test whether plugging social media sentiment into a sequence model alongside historical price data could actually improve prediction accuracy.

Data Collection

Tesla was the obvious choice given how volatile it is and how loudly its investor community talks about it online. 9,435 tweets referencing Tesla's stock were collected via the Twitter API across an 11-month window from January to November 2018. Before anything else, each tweet was cleaned up to remove hashtags, usernames and URLs.

Historical daily closing prices for TSLA came from Yahoo Finance. Rather than feeding in raw prices, I used the day-over-day difference instead, which keeps the series stationary and better suited to an LSTM. Gaps from weekends and market closures were interpolated, and both series were normalised before being fed into the network.

Sentiment Classification

A Naive Bayes binary classifier was built using the Natural Language Toolkit (NLTK) and trained on a Twitter corpus of real positive and negative tweets. It reached 79.5% accuracy on the test set, which is pretty good considering humans only agree on tweet sentiment around 70-79% of the time. Each tweet was classified as positive or negative, and the percentage of positive tweets each day became a sentiment time series sitting alongside the price data.

A problem that came up was a strong negative skew in the sentiment data. On 88% of days, more than half of the tweets were classified as negative, leaving the daily values with very little variance. That probably limited how much useful information the sentiment series could actually give to the LSTM.

LSTM Modelling

LSTMs (Long Short-Term Memory networks) are a type of neural network designed to work with sequences. Unlike a standard network that processes each input on its own, an LSTM carries a memory state forward through time, which makes it a natural fit for something like stock prices where the order and history of values matters.

For each architecture tested, two LSTM networks were built. A baseline network took only the historical price series as input, and a Twitter network took both the price series and the daily sentiment series. Both were trying to predict whether the next day's closing price would be higher or lower than the current day's.

12 architectures were tested in total, with one or two hidden layers and 20, 30, or 50 LSTM neurons per layer. Each was also run with time lags of 3, 4, and 5, controlling how many consecutive days the network looks back at. Because weights are randomly initialised, each network was trained and tested 10 times and the mean accuracy was recorded. All models used an 80/20 train/test split, the Adam optimiser, binary cross-entropy loss, a batch size of 32, 1,000 epochs, and 20% dropout on each LSTM neuron to help with overfitting.

Results

The best accuracy across all experiments was 60.9%, from a baseline network with a time lag of 4. The best Twitter network at the same settings reached 60.3%. The gap between them was too small to mean much.

The more interesting pattern was at a time lag of 5, where most baseline networks fell below 50% accuracy, basically no better than guessing. The Twitter networks held up reasonably well under the same conditions. So sentiment seemed to act more like a safety net than a boost. It didn't reliably help networks that were already working, but it seemed to stop the worst failures from happening.

Standard deviations were high across the board, which makes sense given that 11 months of daily data isn't a lot to train a sequence model on. The training set also had a slight skew towards down days (6.5%), which probably explains why the networks tended to predict down more than they should. Across all the configurations tested, nothing was consistent enough to draw solid conclusions about sentiment.