Return to site

10-K MD&A Section May Not Be Indicative of Stock Performance

· Academic Research,Stock Market,Stock Trading

10-K MD&A Section May Not Be Indicative of Stock Performance

The 10-K is a financial report that every company must file to the U.S. Securities and Exchange Commission (SEC) every year in order to detail various aspects of their financial performance.

The Management’s Discussion and Analysis (MD&A) section in the 10-K is of particular interest because it allows company executives to analyze performance and express their thoughts and opinions that are not found in other sections of the 10-K (Hargrave, 2020).

Using natural language processing (NLP) vectorization and sentiment analysis, it was found that MD&A section from the 10-K of the previous year had almost no correlation to company earnings of the following year.

The MD&A sections were extracted from the 10-K filings of S&P 500 companies. Since each entry from the 10-K was only identified by their Central Index Key (CIK) number, each entry was iterated through a key to match their CIK with their respective stock ticker.

Then, using the Yahoo Finance API, the annual return of each entry was calculated using the closing price of the stock on December 31st of the next year subtracted by the closing price of the stock on January 1st of the next year.

First, the Tfidftransformer and Tfidfvectorizer from Sci-Kit Learn were used to extract principal features from the MD&A sections. Each MD&A section was split into a list of individual words, and the stop words were removed and each word was stemmed. Then, each MD&A section was converted into strings and vectorized.

The dataset was split 80% for training and 20% for testing.

Using the Naïve Bayes Classifier, which is a classifier based on conditional probability with an additional “naïve” assumption of conditional independence (Pedregosa et al., n.d.), the model predicted with 69.5% accuracy and an F1 score of 0.820 using the testing set whether a company would be profitable or unprofitable in the following year based on features of the previous MD&A.

However, the trained model was not able to predict any “Unprofitable” labels.

In order to fix the issue, duplicate words that were common to all of the lists of words were removed. However, there were no common words that appeared in all MD&A entries after the removal of stems and stop words; the last few results are shown below.

Therefore, the most frequent 1,000 words were removed from all MD&A entries to make the vectorized features more unique. However, the new model was still unable to predict “Unprofitable” labels.


{'the', 'sale', 'due', 'cost', 'includ', 'In', 'product', 'oper', 'term', 'primarili', 'chang', 'relat', 'market', 'year', 'provid', 'signific', 'expens', 'addit', 'new', 'result'}


{'addit', 'In', 'primarili', 'the', 'oper', 'relat', 'market', 'new', 'provid', 'signific', 'cost', 'result', 'includ'}


{'addit', 'In', 'primarili', 'the', 'oper', 'relat', 'market', 'new', 'provid', 'result', 'includ'}


{'the', 'oper', 'new', 'relat', 'market', 'result'}


{'the', 'oper', 'new', 'relat', 'result'}


{'result', 'the', 'oper', 'relat'}


{'result', 'the', 'oper'}


{'result', 'oper'}






Subsequently, sentiment analysis was conducted to determine if the sentiment of the MD&A sections could be utilized to predict stock returns of the next year.

Using TextBlob, each MD&A entry was assigned a sentiment score, with -1 being the most negative and 1 being the most positive sentiment.

Then, four classifiers were trained: K-nearest neighbor, decision tree, SVM, and neural network classifiers. Some of the classifiers were able to predict “Unprofitable” labels, but they performed worse in terms of accuracy and F1 score.

To further understand how to improve the model, the relationship between the sentiment scores and annual returns was analyzed. Through plotting 100 subsamples, 1000 subsamples, and all of the samples, it was discovered that the sentiment scores from the MD&A section had almost no correlation (R-squared of 6.085x10-5) with the annual returns.

In essence, it was found that natural language processing (NLP) vectorization and sentiment analysis did not have very robust predictions of next year’s returns based on the MD&A section of the respective 10-K filings, potentially arising from both the subjective and unaudited nature of the MD&A as well as the lack of correlation between the MD&A and annual returns.

Consequently, the content of the MD&A in the 10-K filings may not be as indicative of stock performance as previously imagined.

Written by Calvin Ma

Edited by Alexander Fleiss, Gihyen Eom, Jared Nussbaum, Kevin Ma, Rohan Mehta, Serena Yu & Michael Ding


Hargrave, M. (2020, July 2). Why You Should Read a 10-K's Management Discussion and Analysis (MD&A). Investopedia.

Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., … Duchesnay, É. (n.d.). Naive Bayes. Scikit Learn.

All Posts

Almost done…

We just sent you an email. Please click the link in the email to confirm your subscription!