Hi, my name is Hailey Muñiz and this is Stockwell.
00:03
It's a project where I am using machine learning to measure the financial of, S&P 500 companies.
00:08
Now this is my first machine learning project and I want to explore whether we can predict a company's near future financial stability using a simple transparent metric.
00:17
Now the problem is that, people like investors and the public often struggle to understand if a company is financially healthy just by looking at raw stock data.
00:28
A lot of existing scoring systems are private or too complex.
00:32
So my question is, can we use financial indicators and machine learning to create a clear center for financial health scoring these companies?
00:40
this is using indicators like the EBITDA revenue growth, market capitalization and current stock price.
00:49
Now going into the methodology and some documentation for the data sourcing, I use The S&P 500 data set from Kaggle made by Larxel.
00:59
Now it's a clean structured and it contains, financial variables like EBITDA revenue growth, market cap and current stock price.
01:06
right away I noticed that some of the variables, were very different from the scale and that they were skewed, especially with a lot of them having outliers.
01:16
As well just add to this when I'm looking at this model, I understand that for EBITDA it has a large mean compared to the median because the data is right skewed.
01:25
This is caused by companies like Apple, Microsoft and Amazon since they have a larger EBITDA numbers that weighs the average.
01:31
Where Most S&P 500 companies have an EBITDA that's closer to the median of 3 billion dollars for the revenue growth.
01:38
similar to the EBITDA, the mean is also right skewed and Most companies grow 5% a where the mode allows us to understand that most companies usually go at 1.3%.
01:49
Now for the current price the mean is double the price because my mean is 217 dollars where the median is where the median is 118 dollars which shows me that some stock prices are so expensive that it causes the mean to double as well.
02:04
And then for the market cap there are several values because the market cap is continuous so the mode isn't as meaningful for this where this large gap shows how some really big companies influence the S&P 500 as well.
02:20
The median is a more accurate representation of an average company in the market cap.
02:27
Now I'm going to go into the cleaning and features where I filled in missing values at the median for the EBITDA revenue growth and as well for full-time employees.
02:38
I also made sure to drop the columns long name, short name, city, country, exchange and state since they aren't what I need for the model as well I made sure to convert the sector industry to numeric values before modeling using one hot encoding.
02:58
Now the next thing I made is the Financial Health score which is inspired by the Altman Z score which is a classic metric for predicting bankruptcy.
03:07
Now I designed this to make it a single easy metric that I could use.
03:13
Now the metric I use or the feature that I use is EBITDA which measures the overall probability of revenue growth to show the growth potential.
03:21
Current price which is the valuation and market cap which is the size and stability.
03:26
Now based off the data I looked at earlier within the split random variation as well as the mean and mode, I made sure that it was really important for me to get the standard deviation of each of these.
03:37
And I combined all of the features I used up EBITDA, revenue growth, current price and market cap which is what gives me my financial health score.
03:46
Now a stronger financial health score means that the financial health of a company is doing well and this works vice versa as this serves as the target for my random forest regressor model that I'll go more into as well.
04:04
This works because it provides a comparative measure across companies as well and simplifies analysis by capturing multiple dimensions of financial health.
04:16
Next, for my model I chose a random forest regressor.
04:19
Now I picked this model because it handles nonlinear relationships as well.
04:24
It's really great for handling messy financial data as well.
04:27
It's resistant to overfitting on smaller datasets.
04:29
I am working with a Fortune 500 data set from Kaggle as well.
04:35
I'm looking at four features.
04:37
So for me I made sure to split the data set into training and test sets.
04:40
The features I'm looking at is the EBITDA revenue growth, current price market cap and I made sure that they are standardized as well.
04:46
For the target is to get the financial health score from the formula that I showed in the past slide.
04:52
Now some evaluation metrics is the mean of absolute error.
04:56
This is the average prediction error, the root mean squared error which penalizes large errors and the R square score which is the proportion of variance explained by the model.
05:07
Now going on to some of the model implementation analysis.
05:10
Now as I said, three random forest models, all of them perform similarly with the Mae around 0.87 and 0.92, the RMSE around 2.15 and the R square around 0.55.
05:23
Now the best model was the simplest one with 100 trees that had a max depth of 4.
05:28
Now this shows that first dataset complexity didn't improve performance as are some key findings I have is that the biggest takeaway is that market capitalization is by far one of the most important predictors with over 90% of the features.
05:42
Importance being from this.
05:44
This suggests that company size is a strong indicator of stability as well as short term measures like revenue growth, EBITDA and stock price.
05:52
Matters a bit less when we're looking at the financial health score.
05:55
Long term of a company.
05:58
Now looking at some recommendations and limitations based on the models results, organizations, analysts or anyone using the financial health score approach would prioritize the company size like market capitalization when assessing financial stability.
06:11
Because the market cap was the strongest predictor of a company's long term standing, it will give a more reliable estimation of future health compared to short term financial movements like daily price changes.
06:21
Now some limitations of this is the fact that the data set is small which limits the model's capability to assess financial performance as well some of the features are limited where you can also look at things like debt ratios, liquidity, ratios, cash flow history and macroeconomic conditions.
06:39
this was not included in the model.
06:41
As well the market cap dominate the model where predictions may lean too heavy on company size as though the data set only included S&P 500 companies where the model is based on large firms.
06:54
Now these findings may not apply to smaller emerging companies whose financial health patterns can definitely behave differently.
07:01
Now users should be conscientious and aware not to overgeneralize the results or assume that a company size alone by itself will ensure financial stability in the market.
07:15
Now some next steps that can be taken is that the future work of this could be adding wider set financial metrics that will improve prediction accuracy.
07:23
Especially adding like liquidity, profitability and historical financial trends as well.
07:29
Having more historical data will help capture the model's variability.
07:33
Also to compare different models like gradient boosting, neural networks or additional models to see if they should remain consistent.
07:41
I think out of all the models that I definitely like to add is a gradient booster model or GXB boost.
07:48
lastly to explore scenario based predictions like market downturns, interest rate changes or shifts in revenue growth could affect future health score predictions.