NFL Longevity Predictor

Longevity Predictor for players in the NFL based on Combine Data

Final Project Checkpoint

We have cleaned all of our data and run neural network, support vector machine, random forest, and logistic regression models on our data, all with the intention of detecting how many games these players will play in their NFL career. We decided to use these models because research indicates that these models are most suitable for sports performance prediction (Şimşek, M., & Kesilmiş, 2022). One modification we made to our classification labels after our midterm report is that we decided to give our labels a range of the number of NFL games played by each player. Instead of plotting each individual’s number of games played, we assigned all players who played 0 to 50 games to one label, 51-100 games to another label, 101-150 to another label, and so on. This was the best decision for us because we had issues with accuracy since the models would try to predict the exact amount of games a player would play, but that is too precise of a value. A general range was much more accurately predictable for the model. We simply state that playing 0-50 games is a short career, 51-100 is a mid-length career, and so on. The data we are working with includes all players drafted from 2000-2012. The reason we use this range is so that the players we are running our models on have been in the NFL for a long period of time and not just starting their career. This is important because we have a better understanding of the entire career of a player, and exactly how long they last in the NFL, allowing us to make more accurate predictions about new draftees. When cleaning data, we focused on a few facts. First, we took out players that only had two or less stats reported, meaning they only participated in at most two events at the combine. We took these players out because they do not provide us enough data to make accurate predictions. If players participated in more than two events but were still missing an event or two, we predicted the results for these events using the average by position for each test. For example, if a running back was missing the 40 yard dash, we assigned the average 40 yard dash of the running back group to them as their 40 yard dash value. This is how we dealt with missing features. We also ran PCA on our features in order to reduce the dimensionality we are working with. We did this to use only the features that affected our results the most so that our models ran faster and they were easier to visualize on a graph. Second, we added an attribute of position to each player. We did this because each position has different combine statistics on average, and some positions do not run certain events. For example, quarterbacks do not bench. Therefore, we grouped quarterbacks together so that we cound ensure that bench press did not have an affect on their overall predicted years in the NFL. Similarly, a longer 40 yard dash time should not affect the output for a lineman as much as it should for a wide receiver, because linemen do not have to be as fast. Labeling by position helped out with that. Once our data was cleaned, we ran our models on it using google colab. Our models had an accuracy as so: SVM: 45% accuracy, Random Forest classifier: 38% accuracy, Logistic regression model: 44% accuracy, Neural network: 77% accuracy. Neural network outperformed all of the other models because the data was quite complicated and noisy. It was difficult to predict with regression and was certainly not linearly separable. The neural network better at generalizing the data and accurately predicting games played with few errors. This was partially due to us having enough training data to make an effective neural network. While SVM and Random forest typically work better with fewer input data, neural networks outperform the two when enough data is present, which is what happened in our case.

Using this platform, we wil be able to predict the time that the next draft class will spend in the NFL. The metrics that we will use to show this are how many games they will play, and the time that they will play. We can then use these to predict what players we would want to draft based on what the stats show. Looking into how we can apply that today, we can look at how players that have recently been drafted, such as George Pickens, Kenny Pickett, and Breece Hall, will sustain in the current NFL given how well they did in the combine. Looking into the future, we should be able to quantify how many seasons a player like Jeff Sims will last once he does his draft and see if he looks like someone worth drafting. Just for fun, we also did some of the combine events ourseves and ran the models on our data. We determined that all of us will be in the 0-50 game bucket, but probably more towards the zero side. Considering only one of us could get a singular rep of 225 for bench press, we should probably stick to school.

Contribution Table

Ahmed: Ran NN models and Random Forest on Jupiter Notebook and tuned parameters
Shonjoy: Ran SVM models on Jupiter Notebook and wrote final report
Joe: Analysed and cleaned data to be used for ML projects and made final presentation video
Shantanu: Scraped data from the web from mentioned sources and ran linear regression model

Midterm Project Checkpoint

Currently, we have finished cleaning all of our data and have run a neural network, support vector machine and deep learning model on our data, both with the intention of detecting how many games these players will play in their NFL career. We decided to use these models because research indicates that these models are most suitable for sports performance prediction (Şimşek, M., & Kesilmiş, 2022). The data we are working with includes all players drafted from 2000-2012. The reason we use this range is so that the players we are running our models on have been in the NFL for a long period of time and not just starting their career. This is important because we have a better understanding of the entire career of a player, and exactly how long they last in the NFL, allowing us to make more accurate predictions about new draftees. When cleaning data, we focused on a few facts. First, we took out players that only had two or less stats reported, meaning they only participated in at most two events at the combine. We took these players out because they do not provide us enough data to make accurate predictions. If players participated in more than two events but were still missing an event or two, we predicted the results for these events using the average by position for each test. For example, if a running back was missing the 40 yard dash, we assigned the average 40 yard dash of the running back group to them as their 40 yard dash value. This is how we dealt with missing features. We also ran PCA on our features in order to reduce the dimensionality we are working with. We did this to use only the features that affected our results the most so that our models ran faster and they were easier to visualize on a graph. Second, we added an attribute of position to each player. We did this because each position has different combine statistics on average, and some positions do not run certain events. For example, quarterbacks do not bench. Therefore, we grouped quarterbacks together so that we cound ensure that bench press did not have an affect on their overall predicted years in the NFL. Similarly, a longer 40 yard dash time should not affect the output for a lineman as much as it should for a wide receiver, because linemen do not have to be as fast. Labeling by position helped out with that. Once our data was cleaned, we ran our models on it using google colab. Currently, we are at 72% R2 score accuracy on our neural network predictions of our test data. While our SVM and deep learning models are not there yet, we are still testing them and doing feature reduction to increase these percentages.

Once we have this relationship, we should be able to give an estimate of a new player in the draft. The metrics that we will use to show this are how many games they will play, and the time that they will play. We can then use these to predict what players we would want to draft based on what the stats show. Looking into how we can apply that today, we can look at how players that have recently been drafted, such as George Pickens, Kenny Pickett, and Breece Hall, will sustain in the current NFL given how well they did in the combine. Looking into the future, we should be able to quantify how many seasons a player like Jeff Sims will last once he does his draft and see if he looks like someone worth drafting.

Contribution Table

Ahmed: Ran models on Jupiter Notebook and tuned parameters
Shonjoy: Ran models on Jupiter Notebook and wrote midterm report
Joe: Analysed and cleaned data to be used for ML projects
Shantanu: Scraped data from the web from mentioned sources

Project Proposal

Our project is to make a model that can predict how long a player can play in the NFL using data from the NFL combine. There has been research that has concluded that it is possible to predict player performance and career longevity using past results from the NFL Combine (Asprey & Foley)(Vincent, 2019). The NFL combine is one of the most significant tools used by scouts today to help determine who they should draft for their team. They administer a series of physical tests to each player in order for them to create a scouting report to help determine if they are qualified for the position (Hedlund, 2018). Over time as the NFL grew, there have been questions on whether a player’s performance in the Combine has a correlation to how well they succeed and last in the league (LaPlaca, 2020). If that correlation does exist, is it possible to use the players’ Combine reports to determine how new players coming into the NFL will fare (Pollock, 2021). Our goal is to create a Combine predictor to determine the future of the new NFL class. We want to be able to look at these young players and predict how their careers will pan out and hopefully quantify their fitness to assist with NFL scouting. This can help us pick players we believe will stay in the league long and do well so teams can get a good return on investment. The data set we are working with tracks every player’s Combine stats, such as 40-yard time and verticals in the last 35 years. We then use that dataset to compare it with the same players’ NFL stats focusing on those that prove sustainability and success, such as snap count, years played, games played, and pro bowls. The methods we were thinking of applying included SVM (Support Vector Machines) using classification and regression since research indicates that this model is suitable for sports performance prediction (Şimşek, M., & Kesilmiş, 2022). Some of the results we are looking for are to create a relationship between players scouting reports to their pro bowls and longevity. Once we have this relationship, we should be able to give an estimate of a new player in the draft. The metrics that we will use to show this are the number of Probowls that a new player will probably receive, how many years they will play, and the time that they will play. We can then use these to predict what players we would want to draft based on what the stats show. Looking into how we can apply that today, we can look at how players that have recently been drafted, such as George Pickens, Kenny Pickett, and Breece Hall, will sustain in the current NFL given how well they did in the combine. Looking into the future, we should be able to quantify how many seasons a player like Jeff Sims will last once he does his draft and see if he looks like someone worth drafting.

References

Asprey W L, Foley B M, Makovicka J L.A 10-year evaluation of the NFL combine. Do combine results correlate with career longevity for NFL offensive players? Phys Ther Rehabil 202078doi: 10.7243/2055-2386-7-8 [Google Scholar]
Vincent LM, Blissmer BJ, Hatfield DL. National Scouting Combine Scores as Performance Predictors in the National Football League. J Strength Cond Res. 2019 Jan;33(1):104-111. doi: 10.1519/JSC.0000000000002937. PMID: 30358695.
Şimşek, M., & Kesilmiş, İ. (2022, January 1). Predicting athletic performance from physiological parameters using machine learning: Example of Bocce Ball. Journal of Sports Analytics. Retrieved October 8, 2022, from https://content.iospress.com/articles/journal-of-sports-analytics/jsa200617
Pollock, Jordan Riley et al. “Can NFL Combine Results be Used to Estimate NFL Defensive Players Longevity?.” Sports medicine international open vol. 5,2 E59-E64. 10 Aug. 2021, doi:10.1055/a-1485-0031
LaPlaca, David A, and Bryan A McCullick. “National Football League Scouting Combine Tests Correlated to National Football League Player Performance.” Journal of strength and conditioning research vol. 34,5 (2020): 1317-1329. doi:10.1519/JSC.0000000000003479
Hedlund, David P. “Performance of Future Elite Players at the National Football League Scouting Combine.” Journal of strength and conditioning research vol. 32,11 (2018): 3112-3118. doi:10.1519/JSC.0000000000002252

Project Timeline Chart and Task Assignment

https://docs.google.com/spreadsheets/d/12l_n6g7ghG1ML6eQuEH1AFo9VHdABnvW6aZxb_c0lxE/edit?usp=sharing

Contribution Table

Ahmed: Created Github Page and did literature review
Shonjoy: Created Proposal Video
Rohan: Researched Potential Datasets for Model and worked on Proposal Write Up in collaboration with Shantanu
Joe: Created Gantt Chart to organize Project deadlines
Shantanu: Researched Potential Datasets for Model and worked on Proposal Write Up in collaboration with Rohan

Project Proposal Video

https://youtu.be/eD4rLBb4FOw

Dataset(s)

https://nflcombineresults.com/nflcombinedata.php?year=1987&pos=&college=