From The Hardball Times: A Decision Tree Approach to Pitch Prediction

“A Decision Tree Approach to Pitch Prediction” is an article about the use of decision trees to predict pitch type using PITCHf/x data. The problem here is that the bar is set very high to produce useful results. For all pitchers, the first obvious step for predicting pitch type is to look at pitch percentages. If a pitcher throws fastballs (2 or 4 seamers, not cut) 65% of the time, then this sets the acceptable rate of prediction for any method. There are some good comments at the end of the article that emphasize this point. I left the following comment:

‘This is a very interesting and well written article. It shows that there is a lot of value in PITCHf/x data. I have a few suggestions for refining your research.

1. I second Matt’s suggestion: “On another note, one possible way to improve the error rate of the model would be to default to fastball unless the alternative had a large enough sample and high enough percentage of non-fastball use. While it looks like you already did that to a certain degree, raising the minimum requirements for a non-fastball guess could improve the accuracy (even if it means the model predicts fastballs 80% of the time even when we know the pitcher only throws fastballs 60% of the time).” Such a threshold will also help with the overfitting problems.

2. You should also consider using a random forests algorithm. This is an ensemble of decision trees. There are free packages for this algorithm available in various computer languages. Also, Wise.io (http://www.wise.io/) has a web based implementation (note that it is not free).

3. I suspect that your modeling of batters faced is causing problems. You may want to use something simpler such as batter’s place in the order (top third, middle third, bottom third), a category of pinch hitters, and discard pitchers from the data.

4. For situations in which your model does not meet the threshold mentioned in #1, you should consider using another random forests algorithm with this data and shallow trees and less refined attributes. The goal here is to generate predictions that are statistically relevant yet better than using pitch % data.’

For those who have knowledge of machine learning algorithms and baseball, there are some obvious methods that may be more promising as well as improving the analysis of the author. I have a project using machine learning for baseball stats that has unfortunately been relegated to the back burner due to other machine learning projects that consume most of my time. However, this article gives me some new ideas to pursue.

Below is an excerpt from the article.

‘The examples I’ve mentioned suggest that the patterns are there, and that if we look hard enough, I think we’ll find them.

We also have a ton of information that gives us the appropriate context to identify the patterns we’re looking for. The PITCHf/x database contains records for pitch selection, situational information, events preceding a pitch, and pitch outcomes. We can’t really ask for much more than that.

All of this information initially can seem overwhelming. Using it to identify patterns by hand would be time-consuming, and we would end up missing things.

Sometimes the patterns we see are obvious, as was the case with Greinke. But what if a pitcher is extremely predictable in a situation we aren’t prone to notice? How often do you think Greinke throws fastballs in two-strike counts when he’s just thrown a breaking ball and there’s a runner on third?

Instead of doing things by hand, we can use a model to do our pattern recognizing for us. This model should be flexible, allowing us to throw in many different bits of information, and it would use the most important factors we provide it with to make predictions. Once we have this model in place, we’ll show it a bunch of data specific to one pitcher. The model will arrange the data in the way that best predicts the next pitch to be thrown.

After asking around*, I decided to work with a decision tree. Decision trees are great at taking a bunch of data, picking up on trends, and displaying the data in a way that allows its viewer to follow these trends.

One clear benefit to a decision tree, as opposed to other machine learning techniques, is that its mechanics are pretty easy to understand. The data start at the top the tree and get filtered through the tree’s branches. At each level, the tree sorts the data through various yes/no questions as it refines its prediction. The most important questions are asked at the top of the tree, and the questions asked toward the bottom refine the tree’s initial guesses.

A quick example: Let’s say the first branch of Greinke’s tree is the handedness of the hitter he’s facing. If the hitter is a righty, Greinke’s overall pitch distribution changes a bit. He doesn’t throw his change-up much (about 4.5 percent of the time, vs. 12 percent overall**), and he throws his slider more often. Against lefties, the opposite is true. After filtering the data through this first branch, our guess improves as we move from his overall distribution to his handedness-specific distribution.

Another advantage to the decision tree is that it doesn’t allow useless information to skew our results. For instance, if I included jersey color as a variable, the model’s suggestions wouldn’t change, and the important patterns still would be doing the predicting.

If the model was reliable, you could put it to use right away. A big league coach, with a single sheet of paper in his hand, could follow the game and signal in pitch guesses if the situation calls for it. Pretty cool, right?‘

The entire article can be read here.

From The Hardball Times: A Decision Tree Approach to Pitch Prediction

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112