PITCHf/x Introduction
PITCHf/x is an increasingly used tool in MLB, but daily gamers may not be entirely sure how to apply it to pitcher evaluations and breakout identification. There are a number of sources for PITCHf/x information — many free — and knowing where to get reliable info is a critical starting point. Baseball Prospectus, Brooks Baseball, FanGraphs, and Texas Leaguers all provide free PITCHf/x data. Of that quartet of sites, the only one I don’t visit regularly is Texas Leaguers. That’s not to say there’s anything wrong with the site (there’s not), and it’s great to have bookmarked as a source of comparison to the PITCHf/x data found elsewhere.
Depending on the source of PITCHf/x data, there can be differences in pitch type classification. Texas Leaguers, as their site notes, gets their data directly from the Gameday data published by MLB Advanced Media (MLBAM) and leaves it unchanged. Brooks Baseball explains their data in-depth on their “About” page. To summarize, their dataset starts with the PITCHf/x data made publicly available by MLBAM and Sportvision, but they more rigorously review that information manually and make changes when necessary. The result is that MLBAM may classify a particular breaking ball as a slider and Brooks Baseball may change that classification to a curveball, for instance. The changes can add up over the course of a season.
Using National League Cy Young Award Winner Jake Arietta as an example, here’s his Texas Leaguers landing page with a PITCHf/x data date set for last season, and here is his Brooks Baseball landing page using the same time frame. Texas Leaguers credited Arietta with throwing 1,335 sinkers and Brooks Baseball credited him with throwing 1,574. That’s a gap of over 200 pitches. Well, Brooks Baseball credited him with throwing only 268 fourseam fastballs and Texas Leaguers classified 581 of his pitches as fourseam fastballs (as an aside, 13 pitches were classified as simply “fastball” at Texas Leaguers). The differences aren’t as dramatic between his slider total (1,085 at Texas Leaguers and 1,065 at Brooks Baseball), but the differences extend to each of the pitches in the righty’s repertoire. One of the reasons I prefer use Brooks Baseball instead of Texas Leaguers is due to their usage of video and other tools to manually adjust pitch classifications. For this lesson, I won’t be referencing Texas Leaguers again, but it’s good for gamers to know the site is available for PITCHf/x data.
Baseball Prospectus has a lot of their work behind a paywall for subscribers only, but the good news is that the PITCHf/x leaderboards are freely available — even to non-paid users. The leaderboard allows users to sort through a ton of information and parse to their heart’s desire. The data can be showcased for pitches thrown to lefties and righties, only lefties, and only righties. It can be broken down by all years or a specific year, and it can be broken down even further by selecting the month of pitches you want displayed. The default leaderboard opens by featuring fourseam fastballs, but PITCHf/x data is also available for other pitches as well.
For the sake of daily gaming, the fact the leaderboards can be sorted by starting pitchers, relief pitchers, or both probably doesn’t seem important, but it’s worth noting that if a pitcher worked as both a starter and reliever, selecting starting pitchers will show only that pitcher’s work as a starting pitcher and not include pitches thrown in relief. That’s notable. For example, Kris Medlen made 15 appearances for the Royals in 2015 and only eight were starts. Clumping all of his pitches together and not differentiating between pitches thrown as a starter and those thrown as a reliever can result in drawing misleading conclusions. Baseball Prospectus also allows users to set pitches thrown minimums starting as low as zero and reaching as high as 3,000. Setting pitch minimums can help weed out small sample sizes. Within the table, there are quite a few sortable column options, but the ones I primarily use are velocity, whiff/swing (Whf/Sw), and ground ball/ball in play (GB/BIP). Those are the categories I’ll reference when highlighting the pitchers I expect to collapse, breakout, and surge in strikeout rate in 2016 in this lesson.
FanGraphs has their own PITCHf/x leaderboards and PITCHf/x data available on each player’s individual player card, too. At FanGraphs, I look at PITCHf/x data pertaining to a pitcher’s entire body of work instead of broken down by pitch type. Some of the key stats I’ll analyze and reference in this lesson include O-Swing% (swing percentage on pitches outside of the strike zone, and I may also refer to it as chase rate since it refers to hitters chasing pitches out of the zone), Z-Contact% (percentage of contact on pitches in the strike zone), Zone% (percentage of pitches thrown in the strike zone), F-Strike% (first-pitch strike percentage), and SwStr% (swinging strike percentage). I’ll also utilize the batted ball data provided by FanGraphs in this lesson. Again, the data referenced from FanGraphs will be for all pitches thrown by a pitcher and not broken down by specific pitch type.
Bonus: I won’t specifically discuss Coors Field. However, I would advise bookmarking a Baseball ProGUESTus piece written in April of 2013 by Dan Rozenson. His article analyzes pitch info from Coors Field and compares it to info from the other MLB ballparks. He compares the outcomes of pitches between Coors Field and the rest of the majors, and the information can be helpful when trying to determine which pitch mixes, and thus which pitchers, will play best and worst in Colorado.