Clean Play by Play Data

clean_pbp(pbp)

Arguments

pbp

is a Data frame of play-by-play data scraped using fast_scraper.

Value

The input Data Frame of the paramter 'pbp' with the following columns added:

success

Binary indicator wheter epa > 0 in the given play.

passer

Name of the dropback player (scrambles included) including plays with penalties.

rusher

Name of the rusher (no scrambles) including plays with penalties.

receiver

Name of the receiver including plays with penalties.

pass

Binary indicator if the play was a pass play (sacks and scrambles included).

rush

Binary indicator if the play was a rushing play.

special

Binary indicator if the play was a special teams play.

first_down

Binary indicator if the play ended in a first down.

play

Binary indicator: 1 if the play was a 'normal' play (including penalties), 0 otherwise.

passer_id

ID of the player in the 'passer' column (NOTE: ids vary pre and post 2011)

rusher_id

ID of the player in the 'rusher' column (NOTE: ids vary pre and post 2011)

receiver_id

ID of the player in the 'receiver' column (NOTE: ids vary pre and post 2011)

name

Name of the 'passer' if it is not 'NA', or name of the 'rusher' otherwise.

id

ID of the player in the 'name' column.

qb_epa

Gives QB credit for EPA for up to the point where a receiver lost a fumble after a completed catch and makes EPA work more like passing yards on plays with fumbles.

Details

Build columns that capture what happens on all plays, including penalties, using string extraction from play description. Loosely based on Ben's nflfastR guide (https://mrcaseb.github.io/nflfastR/articles/beginners_guide.html) but updated to work with the RS data, which has a different player format in the play description; e.g. 24-M.Lynch instead of M.Lynch. The function also standardizes team abbreviations so that, for example, the Chargers are always represented by 'LAC' regardless of which year it was.