I& #39;m gonna start a thread on what I hope will be helpful R tips to wrangle this huge NFL Big Data Bowl data. If you& #39;re an advanced R programmer, this is probably not for you but feel free to correct me if I made a mistake or offer better alternatives
#1

slice_sample() if you want to quickly preview what your result might look like using a random sampling of rows in your data
#2

janitor::clean_names() if variable names with random capitalization, spaces and other undesired characters make you sick

with the defaults you can turn gameTimeEastern (https://abs.twimg.com/emoji/v2/... draggable="false" alt="😒" title="Unerfreutes Gesicht" aria-label="Emoji: Unerfreutes Gesicht">) into game_time_eastern (https://abs.twimg.com/emoji/v2/... draggable="false" alt="😙" title="Kussgesicht mit lĂ€chelnden Augen" aria-label="Emoji: Kussgesicht mit lĂ€chelnden Augen">https://abs.twimg.com/emoji/v2/... draggable="false" alt="👌" title="Ok hand" aria-label="Emoji: Ok hand">)
#3

lubridate::mdy() to convert a variable into a Date

data %>% mutate(game_date = mdy(game_date))
#4

lubridate::parse_date_time() for inconsistent date formats

players %>%
mutate(birth_date = lubridate::parse_date_time(birth_date,
orders = c("y-m-d", "m/d/y"))
#5

tidyverse::case_when()
#6

janitor::get_dupes()

(watch out with your joins, there are 5 players with the same name)
#7

if you& #39;re going to bind all 17 weeks of data into one dataset, save it to disk as a parquet file via {arrow}. from my very unscientific testing with different file formats (rda, fst, feather, rds, tsv.gz), parquet was the fastest read

More on {arrow}: https://arrow.apache.org/docs/r/ ">https://arrow.apache.org/docs/r/&q...
#8

tidyverse::separate()
#9

Another use of tidyverse::separate()
#10

stringr::str_extract() to extract information using regex from strings
#11

tidyr::replace_na() to quickly replace NA& #39;s
#12

tidyr::pivot_longer() for long, tidy as fuck data
You can follow @asmae_toumi.
Tip: mention @twtextapp on a Twitter thread with the keyword “unroll” to get a link to it.

Latest Threads Unrolled: