Hello.... it's me....I was wondering if after all these years you'd like to meet -
[Adele is awesome :) ]
but I'll try that again...
Just been revising some R skills in datacamp (DATACAMP IS AMAAZING!), and thought I'd bring some clarity to the na.rm argument in functions!
so when you actively set na.rm to TRUE, you're basically flipping a switch which tells R to start ignoring values in a data set that are missing (hence - NA).
1 + 1 = 2 [HAPPY FACE :D]
But 1 + NA = [VERY SAD FACE :( ]
So, for example consider the following code:
# First, establish some data sets. # How about number of times you crave chocolate # and icecream per day, over a 7 day period chocolate_cravings <- c(16, 9, 13, 5, NA, 17, 14) ice_cream_cravings <- c(17, NA, 5, 16, 8, 13, 14) # Now let's try and get the average of chocolate_cravings mean(chocolate_cravings)  NA
So R clearly can't handle finding the mean (average) of chocolate_cravings (CAUSE ITS A HEALTH ENTHUSIAST!!!... just kidding), and that's because R tried to add up a bunch of numbers with a string.
So instead of doing that - we can tell R to automatically skip these missing numbers by setting the na.rm argument to TRUE like so:
(NOTE: if you don't include na.rm in your formular, it will be set to FALSE by default)
mean(chocolate_cravings, na.rm = TRUE)  12.33333
HORAAA! Great News! IT WORKED!....bad news.... that's a lot of cravings.....
So that's all pretty straight forward but what happens when we try to find the mean of the sum of both our data-sets?
Will R add all remaining numbers?
Will it leave some out?
FIND OUT ON THE NEXT EPISODE OF DRAGON BALL-
yes anyway, as I was about to say, R adds up each pair of elements when it adds data-sets together (see below) -
chocolate_cravings <- c(16, 9, 13, 5, NA, 17, 14) ice_cream_cravings <- c(17, NA, 5, 16, 8, 13, 14) chocolate_cravings + ice_cream_cravings  33 NA 18 21 NA 30 28
and because of this, it becomes a bit like a 4 year old eating vegetables - it gets extremely picky!
So if it's adding TWO data-sets together, and it comes across one element in a data-set that is missing (NA), even if the corresponding element in the other data-set DOES have a number, it grumpily won't count it.
See the code below:
mean(chocolate_cravings, na.rm = TRUE)  12.33333 sum(chocolate_cravings + ice_cream_cravings, na.rm = TRUE)  130 sum(chocolate_cravings + ice_cream_cravings)  NA # Just to demonstrate that only pairs of values are counted sum(16+9+13+5+17+14+17+5+16+8+13+14)  147 sum(16+13+5+17+14+17+5+16+13+14)  130
So as you can imagine na.rm can be very useful in the right context, but it is no substitute for cleaning a dataset with many missing values - especially when your working with multiple sets.
why do i suddenly feel like chocolate.....