Hello.... it's me....I was wondering if after all these years you'd like to meet -

[Adele is awesome :) ]

but I'll try that again...

Hello!

Just been revising some R skills in datacamp (DATACAMP IS AMAAZING!), and thought I'd bring some clarity to the na.rm argument in functions!

(please note - you will need to have installed R (and preferably R studio to try this out [both free])

so when you actively set *na.rm *to **TRUE,** you're basically flipping a switch which tells R to start **ignoring **values in a data set that are **missing** (*hence - NA*).

1 + 1 = 2 [HAPPY FACE :D]

But 1 + NA = [VERY SAD FACE :( ]

So, for example consider the following code:

# First, establish some data sets. # How about number of times you crave chocolate # and icecream per day, over a 7 day period chocolate_cravings <- c(16, 9, 13, 5, NA, 17, 14) ice_cream_cravings <- c(17, NA, 5, 16, 8, 13, 14) # Now let's try and get the average of chocolate_cravings mean(chocolate_cravings) [1] NA

So R clearly can't handle finding the mean (average) of chocolate_cravings (CAUSE ITS A HEALTH ENTHUSIAST!!!... just kidding), and that's because R tried to add up a bunch of numbers with a string.

## FAIL.

So instead of doing that - we can tell R to automatically skip these missing numbers by setting the na.rm argument to **TRUE **like so:

(NOTE: if you don't include na.rm in your formular, it will be set to FALSE by default)

mean(chocolate_cravings, na.rm = TRUE) [1] 12.33333

HORAAA! Great News! IT WORKED!....bad news.... that's a lot of cravings.....

So that's all pretty straight forward but what happens when we try to find the mean of the sum of both our data-sets?

Will R add all remaining numbers?

## Will it leave some out?

# FIND OUT ON THE NEXT EPISODE OF DRAGON BALL-

yes anyway, as I was about to say, R adds up each pair of elements when it adds data-sets together (see below) -

chocolate_cravings <- c(16, 9, 13, 5, NA, 17, 14) ice_cream_cravings <- c(17, NA, 5, 16, 8, 13, 14) chocolate_cravings + ice_cream_cravings [1] 33 NA 18 21 NA 30 28

and because of this, it becomes a bit like a 4 year old eating vegetables - it gets extremely picky!

So if it's adding TWO data-sets together, and it comes across one element in a data-set that is missing (NA), even if the corresponding element in the other data-set DOES have a number, it grumpily won't count it.

See the code below:

mean(chocolate_cravings, na.rm = TRUE) [1] 12.33333 sum(chocolate_cravings + ice_cream_cravings, na.rm = TRUE) [1] 130 sum(chocolate_cravings + ice_cream_cravings) [1] NA # Just to demonstrate that only pairs of values are counted sum(16+9+13+5+17+14+17+5+16+8+13+14) [1] 147 sum(16+13+5+17+14+17+5+16+13+14) [1] 130

So as you can imagine na.rm can be very useful in the right context, but it is no substitute for cleaning a dataset with many missing values - especially when your working with multiple sets.

...hmmm

why do i suddenly feel like chocolate.....