A Little About Dates in R
Before we launch into any analysis that contains dates, we should know a few important nuggets about how R handles date-like objects.
There are 3 date/time classes are built in to R
- Date
- POSIXct
- POSIXlt
Base R
First, base R can read a string of text and convert it to a date class. To help it read the date, you must tell R what date format your character string should expect. Below are several examples. You can look at all the possible format and codes by running ?strptime
in your R console.
strptime("October 16, 1984", format = "%B %e, %Y")
## [1] "1984-10-16 EDT"
strptime("16 October, 1984", format = "%e %B, %Y")
## [1] "1984-10-16 EDT"
strptime("16-- October, 1984", format = "%e-- %B, %Y")
## [1] "1984-10-16 EDT"
class(strptime("16-- October, 1984", format = "%e-- %B, %Y"))
## [1] "POSIXlt" "POSIXt"
birthday = strptime("16-- October, 1984", format = "%e-- %B, %Y")
As you can see, the strptime
command recognizes your string as a POSIXlt POSIXt
class.
lubridate
A second and easier way to have R recognize dates is to use the lubridate
package in R. Thanks again Hadley
library(lubridate)
Using lubridate
also allows R to read character strings as dates. However, instead of having to tell R the exact format of your string (which can be difficult), lubridate tries many methods to recognize your string. You simply provide it the order of your month, day, and year in ymd
format or any combination thereof.
mdy("June 14, 2018")
## [1] "2018-06-14"
dmy("14 June, 2018")
## [1] "2018-06-14"
dmy("14-- June, 2018")
## [1] "2018-06-14"
class(dmy("14-- June, 2018"))
## [1] "Date"
You’ll notice that lubridate creates a date
class. To change it into POSIXlt POSIXt
format, wrap your text with the following code.
class(as.POSIXlt(mdy("June 14, 2018")))
## [1] "POSIXlt" "POSIXt"
We also need to ensure our date is the correct timezone. This would be more important if our date had a time included.
date = as.POSIXlt(dmy("14 June, 2018"))
date
## [1] "2018-06-14 UTC"
date = force_tz(date, tzone = "America/New_York")
date
## [1] "2018-06-14 EDT"
When a date vector is of class as.POSIXlt
, all the information is stored as a list. You can also extract specific information from the list as well.
date
## [1] "2018-06-14 EDT"
unlist(date)
## sec min hour mday mon year wday yday
## "0" "0" "0" "14" "5" "118" "4" "164"
## isdst zone gmtoff
## "1" "EDT" "-14400"
date$mon
## [1] 5
month(date)
## [1] 6
date$year
## [1] 118
year(date)
## [1] 2018
You can manipulate these date vectors as well.
date - birthday
## Time difference of 12294 days
birthday + hours(4)
## [1] "1984-10-16 04:00:00 EDT"
birthday + days(4)
## [1] "1984-10-20 EDT"
date + years(4) + months(9)
## [1] "2023-03-14 EDT"