0

I have a data frame that looks like this:

data <- data.frame(PROG_START = c("Tuesday, October 1, 2019", "1682294400"))
> print(data)
                PROG_START
1 Tuesday, October 1, 2019
2               1682294400

As you can see, the PROG_START column has dates in 2 different formats (UTX timestamp and day of week, month, day, year). I would like to standardize the values in this column so that the format for all dates is day-month-year.

I've looked through Stack and haven't had much success in finding a solution. I did find one helpful post that led me to develop the following:

data1 <- data %>% 
  mutate(Dates = case_when(str_detect(PROG_START, '\\d{10}\\.\\d{3}') ~  PROG_START, TRUE ~ NA_character_  )) %>% 
  mutate(Dates = as.POSIXct(as.numeric(PROG_START), origin = '1970-01-01', tz = 'UTC')) %>% 
  mutate(PROG_START = anytime(PROG_START)) %>% mutate(PROG_START = coalesce(PROG_START, Dates)) %>% select(-Dates)

This successfully converted most of the UTX timestamps, however, a minority had the wrong year. For example, 1682294400 was ascribed the year 1682 instead of 2023. It also turned all other formatted dates (i.e., Tuesday, October 1, 2019) into NAs.

I then updated the above code to this:

data1 <- data %>% 
  mutate(PROG_START = case_when(
    str_detect(PROG_START, '\\d{10}\\.\\d{3}') ~ as.POSIXct(as.numeric(PROG_START), origin = '1970-01-01', tz = 'UTC'),
    TRUE ~ dmy(PROG_START)
  )) %>% 
  mutate(PROG_START = format(PROG_START, "%d-%m-%Y"))

However, this turned everything into NAs. Not sure how to proceed at the moment. I appreciate the help! Thank you!

2 Answers 2

1
fn <- function(dd){
   a <- lubridate::as_datetime(suppressWarnings(as.numeric(dd)))
   a[idx] <- lubridate::parse_date_time(dd[idx <- is.na(a)], "ABdY")
   a
 }

fn(data$PROG_START)
[1] "2019-10-01 UTC" "2023-04-24 UTC"
1

I recently had to do something similar, here's a solution that checks for values that are not NA and do not contain a comma, so it should only affect the UTX timestamps.

Since the column is characters, we need to convert the value to int, to be able to convert to date, and finally character.

There's definitely a solution with better performance, but it works.

for (i in 1:nrow(data)) {
    if (!is.na(data$PROG_START[i]) && !grepl(",", data$PROG_START[i])) {
       data$PROG_START[i] <- format(as.POSIXct(as.numeric(data$PROG_START[i]), origin = '1970-01-01', tz = 'UTC'), "%A, %B %d, %Y")
    }
}

Edit: Changed the contents of format() to fit your example.

1
  • Thanks, Matt. Your response somewhat worked - the years were still a bit off in the output. In this case, Onyambu's solution worked flawlessly. Thanks for the assistance! Commented Apr 19 at 20:35

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Not the answer you're looking for? Browse other questions tagged or ask your own question.