How to remove consecutive duplicate characters


I'm new to R and have no background in programming. My problem is I have data like this: enter image description here

In the "path" column, I want to remove all the consecutive duplicates like the red-framed ones in the above image. For example: Organic > Paid Search > Paid Search > Paid Search > Direct > Direct > Organic should become: Organic > Paid Search > Direct > Organic

Thank you for your support.


Answers:


Here is an option based on strsplit and rle:

x <- c("Organic > Paid Search > Paid Search > Paid Search > Direct > Direct > Organic")
x <- trimws(strsplit(x, ">")[[1]], "both")
paste(rle(x)$values, collapse = " > ")
# output
[1] "Organic > Paid Search > Direct > Organic"

Using a data.frame, try out:

df <- data.frame(Path = c("Organic > Paid Search > Paid Search > Paid Search > Direct > Direct > Organic",
                          "Organic > Paid Search >  Paid Search > Direct > Organic > Direct",
                          "Organic > Organic > Paid Search > Paid Search > Direct > Direct"),
                 conversions = c(6, 5, 3), stringsAsFactors = F)
# Solution
df$Path2 <- sapply(strsplit(df$Path, ">"),
                   function(x) paste(rle(trimws(strsplit(x, ">"), "both"))$values,
                                     collapse = " > "))
df # output
                                                                           Path conversions                                             Path2
1 Organic > Paid Search > Paid Search > Paid Search > Direct > Direct > Organic           6          Organic > Paid Search > Direct > Organic
2              Organic > Paid Search >  Paid Search > Direct > Organic > Direct           5 Organic > Paid Search > Direct > Organic > Direct
3               Organic > Organic > Paid Search > Paid Search > Direct > Direct           3                    Organic > Paid Search > Direct

Hope this helps !