A short note on the startsWith function

The startsWith function comes with base R, and determines whether entries of an input start with a given prefix. (The endsWith function does the same thing but for suffixes.) The following code checks if each of “ant”, “banana” and “balloon” starts with “a”:

startsWith(c("ant", "banana", "balloon"), "a")
# [1]  TRUE FALSE FALSE

The second argument (the prefix to check) can also be a vector. The code below checks if “ant” starts with “a” and if “ant” starts with “b”:

startsWith("ant", c("a", "b"))
# [1]  TRUE FALSE

Where things might get a bit unintuitive is when both arguments are vectors of length >1. Why do you think the line of code below returned the result it did?

startsWith(c("ant", "banana", "balloon"), c("a", "b"))
# [1] TRUE TRUE FALSE

This makes sense when we look at the documentation for startsWith‘s return value:

A logical vector, of “common length” of x and prefix (or suffix), i.e., of the longer of the two lengths unless one of them is zero when the result is also of zero length. A shorter input is recycled to the output length.

startsWith(x, prefix) checks if x[i] starts with prefix[i] for each i. In our line of code above, the function checks if “ant” starts with “a” and “banana” starts with “b”. Since x had length greater than prefix, we “recycle” prefix and check if “balloon” starts with “a”.

If you want to check if each x[i] starts with any prefix[j] (with j possibly being different from i), we could do the following:

x <- c("ant", "banana", "balloon")
prefix <- c("a", "b")
has_prefix <- sapply(prefix, function(p) startsWith(x, p))
has_prefix
#          a     b
# [1,]  TRUE FALSE
# [2,] FALSE  TRUE
# [3,] FALSE  TRUE

apply(has_prefix, 1, any)
# [1] TRUE TRUE TRUE