6

I have a list of strings like this:

mystr <- c("16.142.8",          
       "52.135.1",         
       "40.114.4",          
       "83.068.8",         
       "83.456.3",         
       "55.181.5",         
       "76.870.2",         
       "96.910.2",         
       "17.171.9",         
       "49.617.4",         
       "38.176.1",         
       "50.717.7",         
       "19.919.6")

I know that the first dot . is just a thousands separator, while the second one is the decimal operator.

I want to convert the strings to numbers, so the first one should become 16142.8, the second 52135.1, and so on.

I suspect that it migh be done with regular expressions, but I'm not sure how. Any ideas?

2 Answers 2

11

You need a lookahead based PCRE regex with gsub:

gsub("\\.(?=[^.]*\\.)", "", mystr, perl=TRUE)

See an online R demo

Details

  • \\. - a dot
  • (?=[^.]*\\.) - that is followed with 0 or more chars other than . (matched with [^.]*) and then a literal .. The (?=...) is a positive lookahead that requires some pattern to appear immediately to the right of the current location, but is not added to the match value and the regex index stays at the one and the same place, i.e. is not advanced.
2
  • 2
    Alternately, I guess a negative lookahead works: gsub("[.](?!\\d+$)", "", mystr, perl=TRUE)
    – Frank
    Commented Aug 9, 2017 at 15:51
  • 1
    @Frank: Yes, it will match any dot that is not followed with 1 or more digits and the end of string. Commented Aug 9, 2017 at 15:55
7

A simple "sub" can achieve the same, as it will only replace the first matching pattern. Example,

sub("\\.", "", mystr)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Not the answer you're looking for? Browse other questions tagged or ask your own question.