Saturday, February 12, 2011

Using regex to replace all spaces NOT in quotes in Ruby

I'm trying to write a regex to replace all spaces that are not included in quotes so something like this:

a = 4, b = 2, c = "space here"

would return this:

a=4,b=2,c="space here"

I spent some time searching this site and I found a similar q/a ( http://stackoverflow.com/questions/79968/split-a-string-by-spaces-in-python#80449 ) that would replace all the spaces inside quotes with a token that could be re-substituted in after wiping all the other spaces...but I was hoping there was a cleaner way of doing it.

  • I consider this very clean:

    mystring.scan(/((".*?")|([^ ]))/).map { |x| x[0] }.join
    

    I doubt gsub could do any better (assuming you want a pure regex approach).

  • This seems to work:

    result = string.gsub(/( |(".*?"))/, "\\2")
    
    Gene T : if you get into single- and double-quoted strings, you need to match opening and closing quote marks
    From Borgar
  • try this one, string in single/double quoter is also matched (so you need to filter them, if you only need space):

    /( |("([^"\\]|\\.)*")|('([^'\\]|\\.)*'))/
    
  • It's worth noting that any regular expression solution will fail in cases like the following:

    a = 4, b = 2, c = "space" here"
    

    While it is true that you could construct a regexp to handle the three-quote case specifically, you cannot solve the problem in the general sense. This is a mathematically provable limitation of simple DFAs, of which regexps are a direct representation. To perform any serious brace/quote matching, you will need the more powerful pushdown automaton, usually in the form of a text parser library (ANTLR, Bison, Parsec).

    With that said, it sounds like regular expressions should be sufficient for your needs. Just be aware of the limitations.

    rjmunro : What is the 'correct' solution for this case?
  • Daniel,

    The space between double-quote and 'here' is NOT in quotes in your example.

0 comments:

Post a Comment