Entry
Regular expression problem with quotes
Jul 5th, 2000 09:59
Nathan Wallace, unknown unknown, Hans Nowak, Snippet 67, Gordon McMillan
"""
Packages: text.regular_expressions
"""
"""
> I am trying to write code to parse Forth-ish strings. It should
> support quotes, too. The following string
>
> 2 3 dup "hello world" . blah foo 44
>
> should parse to
>
> ['2', '3', 'dup', '"hello world"', '.', 'blah', 'foo', '44']
[snip]
> "hello world"x should be an error. Q: How can I
> trap this?
>
> 2) I would like to include double quotes in strings, ala C, using
> \", thus allowing things like "\Hi!\" he said". Q: Can this be done
> using regular expressions?
Since I called Python-DX 16 bit DOS (instead of 32 bit DOS executing in a
16 bit DOS box), I'll give you this:
"""
import re
s1 = '2 3 dup "hello world" . blah foo 44'
s2 = '2 3 dup "hello world"x . blah foo 44'
s2a = r'"begin this" x y "end this"'
s2b = r'x "\"You cannot touch this\" he said" y'
s3 = r'2 3 dup "\"Hi!\" he said" . blah foo 44'
pat = re.compile(r'([^ "]+|"([^\\"]|\\"[^"])+")( |$)')
err = re.compile(r'"([^\\"]|\\"[^"])+"[^ ]')
def parse2(s):
pos = 0
rslt = []
while pos < len(s):
mo = err.match(s, pos)
if mo:
raise RuntimeError, "Error at %s" % s[mo.start(0):]
mo = pat.match(s, pos)
if mo:
rslt.append(mo.group(0))
#print "found", mo.group(0)
pos = mo.end(0)
else:
print "No match at", s[pos:]
pos = pos + 1
return rslt
print parse2(s1)
print parse2(s2a)
print parse2(s2b)
print parse2(s3)
print parse2(s2)
'''
Produces:
['2 ', '3 ', 'dup ', '"hello world" ', '. ', 'blah ', 'foo ', '44']
['2 ', '3 ', 'dup ', '"\\"Hi!\\" he said" ', '. ', 'blah ', 'foo ',
'44']
Traceback (innermost last):
[snip]
RuntimeError: Error at "hello world"x . blah foo 44
'''