faqts : Computers : Programming : Languages : Python : Snippets : Regular Expressions

+ Search
Add Entry AlertManage Folder Edit Entry Add page to http://del.icio.us/
Did You Find This Entry Useful?

1 of 2 people (50%) answered Yes
Recently 0 of 1 people (0%) answered Yes

Entry

Regular expression problem with quotes

Jul 5th, 2000 09:59
Nathan Wallace, unknown unknown, Hans Nowak, Snippet 67, Gordon McMillan


"""
Packages: text.regular_expressions
"""

"""
> I am trying to write code to parse Forth-ish strings. It should
> support quotes, too. The following string
> 
> 2 3 dup "hello world" . blah foo 44
> 
> should parse to
> 
> ['2', '3', 'dup', '"hello world"', '.', 'blah', 'foo', '44']
[snip]
> "hello world"x should be an error. Q: How can I
> trap this?
> 
> 2) I would like to include double quotes in strings, ala C, using
> \", thus allowing things like "\Hi!\" he said". Q: Can this be done
> using regular expressions?

Since I called Python-DX 16 bit DOS (instead of 32 bit DOS executing in a
16 bit DOS box), I'll give you this:
"""

import re

s1 = '2 3 dup "hello world" . blah foo 44'
s2 = '2 3 dup "hello world"x . blah foo 44'
s2a = r'"begin this" x y "end this"'
s2b = r'x "\"You cannot touch this\" he said" y'
s3 = r'2 3 dup "\"Hi!\" he said" . blah foo 44'

pat = re.compile(r'([^ "]+|"([^\\"]|\\"[^"])+")( |$)')
err = re.compile(r'"([^\\"]|\\"[^"])+"[^ ]')

def parse2(s):
  pos = 0
  rslt = []
  while pos < len(s):
    mo = err.match(s, pos)
    if mo:
      raise RuntimeError, "Error at %s" % s[mo.start(0):]
    mo = pat.match(s, pos)
    if mo:
      rslt.append(mo.group(0))
      #print "found", mo.group(0)
      pos = mo.end(0)
    else:
      print "No match at", s[pos:]
      pos = pos + 1
  return rslt

print parse2(s1)
print parse2(s2a)
print parse2(s2b)
print parse2(s3)
print parse2(s2)

'''
Produces:
['2 ', '3 ', 'dup ', '"hello world" ', '. ', 'blah ', 'foo ', '44']
['2 ', '3 ', 'dup ', '"\\"Hi!\\" he said" ', '. ', 'blah ', 'foo ',
'44'] 
Traceback (innermost last):
[snip]
RuntimeError: Error at "hello world"x . blah foo 44

'''