Entry
Perl regex code to Python
Jul 5th, 2000 09:59
Nathan Wallace, unknown unknown, Hans Nowak, Snippet 18, Mike Fletcher
"""
Packages: text.regular_expressions
"""
"""
> I have been learning Perl in order to write some CGI scripts and
> various
> parsing scripts. I was wondering how the equivalent code would look in
> Python?
>
> Given an HTML file with content data deliminated via HTML comments.
> For
> example:
>
> <HTML>
> <BODY>
> <!--version-->6.4<!--/version-->
> <B><!--product-->OpenGL<!--/product--></B>some more stuff...
> <!--description-->line1
> line2
> line3
> line4<!--/description-->
> </BODY>
> </HTML>
>
> In Perl, I use the following regex to parse the content from between
> the HTML
> comments:
>
> sub kbExtractContent ($text, "description") {
> @_[0] =~ /<!--@_[1]-->(.+)<!--\/@_[1]-->/s;
> return $1;
> }
>
> where $text contains the entire contents of an HTML file and
> "description" is
> the comment pattern that I am looking for. The regex in the above
> case will
> look for any text between <!--description--> and <!--/description-->.
> The
> regex will span multiple lines via the 's' option. The content text
> is
> returned via the $1.
>
> How would you do the equivalent in Python?
"""
import re
TAGPATTERN = '<!--%s-->(.*?)<!--/%s-->'
#TAGPATTERN = '<%s.*?>(.*?)</%s.*?>'
def findtag( instuff, tagname, tagpattern=TAGPATTERN ):
'''
Finds the contents of a tag matching
TAGPATTERN which must have two string
substitution "slots" into which to
place copies of tagname.
'''
reg = re.compile( tagpattern %(tagname, tagname),
re.IGNORECASE |re.DOTALL )
result = reg.search( instuff )
# the result is either a match object or the None object
if result: # is a match object
return result.group( 1)
else: # is the None object
return None # different result from empty content
"""
Note: I've used a non-greedy search on the contents (which is normally
what you want unless you're allowing nested comments of the same GI).
DOTALL (or S) flag multi-line matching for the . character, IGNORECASE
is just another "wouldn't you want this".
As you will notice, far more verbose, but I find it easier to understand
at a glance than the Perl which, even though I know what it does still
doesn't quite resolve for me as to how it's being done.
"""