Nov 30th, 2004 10:30
Chris Burkhardt, Joe Bloggs, Magnus Lyckå, Matthew Schinckel, Paul Allopenna,
If you want to (quickly) strip all HTML tags from a string of data,
file = open(filename,'r')
data = file.read()
text = re.sub('<!--.*?-->', '', data) #Remove comments first, or '>' in
#comments will be interpreted as
#end of (comment) tag.
text = re.sub('<.*?>', '', text)
If you want to know how it works, read the 're' chapter in the library
as it discusses the usefulness of 'non-greedy' regular expressions.