Online Shopping : Computers : Programming : Languages : Perl : Common Problems : Pattern Matching

+ Search
Add Entry AlertManage Folder Edit Entry Add page to http://del.icio.us/
Did You Find This Entry Useful?

168 of 179 people (94%) answered Yes
Recently 10 of 10 people (100%) answered Yes

Entry

How can I use perl at the command line to do a Global Search & Replace within a directory? I want to remove <font WHATEVER> ... </font> but not ...

Mar 24th, 2009 21:18
chat alarab, Anthony Boyd, Per M Knutsen, mike gifford,


Per M Knutsen wrote on February 12, 2001:
Simple search-and-replace is easy with Perl. The one-liner I use is:
perl -pi -e 's/search/replace/g' filename
where /search/ becomes substituted by /replace/. To substitute the 
pattern in all files within a directory, simply replace the filename 
with a the wildcard *, like this:
perl -pi -e 's/search/replace/g' *
The s/ modifier means substitue, /g means global matching (i.e. 
substitute ALL instances of /search/ in files indicated).
For a more challenging search-and-replace you will need to learn how to 
use regular expressions. For example, if you want to replace the 
pattern <FONT ...>Something</font> with Something, you could alter the 
search-and-replace string like this:
perl -pi -e 's/<FONT.*>(.*)<\/FONT>/$1/gi' filename
For example, the following:
<font size=2>Something</font>
<FONT size=5 Color="blue">Something else</font>
<font></font>
becomes:
Something
Something else
Note that the third line is deleted altogether. The i/ modifier makes 
the search case-insensitive. To understand how this substitution works, 
you will need to know a bit about Perl's regular expression syntax. For 
the keen, I highly recommend Jeffrey Friedl's excellent book Mastering 
Regular Expressions (O'Reilly). The most important thing to note here 
is that the $1 variable refers back to what was matched within the 
paranthesis in the search string. You can use this feature to refer 
back to several sub-patterns in your search pattern, each embraced by a 
separate pair of parantheses. Use $1, $2 etc to do this.
Anthony Boyd wrote on February 10, 2003:
Please note that you will LOSE DATA if you try the Perl one-liner above.
First, it won't match font tags that occur over multiple lines.  So if
the open tag is on line 1, and the close tag is on line 2, you have no
match.  Thus, after running that Perl one-liner, you might still have
font tags in your HTML.  Second, the pattern ".*" matches almost
EVERYTHING.  The dot means "any character" and the star means "as much
as you can."  So this line of HTML:
<font size=2>Hi there.  <b>Hey!  I'm bold!</b>  Plain again.</font>
Will get hacked down to this:
  Plain again.
Why did we lose so much of the text?  Because <FONT.*> means "match a
less-than symbol, followed by the letters FONT, then match as much as
you possibly can until you find the last possible greater-than symbol."
 So <FONT.*> matches all the way to the closing bold tag.  Ugh.  You
don't want to lose text, and you do want it to find FONT tags that span
multiple lines.  So you need to stop using ".*" and add a parameter
(-0777) which will make Perl look at all lines at once.  Like this:
perl -0777 -pi -e 's/<\/?FONT[^>]*>//gi' filename
That means, "find a less-than sign (<) followed (optionally) by a
closing slash character, followed by the letters FONT, followed by
anything that is NOT a greater-than sign, followed by the greater-than
sign." In other words, only the opening & closing <FONT> tags.  I
believe that will perfectly strip out 99.99% of the font tags in existence.
http://www.ksa-123.com
http://www.ksa-2000.com
http://www.chat-kuwait.com
http://www.vip-kuwait.com
http://www.chat-3rb.com
http://www.vip-3rb.com
http://www.3rb-chat.com
http://www.vipgulf.com
http://www.chat-gulf.com
http://www.vip-gulf.com