faqts : Computers : Programming : Languages : Tse : Search

+ Search
Add Entry AlertManage Folder Edit Entry Add page to http://del.icio.us/
Did You Find This Entry Useful?

1 of 1 people (100%) answered Yes
Recently 1 of 1 people (100%) answered Yes

Entry

TSE: File: Search/Replace: Regular expression: NOT: How to find NOT lines?

Sep 24th, 2003 15:06
Knud van Eeden,


-----------------------------------------------------------------------
-
--- Knud van Eeden - 11 October 2001 - 10:07 PM -----------------------
-
Method 1: Complement-Delete method: Complement of NOT, and delete found
lines:
A very general method might be to remove in a total of lines
only the lines that do fulfill the opposite of the NOT search 
criterium.
With this method you avoid to write possibly complicated NOT
regular expressions for a given NOT search, as you can search for the
expressions that do fullfil opposite the criterium, and then just
delete that
lines.
---
Then you are left with the lines that do NOT
contain the search criterium (the idea is that in a
bag full of apples and bananas, if you eat only all the
apples you are left with only all the bananas).
---
The principle is the general truth (e.g. this is a law in set theory
and also used in probability, that is a probability and its complement
are always equal to 1) that a property and its complement make
the total set.
Or thus
 (all elements with a property) + (all elements with NOT that property)
= (all elements of that whole)
---
So if you remove all elements with that property you are
left with the complement (=NOT that property).
Or thus from the above:
 (all elements with NOT that property) =
 (all elements of that whole) - (all elements with a property)
Here the minus operator stands for removing.
---
This principle is so general, that anything you can do with your
search expression, you can also do for the 'NOT' search expression.
---
As a matter of fact, with this approach, the total amount of NOT
expressions at least EQUALS the total amount of not not expressions,
because obviously every search expression used has at least 1
corresponding
NOT expression.
---
For example:
Using complement-delete searching with TSE v4.x, is there a way to
find lines of text that do NOT contain one or more certain characters
or words?
---
e.g. find all lines not containing the character '('
---
The easiest way is to search for all lines containing '(', and then
to delete this lines.
What is left are the lines NOT containing '('.
---
So use e.g.
(Make sure you have backups)
load your file, then
in the TSE menu choose:
 'Search'
  'Find and Do'
    Search for: choose '('
     Options: choose 'ng'
      After Find, Do: choose 'Delete Line'
---
---
Method 2: Regular expression method:
Method: regular expressions (using class and '~'):
---
o Searching for lines that do NOT contain 1 specific character:
---
This method does only work for single characters.
---
To find a line that does NOT contain a specific character use the
following:
 ^[~<yourcharacter>]+$
or similar:
 ^[~<yourcharacter>]#$
---
                       +-----------------------+
                       v +-------------------+ ^
 ->-(begin of line)->--+-+any other character+-+->--(end of line)-
                         +-------------------+
---
This means:
 -first <begin of line>
 -then <anything but that character> repeated 1 or more times
 -then <end of line>
---
e.g. searching with the regular expression search set on ('ix')
^[~(]#$
or similar
^[~(]+$
does find all lines containing NO '(' inside.
This means:
 -first <begin of line>
 -then <anything but (> repeated 1 or more times
 -then <end of line>
                       +-----------------------+
                       v +-------------------+ ^
 ->-(begin of line)->--+-+any character but (+-+->--(end of line)-
                         +-------------------+
---
e.g. searching with the regular expression search set on ('ix')
 ^[~)]#$
or similar:
 ^[~)]+$
does find all lines containing NO ')' inside
This means:
 -first <begin of line>
 -then <anything but )> repeated 1 or more times
 -then <end of line>
                       +-----------------------+
                       v +-------------------+ ^
 ->-(begin of line)->--+-+any character but )+-+->--(end of line)-
                         +-------------------+
---
o Searching for lines that do NOT contain a specific word (that is 2 or
more characters):
For example, find all the lines NOT containing the word 'test':
I checked the records at news.semware.com and found the following
answer from
Semware.com:
(see question 'Re: NOT operator in regular expression')
"Our current regular expression operators do not support this feature.
A macro could be written to that implemented it, but it would be a bit
of work.  I will add the suggestion to my "suggestions" pile."
---
Conclusion:
-If you want to find all lines NOT containing words, you will have to
use the complement method 1(that is remove all lines which fullfil the
opposite search criterium), as described above, when using the current
versions of the TSE macro language (currently TSEv4.x).
-----------------------------------------------------------------------