faqts : Computers : Programming : Languages : Tse : Parser

+ Search
Add Entry AlertManage Folder Edit Entry Add page to http://del.icio.us/
Did You Find This Entry Useful?

0 of 1 people (0%) answered Yes
Recently 0 of 1 people (0%) answered Yes

Entry

TSE: Search/Replace:Regular Expression:Backus Naur Form:What is possible BNF for regular expression?

Oct 25th, 2003 15:24
Knud van Eeden,


----------------------------------------------------------------------
--- Knud van Eeden --- 21 October 2003 - 03:22 - 05:51 am ------------

TSE: Search/Replace:Regular Expression:Backus Naur Form:What is 
possible BNF for regular expression?

---

As a Backus Naur Form:

---

 1. expression = term

                 term | expression

 2. term = factor

           factor term

 3. factor = atom

             atom metacharacter

 4. atom = character

        .

        { expression }

        [ characterclass ]

        [ ~ characterclass ]

 5. characterclass = characterrange

                    characterrange characterclass

 6. characterrange = begincharacter

                     begincharacter - endcharacter

 7. begincharacter = character

 8. endcharacter = character

 9. characters =

                anycharacterexceptmetacharacters

                \ anycharacters

10. metacharacter =

                ? {=0 or 1 times}

                * {=0 or more, non-greedy}

                @ {=0 or more, greedy}

                + {=1 or more, non-greedy}

                # {=1 or more, greedy}

                ^ {=begin of line character}

                $ {=end of line character}

                \a {=alert or 'beep' character, ASCII 7}

                \b {=backspace character, ASCII 8}

                \c {=first placement of cursor}

                \f {=formfeed character, ASCII 12}

                \n {=line feed character, ASCII 10}

                \r {=return character, ASCII 13}

                \t {=tab character, ASCII 9}

                \v {=vertical tab character, ASCII 11}

                \dDD {=decimal value DD}

                \oOO {=octal value OO}

                \xHH {=hexadecimal value HH}

11. integer = digit

              digit integer

12. anycharacters = ! " # $ % & ' ( ) * + , - . / :
                    ; < = > ? @ [ \ ] ^ _ ` { | } ~
                    0 1 2 3 4 5 6 7 8 9
                    A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
                    a b c d e f g h i j k l m n o p q r s t u v w x y z

---
---

In a Backus Naur diagram:


'expression' = -->-+->--[term]---->--------------------+
                   |                                   |
                   |                                   |
                   +->--[term]->-[|]->-[expression]-->-+->--


'term' = -->-+->--[factor]-->----------+
             |                         |
             |                         |
             +->--[factor]->-[term]-->-+->--


'factor' = -->-+->--[atom]-->-------------------+
               |                                |
               |                                |
               +->--[atom]->-[metacharacters]->-+->--


'atom' = -->-+->--[characters]-->---------------------------------+
             |                                                    |
             +->--[.]---------->----------------------------------+
             |                                                    |
             +->-[{]-->--[expression]-->--[}]-->------------------+->-
             |                                                    |
             +->-[[]-->--[characterclass]-->--[]]-->--------------+
             |                                                    |
             +->-[[]-->--[~]-->--[characterclass]-->--[]]-->------+


'characterclass' = -->-+->-[characterrange]->--------------------+
                       |                                         |
                       |                                         |
                       +->-[characterrange]->-[characterclass]->-+->--


'characterrange' = ->-+->-[begincharacter]->----------------------+
                      |                                           |
                      |                                           |
                      +->-[begincharacter]->-[-]->-[endcharacter]-+->-


'begincharacter' = ->-[character]->-


'endcharacter' = ->-[character]->-


'characters' =

         +-----------[anycharacterexceptmetacharacters]-->--+
         |                                                  |
    -->--+                                                  +-->--
         |                                                  |
         +->-[\]-->--[anycharacters]----------->------------+



'anycharacters' =

         +->---------[metacharacters]-------------------->--+
         |                                                  |
    -->--+                                                  +-->--
         |                                                  |
         +->-[anycharacterexceptmetacharacters]-------- ->--+



'metacharacters' = ->-+--[#$*+?@^]----------------------->--+
                      |                                     |
                      +->-[\]-->-+-->--[abcfnrtv]-------->--+-->--
                                 |                          |
                                 +-->--[dox]->-[integer]->--+


'anycharacterexceptmetacharacters' =


              +->-[!"%'(),-/:;<=>[\]_`{|}~]------->--+
              |                                      |
              +->-[0123456789]-------------------->--+
         -->--+                                      +-->--
              +->-[ABCDEFGHIJKLMNOPQRSTUVWXYZ]---->--+
              |                                      |
              +->-[abcdefghijklmnopqrstuvwxyz]---->--+



'integer' = -->--+->--[digit]----->--+
                 |                   |
                 |                   +-->--
                 |                   |
                 +-->--[integer]-->--+

---
---

In words:

---

 1. An expression is a term, or

                     a term OR an expression

 2. A term is a factor, or

              a factor followed by a term

 3. A factor is an atom, or

                an atom followed by a metacharacter

 4. An atom is a character, or

               a dot, or

               an open curly parenthesis followed by an expression
               followed by closed curly parenthesis, or

               an open square parenthesis followed by an characterclass
               followed by closed square parenthesis, or

               an open square parenthesis followed by a tilde, followed
               an characterclass followed by closed square parenthesis

 5. A characterclass is a characterrange, or

                        a characterrange followed by a characterclass

 6. A characterrange is a begincharacter, or

                        a begincharacter followed by a hyphen followed
                        by an endcharacter

 7. A begincharacter is a character

 8. An endcharacter is a character

 9. A character is anycharacterexceptmetacharacters, or

               a backslash followed by anycharacter

10. A metacharacter is a questionmarksign, or

                       a multiplicationsign, or

                       an atsign, or

                       a plussign, or

                       a hashsign, or

                       a caretsign, or

                       a dollarsign, or

                       a backslash followed by the 'a' character, or

                       a backslash followed by the 'b' character, or

                       a backslash followed by the 'c' character, or

                       a backslash followed by the 'f' character, or

                       a backslash followed by the 'n' character, or

                       a backslash followed by the 'r' character, or

                       a backslash followed by the 't' character, or

                       a backslash followed by the 'v' character, or

                       a backslash followed by the 'd' character,
                         followed by an integer, or

                       a backslash followed by the 'h' character,
                         followed by an integer, or

                       a backslash followed by the 'x' character,
                         followed by an integer

11. An integer is a digit, or

                  a digit followed by an integer

---
---


The goal is now, given a particular regular expression, to check if it
is a valid regular expression, using the above rules.

You could use any of two methods for this:

1. Bottom up

2. Top down

---
---

For example:

is

  { a b c }

a valid regular expression?

---

Method: Using bottom up:

---

Using the bottom up rewriting method, it will be a valid regular
expression, if this given { a b c } can be reduced to 1 word, that is
'expression'.

---

According to rule 9. A character is anycharacterexceptmetacharacters,

         so using the character 'c', you get:

         so { a b c } can be written as { a b character }

According to rule 4. An atom is a character

         so { a b character } can be written as { a b atom }

According to rule 3. A factor is an atom

         so { a b atom } can be written as { a b factor }

According to rule 9. A character is anycharacterexceptmetacharacters,

         so using the character 'b', you get:

         so { a b factor } can be written as { a character factor }

According to rule 4. An atom is a character

         so { a character factor } can be written as { a atom factor }

According to rule 3. A factor is an atom

         so { a atom factor } can be written as { a factor factor }

According to rule 2. A term is a factor

         so { a factor factor } can be written as { a factor term }

According to rule 2. A term is a factor followed by a term

         so { a factor term } can be written as { a term }

According to rule 9. A character is anycharacterexceptmetacharacters,

         so using the character 'a', you get:

         so { a term } can be written as { character term }

According to rule 4. An atom is a character

         so { character term } can be written as { atom term }

According to rule 3. A factor is an atom

         so { atom term } can be written as { factor term }

According to rule 2. A term is a factor

         so { factor term } can be written as { term }

According to rule 1. An expression is a term

         so { term } can be written as { expression }

According to rule 4. An atom is a curly parenthesis, followed by an
         expression, followed by a curly parenthesis

         so { expression } can be written as atom

According to rule 3. A factor is an atom

         so atom can be written as factor

According to rule 2. A term is a factor

         so factor can be written as term

According to rule 1. An expression is a term

         so term can be written as expression

So yes, the ( a b c ) has been reduced to 1 word 'expression'.

So ( a b c ) is a valid regular expression.

---

[Internet: see also: Appendix: Extended Backus-Naur Form (EBNF): 
http://www.xml.com/pub/a/98/10/guide5.html]

---
---

For example:

is

  { a b c }

a valid regular expression?

---

Method: Using top down:

---

The goal is now, given a particular regular expression, to reduce this
to an empty string, that is you have taken off all the characters (one
by one, and from left to right) from the given regular expression,
until you are left with an empty string.

---

Start with the first character, until you get a match.

According to rule 1. An expression is a term

According to rule 2. A term is a factor

According to rule 3. A factor is an atom

According to rule 4. An atom is
               an open curly parenthesis followed by an expression
               followed by closed curly parenthesis

So the first character '{' matches the first part of the description.

Now take the next character, 'a'.

And continue from where you are.

That is, you continue with an expression.

                                         followed by an expression
               followed by closed curly parenthesis

According to rule 1. An expression is a term

According to rule 2. A term is a factor

According to rule 3. A factor is an atom

According to rule 4. An atom is a character

According to rule 9. A character is anycharacterexceptmetacharacters

Now 'a' matches 'anycharacterexceptmetacharacters'.

Now take the next character, 'b'.

Go back to 'factor', and check if the next possibility,
'atom followed by a metacharacter' is fulfilled.

It is not, because 'b' is not a metacharacter

Go back to 'term', and check if the next possibility,
'factor followed by a term' is fulfilled.

You have already been at 'factor', so now check if there is a term.

Go to term.

According to rule 2. A term is a factor

According to rule 3. A factor is an atom

According to rule 4. An atom is a character

According to rule 9. A character is anycharacterexceptmetacharacters

Now 'b' matches 'anycharacterexceptmetacharacters'.

Thus 'factor followed by a term' is fulfilled.

Take the next character.

Go back to 'expression', and check if the next possibility,
'term or an expression' is fulfilled.

... to be continued later.

---
---

[book: see also: Bucknall, Julian - the Tomes of Delphi: Algorithms 
and Datastructures - p. 37 - 'Using regular expressions' - 
http://www.amazon.com/exec/obidos/tg/detail/-
/1556227361/qid=1065748783/sr=1-1/ref=sr_1_1/002-0122962-7851254?
v=glance&s=books]

---
---

Internet: see also:

---

Compiler: Grammar: Expression: Regular: Which grammar defines set of 
all regular expressions? [BNF]
http://www.faqts.com/knowledge_base/view.phtml/aid/25950/fid/1263

---

PERL:Search/Replace:Regular Expression:Backus Naur Form:What is 
possible BNF for regular expression?
http://www.faqts.com/knowledge_base/view.phtml/aid/25718/fid/200

---

Delphi: Search: Regular expression: Create: How to create a regular 
expression parser in Delphi?
http://www.faqts.com/knowledge_base/view.phtml/aid/25645/fid/175

---

Delphi: Search: Regular expression: How to add regular expression 
searching to Delphi? [Systools]
http://www.faqts.com/knowledge_base/view.phtml/aid/25295/fid/175

----------------------------------------------------------------------