faqts : Computers : Programming : Language processing : Compiler : Syntax Analyzer : Grammar

+ Search
Add Entry AlertManage Folder Edit Entry Add page to http://del.icio.us/
Did You Find This Entry Useful?

15 of 23 people (65%) answered Yes
Recently 8 of 10 people (80%) answered Yes

Entry

Compiler: Grammar: Expression: Regular: Which grammar defines set of all regular expressions? [BNF]

Jan 22nd, 2004 13:07
Knud van Eeden,


----------------------------------------------------------------------
--- Knud van Eeden --- 26 October 2003 - 00:00 am --------------------

Compiler: Grammar: Expression: Regular: Which grammar defines set of 
all regular expressions? [BNF]

---

The set of all regular expressions

is a 'context free' grammar

containing three rules:

---

Here the BNF for the set of all possible regular expressions:

---

1.  <expression> ::= <term> | <term> OR <expression>

2.  <term> ::= <factor> | <factor> <term>

3.  <factor> ::= ( <expression> )* | ( <expression> ) | anycharacter | 
anycharacter*

---

[book: source: Sedgewick, Robert - Algorithms (2nd edition) - p. 306 -
 'Context-free grammars' - 
http://www.amazon.com/exec/obidos/ASIN/0201066734/qid%3D1066520724/sr%
3D11-1/ref%3Dsr%5F11%5F1/103-7895116-2553433]

---
---

Thus for example the regular expressions used in:

 1. Perl

 2. TSE

 3. Java

 4. JavaScript

 5. .NET

 ...

are just special cases of the above grammar.

---
---

The language accepted by a finite automaton can ALWAYS be written as a
regular expression, that is one obtained by applying the following 3
rules (the above BNF with its 3 rules is just another way to express
the below 3 rules):

 1. if X and Y are regular expressions, so is their SUM X + Y

    (thus strings are produced which are either of X or of Y)

 2. if X and Y are regular expressions, so is their CONCATENATION XY

    (thus strings are produced where the prefixes are from the regular
     expression X, and the suffixes are from the regular expression Y)

 3. if X is a regular expression, so is its ITERATION X*

    (thus strings are produced which are zero or more repetitions
     of a given regular expression)

 and further:

 4. a character is a regular expression

---
---

Note:

e.g.

To show that r+ (or thus rr*) is a regular expression:

1. r is any regular expression

2. an iteration is a regular expression (using rule 3)

3. r+ (that is, one or more times) can also be written as rr*
   (that is, (one time) and (zero or more times))

4. or thus as a regular expression followed by an iteration

5. or thus a concatenation, which is a regular expression (using rule 
2)

---
---

Moreover, for every regular expression there is a finite automaton
which accepts the language symbolized by that expression.

---

But also, for every regular expression there are actually an infinite
number of finite automata which accept that language.

In other words you will not find only 1 unique finite automata for your
regular expression (e.g. in Perl or TSE, ...), but you can always find
another finite automata which ALSO accepts your regular language).

---

Finite automata are the SIMPLEST cases of the 4 computing models
widely studied.

---

[book: source: Dewdney, Alexander Keewatin - the Turing omnibus, 61 
excursions in computer science - p. 12 - 'Finite automata' - 
http://www.amazon.com/exec/obidos/tg/detail/-
/0805071660/qid=1066986136/sr=1-4/ref=sr_1_4/103-7895116-2553433?
v=glance&s=books]

---
---

For example, you will need no extra storage (like a stack, queue,
array, ...) to store your intermediate results while scanning text
with your regular expression.

Because you can always go ahead, to a new state, and you are in
this case not obliged to store any information on a stack or similar.

So this is a language of 0 stacks.

That is one of the reasons why you can keep it simple if you use this
method.

---

Because they are the simplest case, they have limitations, which may
not be readily apparent.

For example checking if parentheses are well balanced

 e.g.

  check if '{' has always its corresponding '}'

is in general not possible when using a regular expression language
only, because you need some extra intermediate storage (like a
stack or a queue, ...) to store positions of say the beginning
parentheses, if you want to find this out.

---

Internet: see also:

---

TSE: Search/Replace:Regular Expression:Backus Naur Form:What is 
possible BNF for regular expression?
http://www.faqts.com/knowledge_base/view.phtml/aid/25714/fid/1236

---

PERL:Search/Replace:Regular Expression:Backus Naur Form:What is 
possible BNF for regular expression?
http://www.faqts.com/knowledge_base/view.phtml/aid/25718/fid/200

---

Compiler: Language: Overview: Hierarchy: Chomsky: Model: What is the 
Chomsky hierarchy?
http://www.faqts.com/knowledge_base/view.phtml/aid/25862/fid/1258

----------------------------------------------------------------------