faqts : Computers : Programming : Languages : Tse : Search

+ Search
Add Entry AlertManage Folder Edit Entry Add page to http://del.icio.us/
Did You Find This Entry Useful?

2 of 4 people (50%) answered Yes
Recently 2 of 4 people (50%) answered Yes

Entry

TSE: Search/Replace: Regular expression: Library: Email: Which regular expression to extract e-mail?

Sep 17th, 2004 17:25
Knud van Eeden, special thanks to Carlo Hogeveen and Sjoerd Rienstra


----------------------------------------------------------------------
--- Knud van Eeden --- 20 October 2003 - 07:43 pm --------------------

TSE: Search/Replace: Regular expression: Library: Email: Which regular 
expression to extract e-mail?

---

{[A-Za-z0-9_\-.]#}\@{[A-Za-z0-9_\-.]#}\.{[A-Za-z][A-Za-z][A-Za-z][A-Za-
z]}|{[A-Za-z][A-Za-z][A-Za-z]}|{[A-Za-z][A-Za-z]}\c

---

will extract all e-mail addresses with an extension of 2, 3 or 4
characters (and not more).

---

Note:
the above characters should be all on 1 line, and you will have to use
the option 'x'

---

For example, it will extract and highlight any of the following:

 my.first.name-my.department@mycompany.bz

 my.first.name-my.department@mycompany.biz

 my.first.name-my.department@mycompany.info

---

Note:

Hereby the Backus Naur diagram used to create the above regular 
expression:


   +-------------<------+         +-------------<------+
   |                    |         |                    |
->-+->-[A-Za-z0-9_-.]->-+->-[@]->-+->-[A-Za-z0-9_-.]->-+--+
                                                          |
+----------------------------<----------------------------+
|
|
|  +->-[A-Za-z]-[A-Za-z]------------------->-+
|  |                                         |
|  |                                         |
+>-+->-[A-Za-z]-[A-Za-z]-[A-Za-z]---------->-+->-
   |                                         |
   |                                         |
   +->-[A-Za-z]-[A-Za-z]-[A-Za-z]-[A-Za-z]->-+

---
---

Note: You could try an even shorter regular expression:

{[a-z0-9_\-\.]#}\@{[a-z0-9_\-\.]#}\.{[a-z][a-z]a-z][a-z]}|{[a-z][a-z]
[a-z]}|{[a-z][a-z]}

but this is not going to work, as the [] must explicitely contain the
capital alphabetic characters, when using the option 'ix', as the
'i'gnore case option has no influence on the characters in the class,
it shows.

---

[help: TSE: see also: index: 'Regular expression operators': Class: 
[]: 'i' has no influence in class]

---
---

Internet: see also:

---

The description of a valid e-mail address is written down in an RFC
(='R'equest 'F'or 'C'omment)

RFC 822 - Standard for the format of ARPA Internet text messages
http://www.faqs.org/rfcs/rfc822.html]

---

TSE: Search/Replace: Regular expression: Link: Can you give overview 
links regular expressions?
http://www.faqts.com/knowledge_base/view.phtml/aid/31433/fid/865

----------------------------------------------------------------------