Entry
TSE: Search/Replace: Regular expression: Library: Internet: Which regular expression to extract URL?
Feb 12th, 2005 07:40
Knud van Eeden,
----------------------------------------------------------------------
--- Knud van Eeden --- 20 October 2003 - 09:56 pm --------------------
TSE: Search/Replace: Regular expression: Library: Internet: Which
regular expression to extract URL?
---
*{www\.}|{{http}{s}?\:\/\/{www\.}?}[A-Za-z0-9_\-.]#\.[A-Za-z0-9_\-./
\~?&=:+#/%,()]#\c
---
and using the 'x' search option
will extract most URLs.
---
Note:
the above characters should be all on 1 line, and you will have to use
the option 'ix'
(Note : Also include the first space in the beginning of the line, in
the expression above)
---
For example, it will extract and highlight any of the following URLs.
Note: the URL has to be on 1 line, so you will have to wrap them.
http://condor.admin.ccny.cuny.edu/~iy4533/
http://www.engin.umich.edu/~problemsolving/closed/unstuck/unstuck.htm
http://www.photonics.ru/eng/sazhnikov/Models.htm
http://216.239.59.104/search?
q=cache:ZjGFHZFXgW4J:www.photonics.ru/eng/sazhnikov/Thinking.htm+what+i
s+working+backwards+problem+solving+method&hl=en&ie=UTF-8
http://www.engr.mun.ca/~cdaley/1000/Design.html
http://216.239.59.104/search?
q=cache:tPA2BA5paIwJ:isg.cs.tcd.ie/giangt/Tut_9.pdf+what+is+working+bac
kwards+problem+solving+method&hl=en&ie=UTF-8
http://www.it.bton.ac.uk/staff/rng/teaching/notes/ProbSolvMethods.html#
Anderson
www.engin.umich.edu/~problemsolving/closed/unstuck/unstuck.htm
www.photonics.ru/eng/sazhnikov/Models.htm
http://216.239.59.104/search?
q=cache:ZjGFHZFXgW4J:www.photonics.ru/eng/sazhnikov/Thinking.htm+what+i
s+working+backwards+problem+solving+method&hl=en&ie=UTF-8
www.engr.mun.ca/~cdaley/1000/Design.html
http://216.239.59.104/search?
q=cache:tPA2BA5paIwJ:isg.cs.tcd.ie/giangt/Tut_9.pdf+what+is+working+bac
kwards+problem+solving+method&hl=en&ie=UTF-8
www.it.bton.ac.uk/staff/rng/teaching/notes/ProbSolvMethods.html#Anderso
n
---
Note:
Hereby the Backus Naur diagram used to create the above regular
expression:
+--------<------+
| |
-+->--[space]-->-+->-+
| | |
+-------->------+ |
|
|
|
++---------<----------+
||
||
||
|| +----->-----+ +--------->--------+
|| | | | |
|+->-[http]-+->--[s]-->-+->-[://]->-+->-[www]->-[.]-->-+->-+
| |
| |
| +-->--+
| | |
| | |
+-->------------------------------------[www]->-[.]-->-----+ |
|
|
|
+----------------------------------------------------------------+
|
|
|
|
| +-------------<------+ +-------------<---------------+
| | | | |
+->-+->-[A-Za-z0-9_-.]->-+->-[.]->--+->-[A-Za-z0-9_-./~?&=:+#\]->-+->-
---
---
Internet: see also:
---
TSE: Search/Replace: Regular expression: Link: Can you give overview
links regular expressions?
http://www.faqts.com/knowledge_base/view.phtml/aid/31433/fid/865
----------------------------------------------------------------------