faqts : Computers : Programming : Algorithms : Searching

+ Search
Add Entry AlertManage Folder Edit Entry Add page to http://del.icio.us/
Did You Find This Entry Useful?

8 of 15 people (53%) answered Yes
Recently 6 of 10 people (60%) answered Yes

Entry

Search: Algorithm: Language: Natural: Article: Indefinite: Algorithm to insert 'a' or 'an' in text?

Nov 24th, 2003 07:24
Knud van Eeden,


----------------------------------------------------------------------
--- Knud van Eeden --- 11 November 2003 - 05:49 pm -------------------

Search: Algorithm: Language: Natural: Article: Indefinite: Algorithm 
to insert 'a' or 'an' in text?


In order to insert the article 'a' or 'an' in front of a noun, and
using an algorithm to do this:


1. It is the phonetic pronunciation of the noun which one has to use.

1.1 If the noun is pronounced beginning with a vowel (=aeiou)
    then the indefinite article to use is a + n.

1.2 If the noun is pronounced beginning with a consonant
    (=bcdfghjklmnpqrstvwxyz)
    then the indefinite article to use is a.

---

Note:

So as a consequence, creating an algorithm to insert 'a' or 'an' before
a noun can usually not be made by simply looking at the physical
appearance of the words in a text file, as this is bound to fail to be
correct.

---

In order to be completely correct you might have to create or get a 2
column table of nouns with their phonetical pronuncation, e.g.

---

1. one = 'wan'

2. suv = 'es you vee'

3. uniform = 'youniform'

...

and so on.

---

So reading each word of your text
 Checking if it is a noun, by comparing it
 with the table of all nouns.
  If it is a noun
   Searching this noun in this phonetic table,
    When found taking value of the right side of this table,
     Then check if the first character of it is a vowel.
      If yes then it is 'an',
      If no then it is 'a' to insert as an indefinite article.

---

Note

---

Adjectives which belong to that noun and follow 'a' or 'an' are handled
similarly. That is, one should also ask how that adjective is
pronounced. And if the pronunciation starts with a vowel it is 'an',
otherwise 'a'.

---

e.g.

an extreme summer

---

Note

---

But you could possibly use the following table as an approximation,
which might be correct in the majority of cases.

---

You simply take the first character of the noun, and search for it in
the left column of this given character versus the way that character
is pronounced table:

---

You distinguish between single character nouns (like in acronyms), or
non-single or thus multicharacter nouns, as they are by natural
definition pronounced differently.

---

If it is a single character noun, it will be pronounced as follows:
(so you could search in the left column for this character, then take
the first character of the right column. If vowel then 'an', else 'a')

a = 'aye'

b = 'bee'

c = 'see'

d = 'dee'

e = 'ee'

f = 'ef'

g = 'yee'

h = 'aytch'

i = 'eye'

j = 'yay'

k = 'kay'

l = 'el'

m = 'm'

n = 'n'

o = 'o'

p = 'pee'

q = 'queue'

r = 'ar'

s = 'es'

t = 'tee'

u = 'you'

v = 'vee'

w = 'double you'

x = 'ex'

y = 'why'

z = 'zet'


---

But if it is a non-single character noun (thus containing 2 or more
characters), it will be pronounced as follows.
(so you could search in the left column for this character, then take
 the first character of the right column. If vowel then 'an', else 'a')

a = 'aa' (as in the 'A'ntilope)

b = 'bee' (as in the 'B'ridge)

c = 'k' (as in the 'C'oral)

d = 'dee' (as in the 'D'isk)

e = 'aye' (as in the 'E'ntry)

f = 'fe' (as in the 'F'orum)

g = 'ge' (as in the 'G'olf)

h = 'he' (as in the 'H'ouse)

i = 'i' (as in the 'I'mmaturity)

j = 'je' (as in the 'J'ogger)

k = 'kay' (as in the 'K'ilo)

l = 'le' (as in the 'L'oad)

m = 'm' (as in the 'M'oon)

n = 'n' (as in the 'N'oun)

o = 'oo' (as in the 'O'smosis), but also 'w' as in 'O'ne)

p = 'pe' (as in the 'P'ound)

q = 'queue' (as in the 'Q'euue)

r = 're' (as in the 'R'oad)

s = 'se' (as in the 'S'ound)

t = 'te' (as in the 'T'otal)

u = 'you' (as in the 'U'niform)

v = 've' (as in the 'V'ictory)

w = 'we' (as in the 'W'hiskey)

x = 'kse' (as in the 'X'enophoby)

y = 'ye' (as in the 'Y'oghurt)

z = 'ze' (as in the 'Z'eppelin)

---

So you by combining the information of the 2 tables, by seeing
where that character differs, you could make a single table,
which should be correct for the majority of cases:

So check if first character is the character:

a = always pronounced with a vowel as first character, so use 'an'

b = always pronounced with a consonant as first character, so use 'a'

c = always pronounced with a consonant as first character, so use 'a'

d = always pronounced with a consonant as first character, so use 'a'

e = always pronounced with a vowel as first character, so use 'an'

f = if single character noun (e.g. in acronym) then vowel, so use 
use 'an'
    if non-single character then consonant, so use 'a'

g = always pronounced with a consonant as first character, so use 'a'

h = if single character noun (e.g. in acronym) then vowel, so use 
use 'an'
    if non-single character then consonant, so use 'a'

i = always pronounced with a vowel as first character, so use 'an'

j = always pronounced with a consonant as first character, so use 'a'

k = always pronounced with a consonant as first character, so use 'a'

l = if single character noun (e.g. in acronym) then vowel, so use 'an'
    if non-single character then consonant, so use 'a'

m = always pronounced with a consonant as first character, so use 'a'

n = always pronounced with a consonant as first character, so use 'a'

o = if single character always pronounced with a vowel as first 
character, so use 'an'
    otherwise leave it to user to decide

p = always pronounced with a consonant as first character, so use 'a'

q = always pronounced with a consonant as first character, so use 'a'

r = if single character noun (e.g. in acronym) then vowel, so use 
use 'an'
    if non-single character then consonant, so use 'a'

s = if single character noun (e.g. in acronym) then vowel, so use 
use 'an'
    if non-single character then consonant, so use 'a'

t = always pronounced with a consonant as first character, so use 'a'

u = always pronounced with a consonant as first character, so use 'a'

v = always pronounced with a consonant as first character, so use 'a'

w = always pronounced with a consonant as first character, so use 'a'

x = if single character noun (e.g. in acronym) then vowel, so use 
use 'an'
    if non-single character then consonant, so use 'a'

y = always pronounced with a consonant as first character, so use 'a'

z = always pronounced with a consonant as first character, so use 'a'

---
---

Refinements:

So characters which might change pronunciation, depending on the
context are:

f

h

l

m

n

o

r

s

u

x

---

A next step would be to ask yourself if there maybe is some
pattern or rule between the

pronunciation of the given character

and

the one or more characters following after this first character in a
given word.

Maybe a certain combination of vowels and consonants after the
first character will always be pronounced the same way.
So by looking at this combinations, you might be able do
extract some general rules. So the pronounciation is so maybe
a matter of combinatorics.

For example in 'uniform' this is pronounced as 'youniform',
so possibly if a 'u' is followed by an 'n' and an 'i', or
thus more general by a consonant and a vowel, it might *always*
be pronounced as 'you' in the beginning.

So what about some other combinations:

How do you pronounce 'una'? (like in 'unanimous'). That gives 'you'
How do you pronounce 'una'? (like in 'unable'). That gives 'un'
So here lookahead at more characters of the word is necessary.

How do you pronounce 'uni'? (like in 'uniform'). That gives 'you'
How do you pronounce 'uni'? (like in 'uniformed'). That gives 'un'
So here lookahead at more characters of the word is necessary.

How do you pronounce 'une'? (like in 'unexposed'). That gives 'un'
How do you pronounce 'une'? (like in 'unexpected'). That gives 'un'
It looks like this is *always* be pronounced with 'un'.

How do you pronounce 'uno'? (like in 'unoccupied'). That gives 'un'
How do you pronounce 'uno'? (like in 'unofficial'). That gives 'un'
It looks like this is *always* be pronounced with 'un'.

---

I got an idea about this, by taking an English text (with 600000 lines)
and searching for ' une' (that is a 'space' as a word marker and then
followed by this characters 'une'), and then looking at the words which
were found. That showed only surprisingly few words and all were e.g.
'un' + something else (like 'un-expected', 'un-informed', ...

You could make it more precise by looking with regular expressions
like (taking the example of all words starting with 'use')

---

3 character words (=starting with a space followed by 'use')

' use'

(for example ' use')

---

4 character words (=starting with a space followed by 'use' followed by
1 character)

' use[a-z]'

(for example ' used')

---

5 character words (=starting with a space followed by 'use' followed by
2 characters)

' use[a-z][a-z]'

(for example ' users')

---

6 character words (=starting with a space followed by 'use' followed by
3 characters)

' use[a-z][a-z][a-z]'

---

You only could have to do this for the characters above
(=f h l m n o r s u x) which might be pronounced differently depending
on the context (as the other characters of the alphabet are pronounced
the same way always, you should not have to consider them).

---

Some of the combination of vocals followed by consonants and vice
versa will just not exist in the English language, so you might
not have to consider them at all.

---
---

Internet: see also:

---

TSE: Text: Noun: Article: Algorithm: Which algorithm to use 'a' 
or 'an' as indefinite article?
http://www.faqts.com/knowledge_base/index.phtml/fid/865

----------------------------------------------------------------------