Online Shopping : Computers : Programming : Languages : JavaScript : Language Core

+ Search
Add Entry AlertManage Folder Edit Entry Add page to http://del.icio.us/
Did You Find This Entry Useful?

2 of 10 people (20%) answered Yes
Recently 2 of 10 people (20%) answered Yes

Entry

I need to parse a string and it's getting hairy!

Apr 3rd, 2009 03:51
engatoo engatoo, jsWalter, Jean-Bernard Valentaten,


I am 99% complete on a date format parser.
The last item is how to a substring bracketed by single quotes.
i.e.:
I have this var...
    var format = "EEEE (D - F - w - W), MMMM dd, yyyy ''G'' 'at' hh
[KK|HH|kk]:mm:ss a z" ;
I can walk down this string and pull out "tokens" (same character of 1 
or more in length: ex: EEEE, hh, z )
   while ( i < intFormatLen )
   {
      var token = "";
      curChar = strFormat.charAt(i);
      while (( strFormat.charAt(i) == curChar ) && ( i < 
intFormatLen ))
         token += strFormat.charAt(i++);
      result += formatOptions ( token );
   }
What I can't figure out is how can I look for and pull out this...
     'walter'
Now, a double single quote needs to be grapped and passed on as a 
single token. but not 3. It needs to grab 2 at a time.
If it finds one single quote, it needs to grab all characters until it 
sees and other single quote. Yes, it needs to be able to handle an 
embedded pair of single quotes.
So, if I have this...
     EEEE 'is waltfer''s birthday'
This should give me 3 tokens
  1) EEEE
  2) a SPACE
  3) is walter's birthday
I have been banging my head for 3 days on this one, I just can't see 
it.
Help?
Walter
---
OK, I see what you want to achieve. The reason why you don't get #3 is 
that the nested while-loop will only return tokens that are composed 
of 
equal characters. The line: while while ((strFormat.charAt(i) == 
curChar)... equals "while you find the same character over and over 
again and the end of the string is not reached, append to token" and 
since 'walter ...' does not consist of concatenated equal chars, it is 
not found.
Your approach is very naive and does not use js's most powerfull tool: 
regular expressions. Using those you'll find it easy to parse your 
string. The methods that should be used is search(RegExp) and match
(RegExp). The search-method will return the position of the pattern-
match and match will return the string that matches the pattern. 
RegExp 
is to be replaced by a regular expression.
HTH,
Jean
---
I understand your comment about RegExp, but I am trying to *not* to 
use 
RegExp, since they are not backward compatible with older browsers.
I am looking for another way to solve this.
Walter
---
Ok, so here's what I would do then:
while ( i < intFormatLen )
{
  var token = "";
  curChar = strFormat.charAt(i);
  if (curChar == "'")
  {
    var lastQuoteFound = false;
    while ((strFormat.charAt(++i) != "'") && (!lastQuoteFound))
    {
      if ((strFormat.charAt(i) == "'") &&
          xor((strFormat.charAt(i-1) != "'"),
            (strFormat.charAt(i+1)!= "'")))
      {
        lastQuoteFound = true;
      }
      else
      {
        token += strFormat.chatAt(i);
      }
    }
  }
  else if (currChar == "["))
  {
    while (strFormat.charAt(++i) != "]")
    {
      token += strFormat.charAt(i);
    }
  }
  else
  {
    while ((strFormat.charAt(i) == curChar) && (i < intFormatLen))
      token += strFormat.charAt(i++);
  }
  result += formatOptions ( token );
}
function xor(bool1, bool2)
{
  return (bool1 != bool2);
}
A you might notice, I had to use the xor logical operator, which 
doesn't exist in js (who the heck knows why *g*). This is because 
either the predeseccor or the successor may be a quote.
Basically I would say that using the same character as delimiter and 
escaper is not a good idea. I'd use a backslash for escaping purposes.
Your string would then look like this:
EEEE 'is walter\'s birthday'
I guess you got the idea from VisualBasic, VBScript or ASP which 
interpret a triple quote as one escaped quote (i.e. """ == \"), but 
you 
should keep in mind that VB doesn't use a stringparser but a grammar-
recognition automaton (basically a stack automaton) that can be tought 
rules, for such purposes. Programming an automaton in js is something 
that I wouldn't try, since js lacks the possibility to create complex 
datastructures, thus making it very hard (not impossible though) to 
programm such a thing.
Aside of this, I don't think that this question is interresting for 
all 
the folks out there reading this knowledgebase since it is too 
specific, so if you have further questions about this subject I'd be 
happy to answer your emails :)
HTH,
Jean
---
Jean, thanks for the effort on this...
> A you might notice, I had to use the xor logical operator,
> which doesn't exist in js (who the heck knows why *g*).
> This is because either the predeseccor or the successor
> may be a quote.
Since JS does not support 'xor', then how is this to work?
> I would say that using the same character as delimiter and 
> escaper is not a good idea. I'd use a backslash for escaping
> purposes.
>
> I guess you got the idea from VisualBasic,... <snip>
No, not anything Microsoft...
http://java.sun.com/products/jdk/1.1/docs/api/java.text.SimpleDateForma
t
.html
This is the "C" and Java standard.
> so if you have further questions about this subject <snip>
No further questions. But thanks for taking the time and effort for 
this.
You have made me think of somethings that helped me solve my problem.
Walter
BTW: Here is how I solved it...
   // Loop through the format string
   while ( i < intFormatLen )
   {
      // clear token var
      token = "";
      // Retrieve individual character
      curChar = strFormat.charAt(i);
      // Build the format tokens
      while (( strFormat.charAt(i) == curChar ) && ( i < 
intFormatLen ))
      {
         // Add current character to token string
         token += strFormat.charAt(i);
         // Increment to retrieve next char
         i++;
         // See if we have a single qoute with a pair next to it
         if ( ( token == "'" ) && ( strFormat.charAt(i) != "'" ) )
         {
            // clear token var
            token = "";
            // Loop through format string until we see another single 
quote
            while (( strFormat.charAt(i) != "'" ) && ( i < 
intFormatLen ))
            {
               // Pull out character
               cChr = strFormat.charAt(i);
               // Add it to to token string if it is not a single quote
               token += ( cChr != "'" ) ? cChr : '';
               // Increment to retrieve next char
               i++;
            }
            // Increment to retrieve next char
            i++;
         }
      }
      // Look at the individual token, one or more characters in length
      // pull coorsponding value from format collection
      // otherwise just pass the token through
      result += formatOptions ( token );
   }