faqts : Computers : Programming : Languages : Tse : Language : Natural

+ Search
Add Entry AlertManage Folder Edit Entry Add page to http://del.icio.us/
Did You Find This Entry Useful?

2 of 3 people (67%) answered Yes
Recently 2 of 3 people (67%) answered Yes

Entry

Editor: TSE: Language: Natural: Unicode: Convert: How to paste Unicode languages to TSE? [UTF-8]

Jan 29th, 2006 09:24
Knud van Eeden,


----------------------------------------------------------------------
--- Knud van Eeden --- 16 October 2004 - 03:10 pm --------------------

Editor: TSE: Language: Natural: Unicode: Convert: How to paste Unicode 
languages to TSE? [UTF-8]

---

Steps: Overview:

 1. -If the application does not allow you to save the unicode text
     directly, if possibly copy the unicode text to the Windows
     clipboard

 2. -Run e.g. Microsoft Notepad

 3. -Make sure the font shows that characters

     1. -Select from menu option 'Format'

         1. -Select from list 'Font...'

             1. Select e.g. 'Arial Unicode MS'

 4. -Paste the unicode in this file

 5. -Save the file

     1. -Select from menu option 'File'

         1. -Select from list 'Save As'

            1. -Select from list 'Encoding'

               1. -Select from list 'Unicode' (or UTF-8)

                  1. -Choose a filename

 6. -Load this file in your favorite wordprocessor

     1. Load this file in TSE

     2. -This will then automatically convert the unicode
         characters to ASCII
         (possibly TSE v4.2 or higher gives the message
          'not all unicode characters could be converted to ASCII'.
          This error is caused because current TSE only converts
          a certain range of Unicode characters: "Simple UNICODE
          support added. By simple, we refer to the ASCII subset of
          UNICODE, which is now supported")

     3. -Usually you get 2 bytes per character

     4. -The first 2 bytes are markers, and can be removed
         (if e.g. using UTF-8)

          ∩╗

 7. -Note:

     1. To get a table of characters (e.g. Greek, Russian, ...)

        1. By pasting the original text in Notepad

        2. Saving this text as e.g. UTF-8

        3. Loading this file in TSE

        4. Copying this converted text from TSE to the clipboard

        5. Pasting it back in Notepad

        6. You can then compare, by looking at the original
           and converted text, which characters correspond
           to each other, and so build a table for the different
           natural languages (e.g. Greek, Russian, ...).
           By writing global search/replace macros you can then
           easily convert between the different languages
           (e.g. replacing the 2 characters using a phonetic
                 transscript language)

            e.g.

             -----------------------------------------
             | ORIGINAL | CONVERTED TO 2 UTF-8 BYTES |
             -----------------------------------------
             | <alpha>  | &#9580;&#9618;                         |
             -----------------------------------------
             | <pi>     | &#9575;Ç                         |
             -----------------------------------------
             ...
             -----------------------------------------
             | <omega>  | &#9575;ë                         |
             -----------------------------------------

         7. To check if the conversion is done OK

            1. If you save this table and load it in Notepad,
               it will show the ASCII characters versus
               the UTF-8 characters, so you can quickly
               see if it is converted OK

---
---

Internet: see also:

---

Unicode: Can you give an overview of links?
http://www.faqts.com/knowledge_base/view.phtml/aid/38864/fid/1852

----------------------------------------------------------------------