Entry
Editor: TSE: Language: Natural: Unicode: Convert: How to paste Unicode languages to TSE? [UTF-8]
Jan 29th, 2006 09:24
Knud van Eeden,
----------------------------------------------------------------------
--- Knud van Eeden --- 16 October 2004 - 03:10 pm --------------------
Editor: TSE: Language: Natural: Unicode: Convert: How to paste Unicode
languages to TSE? [UTF-8]
---
Steps: Overview:
1. -If the application does not allow you to save the unicode text
directly, if possibly copy the unicode text to the Windows
clipboard
2. -Run e.g. Microsoft Notepad
3. -Make sure the font shows that characters
1. -Select from menu option 'Format'
1. -Select from list 'Font...'
1. Select e.g. 'Arial Unicode MS'
4. -Paste the unicode in this file
5. -Save the file
1. -Select from menu option 'File'
1. -Select from list 'Save As'
1. -Select from list 'Encoding'
1. -Select from list 'Unicode' (or UTF-8)
1. -Choose a filename
6. -Load this file in your favorite wordprocessor
1. Load this file in TSE
2. -This will then automatically convert the unicode
characters to ASCII
(possibly TSE v4.2 or higher gives the message
'not all unicode characters could be converted to ASCII'.
This error is caused because current TSE only converts
a certain range of Unicode characters: "Simple UNICODE
support added. By simple, we refer to the ASCII subset of
UNICODE, which is now supported")
3. -Usually you get 2 bytes per character
4. -The first 2 bytes are markers, and can be removed
(if e.g. using UTF-8)
∩╗
7. -Note:
1. To get a table of characters (e.g. Greek, Russian, ...)
1. By pasting the original text in Notepad
2. Saving this text as e.g. UTF-8
3. Loading this file in TSE
4. Copying this converted text from TSE to the clipboard
5. Pasting it back in Notepad
6. You can then compare, by looking at the original
and converted text, which characters correspond
to each other, and so build a table for the different
natural languages (e.g. Greek, Russian, ...).
By writing global search/replace macros you can then
easily convert between the different languages
(e.g. replacing the 2 characters using a phonetic
transscript language)
e.g.
-----------------------------------------
| ORIGINAL | CONVERTED TO 2 UTF-8 BYTES |
-----------------------------------------
| <alpha> | ╬▒ |
-----------------------------------------
| <pi> | π |
-----------------------------------------
...
-----------------------------------------
| <omega> | ω |
-----------------------------------------
7. To check if the conversion is done OK
1. If you save this table and load it in Notepad,
it will show the ASCII characters versus
the UTF-8 characters, so you can quickly
see if it is converted OK
---
---
Internet: see also:
---
Unicode: Can you give an overview of links?
http://www.faqts.com/knowledge_base/view.phtml/aid/38864/fid/1852
----------------------------------------------------------------------