iB::Topic::Damn Small Characters for Interlanguage Interchang

	Damn Small Linux :: Damn Small Linux Board The DSL Forums

» Welcome Guest
[ Log In :: Register ]

Damn Small Linux Board » Damn Small Linux » DSL Ideas and Suggestions » Damn Small Characters for Interlanguage Interchang

		Mini-ITX Boards Sale, Fanless BareBones Mini-ITX, Bootable 1G DSL USBs, 533MHz Fanless PC <-- SALE $200 each!
		Get The Official Damn Small Linux Book. DSL Market , Great VPS hosting provided by Tektonic

Pages: (3) </ [1] 2 3 >/

[ Track this topic :: Email this topic :: Print this topic ]

Topic: Damn Small Characters for Interlanguage Interchang, DSCII

< Next Oldest | Next Newest >

newby

Group: Members
Posts: 171
Joined: June 2006

Posted: Mar. 11 2008,14:04

Damn Small Characters for Interlingual Interchange (DaSCII)

-- a proposal --

THE PROBLEM

The majority languages spoken by over 50% of people worldwide are(in order): Chinese (962 million), English (322m), Spanish (266m), Russian (170m), Portuguese (170m), Japanese (125m), German (98m), Bengali (189m) & Hindi 182m). Neither 7-bit nor 8-bit systems provide enough characters to directly cover these languages.

THE SOLUTION

By examining the transliteration systems for each of the languages, the number of unique characters can be drastically reduced. The dravidian languages (Bengali, Hindi, et cetera) present the greatest difficulty due to a great number of diacritical and other marks. Therefore:

1. Start with the IBM 8-bit character set.

2. Insert characters 177 - 250 from the Indian Script Code for Information Interchange.

3. Add the following characters:

158 --- (the Euro symbol)
166 - z (with a tail underneath, used in Arabic transliteration)
167 - t (with a tail underneath, used in Arabic transliteration)
169 - e rising tone (the other rising-tone characters are covered)
170 - a falling-rising tone
235 - e falling-rising tone
236 - i falling-rising tone
237 - o falling-rising tone
238 - u falling-falling tone
251 - s (with a tail underneath, used in Arabic transliteration)
252 - d (with a tail underneath, used in Arabic transliteration)
253 - o (with a shallow u-shaped mark above, used in Korean transliteration)
254 - u (with a shallow u-shaped mark above, used in Korean transliteration)

USAGE

1. Where a character is found in DaSCII, use it.
2. Where a character is not found in DaSCII, use transliterations, for example:

German - Use dipthongs suggested in the BGN/PCGN 2000 Agreement
Russian - Use dipthongs suggested in the BGN/PCGN 1947 Agreement
Romanji - Use the umlauted character for the high tone.

== This will allow transliterations of over 50% of human languages with _one_ font ==

humpty

Group: Members
Posts: 655
Joined: Sep. 2005

Posted: Mar. 12 2008,03:41

what about the other 50% ?

why can't everyone communicate with just one language ?

why add yet another system ?

newby

Group: Members
Posts: 171
Joined: June 2006

Posted: Mar. 12 2008,16:38

Quote (humpty @ Mar. 11 2008,22:41)

what about the other 50% ?

why can't everyone communicate with just one language ?

why add yet another system ?

Answering your questions in order:

1. Actually, the 50% figure came from looking at population statistics. Looking at the actual transliteration systems, the figure will be far greater than 50%.

2. Social evolution.

3. It's _not_ "another system." What I am proposing is to maximise the usefulness of the system we have, the 8-bit character set.

Remember, I am _not_ proposing a _language_ font, but a _transliteration_ font. It's purpose is to make Damn Small Linux accessable to the greatest number of people. And, therefore, maximally successful world-wide.

I've looked at the Anglo/American systems, the United Nations systems and the ex-Soviet Block systems. The difficulty comes from the addition of marks to the roman consenants - the increase is explosive! Use of dipthongs decreases the necessity for extra consenants.

The political choice is between providing a so-so solution that includes all the vowel variants and leaving South-Asia in the lurch versus including the South-Asian characters and using dipthongs to cover the rest of the systems.

Ultimately, there is a physical limit = 256 characters.

lucky13

Group: Members
Posts: 1478
Joined: Feb. 2007

Posted: Mar. 12 2008,17:29

It *IS* another system. How many people who don't know English or use a Latin or Cyrillic alphabet use transliterated Latin characters to communicate?

Those already familiar with this particular alphabet most likely already speak the (pardon me) lingua franca of the Internet and of most programming, English. I don't see what's so bloody important about this subject that it requires at least two polls and yet another thread.

"Social evolution" isn't tied to transliteration but literacy and actual translation. If you want more people who don't speak English (or use Latin/Cyrillic alphabets) to use DSL, perhaps you can help add the characters they actually know and use.

--------------
"It felt kind of like having a pitbull terrier on my rear end."
-- meo (copyright(c)2008, all rights reserved)

newby

Group: Members
Posts: 171
Joined: June 2006

Posted: Mar. 12 2008,18:08

Quote (lucky13 @ Mar. 12 2008,12:29)

Actually, a lot of people learn transliterations, for the purpose of access to computers as a means to communicate with others. For example, the following is a true statement about myself: "Wo shi zhung wen xue shung." It is meaningless without the vowel marks. With the vowel marks, is is understandable to millions that "I am a Chinese language student."

Your use of "lingua franca" illustrates the issue: French was promoted as an "international" language when France was an economic power. Loss of economic power reduced linguistic dominance to the historic relic of lingua franca.

The US has been declining as a percentage of the gross international product since 1970. Soon, lingua yankee may be all that is left of our claim to linguistic dominance.

Ultimately, the issue is neither nations nor languages, but the limits of the 8-bit byte, which looks to be more durable than nations or languages.

It would have been much easier if we had gone with the PDP-8 and 12-bit bytes. 4096 characters could have phonetically covered the world.

At this point, this is becoming one of those "lighter-more filling" arguments that DSL seems to inspire. I come down on the "lighter" side (8-bit font) for initial access to DSL. Once one has discovered that DSL is really useful, one can switch to the "more filling" camp and use UNICODE.

10 replies since Mar. 11 2008,14:04

< Next Oldest | Next Newest >

[ Track this topic :: Email this topic :: Print this topic ]

Pages: (3) </ [1] 2 3 >/

Damn Small Linux Board » Damn Small Linux » DSL Ideas and Suggestions » Damn Small Characters for Interlanguage Interchang