Click here to Skip to main content
15,887,027 members
Home / Discussions / Java
   

Java

 
QuestionJava strings Pin
mike74114-Dec-23 16:12
mike74114-Dec-23 16:12 
AnswerRe: Java strings Pin
Richard MacCutchan4-Dec-23 22:17
mveRichard MacCutchan4-Dec-23 22:17 
GeneralRe: Java strings Pin
Andre Oosthuizen5-Dec-23 8:47
mveAndre Oosthuizen5-Dec-23 8:47 
GeneralRe: Java strings Pin
Richard MacCutchan5-Dec-23 9:26
mveRichard MacCutchan5-Dec-23 9:26 
AnswerRe: Java strings Pin
jschell5-Dec-23 5:26
jschell5-Dec-23 5:26 
GeneralRe: Java strings Pin
trønderen5-Dec-23 11:29
trønderen5-Dec-23 11:29 
GeneralRe: Java strings Pin
jschell6-Dec-23 6:33
jschell6-Dec-23 6:33 
GeneralRe: Java strings Pin
trønderen6-Dec-23 9:53
trønderen6-Dec-23 9:53 
jschell wrote:
Then it would in fact be international.
You may notice that English is a well known and much used language in England. US trade with England is certainly international.

English is also a commonly used language in business between countries that have different national languages, but with English as a common language that they both (/all) understand.

If you really meant to say "If you are from the US and do not intend to have anything to do with anyone outside the US, or with anyone inside the US speaking other languages, being named or naming their businesses according to non-English conventions" - then abbreviating it to "English" is going much too far. The reader will think you are talking about the language, not a subset of trade with a single country.

You should be aware that there are lots of communities within the US of A where French or Spanish are commonly used languages. Their naming conventions often follows French/Spanish conventions as well.

Is there a federal law requiring all street names in the US to be written using A-Z only, with no strange accents? Does it apply e.g. within native reservations as well? Some of the native languages use completely different scripts, but when transcribed to Latin characters, you can see personal names using additional characters/accents. I do not know if that also goes for local place names within native reservations.

I have heard (no URL available) that in the old days, immigrants to the US of A carrying "strange" names that didn't fall naturally into the English language were forced to change their name. In some cases they were simply assigned a name with no relationship to their original name.

As far as I know, this is no longer the case. If you move to the US of A, you are allowed to keep your name - even if you settle in an English-speaking area. Even if your name contains characters outside of A-Z.
First be aware that I have been delivering internationalized products for decades.
And then I fail to understand why you suggest to spend the effort of giving special treatment to speakers of one language within one country. Why not handle everyone, everywhere, the same way? Where is the gain of using 7-bit ASCII and nothing else to a subgroup of potential customers - especially considering that 7-bit ASCII is a true subset of UTF-8, so there is no expense in terms of space if your text is in fact limited to ASCII.
trønderen wrote:UTF-8 always could handle the entire Unicode range
So? That has nothing to do with what I said. And nothing to do with this thread.
I allow myself to add to what you said. Just repeating your statements is meaningless.

This thread certainly has to do with which characters can be represented in various formats. Now that it is clear that the internal Java representation can not handle all Unicode characters, it is relevant to point out that UTF-8 can. I'd think that is is more relevant than to bring in, as you do, that communicating in English within a single current country can use even simpler codings.
Do you know that Java uses UTF-16 not UTF-8?
That has been stated several times in this thread. If you want to be pedantic, it isn't 100% correct: Java uses a subset of UTF-16 that excludes surrogate pairs (and "variant selectors").

Some sources referenced in this discussion claim that UTF-16 at some point in time was "extended" to include surrogates. That is not true: UTF-16 was always designed to encode the entire Unicode character set (just like UTF-8). For a number of years, it seemed as if 65536 characters (the Basic Multilingual Plane, BMP) would be enough for everybody, so lots of coders (including the Java developers) gambled on surrogates never being required, treating Unicode as a 16 bit character set, period. I remember well when the first first "supplementary" characters where introduced, causing a big discussion which would be simpler: Going to an internal 32 bit representation, or be prepared for surrogates. What happened "at some point in time" is that a lot of programmers suddenly realized that they could no longer ignore the concept of surrogates.

I suspect - correct me if I am wrong - that the concept of "variant selector" was introduced into Unicode at a later point, long after UTF-16 including surrogates. If you can handle surrogates properly, then the step to handle variant selectors isn't that long. (Actually, it is longer if you went for a 32 bit internal working format!)

Note that there is a distinction between internal working format and external storage format. You certainly know that, but sometimes it seems as if you do not (want to) keep them strictly separate. Java character representation is an internal working format while a string is being temporarily processed in memory (just like the interernal working format in Windows). When you write
Just a general easy decision to make everything UTF in a database is not necessarily a good idea and there can be unexpected implications of making that decision
Then you (note: you, not me) are moving out of the temporary internal working format domain. My comment about UTF-8 was given in that context. I am strongly in favor of storing all text in a database as UTF-8; certainly not as UTF-16.
Even though I have in fact delivered solutions on slower modems than that, it still has nothing to do with most business programming now.
Consider it backgound information to explain to youngsters why a majority of Internet protocols (those based on RFC 822, 5322 today, which includes the majority of protocols seen by a user, such as SMTP, FTP, HTPP, SNMP for oldtimers ...) cannot handle even ISO 8859 - a fixed-size, 8 bit, extension of ASCII, the standard character set in 16-bit Windows. I bet that there are still *nix tool that stall if you give it ISO 8859 text!

I maintain that giving special attention to English when used in a limited context, the way you suggest, is a poor idea. Write all your software to always be prepared for non-ASCII characters, in any context - even when communicating in English with native English speakers within the US of A!
GeneralRe: Java strings Pin
jschell7-Dec-23 4:43
jschell7-Dec-23 4:43 
AnswerRe: Java strings Pin
Gerry Schmitz6-Dec-23 7:50
mveGerry Schmitz6-Dec-23 7:50 
Question<pre>Problems enlarging Eclipse icons Pin
BrunoV202219-Nov-23 10:57
BrunoV202219-Nov-23 10:57 
AnswerRe: <pre>Problems enlarging Eclipse icons Pin
jschell5-Dec-23 5:27
jschell5-Dec-23 5:27 
QuestionHow to convert string to double with trailing zeros after decimal Pin
Member 161354338-Nov-23 1:46
Member 161354338-Nov-23 1:46 
AnswerRe: How to convert string to double with trailing zeros after decimal Pin
Richard MacCutchan8-Nov-23 2:07
mveRichard MacCutchan8-Nov-23 2:07 
GeneralRe: How to convert string to double with trailing zeros after decimal Pin
Member 161354338-Nov-23 2:23
Member 161354338-Nov-23 2:23 
GeneralRe: How to convert string to double with trailing zeros after decimal Pin
Richard MacCutchan8-Nov-23 2:35
mveRichard MacCutchan8-Nov-23 2:35 
GeneralRe: How to convert string to double with trailing zeros after decimal Pin
Dave Kreskowiak8-Nov-23 3:20
mveDave Kreskowiak8-Nov-23 3:20 
GeneralRe: How to convert string to double with trailing zeros after decimal Pin
jschell8-Nov-23 6:12
jschell8-Nov-23 6:12 
AnswerRe: How to convert string to double with trailing zeros after decimal Pin
Ralf Meier8-Nov-23 8:02
mveRalf Meier8-Nov-23 8:02 
AnswerRe: How to convert string to double with trailing zeros after decimal Pin
Dave Kreskowiak8-Nov-23 2:23
mveDave Kreskowiak8-Nov-23 2:23 
QuestionHow to use JNI without setting Environment Variables Pin
Valentinor16-Oct-23 0:54
Valentinor16-Oct-23 0:54 
AnswerRe: How to use JNI without setting Environment Variables Pin
Valentinor16-Oct-23 1:42
Valentinor16-Oct-23 1:42 
AnswerRe: How to use JNI without setting Environment Variables Pin
jschell16-Oct-23 4:58
jschell16-Oct-23 4:58 
GeneralRe: How to use JNI without setting Environment Variables Pin
Valentinor16-Oct-23 5:41
Valentinor16-Oct-23 5:41 
GeneralRe: How to use JNI without setting Environment Variables Pin
jschell17-Oct-23 5:47
jschell17-Oct-23 5:47 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.