Author Topic: Unicode string literals  (Read 3729 times)

0 Members and 1 Guest are viewing this topic.

  • Guest
Unicode string literals
« on: October 08, 2018, 08:08:25 PM »
As API functions have been declared without parameters, e.g.

Code: [Select]
! LoadLibraryW

there is no way to call the "W" functions using string literals.

Therefore, I can't use

Code: [Select]
sys hLib = LoadLibraryW("NtDll.dll")

but have to use

Code: [Select]
dim wszLib as asciiz2 * 260 = "NtDll.dll"
sys hLib = LoadLibraryW(wszLib)

which is a pain in the ass for somebody like me that, like Windows, only uses unicode.

Is there an already solution that I don't know?

If not, may I suggest to use the "L" prefix like in C?

Code: [Select]
sys hLib = LoadLibraryW(L"NtDll.dll")
« Last Edit: October 08, 2018, 09:32:26 PM by José Roca »

José Roca

  • Guest
Re: Unicode string literals
« Reply #1 on: October 08, 2018, 08:18:37 PM »
BTW any plans to implement an iif function?

José Roca

  • Guest
Re: Unicode string literals
« Reply #2 on: October 08, 2018, 08:31:54 PM »
and this syntax

Code: [Select]
function = (VER_PLATFORM_WIN32_NT = osvi.dwPlatformId)

that should return a boolean value isn't accepted.

I'm pointing it because they're common in other Basic dialects and people are very used to it.

Mike Lobanovsky

  • Guest
Re: Unicode string literals
« Reply #3 on: October 08, 2018, 09:19:13 PM »
If not, may I suggest to use the "L" prefix line in C?

Code: [Select]
sys hLib = LoadLibraryW(L"NtDll.dll")

I second Jose's suggestion. It's consistent with the modern C syntax O2 is so good at emulating.

José Roca

  • Guest
Re: Unicode string literals
« Reply #4 on: October 08, 2018, 10:06:54 PM »
It works using sys hLib = LoadLibraryW(wstring("NtDll.dll")). Anyway, it would be nice to have "L".

JRS

  • Guest
Re: Unicode string literals
« Reply #5 on: October 08, 2018, 10:52:55 PM »
Quote
The choice is not between ASCII and UTF-8. ASCII is a 7-bit encoding, and UTF-8 supersedes it - any valid ASCII text is also valid UTF-8. The problems arise when you use non-ASCII characters; for these you have to pick between UTF-8, UTF-16, UTF-32, and various 8-bit encodings (ISO-xxxx, etc.).

The best solution is to stick with a strict ASCII charset, that is, just don't use any non-ASCII characters in your code. Most programming languages provide ways to express non-ASCII characters using ASCII characters, e.g. "\u1234" to indicate the Unicode code point at 1234. Especially, avoid using non-ASCII characters for identifiers. Even if they work correctly, people who use a different keyboard layout are going to curse you for making them type these characters.

If you can't avoid non-ASCII characters, UTF-8 is your best bet. Unlike UTF-16 and UTF-32, it is a superset of ASCII, which means anyone who opens it with the wrong encoding gets at least most of it right; and unlike 8-bit codepages, it can encode about every character you'll ever need, unambiguously, and it's available on every system, regardless of locale.

And then you have the encoding that your code processes; this doesn't have to be the same as the encoding of your source file. For example, I can easily write PHP in UTF-8, but set its internal multibyte-encoding to, say, Latin-1; because the PHP parser does not concern itself with encodings at all, but rather just reads byte sequences, my UTF-8 string literals will be misinterpreted as Latin-1. If I output these strings on a UTF-8 terminal, you won't see any differences, but string lengths and other multibyte operations (e.g. substr) will produce wrong results.

My rule of thumb is to use UTF-8 for everything; only if you absolutely have to deal with other encodings, convert to UTF-8 as early as possible and from UTF-8 as late as possible.

I like the \u method of defining unicode encoded literals.

libunistring

This will be an easy extension to add for SB to enable unicode.

Maybe Charle can just #include these C header files and use the functions directly.
« Last Edit: October 09, 2018, 12:02:41 AM by John »

  • Guest
Re: Unicode string literals
« Reply #6 on: October 08, 2018, 11:13:11 PM »
Windows doesn't speak UTF-8 but UTF-16.

JRS

  • Guest
Re: Unicode string literals
« Reply #7 on: October 09, 2018, 12:20:59 AM »
Windows doesn't speak UTF-8 but UTF-16.

I really haven't seen that as a problem with the APIs and libraries I use with Script BASIC under Windows. I'm able to communicate with VB6 OCXs via COM/OLE with binary and asciiz strings.

Charles Pegge

  • Guest
Re: Unicode string literals
« Reply #8 on: October 09, 2018, 04:31:32 AM »
Let's go for the L"..." system. o2 can also check for BOM codes to distinguish Unicode scripts from ANSI scripts, and adjust string literals up or down accordingly.

José Roca

  • Guest
Re: Unicode string literals
« Reply #9 on: October 09, 2018, 07:46:27 AM »
Windows doesn't speak UTF-8 but UTF-16.

I really haven't seen that as a problem with the APIs and libraries I use with Script BASIC under Windows. I'm able to communicate with VB6 OCXs via COM/OLE with binary and asciiz strings.

But we are working with O2, remember? You should try it one day...

JRS

  • Guest
Re: Unicode string literals
« Reply #10 on: October 09, 2018, 08:45:05 AM »
Quote
But we are working with O2, remember? You should try it one day.

"How are we to use a language without documentation?"
--JR--

José Roca

  • Guest
Re: Unicode string literals
« Reply #11 on: October 09, 2018, 08:59:57 AM »
"If a new user of Oxygen Basic were to look at and compile the examples provide in the distribution, there should be no reason one couldn't be productive with the BASIC. Waiting for professional written docs to materialize anytime soon isn't  being realistic."

--John--

JRS

  • Guest
Re: Unicode string literals
« Reply #12 on: October 09, 2018, 09:04:50 AM »
I wasn't counting on you coming through with all the bad press O2 was getting. You continue to amaze me with your unlimited skill base.

José Roca

  • Guest
Re: Unicode string literals
« Reply #13 on: October 09, 2018, 09:31:36 AM »
Well, Charles seems to agree with my suggestions regarding unicode and classes (sorry, Aurel). This changes the perspective. And I have learned some particularities of the language writing some documentation. Yesterday I was irritated because I could not get working something as simple as subclassing only because a missing "callback" attribute. How could I have known? Through infuse science?


JRS

  • Guest
Re: Unicode string literals
« Reply #14 on: October 09, 2018, 09:52:01 AM »
Quote
Yesterday I was irritated because I could not get working something as simple as subclassing only because a missing "callback" attribute. How could I have known? Through infuse science?

O2 keeps us humble.

Me asking you to consider an O2 marriage while you are toting two old ladies on your arms seems like stretch but you're a codeaholic with no known cure.
« Last Edit: October 09, 2018, 03:52:57 PM by John »