Unicode support?
Unicode support?
LuaMacro is powerful, and I'm sure I could do a lot with it, but I have to give him one big blame:
It does not support Unicode.
Unicode and especially UTF-8 are now so common that there is no excuse for not supporting it (see this article from 2003, the idea and the problem can be applied everywhere).
Well, I want conceded that modifying "lmc_send_keys()" will be too complicated and impossible with all the systems of Keystroke sequence, Key Names, Special characters (which don't have an escape, so if you want to write'(', that's no).
So, it would be best to create a new function "lmc_write_text()". In addition to being Unicode compatible, this function would only be used to write/paste a plain text, so no special characters (other the classics '\n', '\r', '\t'...).
Thus, we will be able to separate the execution of the keys "lmc_send_keys()" from the writing of a text "lmc_write_text()" (One is a verbose and has its own vocabulary, the other is readable, simple and that would solve any accentuated character problem).
Thank for reading and this software.
It does not support Unicode.
Unicode and especially UTF-8 are now so common that there is no excuse for not supporting it (see this article from 2003, the idea and the problem can be applied everywhere).
Well, I want conceded that modifying "lmc_send_keys()" will be too complicated and impossible with all the systems of Keystroke sequence, Key Names, Special characters (which don't have an escape, so if you want to write'(', that's no).
So, it would be best to create a new function "lmc_write_text()". In addition to being Unicode compatible, this function would only be used to write/paste a plain text, so no special characters (other the classics '\n', '\r', '\t'...).
Thus, we will be able to separate the execution of the keys "lmc_send_keys()" from the writing of a text "lmc_write_text()" (One is a verbose and has its own vocabulary, the other is readable, simple and that would solve any accentuated character problem).
Thank for reading and this software.
Re: Unicode support?
Yes, lmc_send_keys implementation is quite old and has bugs.
If you need more control use lmc_send_input (viewtopic.php?f=12&t=475) which is more complicated to use (more code is required) but has more features and should support unicode.
I don't plan to extend lmc_send_keys currently
If you need more control use lmc_send_input (viewtopic.php?f=12&t=475) which is more complicated to use (more code is required) but has more features and should support unicode.
I don't plan to extend lmc_send_keys currently
Petr Medek
LUAmacros author
LUAmacros author
Re: Unicode support?
Unfortunately, this does not work.
The problem is that you want to simulate keyboard writing, and I think that's not the right solution.
What happens if we remapped our keyboard (not the default mapping)? or if we want to write characters that are not present on the keyboard?
Many variables and unknowns that give as an answer: nothing and anything, and a lots of bugs.
Really, I think that creating a new function dedicated to text writing would be "simpler" and would solve all its problems more easily.
lmc_write_text() would not be an extension of lmc_send_keys(), who pastes the assigned text directly (like Ctrl+V). No keyboard writing simulation.
(you could probably try using the clipboard)
The problem is that you want to simulate keyboard writing, and I think that's not the right solution.
What happens if we remapped our keyboard (not the default mapping)? or if we want to write characters that are not present on the keyboard?
Many variables and unknowns that give as an answer: nothing and anything, and a lots of bugs.
Really, I think that creating a new function dedicated to text writing would be "simpler" and would solve all its problems more easily.
lmc_write_text() would not be an extension of lmc_send_keys(), who pastes the assigned text directly (like Ctrl+V). No keyboard writing simulation.
(you could probably try using the clipboard)
Re: Unicode support?
I don't see (technically) solution how to "paste" or "inject" text to some application.
And no, I don't plan to investigate possibilities and extend luamacros with this functionality.
And no, I don't plan to investigate possibilities and extend luamacros with this functionality.
Petr Medek
LUAmacros author
LUAmacros author
Re: Unicode support?
Okay, I found a parade.
A large amount of character can be written using the "Alt Code".
Not all Unicode, but enough to stop being a big problem.
A large amount of character can be written using the "Alt Code".
Not all Unicode, but enough to stop being a big problem.
Code: Select all
function write_altcode(altcode)
lmc_send_input(18, 0, 0); -- press ALT
lmc_send_keys(altcode, 10); -- typing AltCode
lmc_sleep(string.len(altcode) * 10); -- wait until all caracters the AltCode has been send/typing
lmc_send_input(18, 0, 2); -- release ALT
end;
-
- Posts: 2
- Joined: 28 Feb 2019, 06:01
Re: Unicode support?
Please write a complete example code with the output of any Unicode.
Re: Unicode support?
I put the reply in another thread you have created: http://hidmacros.eu/forum/viewtopic.php ... 5055#p5055
Petr Medek
LUAmacros author
LUAmacros author
Re: Unicode support?
I found/do better than with the Alt Code !
The write_text() function allows to write any Unicode character string.
write_text() will write each character from its Point Code obtained in a table, thanks to a utf8_explode().
I found the function utf8_explode() here (ustring.lua). There are many functions that are useless to me, so I only extract the one that interests me.
Ok, there is a bug : we can only write the characters of the BMP (Basic multilingage Plan)... which includes the 65535 most "common" characters! (so no emoji, sorry
)
Apparently, this comes from lmc_send_input() which does not accept values greater than 65535, and returns to 0 (integer overflow)
PS: Don't forget to save your Lua script in "UTF-8 (no BOM)", I advise you Notepad++ to do this easily.
Code: Select all
function write_text(text)
if (text == nil) then text = "" end;
local tbl = utf8_explode(tostring(text));
if (tbl.len > 0) then
for i, c in pairs(tbl.codepoints) do
lmc_send_input(0, c, 4) -- press
lmc_send_input(0, c, 6) -- release
end;
end;
end;
--[[ utf8_explode / unicode compatibility
extract from ustring.lua
https://github.com/wikimedia/mediawiki-extensions-Scribunto/blob/master/includes/engines/LuaCommon/lualib/ustring/ustring.lua
A private helper that splits a string into codepoints, and also collects the
starting position of each character and the total length in codepoints.
@param s string utf8-encoded string to decode
@return table { .len, .codepoints, .bytepos}
]]
function utf8_explode( s )
local rslt = {
len = 0,
codepoints = {},
bytepos = {},
}
local i = 1
local l = string.len( s )
local cp, b, b2, trail
local min
while i <= l do
b = string.byte( s, i )
if b < 0x80 then
-- 1-byte code point, 00-7F
cp = b
trail = 0
min = 0
elseif b < 0xc2 then
-- Either a non-initial code point (invalid here) or
-- an overlong encoding for a 1-byte code point
return nil
elseif b < 0xe0 then
-- 2-byte code point, C2-DF
trail = 1
cp = b - 0xc0
min = 0x80
elseif b < 0xf0 then
-- 3-byte code point, E0-EF
trail = 2
cp = b - 0xe0
min = 0x800
elseif b < 0xf4 then
-- 4-byte code point, F0-F3
trail = 3
cp = b - 0xf0
min = 0x10000
elseif b == 0xf4 then
-- 4-byte code point, F4
-- Make sure it doesn't decode to over U+10FFFF
if string.byte( s, i + 1 ) > 0x8f then
return nil
end
trail = 3
cp = 4
min = 0x100000
else
-- Code point over U+10FFFF, or invalid byte
return nil
end
-- Check subsequent bytes for multibyte code points
for j = i + 1, i + trail do
b = string.byte( s, j )
if not b or b < 0x80 or b > 0xbf then
return nil
end
cp = cp * 0x40 + b - 0x80
end
if cp < min then
-- Overlong encoding
return nil
end
rslt.codepoints[#rslt.codepoints + 1] = cp
rslt.bytepos[#rslt.bytepos + 1] = i
rslt.len = rslt.len + 1
i = i + 1 + trail
end
-- Two past the end (for sub with empty string)
rslt.bytepos[#rslt.bytepos + 1] = l + 1
rslt.bytepos[#rslt.bytepos + 1] = l + 1
return rslt;
end;
write_text() will write each character from its Point Code obtained in a table, thanks to a utf8_explode().
I found the function utf8_explode() here (ustring.lua). There are many functions that are useless to me, so I only extract the one that interests me.
Ok, there is a bug : we can only write the characters of the BMP (Basic multilingage Plan)... which includes the 65535 most "common" characters! (so no emoji, sorry

Apparently, this comes from lmc_send_input() which does not accept values greater than 65535, and returns to 0 (integer overflow)
PS: Don't forget to save your Lua script in "UTF-8 (no BOM)", I advise you Notepad++ to do this easily.
Re: Unicode support?
The 65535 limit comes from the Windows API function which accepts only DWORD parameter (see MSDN page).un_pogaz wrote: ↑12 Mar 2019, 11:46Ok, there is a bug : we can only write the characters of the BMP (Basic multilingage Plan)... which includes the 65535 most "common" characters! (so no emoji, sorry)
Apparently, this comes from lmc_send_input() which does not accept values greater than 65535, and returns to 0 (integer overflow)
After bit of googling I found this answer which says such characters should be sent using 2 subsequent sendInput calls using surrogate pair. As lmc_send_input is just wrapper around sendInput, you may try with subsequent calls of lmc_send_input.
Petr Medek
LUAmacros author
LUAmacros author
Re: Unicode support?
In ohter words:
lmc_send_input() send/write a UTF-16 "character"...
God damit, it's going to be long and complicated before we get to a function write_text() full Unicode compatible.
But I am beginning to see the end of it and it clearly doesn't seem impossible.
Thank you for your answers
EDIT: Ugh, it doesn't seem to work
I found on this one page how to "create" the Surrogates pairs, but during execution, the function writes 2 characters. The Surrogates value as good so I'm probably missing other thing (execution order or dwFlags value).
lmc_send_input() send/write a UTF-16 "character"...
God damit, it's going to be long and complicated before we get to a function write_text() full Unicode compatible.
But I am beginning to see the end of it and it clearly doesn't seem impossible.
Thank you for your answers

EDIT: Ugh, it doesn't seem to work

I found on this one page how to "create" the Surrogates pairs, but during execution, the function writes 2 characters. The Surrogates value as good so I'm probably missing other thing (execution order or dwFlags value).
Code: Select all
function mp_write_text(text)
if (text == nil) then text = "" end;
local tbl = utf8_explode(tostring(text));
if (tbl.len > 0) then
for i, c in pairs(tbl.codepoints) do
mp_unicode_write(c);
end;
end;
end;
function mp_unicode_write(codepoint)
if (codepoint == nil) then codepoint = "" end;
codepoint = tonumber(codepoint);
if (codepoint == nil or codepoint < 0 or codepoint >= 0xd800 and codepoint <= 0xdfff or codepoint >= 0x10ffff) then return end;
if (codepoint < 0x10000) then
lmc_send_input(0, codepoint, 4); -- press
lmc_send_input(0, codepoint, 6); -- release
else
local utf32 = toBits(codepoint, 32)
print(utf32)
print("")
local w = toBits(tonumber(string.sub(utf32, 1, 16), 2) - 1, 4);
local x = string.sub(utf32, 17, 22);
local y = string.sub(utf32, 23, 32);
print("110110" .. w .. x)
print("110111" .. y)
lmc_send_input(0, tonumber("110110" .. w .. x, 2), 4)
lmc_send_input(0, tonumber("110111" .. y, 2), 4)
lmc_send_input(0, tonumber("110110" .. w .. x, 2), 6)
lmc_send_input(0, tonumber("110111" .. y, 2), 6)
end;
end;
function toBits(num, bits)
-- returns a table of bits, most significant first.
bits = bits or math.max(1, select(2, math.frexp(num)))
local t = {} -- will contain the bits
for b = bits, 1, -1 do
t[b] = math.fmod(num, 2)
num = math.floor((num - t[b]) / 2)
end
return table.concat(t)
end
--[[ utf8_explode / extract from ustring.lua
https://github.com/wikimedia/mediawiki-extensions-Scribunto/blob/master/includes/engines/LuaCommon/lualib/ustring/ustring.lua
A private helper that splits a string into codepoints, and also collects the
starting position of each character and the total length in codepoints.
@param s string utf8-encoded to decode
@return table { .len, .codepoints, .bytepos}
]]
function utf8_explode( s )
local rslt = {
len = 0,
codepoints = {},
bytepos = {},
};
local i = 1;
local l = string.len( s );
local cp, b, b2, trail;
local min;
while i <= l do
b = string.byte( s, i );
if b < 0x80 then
-- 1-byte code point, 00-7F
cp = b;
trail = 0;
min = 0;
elseif b < 0xc2 then
-- Either a non-initial code point (invalid here) or
-- an overlong encoding for a 1-byte code point
return nil;
elseif b < 0xe0 then
-- 2-byte code point, C2-DF
trail = 1;
cp = b - 0xc0;
min = 0x80;
elseif b < 0xf0 then
-- 3-byte code point, E0-EF
trail = 2;
cp = b - 0xe0;
min = 0x800;
elseif b < 0xf4 then
-- 4-byte code point, F0-F3
trail = 3;
cp = b - 0xf0;
min = 0x10000;
elseif b == 0xf4 then
-- 4-byte code point, F4
-- Make sure it doesn't decode to over U+10FFFF
if string.byte( s, i + 1 ) > 0x8f then
return nil;
end
trail = 3;
cp = 4;
min = 0x100000;
else
-- Code point over U+10FFFF, or invalid byte
return nil;
end
-- Check subsequent bytes for multibyte code points
for j = i + 1, i + trail do
b = string.byte( s, j );
if not b or b < 0x80 or b > 0xbf then
return nil;
end;
cp = cp * 0x40 + b - 0x80;
end;
if cp < min then
-- Overlong encoding
return nil;
end;
rslt.codepoints[#rslt.codepoints + 1] = cp;
rslt.bytepos[#rslt.bytepos + 1] = i;
rslt.len = rslt.len + 1;
i = i + 1 + trail;
end;
-- Two past the end (for sub with empty string)
rslt.bytepos[#rslt.bytepos + 1] = l + 1;
rslt.bytepos[#rslt.bytepos + 1] = l + 1;
return rslt;
end;