Author Topic: _utfdecode in NetWeb.CLW (Read 4627 times)

bshields · « **on:** January 17, 2023, 11:21:18 PM »

Hi Bruce,

There is a bug in the _utfdecode function. Its in NT10 and also the latest NT12.

NetWebServer._utfdecode Procedure (String p_text)
ReturnValue long
code
case len(p_text)
of 1
returnvalue = band(val(p_text[1]),1111111b)
of 2
returnvalue = bshift(band(val(p_text[1]),11111b),6) + band(val(p_text[2]),111111b)
of 3
returnvalue = bshift(band(val(p_text[1]),1111b),12) + bshift(band(val(p_text[2]),11111b),6) + band(val(p_text[3]),111111b)
of 4
returnvalue = bshift(band(val(p_text[1]),1111b),18) + bshift(band(val(p_text[2]),111111b),12) + bshift(band(val(p_text[3]),111111b),6) + band(val(p_text[4]),111111b)
end
return returnValue

should read

NetWebServer._utfdecode Procedure (String p_text)
ReturnValue long
code
case len(p_text)
of 1
returnvalue = band(val(p_text[1]),1111111b)
of 2
returnvalue = bshift(band(val(p_text[1]),11111b),6) + band(val(p_text[2]),111111b)
of 3
returnvalue = bshift(band(val(p_text[1]),1111b),12) + bshift(band(val(p_text[2]),111111b),6) + band(val(p_text[3]),111111b) <--- this line
of 4
returnvalue = bshift(band(val(p_text[1]),1111b),18) + bshift(band(val(p_text[2]),111111b),12) + bshift(band(val(p_text[3]),111111b),6) + band(val(p_text[4]),111111b)
end
return returnValue

The second byte of a three byte UTF should be masked with 111111b not 11111b.

Also the first byte of a four byte UTF should be masked with 111b not 1111b, but as the value in the 4th bit is always 0, this doesnt cause an issue.

This caused certain Chinese, Korean and Japanese characters to get mangled.

Regards
Bill

Bruce · « **Reply #1 on:** January 18, 2023, 09:03:25 AM »

noted, thanks Bill.

this looks like old code that needs to be updated to use StringTheory anyway, but I'll deal with that separately.

Cheers
Bruce

NetTalk Central

Author Topic: _utfdecode in NetWeb.CLW (Read 4627 times)

bshields

_utfdecode in NetWeb.CLW

Bruce

Re: _utfdecode in NetWeb.CLW