Hi Bruce,
There is a bug in the _utfdecode function. Its in NT10 and also the latest NT12.
NetWebServer._utfdecode Procedure (String p_text)
ReturnValue long
code
case len(p_text)
of 1
returnvalue = band(val(p_text[1]),1111111b)
of 2
returnvalue = bshift(band(val(p_text[1]),11111b),6) + band(val(p_text[2]),111111b)
of 3
returnvalue = bshift(band(val(p_text[1]),1111b),12) + bshift(band(val(p_text[2]),11111b),6) + band(val(p_text[3]),111111b)
of 4
returnvalue = bshift(band(val(p_text[1]),1111b),18) + bshift(band(val(p_text[2]),111111b),12) + bshift(band(val(p_text[3]),111111b),6) + band(val(p_text[4]),111111b)
end
return returnValue
should read
NetWebServer._utfdecode Procedure (String p_text)
ReturnValue long
code
case len(p_text)
of 1
returnvalue = band(val(p_text[1]),1111111b)
of 2
returnvalue = bshift(band(val(p_text[1]),11111b),6) + band(val(p_text[2]),111111b)
of 3
returnvalue = bshift(band(val(p_text[1]),1111b),12) + bshift(band(val(p_text[2]),111111b),6) + band(val(p_text[3]),111111b) <--- this line
of 4
returnvalue = bshift(band(val(p_text[1]),1111b),18) + bshift(band(val(p_text[2]),111111b),12) + bshift(band(val(p_text[3]),111111b),6) + band(val(p_text[4]),111111b)
end
return returnValue
The second byte of a three byte UTF should be masked with 111111b not 11111b.
Also the first byte of a four byte UTF should be masked with 111b not 1111b, but as the value in the 4th bit is always 0, this doesnt cause an issue.
This caused certain Chinese, Korean and Japanese characters to get mangled.
Regards
Bill