NetTalk Central

Author Topic: _utfdecode in NetWeb.CLW  (Read 2850 times)

bshields

  • Sr. Member
  • ****
  • Posts: 392
    • View Profile
    • Inhabit
    • Email
_utfdecode in NetWeb.CLW
« on: January 17, 2023, 11:21:18 PM »
Hi Bruce,

There is a bug in the _utfdecode function. Its in NT10 and also the latest NT12.

NetWebServer._utfdecode             Procedure (String p_text)
ReturnValue  long
  code
  case len(p_text)
  of 1
    returnvalue = band(val(p_text[1]),1111111b)
  of 2
    returnvalue = bshift(band(val(p_text[1]),11111b),6) + band(val(p_text[2]),111111b)
  of 3
    returnvalue = bshift(band(val(p_text[1]),1111b),12) + bshift(band(val(p_text[2]),11111b),6) + band(val(p_text[3]),111111b)
  of 4
    returnvalue = bshift(band(val(p_text[1]),1111b),18) + bshift(band(val(p_text[2]),111111b),12) + bshift(band(val(p_text[3]),111111b),6) + band(val(p_text[4]),111111b)
  end
  return returnValue


should read

NetWebServer._utfdecode             Procedure (String p_text)
ReturnValue  long
  code
  case len(p_text)
  of 1
    returnvalue = band(val(p_text[1]),1111111b)
  of 2
    returnvalue = bshift(band(val(p_text[1]),11111b),6) + band(val(p_text[2]),111111b)
  of 3
    returnvalue = bshift(band(val(p_text[1]),1111b),12) + bshift(band(val(p_text[2]),111111b),6) + band(val(p_text[3]),111111b)          <---  this line
  of 4
    returnvalue = bshift(band(val(p_text[1]),1111b),18) + bshift(band(val(p_text[2]),111111b),12) + bshift(band(val(p_text[3]),111111b),6) + band(val(p_text[4]),111111b)
  end
  return returnValue


The second byte of a three byte UTF should be masked with 111111b not 11111b.

Also the first byte of a four byte UTF should be masked with 111b not 1111b, but as the value in the 4th bit is always 0, this doesnt cause an issue.

This caused certain Chinese, Korean and Japanese characters to get mangled.

Regards
Bill




Bruce

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 11250
    • View Profile
Re: _utfdecode in NetWeb.CLW
« Reply #1 on: January 18, 2023, 09:03:25 AM »
noted, thanks Bill.

this looks like old code that needs to be updated to use StringTheory anyway, but I'll deal with that separately.

Cheers
Bruce