NetTalk Central

Author Topic: Special INTL CHARS - best way to handle it?  (Read 3506 times)

JohanR

  • Sr. Member
  • ****
  • Posts: 375
    • View Profile
    • Email
Special INTL CHARS - best way to handle it?
« on: June 27, 2023, 04:38:32 AM »
Hi,


Displays in other software  "description": "Stra?berg, Germany"

am receiving it in the string as  "description" : "Stra\u00dfberg, Germany"

So with manual data input, the user would type Strassberg as the special chars not available.

Is there a method to translate these?
eg. the \u00df to 'ss'
I could build a small table with these examples and search and replace for storing address details, but just wondering if there is better way?
Not even sure how many other common ones I might encounter?

thanks

Johan







Bruce

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 11250
    • View Profile
Re: Special INTL CHARS - best way to handle it?
« Reply #1 on: June 29, 2023, 09:25:00 PM »
Hi Johan,

Before answering this question I need to know what your "Store Data As" setting, and "charset" settings are set to.
(See Settings / General tab in WebServer extension).

Since your program deals with international names and addresses, it would behoove you to be using Unicode for your data. However since this data overlaps with a desktop program I'm guessing you're storing as ANSI. Which then leads me to my second question;

How are you dealing with this in the desktop program. That will inform how we go about dealing with it here.

Bruce

JohanR

  • Sr. Member
  • ****
  • Posts: 375
    • View Profile
    • Email
Re: Special INTL CHARS - best way to handle it?
« Reply #2 on: June 30, 2023, 12:04:41 AM »
Hi Bruce

Short answer
In the desktop system we capture '?' as ae and "?" as o
This might change in the future as we are opening up data capture with an NTWS and possible other data input streams.



Long answer
Historically we have been dealing with data coming from all sorts of sources, paper, email, external invoices etc,
This would then be captured in CW desktop system.
eg.
J P Smithsvej, Nyk?bing Sj?lland, Denmark
would normally arrive as
J P Smithsvej, Nykobing Sjaeland, Denmark

We submit customs clearance requests via EDI format to customs in SA,
pretty sure they don't handle these characters, and we would have to print accompanying customs paperwork.

Current project is also to accept online instructions for deliveries and this includes a facility to do Google or Here search for the address.
This has now created the external input without the manual 'conversion' and going forward this will probably become the main source of data.

So first step would be to store the data as we would have in the past.
Convert the '?' to ae, store and use as in the past.
If there are existing functions that can do this for the main ones?


Not sure what else I can do, if there is any benefit in storing the codes in a side table for lookup when possibilities exist to display correctly.
In the browser possibly, not even sure if this is doable as NTWS is also built with clarion.

Create a table to store the correct version.
fld1 ="Nykobing Sjaeland, Denmark"
as
fld2 = "Nyk\u00f8bing Sj\u00e6lland, Denmark"
So sometime in the future still be able to rectify the data, or display and use it correctly when possible.
Seems extreme and very cumbersome.


thanks

Johan



















JohanR

  • Sr. Member
  • ****
  • Posts: 375
    • View Profile
    • Email
Re: Special INTL CHARS - best way to handle it?
« Reply #3 on: July 01, 2023, 03:59:45 AM »
Hi Bruce

This not a proper solution but a quickfix and not even sure this is a good quickfix solution.

Look forward to ideas/solution.
Sure there are better ways to do this, but for current requirements seems to be working.

This code is in the ThisWebClient.PageReceived PROCEDURE

  ThisWebClient.TextOnly()
 
  ThisWebClient.Thispage.replace('\u00e6','ae')
  ThisWebClient.Thispage.replace('\u00f8','o')
  ThisWebClient.Thispage.replace('\u00df','ss')

  ! check for more common ones and add to replace code
  loop 10 times
     pos# = ThisWebClient.Thispage.FindChars('\u00',pos#)
     ds_OutputDebugString('TVC Google UNICODE not found:' & ThisWebClient.Thispage.Slice(pos#,pos#+5))
     if pos# = 0
        break
     end
  end

  Any feedback most welcome and if anyone has a list of common ones, will be appreciated.

  thanks

  Johan



 


Bruce

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 11250
    • View Profile
Re: Special INTL CHARS - best way to handle it?
« Reply #4 on: July 04, 2023, 11:41:39 PM »
>> We submit customs clearance requests via EDI format to customs in SA, pretty sure they don't handle these characters,

I think  maybe you should verify this.
Specifically find out what formats the EDI (file?) supports. Does it support only ASCII? or does it support utf-8?

I think this the is the key to picking a good solution to your situation.

(and to be fair, even then I'd look at storing unicode, and just converting on Export if necessary.)

Cheers
Bruce

JohanR

  • Sr. Member
  • ****
  • Posts: 375
    • View Profile
    • Email
Re: Special INTL CHARS - best way to handle it?
« Reply #5 on: July 06, 2023, 06:03:24 AM »
Hi Bruce

Best I can find in the Customs EDI manual.

A data element is the smallest unit of information in a segment. Two or more data elements may be grouped together to form a composite data element. Like segments, data elements can have a status of either mandatory or conditional. The type of a data element is represented by either an "a" (alpha characters only), "an" (alphanumeric) or "n" (numeric). The number that follows the type represents the maximum number of characters allowed and the ".." between the type and size means that the data element is of variable length. Should the ".." (dot) not be present, it is a fixed length data element. Examples are "an..35", "n..18", "a1" etc.

Will try to send a query to one of the SARS tech guys.

However, my current solution is working albeit cludgy and lame.

But it would be interesting to pursue the storing of the characters and then translate back when creating output, either exchanging electronic data or reports.
eg. Is there a possibility to print the correct character when creating a PDF doc?


thanks

Johan