Author Topic: jFiles : improving parsing speed? (Read 7347 times)

AtoB · « **on:** September 14, 2016, 12:22:18 AM »

Hi Bruce,

I'm using jFiles/nettalk webservices quite intensively and now I am into the speed "thingy". I have to deal with tremendous amounts of rather large requests and optimised my appliction and backend interaction as much as possible.

As jFiles ships with source I can look for improvements there too, (and you've probably guessed it

so I did ... I've attached the modified 1.26 version, hope you will consider implementing these adjustments and suggestions below

some requests and/or suggestions (not applied to the attached source <g>):

1 - something I haven't tried, but only thought of: is it a good idea to create one toplevel object as a container for most of jsonclass properties in one one place? Then each created jsonclass object will need a reference to this toplevel object. But this would probably create less overhead than all the cloning and cascading (up/down) of the properties for each object. The objects themselves can be smaller, so construction them runs faster. Again I don't grasp the concept of jFiles completely and I'm only thinking out loud :-)

2 - the general loading/parsing of the json file contains quite a lot of calls to the .HandleChar procedure, but since version 1.25 (?) you've added a third parameter pPlace that is filled with a call to substr() for each call. I haven't changed these the attached version, but stripping them out improves the call to the LoadString method with another 4 to 5 percent speed improvement. Maybe passing the pToParse String by reference and only substr it when an error occurs (which should be rarely the case ...)?

3 - would you consider removing (or making it a configurable object property (with the downside of cascading ...)?) the calls to SELF.Trace in at least both the .FillStructure method and the .GetObject (when no object is found) method?. I prefer to ship all my code with debug option on, so these calls are effectively made in the production system and do slowdown the system (on the 5000 record example this saves me .3 secs per request)

Now for the applied improvement (see attached modified jFiles.clw/inc)

when loading json into a queue I noticed the fieldlabels are processed for each queue record/json object. I've created a "preprocessor" for this, with little or no overhead that reduces a total processing time of 5000 records from 2.24 secs to 1.95 secs (this is total "service time", so includes api validition, fetching against backend of 5000 records, and producing a 5000 record result queue and turning that into json too)

the following needs to be changed:

- in jfiles.inc : add the ColNameQType declaration
- in jfiles.inc : modify the prototype of the first Fillstructure item to : FillStructure procedure(*GROUP pGroup, <ColNameQType pColNameQ>),Long,Proc,VIRTUAL

- in files.clw : in the FillStructure (first occur.) method the following is added (see around line 1539):

if ~OMITTED(pColNameQ)
GET(pColNameQ, c)
nIndex = SELF.Position(pColNameQ.label)
!? self.trace(all(' ',indent) & 'Trying to fill ' & clip(pColNameQ.label) & ' nIndex = ' & nIndex )
else
PropertyName.SetValue(WHO(pGroup,c))
self.AdjustFieldName(PropertyName,self.TagCase)
nIndex = SELF.Position(PropertyName.GetValue())
!? self.trace(all(' ',indent) & 'Trying to fill ' & clip(PropertyName.GetValue()) & ' nIndex = ' & nIndex )
end

- in jFiles.clw the method "JSONClass.load Procedure(QUEUE pQueue)" is modified: this does the actual preprocessing and passes the local queue to the .FillStructure method

There is also another FillStructure method that's called with a queue as a parameter (second occurence), that I haven't touched (as I don't effectively use it), but that might also benefit from this!

Hope you have time to implement (at least some of it). Let me know if I can be of any help. I'm willing to invest some time in it too!

regards,
Ton

Bruce · « **Reply #1 on:** September 14, 2016, 11:32:25 PM »

Hi Ton,

Thanks for all the suggestions.

For the biggest speed improvement in a web-server context (ie a multi-threaded context) apply FastMem to the app.
http://www.capesoft.com/accessories/fastmemsp.htm
The biggest gain (by far) with this tool is in the case where objects are being constructed on multiple threads - which turns out to be _exactly_ the situation WebServiceMethods end up in. So if you are making JSON based web services server then it's very much encouraged to use Fastmem.

That said, that applies to multiple requests happening at the same time - it doesn't apply as much to a single request, which is I guess what you are testing with at the moment. I am all for speeding up jFiles in any way possible. As we did with StringTheory we first make it right, then optimize as we go along to make it faster.

>> 1. something I haven't tried, but only thought of: is it a good idea to create one toplevel object as a container for most of jsonclass properties in one one place? Then each created jsonclass object will need a reference to this toplevel object. But this would probably create less overhead than all the cloning and cascading (up/down) of the properties for each object. The objects themselves can be smaller, so construction them runs faster. Again I don't grasp the concept of jFiles completely and I'm only thinking out loud :-)

In hindsight, and understanding more about how classes are used, it might have been a good idea to split the class in a "properties" part and a "node" part. At this point however it would be difficult to do that, and would make backwards compatibility a real problem. So while it might result in speed increases, I'm not sure it's a practical choice at this time.

>> 2 - the general loading/parsing of the json file contains quite a lot of calls to the .HandleChar procedure, but since version 1.25 (?) you've added a third parameter pPlace that is filled with a call to substr() for each call. I haven't changed these the attached version, but stripping them out improves the call to the LoadString method with another 4 to 5 percent speed improvement. Maybe passing the pToParse String by reference and only substr it when an error occurs (which should be rarely the case ...)?

yes, certainly worth looking closer at this.

>> 3 - would you consider removing (or making it a configurable object property (with the downside of cascading ...)?) the calls to SELF.Trace in at least both the .FillStructure method and the .GetObject (when no object is found) method?. I prefer to ship all my code with debug option on, so these calls are effectively made in the production system and do slowdown the system (on the 5000 record example this saves me .3 secs per request)

The GetObject one only happens on a fail, so probably not significant. The one in FillStructure is enormously useful when writing the app, but once written perhaps there's a way to turn that off. I'll look into that.

>> when loading json into a queue I noticed the fieldlabels are processed for each queue record/json object. I've created a "preprocessor" for this, with little or no overhead that reduces a total processing time

good catch. I suspect there may be other code that could be removed from loops in this way.

Incidentally are you using Profiler to profile the code at all.
http://www.capesoft.com/accessories/Profilersp.htm
It's often possible to improve things by looking at code and making some guesses, but the biggest gains usually come from profiling the code under a specific load or task and seeing what happens there.
If you are not profiling then please feel free to send me a sample JSON file, and associated data structures (Queues etc).
I can then build a test app to load or save those structures, profile it, and see where real savings might be made.

jFiles is fairly new, and I've certainly not started optimizing it yet, so I suspect there are several gains that can yet be made.

cheers
Bruce

Bruce · « **Reply #2 on:** September 16, 2016, 03:32:20 AM »

Hi Ton,

I've made a few optimizations to jFiles, and by extension in StringTheory as well.
Grab build 2.50 of StringTheory and 1.27 of jFiles.

Your structure name idea is one of the optimizations, although I sued a somewhat different approach than you did. There are also a variety of other optimizations some which have a large effect, some with a very small effect.

Specifically in the area of loading and saving a queue I'm seeing it about 25% faster on a load and about 15% faster on a save.

There are probably more optimizations that can be made, so suggestions are always welcome.

Cheers
Bruce

AtoB · « **Reply #3 on:** September 16, 2016, 04:29:56 AM »

Hi Bruce,

if my wife lets me, I'will try the new versions this weekend, otherwise it will be early next week :-)

Thanks for great support, will let you know the results

if you read this in time: have a nice weekend

regards
Ton

AtoB · « **Reply #4 on:** September 18, 2016, 03:18:00 AM »

Hi Bruce,

just ran my first test with updated jFiles and Stringtheory.

Last week my method with "my" (optimised) version of jFiles ran a 5000 records json file in approx. 1.63 sec.

Then I applied most recent jFiles and Stringtheory and did a couple of runs: 1.66 secs. So roughly the same.

But!

Now I simply commented out all the self.trace calls in the .FillStructure method (jFiles) and did another set of runs with the same data: 1.43 secs !!!

So effectively you did a better job than I did (I suspect the stringtheory .cat method improves things greatly, which I didn't have in last weeks version either).

Some questions/remarks:

1 - do you also see such a remarkably speed improvement when commenting out the self.Trace calls (on a somewhat larger input set)?

- I did found out why I had so many calls too the self.Trace method via the .GetObject method:
I was calling

jsonParameter &= SELF.rJSON.GetByName('apiVersion')

and when a property is not present it processes all objects (remarkebly fast b.t.w.), but calls the self.Trace method for each data element ... Now I'm issuing a

jsonParameter &= SELF.rJSON.GetByName('apiVersion',1)

(note the second parameter ...) and it only searches the top-level objects for this property

2 - suggestion related to the above: is it possible to simply get a couple of (simple) element values from the json prior to the complete parsing of the file? Sometimes I cannot control the incoming json and need some "strategic" info from the file that helps me decide how to process the thing (and wether I should process it at all). Currently I've created a procedure that seeks for a label (first one) and starts reading after that keyword until it decides it has the corresponding value. But it's not a clean implementation by all means. Would be much better if this is included in jFiles :-)

I've only tested parsing and haven't tried fastmem with the webservice yet. I'll first have some internal optimisations I still can look into, afterwards I'll see what effect the fastmem has on the whole thing. Keep you posted

Thanks so far, great improvements !

(less than two weeks ago this set took 2.24 secs to run, and now 1.43, If we can keep this pace, we'll be under .50 seconds by the end of the year :-) )

regards
Ton

Bruce · « **Reply #5 on:** September 18, 2016, 11:29:23 PM »

Hi Ton,

>> So effectively you did a better job than I did (I suspect the stringtheory .cat method improves things greatly, which I didn't have in last weeks version either).

there were a bunch of optimizations in both the load and save, which all add together. Cat was part of it although that improves the Save only, not the load. Optimising JsonEncode and JsonDecode in StringTheory also plays a big part. Optimising the "place" parameter (as you suggested) also made a big impact on .Load.

Bear in mind that it's all fairly "structure dependent". I tested with a mixed structure, strings and numbers, with about 15 fields. If you process other structures you'll get various different results.

>> 1 - do you also see such a remarkably speed improvement when commenting out the self.Trace calls (on a somewhat larger input set)?

probably not as much (percentage wise.)

>> 2 - suggestion related to the above: is it possible to simply get a couple of (simple) element values from the json prior to the complete parsing of the file?

If you need to "test" the string to see if you want o parse it, or if it contains something specific, then use StringTheory "Between" call. (or Instring.) It's not possible to parse "some" of the file, that would not make too much sense I don't think.

cheers
Bruce

AtoB · « **Reply #6 on:** September 19, 2016, 12:35:53 PM »

Hi Bruce,

for the time being (hoping for a jFiles adjustment in the end of course :-)) I'll comment out the fillstructure .trace call. (B.t.w. I think the .cat method of stringtheory is called when loading/parsing json quite often too, see .handlechar ...), but really don't make it a priority, I'm more than happy with the improvements made so far.

regards,
Ton

NetTalk Central

Author Topic: jFiles : improving parsing speed? (Read 7347 times)

AtoB

jFiles : improving parsing speed?

Bruce

Re: jFiles : improving parsing speed?

Bruce

Re: jFiles : improving parsing speed?

AtoB

Re: jFiles : improving parsing speed?

AtoB

Re: jFiles : improving parsing speed?

Bruce

Re: jFiles : improving parsing speed?

AtoB

Re: jFiles : improving parsing speed?