Hi Ton,
I spent some time on this over the weekend. The results were most interesting.
I used the WebStrain example (examples\NetTalk\WebClient\WebStrain) as the test client. This allows you to re-do a test consistently, across multiple threads to see what's happening. Obviously the tests are very server-dependent, so are not comparable from one machine to another, but it's useful to test difference scenarios.
Basically the major time consumption is in three places;
a) loading the xml file from the disk, and parsing this into a queue. The file is large (around 2 megs) and contains about 42000 records so this takes a bit of time to do. On my machine this takes roughly 1.8 seconds.
b) converting the queue to json objects, 1.2 seconds (*)
c) saving the json objects to a serialized string for saving out to disk - about 0.3 seconds.
Total time is about 3.4 seconds for a single request.
(*) This is after the optimizations I made. It was going a bit slower than this for this step so I made some small tweaks to the class, and the StringTheory class, to speed it up a bit.
Memory usage was quite high - you're making 42000 instances of the object (in JsonClass each node is a separate instance.) Again I made a number of tweaks to both StringTheory and jFiles to reduce memory usage. It was using about 270 megs for the test - this is now down to 60 megs. (Which sounds like a lot, but the object does not last long.) 
Incidentally because of the jFiles approach to JSON - where the nodes are parsed and stored as a collection of classes, the speed of processing and the memory used are very different to xFiles. xFiles is built for raw speed, which makes it very fast (and also uses very little ram) but jFiles is MUCH easier to work with when complex structures are encountered. It's an interesting trade-off, and it's interesting to see it contrasted so well here.
Then the really interesting stuff started to happen. I set WebString to use 4 threads, and make 4 requests (simultaneously.)  Given the request time is 3.4 seconds I would have expected the test to take about 13.6 seconds (maximum) and in fact (knowing it can leverage multiple cores) perhaps a bit less than that.
Interestingly it didn't do this. Processing the incoming XML multi-tasked well (it took 3.8 seconds instead of 1.8, but it did 4 files, not 1). The final step of converting the Json tree to a Serialized string also multi-tasked well. But the middle step of constructing the tree took about 52 seconds (to do all 4).
Clearly allowing the OS to switch the threads while constructing the tree is massive inefficient. I don't fully understand why, but I'm guessing it surrounds the getting of Ram. (This isn't simply a contention issue, because then it would just take 4* 3.4 at worst). 
I then added a critical section around the construction of the tree - in other words forcing one to complete before the next one could begin. I only wrapped the tree-construction phase leaving the other two phases to multi-task as before.
Now the test (running 4 requests on 4 threads at the same time) takes 9 seconds to complete. Which is more in line with what one might expect. 
The solution definitely seems to be serializing the creation of the tree, but at this point I'm not 100% sure if I want to do that in jFiles itself, or in the NetTalk Method class. I'll be checking into that as the week progresses.
update: The critical section has been built into jFiles itself and released in build 1.17, so download that and please try your test again.
Thanks for the report though, and the example, it's been fascinating to work through this.
The StringTheory and jFiles improvements for memory usage are on the web site and available for download.
cheers
Bruce