Patch for ticket 2596 (Constant crashes under high load/many concurrent requests)

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Patch for ticket 2596 (Constant crashes under high load/many concurrent requests)

Jackie Ng
Hi All,

I've attached a patch for ticket 2596

https://trac.osgeo.org/mapguide/ticket/2596

This patch should fix instability issues in mgserver as a result of high load.

From the ticket description, mgserver crashes under high load from many concurrent QUERYMAPFEATURES requests to an Oracle layer (with the King Oracle provider).

I was able to reproduce this on an Oracle XE 11g instance on a MapGuide Server on Windows using an Apache/Tomcat web tier configuration with one layer pointing to an oracle table and hammering the containing map with a 200 concurrent sample QUERYMAPFEATURES requests using ApacheBench.

Intense debugging revealed corrupted STL strings being passed to FDO feature queries, which then manifest as access violations when put under the VS debugger and produces log entries similar to the ticket description (the king oracle provider not finding a class definition is because of a garbage class name string being passed to it), other errors being logged included std::length_errors.

Having being reminded of this thread from many years back (http://osgeo-org.1560.x6.nabble.com/std-string-not-thread-safe-on-Linux-td4210940.html), I scanned the MapGuide codebase for cases where STL strings were being assigned to class members as-is instead of assigning its c_str()'d result. It turns out the whole MdfModel project is full of classes where this was the case (raw STL string assignment to class members).

This patch modifies all affected MdfModel classes to assign c_str() on all class members for any STL string inputs.

The patch also modifies MgUtil so that some previously static const STL string members have been replaced with raw #define'd string literals, as I suspected ParseQualifiedClassName()/FormatQualifiedClassName() would be called from multiple threads and I didn't want to risk possibility of STL string corruption from multiple threads trying to concat from the same static STL string member.

I have yet to see mgserver crash under the same load tests after this patch was applied.

Please review. Thanks.

- Jackie
Reply | Threaded
Open this post in threaded view
|

Re: Patch for ticket 2596 (Constant crashes under high load/many concurrent requests)

Gabriele Monfardini
Hi Jackie,

I want to share an error that we're seeing under load that may be
correlated with all STL string mess you're trying to fix.
We're using OGRProvider with recompiled libgdal and PostgreSQL.

Error: A length exception occurred.
       basic_string::_S_create
StackTrace:
 - MgOperationThread.ProcessOperation() line 437 file OperationThread.cpp

This error seems to crash apache child leading to sporadic failure in
requests but without affecting server stability.
I've not opened a bug for this since is difficult to reproduce and the
stack trace is too terse to be useful.
I hope that your patch may be useful also to solve this kind of errors,
that are quite difficult to debug.

Best regards,

Gabriele Monfardini


On Wed, Jun 22, 2016 at 2:30 PM, Jackie Ng <[hidden email]> wrote:

> Hi All,
>
> I've attached a patch for ticket 2596
>
> https://trac.osgeo.org/mapguide/ticket/2596
>
> This patch should fix instability issues in mgserver as a result of high
> load.
>
> From the ticket description, mgserver crashes under high load from many
> concurrent QUERYMAPFEATURES requests to an Oracle layer (with the King
> Oracle provider).
>
> I was able to reproduce this on an Oracle XE 11g instance on a MapGuide
> Server on Windows using an Apache/Tomcat web tier configuration with one
> layer pointing to an oracle table and hammering the containing map with a
> 200 concurrent sample QUERYMAPFEATURES requests using ApacheBench.
>
> Intense debugging revealed corrupted STL strings being passed to FDO
> feature
> queries, which then manifest as access violations when put under the VS
> debugger and produces log entries similar to the ticket description (the
> king oracle provider not finding a class definition is because of a garbage
> class name string being passed to it), other errors being logged included
> std::length_errors.
>
> Having being reminded of this thread from many years back
> (
> http://osgeo-org.1560.x6.nabble.com/std-string-not-thread-safe-on-Linux-td4210940.html
> ),
> I scanned the MapGuide codebase for cases where STL strings were being
> assigned to class members as-is instead of assigning its c_str()'d result.
> It turns out the whole MdfModel project is full of classes where this was
> the case (raw STL string assignment to class members).
>
> This patch modifies all affected MdfModel classes to assign c_str() on all
> class members for any STL string inputs.
>
> The patch also modifies MgUtil so that some previously static const STL
> string members have been replaced with raw #define'd string literals, as I
> suspected ParseQualifiedClassName()/FormatQualifiedClassName() would be
> called from multiple threads and I didn't want to risk possibility of STL
> string corruption from multiple threads trying to concat from the same
> static STL string member.
>
> I have yet to see mgserver crash under the same load tests after this patch
> was applied.
>
> Please review. Thanks.
>
> - Jackie
>
>
>
> --
> View this message in context:
> http://osgeo-org.1560.x6.nabble.com/Patch-for-ticket-2596-Constant-crashes-under-high-load-many-concurrent-requests-tp5272840.html
> Sent from the MapGuide Internals mailing list archive at Nabble.com.
> _______________________________________________
> mapguide-internals mailing list
> [hidden email]
> http://lists.osgeo.org/mailman/listinfo/mapguide-internals
_______________________________________________
mapguide-internals mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/mapguide-internals
Reply | Threaded
Open this post in threaded view
|

Re: Patch for ticket 2596 (Constant crashes under high load/many concurrent requests)

Jackie Ng
In reply to this post by Jackie Ng
Disregard this patch. I was able to bring down my mgserver with "heavier" QUERYMAPFEATURES requests.

- Jackie