Proxy Sync call stuck waiting on semaphore.
|
12-07-2012, 05:00 AM
Post: #1
|
|||
|
|||
Proxy Sync call stuck waiting on semaphore.
I am making a Sync call from a generated proxy. The call follows two other Sync calls which always complete successfully. If I switch the order of calls, it seems that the third one always gets stuck waiting for semaphore. The call trace looks like this:
Quote:#0 0x312ca068 in __psynch_cvwait () The following log with LogVerbose enabled on the Invoke method suggests that the action's response is successfully received from by my Upnp Control Point: Quote:Socket::Send, sending When the execution hangs, waiting on the semaphore, if I pause execution in the debugger and resume, the Sync call completes. Any ideas why? Is there another trace log I can enable to find out more? Is there something peculiar about the way semaphores work on iOS? Surely I can make consecutive Sync calls through different Proxies? |
|||
12-07-2012, 08:11 AM
(This post was last modified: 12-07-2012 08:14 AM by simonc.)
Post: #2
|
|||
|
|||
RE: Proxy Sync call stuck waiting on semaphore.
You can make consecutive (or overlapping) calls through any number of different Proxies. I'd guess that you're seeing a bug in the iOS simulator but its possible this is an ohNet problem too...
To start narrowing the cause down, can you add some logging please? In InvocationUpnp::ReadResponse(), after the initialisation of actionName, add the lines Code: Log::Print("Response to "); Rebuild with DEFINE_TRACE defined and re-run your tests. You should see a logging statement for each action invocation that receives a full response. It'd be interesting to know if you see a statement for the apparantly stalled action. If you don't immediately see a statement, do you get it after you pause/resume the debugger? (This tests out the reliability of networking support in the iOS simulator. If you see logging for all actions while your code is still blocked, networking is behaving correctly. If you don't see logging for the third action until you pause/resume the simulator, its likely this is a problem with the simulator.) |
|||
12-07-2012, 10:10 AM
Post: #3
|
|||
|
|||
RE: Proxy Sync call stuck waiting on semaphore.
Code: Log::Print("Response to "); I have worked around the problem with third call, by using an evented property. However, I am seeing a similar trace to the semaphore wait when invoking a different action. This time it causes a crash (bad memory access) with the following complete call trace for that thread: Quote:#0 0x0006c98a in OpenHome::Swx::Write(OpenHome::Brx const&) at /Users/developer/ohNet/openhome-ohNet-b56bff8/OpenHome/Stream.cpp:188 I am able to reproduce this crash at will. In this case, the Log::Print(actionName) does not appear. Quote:(This tests out the reliability of networking support in the iOS simulator. If you see logging for all actions while your code is still blocked, networking is behaving correctly. If you don't see logging for the third action until you pause/resume the simulator, its likely this is a problem with the simulator.)I am running on the iPhone device (arm7), not the iOS simulator. The OS should behave the same, but of course the underlying hardware for the Simulator is the Mac, it's Intel CPU, and multiple network interfaces.[/quote]. I hope that the area of the crash will shed some light on the endless semaphore wait.?? |
|||
12-07-2012, 04:28 PM
Post: #4
|
|||
|
|||
RE: Proxy Sync call stuck waiting on semaphore.
This could either be a bug in ohNet or client code. I'd tend to guess its in client code at this stage; you'd probably disagree :-).
I think I need to investigate this. Can you create a simple test case, something I can compile and run myself, that demonstrates a hang or crash? Or, failing that, can you send me the proxies you're using? |
|||
17-07-2012, 08:09 AM
Post: #5
|
|||
|
|||
RE: Proxy Sync call stuck waiting on semaphore.
Hi Simon,
Apologies for the delay in my response. I have been having great difficulty creating a test case which reliably reproduces a crash or hang. I do actually have a test case which can trigger the hang at least once (sometimes more often). Unfortunately, I have not managed to simplify it down to something you could run at your end. Besides, I think it would be difficult to replicate the environment I have here (i.e. the STB running the Intel Upnp Device Microstack). Today I have been trying a more statistical analysis of the issues since there seems to be multiple issues here and the call trace to a crash is not always the same. I am now able to ask some more questions instead of being completely lost... When the hang occurs after invoking an action (any action as far as I can see), I pause execution and... 1. OsNetworkReceiveFrom is running as the entry point to two separate threads. Each one is waiting at the select function in Os.c : 741. Code: int32_t selectErr = TEMP_FAILURE_RETRY(select(nfds(handle), &read, NULL, &error, NULL)); In this function, does aBytes represent the size of the given buffer? In one thread the value is 256 and the other it is 4294967295 (0xFFFFFFFF). That seems like a large size for a buffer!? Also (manually) tracing OsNetworkReceiveFrom back up the call chain, I get to Socket::Receive, which has two different variants. How do I know which variant was called? 2. On rarer occasions, I get the following error log: Quote:------------------------------ and on another occasion: Quote:------------------------------ I should point out, that the response time to an action from the Upnp Device is not consistent and on occasions I have seen it take 55 seconds before responding to an action invocation!! When this scenario does occur though, I would expect a Sync call from the Control Point (CP) side to either wait until it finally gets the response or time out. When it hangs though, the CP has definitely received a valid Upnp response, as indicated by the prints you recommended. When the response is received, Pause-Resume execution will always complete the Upnp transaction. I wonder if Pause-Resume has some affect on thread priorities, overcoming the apparent deadlock? I notice there are log messages carefully placed throughout the OhNet source. Can you advise how to enable/disable the different log levels in general? Can you advise on any particular logs I should enable? Are there any tests I could run to run through the shell environment to help diagnose the issue(s)? I appreciate your support, Doug |
|||
17-07-2012, 01:13 PM
(This post was last modified: 17-07-2012 01:14 PM by simonc.)
Post: #6
|
|||
|
|||
RE: Proxy Sync call stuck waiting on semaphore.
The various problems sound very much like memory corruption.
Tracking this down by logging on iOS is possible but will likely be slow. Would it be possible for you to create a simple control point program that runs on Windows or Linux? If you can get something running on Linux, try using valgrind to see if that gives you any hints about the source of the memory overwrite. I'd also be interested to see your control point source to see if I can spot any problems (either current or Windows/Linux would be fine). In case none of that helps, answers to your questions follow: (17-07-2012 08:09 AM)dough Wrote: When the hang occurs after invoking an action (any action as far as I can see), I pause execution and...0xFFFFFFFF is an invalid size for a buffer so probably indicates either memory over-writes or use of freed memory. The 256 byte buffer is also suspect - that seems awfully small so may well also be invalid. (17-07-2012 08:09 AM)dough Wrote: Also (manually) tracing OsNetworkReceiveFrom back up the call chain, I get to Socket::Receive, which has two different variants. How do I know which variant was called?I can only see one function - Socket::ReceiveFrom - which ends up calling OsNetworkReceiveFrom. Are you sure none of the interesting looking threads are in calls to OsNetworkReceive instead? (17-07-2012 08:09 AM)dough Wrote: 2. On rarer occasions, I get the following error log:The ErrorCode: 2 line shows that the tcp connect at the start of action invocation timed out. ohNet uses a 3 second timeout by default; you can change this by calling OhNetInitParamsSetTcpConnectTimeout (note that this needs to be called before OhNetLibraryInitialise). You could also consider treating this as a bug on your STB; there may well be some bugs in that if it is unable to accept a small number of parallel action invocations. (17-07-2012 08:09 AM)dough Wrote: I should point out, that the response time to an action from the Upnp Device is not consistent and on occasions I have seen it take 55 seconds before responding to an action invocation!!You're correct that each action invocation should either complete or timeout. The hangs you're seeing are definitely a bug. I wonder whether they're yet another instance of memory corruption. I'd be tempted to look into the crashes first and see if fixing them also fixes the hangs... (17-07-2012 08:09 AM)dough Wrote: When the response is received, Pause-Resume execution will always complete the Upnp transaction. I wonder if Pause-Resume has some affect on thread priorities, overcoming the apparent deadlock?Can you explain how you're doing the pause-resume please? Are you running under a source-level debugger? If you are, is this on a simulator or on an iPhone? (17-07-2012 08:09 AM)dough Wrote: I notice there are log messages carefully placed throughout the OhNet source. Can you advise how to enable/disable the different log levels in general?Use Code: Debug::SetLevel(Debug::kSomeLevel | Debug::kAnotherLevel); (17-07-2012 08:09 AM)dough Wrote: Can you advise on any particular logs I should enable? Are there any tests I could run to run through the shell environment to help diagnose the issue(s)?As above, I'd recommend creating a control point that can run on a platform with a better debugger. If you have access to Linux boxes, that'd be ideal. Also, I'm happy to have a look at any control point code to see if I can spot any bugs. |
|||
17-07-2012, 01:39 PM
(This post was last modified: 18-07-2012 01:01 PM by simonc.)
Post: #7
|
|||
|
|||
RE: Proxy Sync call stuck waiting on semaphore.
Can I check whether you're using subscriptions and their associated property change callbacks? If you are, be aware that these callbacks run in threads created by ohNet; the advice in https://developer.apple.com/library/mac/...i-CH15-SW2 will be relevant.
Later: The link above leads to a fairly long post. The "Using POSIX Threads in a Cocoa Application" section is most relevant. |
|||
01-08-2012, 02:09 AM
Post: #8
|
|||
|
|||
RE: Proxy Sync call stuck waiting on semaphore.
The ohNet tests (particularly TestProxyC) were not passing when run on the iPhone. The hang occurred when running these tests.
The hanging issue has now been resolved. Just check out the latest release of ohNet. Many thanks! |
|||
« Next Oldest | Next Newest »
|