Thread Rating:
  • 0 Votes - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Proxy Sync call stuck waiting on semaphore.
12-07-2012, 05:00 AM
Post: #1
Proxy Sync call stuck waiting on semaphore.
I am making a Sync call from a generated proxy. The call follows two other Sync calls which always complete successfully. If I switch the order of calls, it seems that the third one always gets stuck waiting for semaphore. The call trace looks like this:

Quote:#0 0x312ca068 in __psynch_cvwait ()
#2 0x339887c8 in pthread_cond_wait ()
#3 0x00101f6a in OsSemaphoreWait at /Users/developer/ohNet/openhome-ohNet-b56bff8/Os/Posix/Os.c:294
#4 0x000f86d6 in OpenHome::Os::SemaphoreWait(void*) [inlined] at /Users/developer/ohNet/openhome-ohNet-b56bff8/Build/Include/OpenHome/OsWrapper.inl:29
#5 0x000f86c0 in OpenHome::Semaphore::Wait() at /Users/developer/ohNet/openhome-ohNet-b56bff8/OpenHome/Thread.cpp:30
#6 0x000c980a in OpenHome::Net::SyncProxyAction::Wait() at /Users/developer/ohNet/openhome-ohNet-b56bff8/OpenHome/Net/ControlPoint/AsyncPrivate.cpp:17
#7 0x000c378a in CpProxySchemas_upnp_orgEventServices1C::SyncMisc_getStbTimeUtc(bool&, unsigned int&) at <path to source>/CpSchemas-upnp-orgEventServices1C.cpp:385
#8 0x000c42f4 in CpProxySchemas_upnp_orgEventServices1SyncMisc_getStbTimeUtc at <path to source>/CpSchemas-upnp-orgEventServices1C.cpp:627

The following log with LogVerbose enabled on the Invoke method suggests that the action's response is successfully received from by my Upnp Control Point:

Quote:Socket::Send, sending
POST /EventServices/control HTTP/1.0

Host: 192.168.15.234:46339

Content-Length: 289

Content-Type: text/xml; charset="utf-8"

SOAPACTION: "urn:schemas-upnp-org:service:EventServices:1#Misc_getStbTimeUtc"



<?xml version="1.0"?>

<s:Envelope xmlnsConfused="http://schemas.xmlsoap.org/soap/envelope/" s:encodingStyle="http://schemas.xmlsoap.org/soap/encoding/">

<s:Body>

<u:Misc_getStbTimeUtc xmlns:u="urn:schemas-upnp-org:service:EventServices:1">

</u:Misc_getStbTimeUtc>

</s:Body>

</s:Envelope>


Socket::Receive, got
HTTP/1.1 200 OK
Socket::Receive, got


EXT:

CONTENT-TYPE: text/xml; charset="utf-8"

SERVER: POSIX, UPnP/1.0, MicroStack/1.0.4384



<?xml version="1.0" encoding="utf-8"?>

<s:Envelope s:encodingStyle="http://schemas.xmlsoap.org/soap/encoding/" xmlnsConfused="http://schemas.xmlsoap.org/soap/envelope/"><s:Body><u:Misc_getStbTimeUtcResponse xmlns:u="urn:schemas-upnp-org:service:EventServices:1"><result>0</result><time>1337789348</time></u:Misc_getStbTimeUtcResponse></s:Body></s:Envelope>
Socket::Receive, got

When the execution hangs, waiting on the semaphore, if I pause execution in the debugger and resume, the Sync call completes. Any ideas why?
Is there another trace log I can enable to find out more?
Is there something peculiar about the way semaphores work on iOS?
Surely I can make consecutive Sync calls through different Proxies?
Find all posts by this user
12-07-2012, 08:11 AM (This post was last modified: 12-07-2012 08:14 AM by simonc.)
Post: #2
RE: Proxy Sync call stuck waiting on semaphore.
You can make consecutive (or overlapping) calls through any number of different Proxies. I'd guess that you're seeing a bug in the iOS simulator but its possible this is an ohNet problem too...

To start narrowing the cause down, can you add some logging please? In InvocationUpnp::ReadResponse(), after the initialisation of actionName, add the lines
Code:
Log::Print("Response to ");
Log::Print(actionName);
Log::Print("\n");

Rebuild with DEFINE_TRACE defined and re-run your tests. You should see a logging statement for each action invocation that receives a full response. It'd be interesting to know if you see a statement for the apparantly stalled action. If you don't immediately see a statement, do you get it after you pause/resume the debugger?

(This tests out the reliability of networking support in the iOS simulator. If you see logging for all actions while your code is still blocked, networking is behaving correctly. If you don't see logging for the third action until you pause/resume the simulator, its likely this is a problem with the simulator.)
Find all posts by this user
12-07-2012, 10:10 AM
Post: #3
RE: Proxy Sync call stuck waiting on semaphore.
Code:
Log::Print("Response to ");
Log::Print(actionName);
Log::Print("\n");
Did this. This should print the actionName with no dump of the actual response, right?

I have worked around the problem with third call, by using an evented property. However, I am seeing a similar trace to the semaphore wait when invoking a different action. This time it causes a crash (bad memory access) with the following complete call trace for that thread:

Quote:#0 0x0006c98a in OpenHome::Swx::Write(OpenHome::Brx const&) at /Users/developer/ohNet/openhome-ohNet-b56bff8/OpenHome/Stream.cpp:188
#1 0x00065f10 in OpenHome::Net::InvocationBodyWriter::Write(OpenHome::Net::Invocation const&) at /Users/developer/ohNet/openhome-ohNet-b56bff8/OpenHome/Net/ControlPoint/Upnp/ProtocolUpnp.cpp:241
#2 0x0006625a in OpenHome::Net::InvocationBodyWriter::Write(OpenHome::Net::Invocation const&, OpenHome::Bwh&) at /Users/developer/ohNet/openhome-ohNet-b56bff8/OpenHome/Net/ControlPoint/Upnp/ProtocolUpnp.cpp:200
#3 0x00066778 in OpenHome::Net::InvocationUpnp::WriteRequest(OpenHome::Uri const&) at /Users/developer/ohNet/openhome-ohNet-b56bff8/OpenHome/Net/ControlPoint/Upnp/ProtocolUpnp.cpp:69
#4 0x00066aba in OpenHome::Net::InvocationUpnp::Invoke(OpenHome::Uri const&) at /Users/developer/ohNet/openhome-ohNet-b56bff8/OpenHome/Net/ControlPoint/Upnp/ProtocolUpnp.cpp:37
#5 0x00047836 in OpenHome::Net::CpiDeviceUpnp::Invocable::InvokeAction(OpenHome::Net::Invocation&) at /Users/developer/ohNet/openhome-ohNet-b56bff8/OpenHome/Net/ControlPoint/Upnp/CpiDeviceUpnp.cpp:309
#6 0x0004b914 in OpenHome::Net::Invoker::Run() at /Users/developer/ohNet/openhome-ohNet-b56bff8/OpenHome/Net/ControlPoint/CpiService.cpp:653
#7 0x0006ddfc in OpenHome::Thread::EntryPoint(void*) at /Users/developer/ohNet/openhome-ohNet-b56bff8/OpenHome/Thread.cpp:197
#8 0x00077240 in threadEntrypoint at /Users/developer/ohNet/openhome-ohNet-b56bff8/Os/Posix/Os.c:440

I am able to reproduce this crash at will. In this case, the Log::Print(actionName) does not appear.

Quote:(This tests out the reliability of networking support in the iOS simulator. If you see logging for all actions while your code is still blocked, networking is behaving correctly. If you don't see logging for the third action until you pause/resume the simulator, its likely this is a problem with the simulator.)
I am running on the iPhone device (arm7), not the iOS simulator. The OS should behave the same, but of course the underlying hardware for the Simulator is the Mac, it's Intel CPU, and multiple network interfaces.[/quote].

I hope that the area of the crash will shed some light on the endless semaphore wait.??
Find all posts by this user
12-07-2012, 04:28 PM
Post: #4
RE: Proxy Sync call stuck waiting on semaphore.
This could either be a bug in ohNet or client code. I'd tend to guess its in client code at this stage; you'd probably disagree :-).

I think I need to investigate this. Can you create a simple test case, something I can compile and run myself, that demonstrates a hang or crash? Or, failing that, can you send me the proxies you're using?
Find all posts by this user
17-07-2012, 08:09 AM
Post: #5
RE: Proxy Sync call stuck waiting on semaphore.
Hi Simon,

Apologies for the delay in my response. I have been having great difficulty creating a test case which reliably reproduces a crash or hang. I do actually have a test case which can trigger the hang at least once (sometimes more often). Unfortunately, I have not managed to simplify it down to something you could run at your end. Besides, I think it would be difficult to replicate the environment I have here (i.e. the STB running the Intel Upnp Device Microstack). Today I have been trying a more statistical analysis of the issues since there seems to be multiple issues here and the call trace to a crash is not always the same. I am now able to ask some more questions instead of being completely lost...

When the hang occurs after invoking an action (any action as far as I can see), I pause execution and...
1. OsNetworkReceiveFrom is running as the entry point to two separate threads. Each one is waiting at the select function in Os.c : 741.

Code:
int32_t selectErr = TEMP_FAILURE_RETRY(select(nfds(handle), &read, NULL, &error, NULL));

In this function, does aBytes represent the size of the given buffer? In one thread the value is 256 and the other it is 4294967295 (0xFFFFFFFF). That seems like a large size for a buffer!?
Also (manually) tracing OsNetworkReceiveFrom back up the call chain, I get to Socket::Receive, which has two different variants. How do I know which variant was called?

2. On rarer occasions, I get the following error log:

Quote:------------------------------
SequenceNumber: 3
Service: urn:schemas-upnp-org:service:ChannelServices:1
Action: Channel_set_channel
InputArg: ChHandle (4026531841)
ErrorLevel: Socket
ErrorCode: 2
ErrorDescription: Connection timed out
------------------------------

and on another occasion:

Quote:------------------------------
SequenceNumber: 4
Service: urn:schemas-upnp-org:service:ChannelServices:1
Action: Channel_set_channel
InputArg: ChHandle (4026531842)
ErrorLevel: Socket
ErrorCode: 2
ErrorDescription: Connection timed out
------------------------------

I should point out, that the response time to an action from the Upnp Device is not consistent and on occasions I have seen it take 55 seconds before responding to an action invocation!!
When this scenario does occur though, I would expect a Sync call from the Control Point (CP) side to either wait until it finally gets the response or time out. When it hangs though, the CP has definitely received a valid Upnp response, as indicated by the prints you recommended.

When the response is received, Pause-Resume execution will always complete the Upnp transaction. I wonder if Pause-Resume has some affect on thread priorities, overcoming the apparent deadlock?

I notice there are log messages carefully placed throughout the OhNet source. Can you advise how to enable/disable the different log levels in general? Can you advise on any particular logs I should enable? Are there any tests I could run to run through the shell environment to help diagnose the issue(s)?

I appreciate your support,

Doug
Find all posts by this user
17-07-2012, 01:13 PM (This post was last modified: 17-07-2012 01:14 PM by simonc.)
Post: #6
RE: Proxy Sync call stuck waiting on semaphore.
The various problems sound very much like memory corruption.

Tracking this down by logging on iOS is possible but will likely be slow. Would it be possible for you to create a simple control point program that runs on Windows or Linux? If you can get something running on Linux, try using valgrind to see if that gives you any hints about the source of the memory overwrite.

I'd also be interested to see your control point source to see if I can spot any problems (either current or Windows/Linux would be fine).

In case none of that helps, answers to your questions follow:

(17-07-2012 08:09 AM)dough Wrote:  When the hang occurs after invoking an action (any action as far as I can see), I pause execution and...
1. OsNetworkReceiveFrom is running as the entry point to two separate threads. Each one is waiting at the select function in Os.c : 741.

Code:
int32_t selectErr = TEMP_FAILURE_RETRY(select(nfds(handle), &read, NULL, &error, NULL));

In this function, does aBytes represent the size of the given buffer? In one thread the value is 256 and the other it is 4294967295 (0xFFFFFFFF). That seems like a large size for a buffer!?
0xFFFFFFFF is an invalid size for a buffer so probably indicates either memory over-writes or use of freed memory. The 256 byte buffer is also suspect - that seems awfully small so may well also be invalid.

(17-07-2012 08:09 AM)dough Wrote:  Also (manually) tracing OsNetworkReceiveFrom back up the call chain, I get to Socket::Receive, which has two different variants. How do I know which variant was called?
I can only see one function - Socket::ReceiveFrom - which ends up calling OsNetworkReceiveFrom. Are you sure none of the interesting looking threads are in calls to OsNetworkReceive instead?

(17-07-2012 08:09 AM)dough Wrote:  2. On rarer occasions, I get the following error log:

Quote:------------------------------
SequenceNumber: 3
Service: urn:schemas-upnp-org:service:ChannelServices:1
Action: Channel_set_channel
InputArg: ChHandle (4026531841)
ErrorLevel: Socket
ErrorCode: 2
ErrorDescription: Connection timed out
------------------------------

and on another occasion:

Quote:------------------------------
SequenceNumber: 4
Service: urn:schemas-upnp-org:service:ChannelServices:1
Action: Channel_set_channel
InputArg: ChHandle (4026531842)
ErrorLevel: Socket
ErrorCode: 2
ErrorDescription: Connection timed out
------------------------------
The ErrorCode: 2 line shows that the tcp connect at the start of action invocation timed out. ohNet uses a 3 second timeout by default; you can change this by calling OhNetInitParamsSetTcpConnectTimeout (note that this needs to be called before OhNetLibraryInitialise). You could also consider treating this as a bug on your STB; there may well be some bugs in that if it is unable to accept a small number of parallel action invocations.

(17-07-2012 08:09 AM)dough Wrote:  I should point out, that the response time to an action from the Upnp Device is not consistent and on occasions I have seen it take 55 seconds before responding to an action invocation!!
When this scenario does occur though, I would expect a Sync call from the Control Point (CP) side to either wait until it finally gets the response or time out. When it hangs though, the CP has definitely received a valid Upnp response, as indicated by the prints you recommended.
You're correct that each action invocation should either complete or timeout. The hangs you're seeing are definitely a bug. I wonder whether they're yet another instance of memory corruption. I'd be tempted to look into the crashes first and see if fixing them also fixes the hangs...

(17-07-2012 08:09 AM)dough Wrote:  When the response is received, Pause-Resume execution will always complete the Upnp transaction. I wonder if Pause-Resume has some affect on thread priorities, overcoming the apparent deadlock?
Can you explain how you're doing the pause-resume please? Are you running under a source-level debugger? If you are, is this on a simulator or on an iPhone?

(17-07-2012 08:09 AM)dough Wrote:  I notice there are log messages carefully placed throughout the OhNet source. Can you advise how to enable/disable the different log levels in general?
Use
Code:
Debug::SetLevel(Debug::kSomeLevel | Debug::kAnotherLevel);
where the levels are defined in Debug.h. You can call this at any point after initialising ohNet. The purpose of most of the debug levels should be pretty obvious. I'd caution against using the kNetwork level however - there are so many threads making networking calls that its very hard to make sense of the output.

(17-07-2012 08:09 AM)dough Wrote:  Can you advise on any particular logs I should enable? Are there any tests I could run to run through the shell environment to help diagnose the issue(s)?
As above, I'd recommend creating a control point that can run on a platform with a better debugger. If you have access to Linux boxes, that'd be ideal.

Also, I'm happy to have a look at any control point code to see if I can spot any bugs.
Find all posts by this user
17-07-2012, 01:39 PM (This post was last modified: 18-07-2012 01:01 PM by simonc.)
Post: #7
RE: Proxy Sync call stuck waiting on semaphore.
Can I check whether you're using subscriptions and their associated property change callbacks? If you are, be aware that these callbacks run in threads created by ohNet; the advice in https://developer.apple.com/library/mac/...i-CH15-SW2 will be relevant.

Later: The link above leads to a fairly long post. The "Using POSIX Threads in a Cocoa Application" section is most relevant.
Find all posts by this user
01-08-2012, 02:09 AM
Post: #8
RE: Proxy Sync call stuck waiting on semaphore.
The ohNet tests (particularly TestProxyC) were not passing when run on the iPhone. The hang occurred when running these tests.

The hanging issue has now been resolved. Just check out the latest release of ohNet.

Many thanks!
Find all posts by this user


Forum Jump: