Asterisk stops processing calls under heavy load

Get help with installing, upgrading and running Asterisk.

Moderators: muppetmaster, Moderator, Support

Asterisk stops processing calls under heavy load

Postby illizit » Thu Jul 02, 2015 8:56 pm

Hi,

I am having an issue with Asterisk 11.18.0 where after a certain amount of queue calls (I would say about 40), the system stops responding to inbound calls. It doesn't actually ever crash but I can see many locks (Core show locks) in the system. Some are related to mysql. I have an external system that is querying the database and it never stops functioning therefore I don't believe it is a mysql issue.

The only errors I can see in the log are:
Code: Select all
exceptionally long voice queue length (this error starts showing up under heavy load).


htop and top show only 20% cpu usage during the occurrence of the issue and RAM does not seem to be an issue either.

Running Centos 6.8 32bit. 8GB RAM (PAE). Quad Core processor with HT x5550 2.67Ghz.

Any suggestions? What else can I use to troubleshoot?
illizit
Newsterisk
 
Posts: 15
Joined: Mon Oct 11, 2010 6:04 pm

Re: Asterisk stops processing calls under heavy load

Postby david55 » Fri Jul 03, 2015 1:39 am

https://wiki.asterisk.org/wiki/display/ ... rADeadlock gives information on how to debug deadlocks.
david55
Moves Like Spencer
 
Posts: 12570
Joined: Fri Sep 26, 2008 5:03 am

Re: Asterisk stops processing calls under heavy load

Postby illizit » Fri Jul 03, 2015 2:13 am

core show locks:

http://pastebin.com/iUC2KUq3



gdp threads:

http://pastebin.com/jYCWHjZX


Not sure how to read these exactly, any assistance would be appreciated.
illizit
Newsterisk
 
Posts: 15
Joined: Mon Oct 11, 2010 6:04 pm

Re: Asterisk stops processing calls under heavy load

Postby david55 » Fri Jul 03, 2015 3:04 am

You are looking for circles of two or more locks where threads are holding one and waiting on the other, or, in this case, signs that a lock is being held indefinitely for other reasons, namely the odbc lock at 0xab3e488 in this case.

It looks to me as though your ODBC connection for call logging has stalled and everything is waiting for it.

That's likely to be a third party problem and is one of the reasons why I would want to keep call logging as simple as possible.
Last edited by david55 on Fri Jul 03, 2015 5:33 am, edited 1 time in total.
david55
Moves Like Spencer
 
Posts: 12570
Joined: Fri Sep 26, 2008 5:03 am

Re: Asterisk stops processing calls under heavy load

Postby jcolp » Fri Jul 03, 2015 4:44 am

I agree, it looks as though something is up with the underlying database connection.

This is personally why I don't like tightly coupling Asterisk to databases. It's handy at times, but if your database acts up then Asterisk reflects that.
Joshua Colp
Digium, Inc. | Senior Software Developer
jcolp
Oldsterisk
 
Posts: 248
Joined: Tue May 19, 2015 6:59 am

Re: Asterisk stops processing calls under heavy load

Postby illizit » Sat Jul 04, 2015 10:54 am

I have now moved the mysql database and put it on it's own dedicated server. Let's see how it functions today.

Thanks
illizit
Newsterisk
 
Posts: 15
Joined: Mon Oct 11, 2010 6:04 pm

Re: Asterisk stops processing calls under heavy load

Postby illizit » Sun Jul 12, 2015 12:22 pm

Hi, we had the same issue occur almost immediately. Have not had much time to come back here because we had chaos with the system going down during busy hours (100 concurrent calls). We had to move them to their old Asterisk server which does not suffer from this issue.

Things we have tried:

Installed Asterisk 11 on brand new hardware (HP Server, Dual Xeon [32 cores total], 96GB RAM, RAID 10 10k Hard Drives). I know, overkill. This was all we had available at the time. Same issue occurs with this new server. I can successfully reproduce the issue with sipp on a simple dialplan:

Code: Select all
[from-sip-test]
exten => 696969,1,Answer
exten => 696969,n,Musiconhold(default)
exten => 696969,4,Hangup


After about 30 calls, (10 calls per second), sipp starts receivinv transmit errors (no response from host).

command:
netstat -anNp | grep 5060


Shows thousands of packets queued.

Asterisk only shows about 10 calls.
See this in the error log:

Code: Select all
res_musiconhold.c: Failed to write frame to 'SIP/sip-test-00000024': Resource temporarily unavailable


Htop shows over 80GB of RAM free, CPU spikes to almost 95% during this time (all 32 cores).

Backtrace:
http://pastebin.com/qY6X5mqm

Core show locks:
http://pastebin.com/cUdFm5mB

I tried different Asterisk versions to 11.12, 11.16, same results. This was running on 11.18. Old server which does not suffer from this issue (as far as I can tell) is running 11.12 x64 on debian. This test was done on Ubuntu 14, 64 bit.

Not sure what else to try.
illizit
Newsterisk
 
Posts: 15
Joined: Mon Oct 11, 2010 6:04 pm

Re: Asterisk stops processing calls under heavy load

Postby illizit » Sun Jul 12, 2015 12:36 pm

Oh, and I removed CDR logging via odbc/mysql. It now logs directly to filesystem with csv files.
illizit
Newsterisk
 
Posts: 15
Joined: Mon Oct 11, 2010 6:04 pm

Re: Asterisk stops processing calls under heavy load

Postby david55 » Tue Jul 14, 2015 11:16 am

At the moment, the forum server's encryption is unsafe as far as Debian Iceweasel is concerned, which means that it is not easy for me to access it from home, where I have some time. Consequently you are going to have to do more of the leg work and actually work out the cycle of locks, or the locks that seem to be being held for too long, and the threads that seem to be running when you would expect them to be waiting.
david55
Moves Like Spencer
 
Posts: 12570
Joined: Fri Sep 26, 2008 5:03 am

Re: Asterisk stops processing calls under heavy load

Postby malcolmd » Tue Jul 14, 2015 11:54 am

@david55, I've opened a ticket for this already with Digium's IT folks.
Malcolm Davenport
Digium, Inc. | Senior Product Manager
malcolmd
Moves Like Spencer
 
Posts: 3019
Joined: Wed Aug 03, 2005 3:53 pm
Location: Huntsville, AL, US

Re: Asterisk stops processing calls under heavy load

Postby david55 » Thu Jul 16, 2015 4:26 pm

There are a very large number of threads waiting for /var/lib/asterisk/bin/record_runafter.pl

That has either stalled or you have overwhelmed the system with them. As it isn't part of Asterisk, you will need to debug that.

If it, in turn, uses AMI, or asterisk -r, I suppose you could have a deadlock that is not explicitly visible.

There also seem to be a lot of locks associated with TLS sessions. Are you using the HTTP AMI interface, by any chance?
david55
Moves Like Spencer
 
Posts: 12570
Joined: Fri Sep 26, 2008 5:03 am

Re: Asterisk stops processing calls under heavy load

Postby illizit » Fri Jul 17, 2015 12:14 am

The issue occurs regardless of running that script. No, I do not use the http AMI interface. The simple dialplan I showed in my previous post, which does not run that script, still has calls failing after 30 concurrent calls. I opened a ticket with digium and they mentioned this seems like a bug. After downgrading to Asterisk 11.7, I can now sustain about 150 concurrent calls at a rate of 15 calls per second. Still not as many as the old server, but much better.
illizit
Newsterisk
 
Posts: 15
Joined: Mon Oct 11, 2010 6:04 pm

Re: Asterisk stops processing calls under heavy load

Postby david55 » Fri Jul 17, 2015 2:52 am

The number of scripts running still needs to be explained, as there is clearly some resource those scripts need but are not getting. I suppose it could be CPU, if Asterisk is running at elevated priority. Running lots of perl scripts is likely to be expensive because of the set up time needed for each one.

The other big factor was in AMI. I was guessing HTTP because they seemed to be holding locks whilst reading network input, which I wouldn’t normally have expected, but I;d need to go down into the code to find the exact context. That lock wasn't part of a classic deadlock, so the missing resource has to be something else, e.g. network input itself.

In my view AMI should not be holding locks at such points. It already has a problem that network output stalling can cause it to hold critical locks. and freeze the system. However I can't see that happening here. We modified it to buffer all the output in a file, to decouple network output from internal locks, although that will increase processor and memory load.
david55
Moves Like Spencer
 
Posts: 12570
Joined: Fri Sep 26, 2008 5:03 am

Re: Asterisk stops processing calls under heavy load

Postby david55 » Sat Jul 18, 2015 5:55 am

I can see no lock cycles. Although you have a large number of locks, that looks more like the result of the system being close to overload than any classical deadlock, as the configuration changes between the show locks and the backtrace.

All the Asterisk threads appear to be in waits, although I can't be sure that some of these aren't spin locks, without ps -l output with the show individual threads option.

What I do see is about 56 threads apparently stalled in the runafter script, so, even if that is not the only factor you need to either explain why it is stalling or remove it from the equation, and re-run the analysis to find the next level of problem.

Is this script transferring recordings over the network, but it may well be that you have network problems. That might also explain why you had problems with the database.

I didn't fully understand the bit about Digium and bug reports. Digium only handle bugs in commercially supported derivatives of Asterisk or relating to support to the hardware they sell. Were you referring to the community issues tracker at issues.asterisk.org. If so you should cross link the report and this thread.

Regarding overloads. A system, like Asterisk, that uses locks heavily can rapidly increase its CPU usage as it approaches the point of overload, because locks that are taken are much more expensive than ones that are not. I'm not sure where the tipping point is, but, for example, you may find CPU uses goes up fairly smoothly to 80%, but only a little more load is needed to exceed the processing power of the system.
david55
Moves Like Spencer
 
Posts: 12570
Joined: Fri Sep 26, 2008 5:03 am

Re: Asterisk stops processing calls under heavy load

Postby illizit » Sat Jul 18, 2015 10:19 am

What I was referring to when I mentioned Digium support is I currently have a support agreement for Open Source Asterisk. I submitted a ticket and their final response was "it may be a bug". I had posted here simultaneously to receive different opinions and also, once I found the solution, to post it as well.

They were quick to blame of what appeared to be, the last process in the "waiting" state. First it was the mysql connection. I then removed mysql was same results. Then it was "record_run_after.pl" script. Removed that as well with same results. Finally, they agreed that there was some issue with the core and suggested to downgrade the version of Asterisk. I tried the latest 11 certified version.

Simply downgrading Asterisk and changing NOTHING at all, the issue almost completely disappeared and I can now handle 150 concurrent calls. Prior to the downgrade, I could only handle about 30 concurrent calls.

I may not have provided the debug information for the other test scenario, but I can assure you, I removed any scripts and had that simple dialplan I showed in my previous post with the same exact issue. I will change the Asterisk version once again, during after hours, to reproduce the debug information with the simple dialplanI I posted previously (basic music on hold). Perhaps it can assist troubleshooting the source of the bug/issue.

Thanks for taking the time and looking into this. I understand it is complex and hopefully this can benefit others in the process.
illizit
Newsterisk
 
Posts: 15
Joined: Mon Oct 11, 2010 6:04 pm

Re: Asterisk stops processing calls under heavy load

Postby leri » Wed Nov 18, 2015 4:47 am

Thanks Illizit for being so clear in your analysis.

I just wanted to remark we have the same problem... once I get to 30-35 concurrent calls using an predictive dialer application, Asterisk 11.17 stops responding. CPU, IO, Troughput and Memory looks perfectly stable (its a IBM Xeon with 8 core and 4 MB ram)... we have downgraded to 11.10 and see what happens.. then I'll let you know. I hope we could find something to clarify wich is the problem in order to look for a solution.
leri
Newsterisk
 
Posts: 1
Joined: Wed Nov 18, 2015 4:39 am

Re: Asterisk stops processing calls under heavy load

Postby illizit » Wed Nov 18, 2015 12:03 pm

leri wrote:Thanks Illizit for being so clear in your analysis.

I just wanted to remark we have the same problem... once I get to 30-35 concurrent calls using an predictive dialer application, Asterisk 11.17 stops responding. CPU, IO, Troughput and Memory looks perfectly stable (its a IBM Xeon with 8 core and 4 MB ram)... we have downgraded to 11.10 and see what happens.. then I'll let you know. I hope we could find something to clarify wich is the problem in order to look for a solution.



Did you see any errors in the logs? I would try using the certified version of asterisk unless you have a specific need for a different version.
illizit
Newsterisk
 
Posts: 15
Joined: Mon Oct 11, 2010 6:04 pm

Re: Asterisk stops processing calls under heavy load

Postby illius » Wed Jan 27, 2016 1:17 pm

Illizit, have you found any solutions since, besides downgrading?

I'm having much the same problem. I can't queue more than, say, 10 calls before things start to go sour. The excessive locking in app_queue.c drives system load in a quad-cpu system beyond 4. The actual cpu usage isn't pegged, the load is just driven up because all the futex locking clogs up the threads. If I queue more, say 20 or more, eventually all of the rtp streams start stuttering and chopping audio, even though there may even be idle cpu available. I guess futex locks don't show up in top as real cpu usage.

I can't really downgrade to the certified 11 version because I'm already heavily invested in Asterisk 13 and its new way of doing things, and I need the latest versions fixes because the older asterisk 13 app_queue.c segfaults sometimes.
illius
Newsterisk
 
Posts: 1
Joined: Wed Jan 27, 2016 12:56 pm


Return to Asterisk Support

Who is online

Users browsing this forum: Exabot [Bot] and 1 guest

cron