configuration maxSemaphores for zuul server - netflix-zuul

I am trying to do load test for zuul version 1.1.2.
However I am keep getting following issue after few a minute for running load test.
Caused by: com.netflix.hystrix.exception.HystrixRuntimeException: book could not acquire a semaphore for execution and no fallback available.
at com.netflix.hystrix.AbstractCommand$21.call(AbstractCommand.java:783) ~[hystrix-core-1.5.3.jar:1.5.3]
My question is how can I increase maxSemaphores via confiugration.
hystrix.command.default.execution.isolation.thread.timeoutInMilliseconds= 20000000
zuul.hystrix.command.default.execution.isolation.strategy= SEMAPHORE
zuul.hystrix.command.default.execution.isolation.semaphore.maxConcurrentRequests= 10
zuul.hystrix.command.default.fallback.isolation.semaphore.maxConcurrentRequests= 10
zuul.semaphore.maxSemaphores=3000
zuul.eureka.book.semaphore.maxSemaphore=30000
I have tried search many option on Intenet but one of those works for me
Please advise

it turns out I am using old version. For later version we could set semaphores at Zuul level. below is an example to set the maxSemaphores 3000 as default for routing to every proxied service
zuul.semaphore.maxSemaphores=3000

The actual property is max-semaphores (this would be with yaml config):
zuul:
semaphore:
#com.netflix.hystrix.exception.HystrixRuntimeException: "microservice" could not acquire a semaphore for execution and no fallback available.
max-semaphores: 2000

Related

Quartz jobs failing after MySQL db errors

On a working Grails 2.2.5 system, we're occasionally losing connection to the MySQL database, for reasons that are not relevant here. The majority of the system recovers perfectly well from the outage. But any Quartz jobs (using Quartz plugin 0.4.2) are typically failing to run again after such an outage. This is a typical message which appears in the log at the point the job should run:
2015-02-26 16:30:45,304 [quartzScheduler_Worker-9] ERROR core.ErrorLogger - Unable to notify JobListener(s) of Job to be executed: (Job will NOT be executed!). trigger= GRAILS_JOBS.quickQuoteCleanupJob job= GRAILS_JOBS.com.aire.QuickQuoteCleanupJob
org.quartz.SchedulerException: JobListener 'sessionBinderListener' threw exception: Already value [org.springframework.orm.hibernate3.SessionHolder#593a9498] for key [org.hibernate.impl.SessionFactoryImpl#c8488d7] bound to thread [quartzScheduler_Worker-9] [See nested exception: java.lang.IllegalStateException: Already value [org.springframework.orm.hibernate3.SessionHolder#593a9498] for key [org.hibernate.impl.SessionFactoryImpl#c8488d7] bound to thread [quartzScheduler_Worker-9]]
at org.quartz.core.QuartzScheduler.notifyJobListenersToBeExecuted(QuartzScheduler.java:1868)
at org.quartz.core.JobRunShell.notifyListenersBeginning(JobRunShell.java:338)
at org.quartz.core.JobRunShell.run(JobRunShell.java:176)
at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:525)
Caused by: java.lang.IllegalStateException: Already value [org.springframework.orm.hibernate3.SessionHolder#593a9498] for key [org.hibernate.impl.SessionFactoryImpl#c8488d7] bound to thread [quartzScheduler_Worker-9]
at org.quartz.core.QuartzScheduler.notifyJobListenersToBeExecuted(QuartzScheduler.java:1866)
... 3 more
What do I need to do to make things more robust, so that the Quartz jobs recover as well?
By default, a Quartz job will get a session bound to it. Disable that session binding and let your service handle the transaction / session. That's what we do and when we get our DB connections back up, jobs still work.
To disable session binding in your job, add :
def sessionRequired = false

Isolating cause of Erlang and RabbitMQ crashes

We have been trying to make use of the RabbitMQ Service Bus (v3.3.4) but the central bus keeps crashing. At the moment we are not using any clustering and its hosted on Windows Server 2008 R2. We'd like to isolate the root cause but the below error is the only one we can find. Can anyone shed some light on what; if anything; we can do to find the root cause of this?
Note: There are roughly 20 consumers with roughly the same number of Topic subscriptions. Also, all the clients are .NET 4.5 using the 3.3.4 Rabbit client libraries.
Version=1
EventType=APPCRASH
EventTime=130658038736577295
ReportType=2
Consent=1
ReportIdentifier=7f93ccd8-9cbe-11e4-ae00-000c29c08139
IntegratorReportIdentifier=7f93ccd7-9cbe-11e4-ae00-000c29c08139
Response.type=4
Sig[0].Name=Application Name
Sig[0].Value=erl.exe
Sig[1].Name=Application Version
Sig[1].Value=0.0.0.0
Sig[2].Name=Application Timestamp
Sig[2].Value=5343035d
Sig[3].Name=Fault Module Name
Sig[3].Value=MSVCR100.dll
Sig[4].Name=Fault Module Version
Sig[4].Value=10.0.30319.1
Sig[5].Name=Fault Module Timestamp
Sig[5].Value=4ba220dc
Sig[6].Name=Exception Code
Sig[6].Value=40000015
Sig[7].Name=Exception Offset
Sig[7].Value=00000000000760d9
DynamicSig[1].Name=OS Version
DynamicSig[1].Value=6.1.7600.2.0.0.272.7
DynamicSig[2].Name=Locale ID
DynamicSig[2].Value=1033
DynamicSig[22].Name=Additional Information 1
DynamicSig[22].Value=8d79
DynamicSig[23].Name=Additional Information 2
DynamicSig[23].Value=8d79a00078e92d9c3d5d79d4324254fe
DynamicSig[24].Name=Additional Information 3
DynamicSig[24].Value=9af5
DynamicSig[25].Name=Additional Information 4
DynamicSig[25].Value=9af5b20633c279dbf44b04a614c6a1f6
UI[2]=C:\Program Files\erl6.0\erts-6.0\bin\erl.exe
UI[5]=Check online for a solution (recommended)
UI[6]=Check for a solution later (recommended)
UI[7]=Close
UI[8]=erl.exe stopped working and was closed
UI[9]=A problem caused the application to stop working correctly. Windows will notify you if a solution is available.
UI[10]=&Close
LoadedModule[0]=C:\Program Files\erl6.0\erts-6.0\bin\erl.exe
LoadedModule[1]=C:\Windows\SYSTEM32\ntdll.dll
LoadedModule[2]=C:\Windows\system32\kernel32.dll
LoadedModule[3]=C:\Windows\system32\KERNELBASE.dll
LoadedModule[4]=C:\Windows\system32\MSVCR100.dll
LoadedModule[5]=C:\Program Files\erl6.0\erts-6.0\bin\erlexec.dll
LoadedModule[6]=C:\Windows\system32\USER32.dll
LoadedModule[7]=C:\Windows\system32\GDI32.dll
LoadedModule[8]=C:\Windows\system32\LPK.dll
LoadedModule[9]=C:\Windows\system32\USP10.dll
LoadedModule[10]=C:\Windows\system32\msvcrt.dll
LoadedModule[11]=C:\Windows\system32\IMM32.DLL
LoadedModule[12]=C:\Windows\system32\MSCTF.dll
LoadedModule[13]=C:\Windows\system32\apphelp.dll
LoadedModule[14]=C:\Program Files\erl6.0\erts-6.0\bin\beam.dll
LoadedModule[15]=C:\Windows\system32\ADVAPI32.dll
LoadedModule[16]=C:\Windows\SYSTEM32\sechost.dll
LoadedModule[17]=C:\Windows\system32\RPCRT4.dll
LoadedModule[18]=C:\Windows\WinSxS\amd64_microsoft.windows.common-controls_6595b64144ccf1df_6.0.7600.16661_none_fa62ad231704eab7\COMCTL32.dll
LoadedModule[19]=C:\Windows\system32\SHLWAPI.dll
LoadedModule[20]=C:\Windows\system32\COMDLG32.dll
LoadedModule[21]=C:\Windows\system32\SHELL32.dll
LoadedModule[22]=C:\Windows\system32\WS2_32.dll
LoadedModule[23]=C:\Windows\system32\NSI.dll
LoadedModule[24]=C:\Windows\system32\IPHLPAPI.DLL
LoadedModule[25]=C:\Windows\system32\WINNSI.DLL
LoadedModule[26]=C:\Windows\system32\mswsock.dll
LoadedModule[27]=C:\Windows\System32\wshtcpip.dll
LoadedModule[28]=C:\Windows\system32\NLAapi.dll
LoadedModule[29]=C:\Windows\system32\DNSAPI.dll
LoadedModule[30]=C:\Windows\System32\winrnr.dll
LoadedModule[31]=C:\Windows\system32\napinsp.dll
LoadedModule[32]=C:\Windows\System32\wship6.dll
FriendlyEventName=Stopped working
ConsentKey=APPCRASH
AppName=erl.exe
AppPath=C:\Program Files\erl6.0\erts-6.0\bin\erl.exe

Python FTP hang

Say I want to use FTP in Python using the ftplib. I begin with this:
from ftplib import ftp
ftp = FTP('10.10.10.151')
If the FTP server is not online, however, it will hang right there indefinitely. The only thing that can kick it out is a keyboard interrupt as far as I know. I've tried this:
ftp.connect('10.10.10.151','21', 5)
With the five being a five second timeout. But the problem here is that I do not know of any way to use that line without first assigning ftp something. But if the server is offline, then the "ftp =" line will hang. So what use is ftp.connect()'s timeout function?!?
Does anybody know a workaround or anything? Is there a way to time out the "ftp = FTP(xxx)" command that I haven't found? Thanks.
I'm using Python 2.7 on Linux Mint.
Your call to connect() is redundant since FTP() method documentation states:
When host is given, the method call connect(host) is made.
Also, since Python 2.6, FTP() does have a timeout parameter:
class ftplib.FTP([host[, user[, passwd[, acct[, timeout]]]]])
The optional timeout parameter specifies a timeout in seconds for blocking operations like the connection attempt (if is not specified, the global default timeout setting will be used).

Quartz.net not firing on remote server

I've implemented quartz.net in windows service to run tasks. And everything works fine on local workstation. But once it's deployed to remote win server host, it just hangs after initialization.
ISchedulerFactory schedFact = new StdSchedulerFactory();
// get a scheduler
var _scheduler = schedFact.GetScheduler();
// Configuration of triggers and jobs
var trigger = (ICronTrigger)TriggerBuilder.Create()
.WithIdentity("trigger1", "group1")
.WithCronSchedule(job.Value)
.Build();
var jobDetail = JobBuilder.Create(Type.GetType(job.Key)).StoreDurably(true)
.WithIdentity("job1", "group1").Build();
var ft = _scheduler.ScheduleJob(jobDetail, trigger);
Everything seems to be standard. I have private static pointer to scheduler, logging process stops right after jobs are initialized and added to scheduler. Nothing else happens after.
I'd appreciate any advices.
Thanks.
PS:
Found some strange events in event viewer mb according quartz.net:
Restart Manager - Starting session 2 - ‎2012‎-‎07‎-‎09T15:14:15.729569700Z.
Restart Manager - Ending session 2 started ‎2012‎-‎07‎-‎09T15:14:15.729569700Z.
Based on your question and the additional info you gave in comments, I would guess there is something going wrong in the onStart method of your service.
Here are some things you can do to help figure out and solve the problem:
Place the code in your onStart method in a try/catch block, and try to install and start the service. Then check windows logs to see if it was installed correctly, started correctly, etc.
The fact that restart manager is running leads me to believe that your service may be dependent on a process which is already in use. Make sure that any dependencies of your service are closed before installing it.
This problem can also be caused by putting data-intense or long running operations in your onStart method. Make sure that you keep this kind of code out of onStart.
I had a similar problem to this and it was caused by having dots/periods in the assembly name e.g. Project.Update.Service. When I changed it to ProjectUpdateService it worked fine.
Strangely it always worked on the development machine. Just never on the remote machine.
UPDATE: It may have been the length of the service that has caused this issue. By removing the dots I shortened the service name. It looks like the maximum length is 25 characters.

ActiveResource timeout not functioning [duplicate]

This question already has an answer here:
Overriding/Modifying Rails Class (ActiveResource)
(1 answer)
Closed 3 years ago.
I'm trying to contact a REST API using ActiveResource on Rails 2.3.2.
I'm attempting to use the timeout functionality so that if the resource I'm contacting is down I can fail quickly - I'm doing this with the following:
class WorkspaceResource < ActiveResource::Base
self.timeout = 5
self.site = "http://mysite.com/restAPI"
end
However, when I try to contact the service when I know it isn't available, the class only times out after the default 60 seconds. I can see from the error stack that the timeout error does indeed come from an ActiveResource class in my gem folder that has the proper functions to allow timeout settings, but my set timeout never seems to work.
Any thoughts?
So apparently the issue is not that timeout is not functioning. I can run a server locally, make it not return a response within the timeout limit, and see that timeout works.
The issue is in fact that if the server does not accept the connection, timeout does not function as I expected it to - it doesn't function at all. It appears as though timeout only works when the server accepts the connection but takes too long to respond.
To me, this seems like an issue - shouldn't timeout also work when the server I'm contacting is down? If not, there should be another mechanism to stop a bunch of requests from hanging...anyone know of a quick way to do this?
The problem
If you're running on Ruby 1.8.x then the problem is its lack of real system threads.
As you can read first hereand then here, there are systemic problems with timeouts in Ruby. An interesting discussion but for you in particular some comments suggest that the timeout is effectively ignored and defaults to 60 seconds - exactly what you are seeing.
Solutions ...
I have a similar issue with our own product when trying to send emails - if the email server is down the thread blocks. For me the solution was to spin the request off on a separate thread and therefore my main request-processing thread doesn't block.
There are non-blocking libraries out there for Ruby but perhaps you could take a look first at this System Timeout Gem.
An option open to anyone using Rails behind a proxy like nginx would be to set the upstream timeout to a lower number - that way you'll get notified if the server is taking too long. I'd only do this if I were really stuck for a solution.
Last but not least, it's possible that running Rails 2.3.2 on top of Ruby 1.9.1 will fix the issue.
Alternatively, you could try to catch these connection errors and retry once (after certain period of time) just to make sure the connection is really out.
retried = false
begin
#businesses = Business.find(:all, :params => { :shop_domain => #shop.domain })
retried = false
rescue ActiveResource::TimeoutError => ex
#raise ex
rescue ActiveResource::ConnectionError, ActiveResource::ServerError, ActiveResource::ClientError => ex
unless retried
sleep(((ex.respond_to?(:response) && ex.response['Retry-After']) || 5).to_i)
retried = true
retry
else
# raise ex
end
end
Inspired by this solution from Shopify for paginating a large number of records. https://ecommerce.shopify.com/c/shopify-apis-and-technology/t/paginate-api-results-113066

Resources