nginx restart issues - ruby-on-rails

ive a peculiar consistent problem in production. im running rails3 + nginx with latest datamapper and ruby-enterprise
Everytime a deploy a new version (touch restart.txt) i get a bunch of errors (happen during different requests) just after the deploy has happened. The errors are not always the same:
DataObjects::SQLError: Lost connection to MySQL server during query
ArgumentError: Field-count mismatch. Expected 1 fields, but the query yielded 10
ArgumentError: Field-count mismatch. Expected 10 fields, but the query yielded 1
DataObjects::SQLError: Lost connection to MySQL server during query
im running an other rails app (2.3+apache+ruby-ent) with active record and i NEVER EVER had any problem during restarts
Does anyone have some advise on why this happens and how to get rid of it?
thanks
Anders

Do you get the same errors when you do a sudo kill -HUP nginx_pid? (do a sudo ps aux|grep nginx to get the pid).
It is indeed a very strange set of errors you're getting. Perchance you still have a session open to your db while you're restarting, causing problems with your db pool? Rails db access is usually intermittent, but I can imagine issues happening if you have a long running db query going and you attempt to restart Rails.
The fact that the errors keep changing would lead me to believe that the errors are related to resource access, rather than problems with your config.

Related

Postgresql, Rails - could not fork autovacuum worker process: Resource temporarily unavailable

This is happening to me while in my local environment, Mac OSX, every time I start my server - puma - and workers - resque.
The logs don't say anything helpful, just a repeated, "could not fork autovacuum worker process: Resource temporarily unavailable."
Until I turn ctr-c out of the server, it locks up my entire computer. When I try to visit a site in the browser it just hangs, and when I open a new tab in the terminal it says, 'pipe broken' and closes it. The MAC console isn't spitting out anything helpful, at least from what I can tell.
Anyone have any thoughts to why this is?
I've restarted Postgres multiple times to no avail.
EDIT:
Log just started spitting out, 'LOG: could not fork new process for connection: Resource temporarily unavailable'
Puma thread count:
threads_count = ENV.fetch("RAILS_MAX_THREADS") { 10 }.to_i
DB: pool: 100
EDIT2:
Tried to increase max_connection count from 100 to 200, still nothing. Ran into the duplicate postmaster.id error. Removed it, restarted Postgres, but still the same issue remains - cleared the postmaster.id error though.
I followed these two articles, and so far this seems to work. Will update if something changes.
https://github.com/sociam/indx/wiki/Increasing-max-connections-under-os-x
http://big-elephants.com/2012-12/tuning-postgres-on-macos/
Edit: From what I have experienced, this hasn't helped me. Removing the postmaster.pid doesn't seem to do much either, but seems to do more than said above. If any one stumbles upon this and figures it out, if you could post about it that would be great. I'll update if anything else changes. FWIW, when this happens, sometimes Reddis takes a shit and says that it no longer can save to disk.
I was having the same issue and tried the fixes that #jack-rothrock had proposed on his answer, to no avail.
I noticed that when I tried to start postgres from the command line using homebrew services, I would get a message that postgres was already running, which reminded me that I had installed the "Postgres App" (the postgres application you can download from http://postgresapp.com/ . I relaunched that application and not everything works.
Nice way to start the year!

"mapping values are not allowed in this context at line xx" when running DelayedJob

One day on my production server with Rails 3.2.13 app DelayedJob stoped working and there was no way to run it again. I haven't made any changes on the server before. When trying to run rake jobs:work I saw error:
mapping values are not allowed in this context at line xx
this error is always connected with parsing some yaml file.
When I was searching for problem I
restarted my app
checked for yaml problems
checked for system problems
and everything seemed to be fine.
Where could be the problem?
Finally I tried to run first job from rails console by DelayedJob.find(x).invoke_job and the problem was in one specific job and its handler description. I remover this one and then started delayed_job without problem. So if you have that kind of problem start searching from your first job in queue.

can't stop Neo4j server - ERROR: Neo4j Server not running

My neo4j server wont stop. Whenever my DB has my database loaded and I type:
./bin/neo4j stop
I get
ERROR: Neo4j Server not running
rm: remove write-protected regular file `/home/sa20/neo4j-enterprise-2.0.0/data/neo4j-service.pid'?
I then kill -9 the process and it corrupts my DB :( Does anyone have any idea why this might happen. I dont get the problem with a fresh empty DB
Thanks Michael you led me down the right road. It seems that my data load script was re-starting neo4j using root rather than my application user account. This was making the owner of the .pid file locked to my user. Lesson learned like the old addage "don't do anything as root"

Thinking Sphinx delta indexing fails in production

Here's what I've determined:
Delta indexing works fine in development
Delta indexing does not work when I push to the production server, and no action is logged in searchd.log
I'm running Phusion Passenger, and, as recommended in the basic troubleshooting guide, have confirmed that:
www-data has permission to run indexing rake tasks (ran them from command line manually)
the path to indexer and searchd are correct (/usr/local/bin)
there are no errors in production.log
What on earth could I possibly be missing? I'm running Ruby Enterprise 1.8.6, Rails 2.3.4, Sphinx 0.9.8.1, and Thinking Sphinx 1.2.11.
Thanks!
Last night as I slept it hit me. Unsurprisingly, it was a stupid issue involving bad configuration, though I am rather surprised that it produced the results it did. I guess I don't know much about Thinking Sphinx internals.
Recently I migrated servers. sphinx.yml looked like this:
production:
bin_path: '/usr/local/bin'
host: mysql.mysite.com
On the new server, MySQL was just a local service, but I had forgotten to remove that line. Interestingly, manual rake reindexing still worked just fine. I'm intrigued that Thinking Sphinx didn't throw an error when trying to reload the deltas, since mysql.mysite.com no longer exists, even though that was clearly the source of the issue.
Thanks for all your help, and sorry to have brought up such a silly problem.
Are there any clues in Apache/Nginx's error log?
Here's the next troubleshooting step I would take. Open up the file for the delta indexing strategy you are using (presumably lib/thinking_sphinx/deltas/default_delta.rb). Find the line where it actually generates the indexing command. In mine (v1.1.6) it's line 20:
output = `#{config.bin_path}indexer --config #{config.config_file} #{rotate} #{delta_index_name model}`
Change that so you can log the command itself, and maybe log the output as well:
command = `#{config.bin_path}indexer --config #{config.config_file} #{rotate} #{delta_index_name model}`
RAILS_DEFAULT_LOGGER.info(command)
output = `#{command}`
RAILS_DEFAULT_LOGGER.info(output)
Deploy that to production and tail the log while modifying a delta-indexed model. Hopefully that will actually show you the problem. Of course maybe the problem is elsewhere in the code and you won't even get to this point, but this is where I would start.
I was having this problem and found the "bin_path" solution mentioned above. When it didn't seem to work, it took me a while to realize that I'd pasted in the example code for "production" when I was testing on "stagiing" environment. Problem solved!
This was after making sure that the rake tasks that config, index, and start sphinx are all running as the same user as your passenger instance. If you log into the server as root to run these tasks, they will work in the console but not via passenger.
I had the same problem. Works on the command line, not inthe app.
Turns out that we still had a slave database that we were using for the indexing, but the slave wasn't getting updated.
As above, same issues were faced our side on two machine. The first one we had an issue with mysql which showed in apache2 log. Only seemed to affect local OSX machine..
Second time when we deployed to Ubuntu server, we had same issue. Rails c production was fine, no errors, bla bla bla.
Ended up being a permissions problem. Couldn't figure this out as there were no problems starting, although I guess I was doing so as root.
Using capistrano and passenger, we did this:
Create a passenger user and added to www-data group
Changed user in deploy.rb to be passenger
Manually changed all the /current files to be owned by above
Logged in as passenger user.
Ran rake ts:rebuild RAILS_ENV="production"
Worked a treat for us...
Good luck

Oracle problems in Rails with rake, but not with site

I'm working on a Rails site that connects to an Oracle database, and though I didn't build the site from scratch, I'm doing maintenance work. The site uses the delayed_jobs plugin to handle some background tasks and I'd like to be able to run rake jobs:work on the development server to periodically process all jobs in the queue (due to the server's configuration, running a daemonized version of the script on the development server isn't an option). However, whenever I try running the command, I get the following classic Oracle error:
error while trying to retrieve text for error ora-12154
Ordinarily, I'd think this would be an authentication problem (e.g. incorrect credentials in database.yml), but the site is up and running fine (and doing lots of database stuff). I've tried adding RAILS_ENV=production as a parameter to rake to force it to run in in the production environment, but got the same error (there are two separate rails installations for the production and development versions of the site, and I've set the "development" and "production" credentials in development's db config file to be identical).
I'm not sure what could be causing this error, and I don't have a ton of experience using Oracle with rails. Any suggestions?
Thanks a lot!
Justin
EDIT (10/26/09): Still can't figure out what's causing the problem here. The app continues to run (and talk to the database) without a problem, but rake keeps throwing DB errors. So does script/console, which shows a prompt but first complains with the same Oracle error message. I'm going to keep looking, but I'm running out of ideas...
EDIT(10/26/09, later): Following the advice of this link, I set both ORACLE_HOME and TNS_ADMIN to point to the directory where tnsnames.ora lives. Just setting ORACLE_HOME had no obvious effect, but now that TNS_ADMIN points to the right place, I've started getting segmentation faults whenever I try to open the console or run rake:
/usr/local/lib/ruby/site_ruby/1.8/oci8.rb:184: [BUG] Segmentation Fault
and get booted unceremoniously back to the prompt. Any further ideas?
Finally got it...turns out that ORACLE_HOME wasn't being correctly set as an environment variable for my user account. Now rake, script/console, etc. are humming along happily.
The oracle error says the following:
ORA-12154 is generated by the oracle network layer. TNS error message is thrown during the logon process to a database. This error indicates that the communication software in Oracle ( SQL *Net or Net8 ) did not recognize the host/service name specified in the connection parameters. This error almost always indicates a misconfiguration of the oracle tns entries.
Can you connect to your oracle instance using sqlplus or another db tool?
It is odd that the app runs fine though.
Is there an $ORACLE_SID laying around somewhere that could be pointing to a db that doesn't exist?
IN sql server I would probaly run profuiler to see what is actually being sent vice what I think I have set up. I'm sure Oracle aslo has some type of profiling utility. I would try that and see, you may find it isn't using the credentials you tink it is.
Well, as Mike mentioned, ora-12145 means TNS couldn't resolve the database identifier (TNS is Oracle's name-to-database mapping, sorta-kinda like DNS). If you can find your tnsnames.ora file, you can see what databases are configured there and compare that to the database.yml file. The fact that it works as a delayed job but not from the command line is a bit odd, though, and makes me think that perhaps there are some environment variables being set in one context that aren't in the other.
If neither of those pan out, there's a long list of troubleshooting suggestions at http://ora-12154.ora-code.com that are specific to that error code.

Resources