What is a good Rails logging solution? - ruby-on-rails

I am looking for a solution that will allow me to do advanced logging:
Unlimited log size
Ability to filter by priorities (debug/info/error)
Ability to filter by models/custom- tag
Ability to filter by user-sessions (see only errors for a specific session)
Should be able to work on Heroku
*Optional: Set rules to email/sms on certain high-priority errors
Either a tool that works with files and can easily diesct them, or a DB backed log storage.
Any suggestions are most welcome

Try Log4r first; if it doesn't do exactly what you need, it's pretty tweakable.

Related

Serilog | How to control logging level based upon a custom filter

I would like to be able to dynamically control the logging level for Serilog based upon a custom query. At minimum, I would like to be able to filter log messages by the start of their SourceContext (ie, the namespace/full class name) - this could be optimized with a compiled regex statement/state machine to handle the matching.
Ideally, in addition to the above, I could create SourceContext filters that are applicable to only certain users - such as via a UserId field on the event - that would take priority over filters that don't have a UserId specified.
I would like to be able to dynamically modify the set of such filters via an Admin website, allowing myself or other admins to increase/decrease the verbosity level of specific parts of the code and for specific users as the need arises.
I was wondering if anyone here knows the best way to go about implementing such a feature with Serilog? I've googled this, but I haven't found any filtering capabilities out of the box for Serilog that look like they'd be up to this task. In which case, I might go about it by means of an intermediate, in-process Sink.
I found an answer to a related but distinct issue that looks like a good solution: How to change the LogLevel of specific log events in Serilog?_
It looks like Serilog does provide a mechanism for wrapping sinks in intermediate sinks, and it shouldn't be too difficult to apply the filtration logic I'm seeking by doing so.

Grails - Limit an IP Address' Upload Rate

I am creating a Grails application and I'm trying to figure out the best way to prevent a user from spamming posts on my server. I have an infinite number of forms where they can leave comments. I don't want them to have the ability to send a million comments. I know there exists a way to mock "server lag" so that the data rate slows down. Within the grails framework, is there a good way to set the maximum post size limit/rate?
I tried looking into any possibility of setting this via the tomcat properties but I wasn't having too much luck there with my own research.
Thanks much!!!
If I understand the question correctly, you want to restrict the size of a post which is provided by the user. If yes, then you can add the maxSize constraint in the domain class (or any command object if it is used).
If you are looking to prevent form re-submission then you can use formTokens to prevent a duplicate submission.

Using mixpanel to build custom analytics dashboard for users

I love graphs.
I'd love to get my hands on some data and make it look pretty. But alas, I'm a little lost on what would be considered best practice.
I've selected mixpanel (only as an example) as I seems wonderfully easy to track custom events, and doesn't have any subdomain limitation like Google Analytics.
Say I had 100-1000+ users who have an account (which is publicly facing), and I'm currently tracking the public interactions their pages get. With mixpanel, I can see the data which is lovely, and I've segmented it to individual accounts. So far, so good!
But then, I want to show my users this information. And here my head begins to hurt. Do I schedule a cron jobs, pulling in the data from mixpanel and writing it to their respective accounts? Or is there a better way? I've looked into mixpanel's api (I'm using Ruby), but they keep telling me I should use the javascript api. But in using JS, how does one prevent others getting the data (ie. what's stopping someone faking mixpanel api-posts in the console, or viewing my private key?).
What would you consider a practical solution in such a case?
You can achieve this by storing the user specific events of each user with a $bucket property attached which has a value unique to each user as explained in the mixpanel docs here Mixpanel docs. If you want to still use ruby to serve the events, have a look at Mixpanel's recommended ruby client libraries
mixpanel_client looks like the much maintained option of the 2 mentioned. If you go with that then you can serve user specific events as shown in the example below(which is also in the gem's readme):
data = client.request do
# Available options
resource 'events/properties'
event '["test-event"]'
name 'hello'
values '["uno", "dos"]'
timezone '-8'
type 'general'
unit 'hour'
interval 24
limit 5
bucket 'contents'
from_date '2011-08-11'
to_date '2011-08-12'
on 'properties["product_id"]'
where '1 in properties["product_id"]'
buckets '5'
end
You could try a service like Keen IO that will allow you to generate encrypted scoped write and read API keys. Keen IO is built for customizable and programmatic analytics features such as exposing analytics to your customers, where as MixPanel is more for exploring your data in their UI. The idea with an encrypted scoped key is they will never be able to access your account, only the data you want them to see. You could easily tag your events with a customer ID and then use the Scoped Keys to ensure that you only ever show customers their own data.
https://keen.io/docs/security/#scoped-key
Also, Keen IO has an "importer" which allows you to export your mixpanel events into your Keen IO database.

Prevent bot from crawling certain areas of site

I don't know much about SEO and how web spiders work, so forgive my ignorance here. I'm creating a site (using ASP.NET-MVC) which has areas that displays information retrieved from the database. The data is unique to the user, so there's no real server-side output caching going on. However, since the data can contain things the user may not wish to have displayed from search engine results, I'd like to prevent any spiders from accessing the search results page. Are there any special actions I should take to ensure that the search result directory isn't crawled? Also, would a spider even crawl a page that's dynamically generated and would any actions preventing certain directories being search mess up my search engine rankings?
edit: I should add, I'm reading up on robots.txt protocol, but it relies on co-operation from the web crawler. However, I'd also like to prevent any data-mining users who will ignore the robots.txt file.
I appreciate any help!
You can prevent some malicious clients from hitting your server too heavily by implementing throttling on the server. "Sorry, your IP has made too many requests to this server in the past few minutes. Please try again later." In practice, though, assume that you can't stop a truly malicious user from bypassing any throttling mechanisms that you put in place.
Given that, here's the more important question:
Are you comfortable with the information that you're making available for all the world to see? Are your users comfortable with this?
If the answer to those questions is no, then you should be ensuring that only authorized users are able to see the sensitive information. If the information isn't particularly sensitive but you don't want clients crawling it, throttling is probably a good alternative. Is it even likely that you're going to be crawled anyway? If not, robots.txt should be just fine.
It seems like you have 2 issues.
Firstly a concern about certain data appearing in search results. The second about malicious or unscrupulous user harvesting user related data.
The first issue will be covered by appropriate use of a robots.txt file as all the big search engines honour this.
The second issue seems more to do with data privacy. The first question which immediately springs to mind is: If there is user information which people may not want displayed, why are you making it available at all?
What is the privacy policy for such data?
Do users have the ability to control what information is made available?
If the information is potentially sensitive but important to the system could it be restricted so it is only available to logged in users?
Check out the Robots exclusion standard. It's a text file that you put on your site that tells a bot what it can and can't index. You will also want to address what happens if a bot doesn't honour the robots.txt file.
robots.txt file as mentioned. If that is not enough then you can:
Block unknown useragents - hard to maintain, easy for a bot to forge a browser's (although most legitimate bots wont)
Block unknown IP addresses - not useful for a public site
Require logins
Throttle user connections - tricky to tune, you will still be disclosing information.
Perhaps by using a combination. Either way it is a trade off, if the public can browse to it, so can a bot. Be sure you don't block & alienate people in your attempts to block bots.
a few options:
force the user to login to view the content
add a CAPTCHA page before the content
embed content in Flash
load dynamically with JavaScript

Stopping Session Sharing between malicious users in Rails

What's the best way to keep users from sharing session cookies in Rails?
I think I have a good way to do it, but I'd like to run it by the stack overflow crowd to see if there's a simpler way first.
Basically I'd like to detect if someone tries to share a paid membership with others. Users are already screened at the point of login for logging in from too many different subnets, but some have tried to work around this by sharing session cookies. What's the best way to do this without tying sessions to IPs (lots of legitimate people use rotating proxies).
The best heuristic I've found is the # of Class B subnets / Time (some ISPs use rotating proxies on different Class Cs). This has generated the fewest # of false positives for us so I'd like to stick with this method.
Right now I'm thinking of applying a before filter for each request that keeps track of which Subnets and session_ids a user has used in memcached and applies the heuristic to that to determine if the cookie is being shared.
Any simpler / easier to implement ideas? Any existing plugins that do this?
You could tie the session information to browser information. If people are coming in from 3 or 4 different browser types within a certain time period, you can infer that something suspicious may be going on.
An alternative answer relies on a bit of social-engineering. If you have some heuristic that you trust, you can warn users (at the top of the page) that you suspect they are sharing their account and that they are being watched closely. A "contact us" link in the warning would allow legitimate users to explain themselves (and thus be permanently de-flagged). This may minimize the problem enough to take it off your radar.
One way I can think of would be to set the same random value in both the session and a cookie with every page refresh. Check the two to make sure they are the same. If someone shares their session, the cookie and session will get out of sync.

Resources