Anybody using detrusion.com, web application firewall for ruby on rails - ruby-on-rails

PS: I was doing to some random search and then I got detrusion.com.
Whats this web application firewall ?
How it works ?
Any performance hit, if yes then how much?
Should I use this destruction.com or anything else better available.
Anybody??

I quickly glanced at the code and it doesnt appear to be doing all that much. Basically it maintains a white and black list of IPs. While it cannot be that much of a crazy performance hit you'd probably be better off doing this kind of request analyzing in a Rack middleware, that is before it even gets to the Rails request handling.
That being said, I dont like the fact that it will re-sync every 5 minutes DURING processing a given request. That is, it will block the current request while it re-syncs its ruleset / and lists. Which means that you're at the mercy of the Detrusion.com team to keep their site/API up. So when they go down you go down.
While its not as real-timey, I'd feel more comfortable to have the updating process be out of bound. Maybe you store the rules/lists in a flat file or a local DB (Redis would be perfect) which you load on app start. Then you have a frequent cron which reloads the ruleset from Detrusion and writes it locally.
Something like that. Just anything to de-couple your request handling from a Detrusion API check.

Related

Frequently refreshing web page during long-running process

I've been hunting around my issue for a while, probably the best I've come up with is another Stack Overflow question: How should I perform a long-running task in ASP.NET 4?
I'm in a similar place in that I'm wanting to understand what my options are, but I don't feel I know enough specifically about MVC to come to a view. I'm using MVC 5 but with the 4.8 framework, plus I note that technologies such as SignalR have become available since this question was asked. I was wondering if any experienced MVC'ers could give me a view?
I too have a long running process. More specifically, the user is importing a file. The file is delimited so the import happens line by line. The file might be thousands of lines long. Each line will be parsed and imported in a fraction of a second but the whole operation might take several minutes.
I don't particularly need behaviour to be asynchronous, but because of the length of the entire process I want to regularly update the user on progress. I'm wondering what options I have?
I've got a vague recollection that I might have looked at this problem 20-odd years ago (Classic ASP), and solved it by regular flushes, sending a bit more of the page to the client every few seconds, but I'm trying also to use a _Layout page now, so I've sent the page back already. So I don't think I have that option, even assuming such a mechanism still exists. A bit more recently, but still a while ago, I might have used javascript to poll but everything I'm reading now seems to point me to newer technologies which I'm not sure I fully understand yet.
I'm just wondering how would you solve this problem?
I would not be performing any of the file parsing on the web server, especially if it's thousands of rows long. I would delegate this to a background service of sorts, whether that be a Lambda service in the cloud or a Windows service or a scheduled task. You could then call your SignalR hub from the background task (whatever that might be) to update the progress of the import.

How to handle SAP Kapsel Offline app OData conflicts properly?

I build an app that is able to store OData offline by using SAP Kapsel Plugins.
More or less it's the same as generated by WEB ID or similer to the apps in this example: https://blogs.sap.com/2017/01/24/getting-started-with-kapsel-part-10-offline-odatasp13/
Now I am at the point to check the error resolution potential. I created a sync conflict (chaning data on the server after the offline database was stored and changed something on the app and started a flush).
As mentioned in the documentation I can see the error in ErrorArchive and could also see some details. But what I am missing is the information of the "current" data on the database.
In the error details I can just see the data on the device but not the data changed on the server.
For example:
Device is loading some names into offline store
Device is offline
User A is changing some names
User B is changing one of this names directly online
User A is online again and starts a sync
User A is now informend about the entity that was changed BUT:
not the content user B entered
I just see the "offline" data.
Is there a solution to see the "current" and the "offline" one in a kind of compare view?
Please also note that the server communication is done by the Kapsel Plugin and not with normal AJAX calls. This could be an alternative but I am wondering if there is no smarter way supported by the API?
Meanwhile I figured out how to load the online data (manually).
This could be done by switching http handler back to normal one.
sap.OData.removeHttpClient();
sap.OData.applyHttpClient();
Anyhow this does not look like a proper solution and I also have the issue with the conflict log itself. It must be deleted before any refresh could be applied.
I could not find any proper documentation for that. Also ETag handling is hardly described in SAPUI5 and SAP Kapsel documentation.
This question is a really tricky one, due to its implications. I understand that you are simulating a synchronization error due to concurrent modification, and want to know if there is a way for the client to obtain the "current" server state in order to give the user a means to compare the local and server state.
First, let me give you the short answer: No, there is no way for the client to see the current server state "for reference" via the Offline APIs when there are synchronization errors. Doing an online query as outlined above might work, but it certainly is a bad idea.
Now for the longer answer, which explains why this is not necessarily a defect and why I said there are quite some implications to the answer.
Types of Synchronization Errors
We distinguish a number of synchronization errors, and in this context, we are clearly dealing with business-related issues. There are two subtypes here: Those that the user can correct, e.g. validation errors, and those that are issues in the business process itself.
If the user violates the input range, e.g. by putting a negative price for a product, the server would reply with the corresponding message: "-1 is not a valid input value for 'Price'". You, as a developer, can display such messages to the user from the error archive, and the ensuing fix is indeed a very easy one.
Now when we talk about concurrent modification, things get really, really nasty. In fact, I like to say that in this case there is an issue with the business process, because on one hand, we allow data to get out of sync. On the other hand, the process allows multiple users to manipulate the same piece of information. How all relevant users should now be notified and synchronize, is no longer just a technical detail, but in fact a new business process. There just is no way to generically device how to handle this case. In most cases, it would involve back-office experts who need to decide how the changes should be merged.
A Better Solution
Angstrom pointed out that there is no way to manipulate ETags on the client side, and you should in fact not even think about it. ETags work like version numbers in optimistic locking scenarios, and changing the ETag basically means "Just overwrite what's on the server". This is a no-go in serious scenarios.
An acceptable workaround would be the following:
Make sure the server returns verbose error messages so that the user can see what happened and what caused the conflict.
If that does not help, refresh the data. This will get you an updated ETag, and merge the local changes into the "current" server state, but only locally. "Merging" really means that local changes always overwrite remote changes.
The user now has another opportunity to review the data and can submit it again.
A Good Solution
Better is not necessarily good, so here is what you should really do: Never let concurrent modification happen because it is really expensive to handle. This implies that not the developer should address this issue, but the business needs to change the process.
The right question to ask is, "When you replicate data in a distributed system, why do you allow it to be modified concurrently at all?" Typically stakeholders will not like this kind of question, and the appropriate reaction is to work out a conflict resolution process together with them. Only then they will realize how expensive fixing that kind of desynchronization is, and more often than not they will see that adjusting the process is way cheaper than insisting in yet another back-office process to fix the issues it causes. Even if they insist that there is a need for this concurrent modification, they will now understand that it is not your task to sort this out and that they need to invest in a conflict resolution process.
TL;DR
There is no way to compare the server and client state to the server state on the client, but you can do a refresh to retain the local changes and get an updated ETag. The real solution, however, is to rework the business process, because this no longer is a purely technical issue.
The default solution is that SMP or HCPms is detecting errors by ETags. At client side there is no API to manipulate ETags in case of conflicts. A potential solution to implement a kind of diff view on the device would work like this:
Show errors
Cache errors (maybe only in memory?)
delete the errors
do a refresh of the database
build a diff view with current data and cached errors
The idea with
sap.OData.removeHttpClient();
sap.OData.applyHttpClient();
could also work but could be very tricky and may introduce side effects.
Maybe some requests are triggered against the "wrong" backend.

Rails app: Trouble shooting frequent Handling RequestTimeOut errors

I have a large webb app of which I have recently been working hard to reduce load times. I have two controllers Generator (some 20.000 items) and Product (some 1.500 items) that have been slow for a while but I have worked with indexes and smart queries. On my dev app the app response time is about 500 ms.
From time to time I still get RequestTimeOut on the app and I need help trouble shooting this error. I understand what it means (a request has taken too much time) and I have installed the 'rack-timeout' gem and set it to 15 seconds (which works fine).
I have gone through the entire app (and especially the two slowest: Generator & Product) in search for time to save. I have had some issues with caching that I am currently trying to fix (caching would help quite a bit).
It seems that these timeouts happens mostly when bots (Yandex.ru especially) spiders through my site and especially goes through one generator after another. They may not be very slow any more but loading so many after another causes a lot of requests.
Now I am out of ideas and need some help in order to know what and how to continue my trouble shooting:
Is there anything else outside of response time that cause this
error? E.g. memory leakage or something? Or is it just a matter of
lots of requests on slow controllers?
I haven't been able to test it on my development platform. Is
there a way to benchmark and see how the app would handle
requests like from the bots? I seem to remember there was an
"Apache-thing" one could use to simulate traffic like this.
Any other ways of looking at the problem or trouble shoot this
issue from a high level point of view? Any ideas and
thoughts are welcome!

opening and closing streaming clients for specific durations

I'd like to infrequently open a Twitter streaming connection with TweetStream and listen for new statuses for about an hour.
How should I go about opening the connection, keeping it open for an hour, and then closing it gracefully?
Normally for background processes I would use Resque or Sidekiq, but from my understanding those are for completing tasks as quickly as possible, not chilling and keeping a connection open.
I thought about using a global variable like $twitter_client but that wouldn't horizontally scale.
I also thought about building a second application that runs on one box to handle this functionality, but that seems excessive if it can be integrated into the main app somehow.
To clarify, I have no trouble starting a process, capturing tweets, and using them appropriately. I'm just not sure what I should be starting. A new app? A daemon of some sort?
I've never encountered a problem like this, and am completely lost. Any direction would be much appreciated!
Although not a direct fix, this is what I would look at:
Time
You're working with time, so I'd look at what time-centric processes could be used to induce the connection for an hour
Specifically, I'd look at running a some sort of job on the server, which you could fire at specific times (programmatically if required), to open & close the connection. I only have experience with resque, but as you say, it's probably not up to the job. If I find any better solutions, I'll certainly update the answer
Storage
Once you've connected to TweetStream, you'll want to look at how you can capture the tweets for that time period. It seems a waste to create a data table just for the job, so I'd be inclined to use something like Redis to store the tweets that you need
This can then be used to output the tweets you need, allowing you to simulate storing / capturing them, but then delete them after the hour-window has passed
Delivery
I don't know what context you're using this feature in, so I'll just give you as generic process idea as possible
To display the tweets, I'd personally create some sort of record in the DB to show the time you're pinging TweetStream that day (if it changes; if it's constant, just set a constant in an initializer), and then just include some logic to try and get the tweets from Redis. If you're able to collect them, show them as you wish, else don't print anything
Hope that gives you a broader spectrum of ideas?

How to fail gracefully and get notified if screen scraping fails in ruby on rails

I am working on a Rails 3 project that relies heavily on screen scraping to collect data mainly using Nokogiri. I'm aggregating essentially all the same data but I'm grabbing it from many difference sources and as time goes on I will be adding more and more. However I am acutely aware that screen scraping can be notoriously unreliable.
As such I am interested in how other people have handled the problem of verifying the data and then also getting notified if it is failing.
My current plan is as follow.
I am going to have validation on my model for most of the fields. If they fail I won't get bad data into my system. Although logging this failure in a meaningful way is still a problem.
I was thinking of some kind of counter where after so many failures from a particular source I somehow turn it off. Not sure how to keep track of that. I guess the only way is to have a field on my Source model that counts it and can be reset.
Logging is 800 pound gorilla I'm not sure how to deal with. I could just do standard writing to logs but if something fails I'd like to store the entire html so I can figure it out. Also I need to notify myself somehow so I can address the issues. I thought of maybe just creating a model for all this and storing it in the database. If I did this I'd probably have to store the html on s3 or something. I'm running this on heroku so that influences what I can do.
Setup begin and rescue blocks around every field. I was trying to figure out a to code this in a nicer ruby way so I just don't have a page of them but although I do have some fields are just straight up doc.css_at("#whatever") there are quite a number that require various formatting or calculations so I think it makes sense to try to rescue that so I can then log what went wrong. The other option is to let the exception bubble up and catch it when I try to create the model.
Anyway I'm sure I'm not even thinking of everything but that is why I'm trying to figure out how other people have handled this problem.
Our team does something similar to this, so here's some ideas:
we use a really high level begin/rescue transaction to make sure we don't get into weird half loaded states:
begin
ActiveRecord::Base.transaction do
...try to load a data source...
end
rescue
...error handling...
end
Email/page yourself when certain errors occur. We use exception_notifier but if you're sitting on Heroku the Exceptional plugin also seems like a good option. I've also heard of people having success w/ hoptoad
Capturing state is VERY important for troubleshooting issues. Something that's worked quite well for us is GMail. Our loaders effectively have two phases:
capture data and send it to our gmail account
log into gmail, download latest data and parse it
The second phase is the complex one, and if it fails a developer can simply log into the gmail account and easily inspect the failed message. This process has some limitations (per email and per mailbox storage limits, two phase pipeline, etc.) and we started out doing it because we had no other option, but it's proven shockingly resilient and convenient. Keep email in mind as a cheap/easy way to store noncritical state. We didn't start out thinking of using it that way and are now really glad we do. Logging into GMail feels better than digging through log files.
Build a dashboard UI. We have a simple dashboard with a grid of sources by day that looks like this. Each box is colored either red or green based on whether the load for that source on that day succeeded. You can go one step further and set up a monitor on this UI (mon.itor.us or equivalent) that alarms if some error threshold is met.

Resources