I am looking at Spring-security 3.0 for this, spring's ACL filtering happens as post(api call) operation. There are 2 issues with that :-
it will break paginated query
Even if i take pagination out on layer above the api fetching results( i am using spring-hibernate here) , the db query each time is wasteful as it fetches and populates all results even if most of them are destined to be filtered out at java level
I have seen solutions where each query is appended with the acl queries which does the filtering at the db level , but that looks ugly as it pollutes business logic with authorization concern, are there any ways/frameworks that does db-level acl filtering transparently ? I like spring-securities overall approach of enforcing security declaratively through config/annotations thus sparing the code from security related logic directly, but i think it loses out on this on performance concerns
For the issues that you mentioned , only the #1 one is the real issue to me .
For the #2 issue if I understand correctly , it is the query that return a result list without any pagination behaviour. Because of that , it is supposed to assume that size of the result is finite and will not grow to the point that it will become very slow to return the result . Otherwise you need to make this query to be pageable and go back to the #1 issue. Given the finite result list , I doubt that filtering at the application level using #PostFilter will become noticeably slower than filtering at the database level.
I have seen solutions where each query is appended with the acl
queries which does the filtering at the db level , but that looks ugly
as it pollutes business logic with authorization concern, are there
any ways/frameworks that does db-level acl filtering transparently ? I
like spring-securities overall approach of enforcing security
declaratively through config/annotations thus sparing the code from
security related logic directly,
So for the #1 issue , if you are using Hibernate , you can check out #Filter which allows you to declaratively define a where clause that will be appended to the select SQL when querying certain entity. The filter is by default turned off and required to be enabled per transaction .The where clause can also be parameterised .
That means you can simply use Spring AOP to define an annotation to annotate the query method that you want to enable the authorization .Then in the advice that backed by this annotation , turn on this filter and configure the parameters for the where clause based on the current user information if necessary. For the query method that is not annotated with this annotation , the filter is turned off and not aware of the authorization concern.
Basically it is the same as appending the authorization logic to the query , but with the help of AOP and the nature of the #Filter , the business logic is not aware of any authorization logic.
If Hibernate filter is not suitable for your requirements, you can then look into which data access technologies allow you to modify the query easily by adding the authorization logic to it. For example , using JPA Criteria API is also possible as it provides the object model to represent a query ,and hence adding the authorization logic to the query is just equivalent to tweaking the query object.
The idea is that you need to have a proper design of the data access layer such that you can use AOP to configure the underlying technology to apply the authorization concern easily and in a consistent way. And use AOP to separate the authorization logic from the business logic.
Related
I have some complicated access restrictions in my application which basically require looking at a combination of the user's roles, as well as some deep properties of the domain objects, to make access decisions.
For some of my methods (specifically things like getItem(Integer id) and updateItem(Integer id, FormBean form)), I can't really know ahead of time if they're allowed to access that item, as I don't have it yet, so I had been using #PostAuthorize.
However, that latter example, updateItem(id, form) presents a challenge. I only want to allow them to update the form in certain specific cases. Right now, I do see the #PostAuthorize causing them to get an HTTP 403 response when they do something they shouldn't, but the database changes aren't rolled back.
Is it possible to get #PreAuthorize, #Transactional and #PostAuthorize to all play nicely together in this case? (I think maybe by adjusting the order of some advice on them... but I'm not totally clear on how that ordering should be done).
Or, has my system gotten complex enough that I should really bite the bullet on Domain ACLs? Unfortunately, the documentation on those feels rather thin...
Spring framework used Ordered to define the ordering of beans.
The easiest way to explicit define the order for #Transactional and #PostAuthorize are through the annotations:
#EnableTransactionManagement(order = 0)
#EnableGlobalMethodSecurity(prePostEnabled = true, order = 1)
Why not use #PreAuthorize("hasPermission(#id, 'read')") and #PreAuthorize("hasPermission(#id, 'update')") and implement your own version of PermissionEvaluator?
But if you really do wish to use #PostAuthorize then you could try fiddling with the ordering of security and transactional aspects, as is described here.
My guess would be:
// perform security check first, will throw exception if bad
#EnableGlobalMethodSecurity(prePostEnabled = true, order = 0)
// apply tm around security check and update, allowing for rollback
#EnableTransactionManagement(order = 1)
Have all a structure for creating complex queries and obtaining data on the client side using Breeze and webapi IQueryable<T>.
I would use this structure on the client side to call another webapi controller, intercept the result of the query, and use this to make an Excel file returned by HttpResponseMessage.
See: Returning binary file from controller in ASP.NET Web API
How can I use the executeQuery without getting return data in standard Breeze JSON and without interfering with the data on the client side cache to have the 'octet-stream'.
The goal is to create an 'Export to Excel' without existing frontend paging for a large volume of data.
If you don't want to track changes, call EntityQuery.noTracking() before calling executeQuery(). This will return raw javascript objects without breeze tracking capabilities.
You can't make executeQuery() return binary 'octet-stream' data. But you can use breeze ajax implementation:
var ajaxImpl = breeze.config.getAdapterInstance("ajax");
ajaxImpl.ajax() // by default it is a wrapper to jQuery.ajax
Look http://www.breezejs.com/documentation/customizing-ajax
Create a custom "data service adapter" and specify it when you create an EntityManager for this special purpose. It can be quite simple because you disable the most difficult parts to implement, the metadata and saveChanges methods.
You don't want to cache the results. Therefore, you make sure the manager's metadata store is empty and you should add the " no caching" QueryOption. [exact names escape me as I write this on my phone].
Make sure these steps are really adding tangible value
Specialized server operations often can be performed more simply with native AJAX components.
p.s. I just saw #didar 's answer which is consistent with mine. Blend these thoughts into your solution.
The Problem
I'm just trying to figure out exactly how much of my own security I need to implement on the server side when saving changes in Breeze. In particular, I'm thinking about how a malicious user could manually hack the SaveChanges request, or hack the javascript in the client, to bypass my normal business rules - for example, to maliciously alter foreign key IDs on my entities.
I want to understand exactly where I need to focus my security efforts; I don't want to waste time implementing layers of security that are not required.
I'm using Breeze with .net and Entity Framework on the server side.
Example
Here's a trivial example. ObjectA has a reference to an ObjectB, and ObjectA is owned by a particular User. So, my database looks like this:
ObjectA:
Id ObjectB_Id SomeField User_Id
1 1 Alice's ObjectA 1
2 2 Bob's ObjectA 2
ObjectB:
Id SomeOtherField
1 Foo
2 Bar
User:
Id Name
1 Alice
2 Bob
From this model, the security concerns I have are:
I don't want unauthenticated users to be changing any data
I don't want Bob to be able to make any changes to Alice's ObjectA
I don't want Alice to try to point her ObjectA at Bob's ObjectB.
I don't want Bob to try to change the User_Id on his ObjectA to be Alice.
The solution for (1) is trivial; I'll ensure that my SaveChanges method has an [Authorize] attribute.
I can easily use Fiddler to build a SaveChanges request to reproduce issues 2 to 4 - for example, I can build a request which changes Alice's ObjectA to point to Bob's ObjectB. This is what the message content might look like:
"entities":
[
{
"Id":1,
"ObjectB_Id":2,
"SomeField":"Alice's ObjectA",
"User_Id":1,
"entityAspect":
{
"entityTypeName":"ObjectA:#MyNamespace",
"defaultResourceName":"ObjectAs",
"entityState":"Modified",
"originalValuesMap":
{
"ObjectB_Id":"1"
},
"autoGeneratedKey":
{
"propertyName":"Id",
"autoGeneratedKeyType":"Identity"
}
}
}
],
As I'd expect, when no security is implemented on the server side, this persists the updated value for ObjectB_Id into the database.
However, I've also confirmed that if there is no entry for ObjectB_Id in the originalValuesMap, then even if I change the value for ObjectB_Id in the main body of the message it is NOT updated in the database.
General Rules?
So, I think this means that the general security rules I need to follow on the server are:
[Edited 4 July 2013 - rewritten for clarity]
In general:
Nothing in the message can be trusted: neither values in the originalValuesMap nor supposedly "unchanged" values
The only exception is the identity of the entity, which we can assume is correct.
Supposedly "unchanged" properties may have been tampered with even if they are not in the originalValuesMap
For "Unchanged" properties (properties which are not also on the originalValuesMap):
When "using" any "unchanged" property, we must NOT use the value from the message; we must retrieve the object from the database and use the value from that.
for example, when checking owenership of an object to ensure that the user is allowed to change it, we cannot trust a UserId on the message; we must retrieve the entity from the database and use the UserId value from that
For any other "unchanged" property, which we are not using in any way, we don't need to worry if it has been tampered with because, even if it has, the tampered value will not be persisted to the database
For changed properties (properties which are also on the originalValuesMap):
Business rules may prevent particular properties being changed. If this is the case, we should implement a check for each such rule.
If a value is allowed to be changed, and it is a foreign key, we should probably perform a security check to ensure that the new value is allowed to be used by the session identity
We must not use any of the original values in the originalValuesMap, as these may have been tampered with
[End of edit]
Implementing the Rules
Assuming that these rules are correct, I guess there are a couple of options to implement security around the changed foreign keys:
If the business rules do not allow changes to a particular field, I will reject the SaveChanges request
If the business rules DO allow changes to a particular field, I will check that the new value is allowed. In doing this, CANNOT use the originalValuesMap; I'll need to go to the database (or other trusted source, eg session Cookie)
Applying these rules to the security concerns that I gave above,
security concern (2). I'll need to check the user identity on the session against the User_ID on the ObjectA that is currently in the database. This is because I cannot trust the User_ID on the request, even if it is not in the originalValuesMap.
security concern (3). If the business rules allow a change of ObjectB, I will need to check who owns the new value of ObjectB_Id; I'll do this by retrieving the specified ObjectB from the database. If this ObjectB is not owned by ObjectA's owner, I probably want to reject the changes.
security concern (4). If the business rules allow a change of User, this is already covered by (2).
Questions
So, really, I'm looking for confirmation that I'm thinking along the right lines.
Are my general rules correct?
Does my implementation of the rules sound reasonable?
Am I missing anything?
Am I over complicating things?
Phil ... you are absolutely on the right track here. You've done a nice job of laying out the issues and the threats and the general approach to mitigating those threats. It is almost as if you had written the introduction to the Breeze security chapter ... which we haven't gotten to yet.
I do not think that you are "over complicating things"
Someone reading this might think "wow ... that's a lot of work ... that Breeze stuff must be insecure".
Well it is a lot of work. But it isn't Breeze that is making it difficult. This is the necessary thinking for every web application in existence. Authentication is only the first step ... the easiest step ... in securing an application.
You shouldn't trust any client request ... even if the client is authenticated. That means making sure the client is authorized to make the request and that the content entering and exiting the server is consistent with what the client is both claiming to do and is allowed to do. These are general principles that apply to all web applications, not just Breeze applications. Adhering to these principles is no more difficult in Breeze than in any other technology.
One Breeze technicality you may have overlooked. The EFContextProvider.Context should only hold the entities to save; don't use it to retrieve original entities.You'll need a separate DbContext to retrieve the original entities to compare with the change-set entities from the client.
We are working on samples that demonstrate ways to handle the issues you described. For example, we're recommending (and demo'ing) a "validation rules engine" that plugs into the BeforeSaveEntitiesDelegate; this "engine" approach makes it easier to write bunches of server-side rules and have them applied automatically.
Our samples and guidance aren't quite ready for publication. But they are coming along.
Meanwhile, follow your instincts as you've described them here. Blog about your progress. Tell us about it ... and we'll be thrilled to highlight your posts.
I've been looking for guidance on the same matter and I am very happy to find your brilliant analysis. In my opinion the answer to our problem is different though, assuming that we are talking about applications which are to be composed of more than a few modules and are to live longer than a year.
If rules become too complicated it means that we might be using inappropriate approach. I'm sure many brilliant developers would cope following these rules but the sad truth is that most of our peers would either get it wrong or would forget about some of them under pressure.
I'd say that we need to go back to Fowler's, Evans' and Nilssons' publications and repeat after them that in larger applications (and these have strong security requirements) the entity model is not something that should be exposed to the client at all (for other reasons than security too - e.g. maintainability).
On the other hand it is worth looking at revisions to these original ideas proposed later by Greg Young and Udi Dahan. These in essence say that model for reading does not have to and often is not the same as model for writing 'data'.
To sum this up I'd say that the base rule should be DON'T use Breeze for writing and DO use it for reading (with DTOs/Projections), provided you don't query the 'real' model but the model built specially for reading (e.g. Views not Tables).
All this quite naturally emerges if you follow your domain and use cases and above all if you follow Test-Driven approach. Would you really end up with BeforeSaveEntities solution for business rules while following Test-Driven-Development?
I'm currently in the middle of a reasonably large question / answer based application (kind of like stackoverflow / answerbag.com)
We're using SQL (Azure) and nHibernate for data access and MVC for the UI app.
So far, the schema is roughly along the lines of the stackoverflow db in the sense that we have a single Post table (contains both questions / answers)
Probably going to use something along the lines of the following repository interface:
public interface IPostRepository
{
void PutPost(Post post);
void PutPosts(IEnumerable<Post> posts);
void ChangePostStatus(string postID, PostStatus status);
void DeleteArtefact(string postId, string artefactKey);
void AddArtefact(string postId, string artefactKey);
void AddTag(string postId, string tagValue);
void RemoveTag(string postId, string tagValue);
void MarkPostAsAccepted(string id);
void UnmarkPostAsAccepted(string id);
IQueryable<Post> FindAll();
IQueryable<Post> FindPostsByStatus(PostStatus postStatus);
IQueryable<Post> FindPostsByPostType(PostType postType);
IQueryable<Post> FindPostsByStatusAndPostType(PostStatus postStatus, PostType postType);
IQueryable<Post> FindPostsByNumberOfReplies(int numberOfReplies);
IQueryable<Post> FindPostsByTag(string tag);
}
My question is:
Where / how would i fit solr into this for better querying of these "Posts"
(I'll be using solrnet for the actual communication with Solr)
Ideally, I'd be using the SQL db as merely a persistant store-
The bulk of the above IQueryable operations would move into some kind of SolrFinder class (or something like that)
The Body property is the one that causes the problems currently - it's fairly large, and slows down queries on sql.
My main problem is, for example, if someone "updates" a post - adds a new tag, for example, then that whole post will need re-indexing.
Obviously, doing this will require a query like this:
"SELECT * FROM POST WHERE ID = xyz"
This will of course, be very slow.
Solrnet has an nHibernate facility- but i believe this will be the same result as above?
I thought of a way around this, which I'd like your views on:
Adding the ID to a queue (amazon sqs or something - i like the ease of use with this)
Having a service (or bunch of services) somewhere that do the above mentioned query, construct the document, and re-add it to solr.
Another problem I'm having with my design:
Where should the "re-indexing" method(s) be called from?
The MVC controller? or should i have a "PostService" type class, that wraps the instance of IPostRepository?
Any pointers are greatly received on this one!
On the e-commerce site that I work for, we use Solr to provide fast faceting and searching of the product catalog. (In non-Solr geek terms, this means the "ATI Cards (34), NVIDIA (23), Intel (5)" style of navigation links that you can use to drill-down through product catalogs on sites like Zappos, Amazon, NewEgg, and Lowe's.)
This is because Solr is designed to do this kind of thing fast and well, and trying to do this kind of thing efficiently in a traditional relational database is, well, not going to happen, unless you want to start adding and removing indexes on the fly and go full EAV, which is just cough Magento cough stupid. So our SQL Server database is the "authoritative" data store, and the Solr indexes are read-only "projections" of that data.
You're with me so far because it sounds like you are in a similar situation. The next step is determining whether or not it is OK that the data in the Solr index may be slightly stale. You've probably accepted the fact that it will be somewhat stale, but the next decisions are
How stale is too stale?
When do I value speed or querying features over staleness?
For example, I have what I call the "Worker", which is a Windows service that uses Quartz.NET to execute C# IJob implementations periodically. Every 3 hours, one of these jobs that gets executed is the RefreshSolrIndexesJob, and all that job does is ping an HttpWebRequest over to http://solr.example.com/dataimport?command=full-import. This is because we use Solr's built-in DataImportHandler to actually suck in the data from the SQL database; the job just has to "touch" that URL periodically to make the sync work. Because the DataImportHandler commits the changes periodically, this is all effectively running in the background, transparent to the users of the Web site.
This does mean that information in the product catalog can be up to 3 hours stale. A user might click a link for "Medium In Stock (3)" on the catalog page (since this kind of faceted data is generated by querying SOLR) but then see on the product detail page that no mediums are in stock (since on this page, the quantity information is one of the few things not cached and queried directly against the database). This is annoying, but generally rare in our particularly scenario (we are a reasonably small business and not that high traffic), and it will be fixed up in 3 hours anyway when we rebuild the whole index again from scratch, so we have accepted this as a reasonable trade-off.
If you can accept this degree of "staleness", then this background worker process is a good way to go. You could take the "rebuild the whole thing every few hours" approach, or your repository could insert the ID into a table, say, dbo.IdentitiesOfStuffThatNeedsUpdatingInSolr, and then a background process can periodically scan through that table and update only those documents in Solr if rebuilding the entire index from scratch periodically is not reasonable given the size or complexity of your data set.
A third approach is to have your repository spawn a background thread that updates the Solr index in regards to that current document more or less at the same time, so the data is only stale for a few seconds:
class MyRepository
{
void Save(Post post)
{
// the following method runs on the current thread
SaveThePostInTheSqlDatabaseSynchronously(post);
// the following method spawns a new thread, task,
// queueuserworkitem, whatevever floats our boat this week,
// and so returns immediately
UpdateTheDocumentInTheSolrIndexAsynchronously(post);
}
}
But if this explodes for some reason, you might miss updates in Solr, so it's still a good idea to have Solr do a periodic "blow it all away and refresh", or have a reaper background Worker-type service that checks for out-of-date data in Solr everyone once in a blue moon.
As for querying this data from Solr, there are a few approaches you could take. One is to hide the fact that Solr exists entirely via the methods of the Repository. I personally don't recommend this because chances are your Solr schema is going to be shamelessly tailored to the UI that will be accessing that data; we've already made the decision to use Solr to provide easy faceting, sorting, and fast display of information, so we might as well use it to its fullest extent. This means making it explicit in code when we mean to access Solr and when we mean to access the up-to-date, non-cached database object.
In my case, I end up using NHibernate to do the CRUD access (loading an ItemGroup, futzing with its pricing rules, and then saving it back), forgoing the repository pattern because I don't typically see its value when NHibernate and its mappings are already abstracting the database. (This is a personal choice.)
But when querying on the data, I know pretty well if I'm using it for catalog-oriented purposes (I care about speed and querying) or for displaying in a table on a back-end administrative application (I care about currency). For querying on the Web site, I have an interface called ICatalogSearchQuery. It has a Search() method that accepts a SearchRequest where I define some parameters--selected facets, search terms, page number, number of items per page, etc.--and gives back a SearchResult--remaining facets, number of results, the results on this page, etc. Pretty boring stuff.
Where it gets interesting is that the implementation of that ICatalogSearchQuery is using a list of ICatalogSearchStrategys underneath. The default strategy, the SolrCatalogSearchStrategy, hits SOLR directly via a plain old-fashioned HttpWebRequest and parsing the XML in the HttpWebResponse (which is much easier to use, IMHO, than some of the SOLR client libraries, though they may have gotten better since I last looked at them over a year ago). If that strategy throws an exception or vomits for some reason, then the DatabaseCatalogSearchStrategy hits the SQL database directly--although it ignores some parameters of the SearchRequest, like faceting or advanced text searching, since that is inefficient to do there and is the whole reason we are using Solr in the first place. The idea is that usually SOLR is answering my search requests quickly in full-featured glory, but if something blows up and SOLR goes down, then the catalog pages of the site can still function in "reduced-functionality mode" by hitting the database with a limited feature set directly. (Since we have made explicit in code that this is a search, that strategy can take some liberties in ignoring some of the search parameters without worrying about affecting clients too severely.)
Key takeaway: What is important is that the decision to perform a query against a possibly-stale data store versus the authoritative data store has been made explicit--if I want fast, possibly stale data with advanced search features, I use ICatalogSearchQuery. If I want slow, up-to-date data with the insert/update/delete capability, I use NHibernate's named queries (or a repository in your case). And if I make a change in the SQL database, I know that the out-of-process Worker service will update Solr eventually, making things eventually consistent. (And if something was really important, I could broadcast an event or ping the SOLR store directly, telling it to update, possibly in a background thread if I had to.)
Hope that gives you some insight.
We use solr to query a large product database.
Around 1 million products, and 30 stores.
What we did is we used triggers on the product table and stock tables on our Sql server.
Each time a row is changed it flags the product to be reindexed. And we have a windows service that grabs these products and post them to Solr every 10 seconds. (With a limit of 100 products per batch).
It's super efficient, almost real time info for the stock.
If you have a big text field (your 'body' field), then yes, re-index in background. The solutions you mentioned (queue or periodic background service) will do.
MVC controllers should be oblivious of this process.
I noticed you have IQueryables in your repository interface. SolrNet does not currently have a LINQ provider. Anyway, if those operations are all you're going to do with Solr (i.e. no faceting), you might want to consider using Lucene.Net instead, which does have a LINQ provider.
I'm currently using an attribute based approach to nhibernate session management, which means that the session is open for the duration of the Action method, but is closed once control is passed to the View.
This seems like good practice to me, however I'm running in to problems with lazy loaded collections. (This is complicated by the fact that some collections seem to be lazy loading even though they have Not.LazyLoad() set in the fluent mapping).
As I see it, my options are:
Change my ISession management strategy and Keep the session open in the View
Make better use of ViewModels (I'm currently not using them everywhere).
Eager load all of the collections that are causing problems (maybe paged) (fluent problem not withstanding)
1 seems a bit wrong to be - but may be the 'easiest' solution. 2 is probably the proper way to go, but in some cases ViewModels seem slightly redundant, and I'm loathed to introduce more classes just to deal with this issue. 3 seems to be a bit of a dirty fix.
What do you think?
The best way to handle this (in my opinion anyways) is to introduce a service layer in between your UI and your repositories; It should take care of loading everything needed by the view and pass off a flattened (and fully populated) dto to the view.
Often, I go one step further and map the dtos returned from the service layer to view models, which often need to contain data that is very view-specific, and not appropriate for inclusion into the dtos coming from your service layer. Remember, Automapper is your friend when it comes to situations like this.
Using an open-session-in-view pattern is still perfectly acceptable, just don't have your views invoking lazy loading on entity models - this is almost always a horrible idea.
consider your current usage as making implicit database operations. The object is sent to the View but the object contains proxies which when touched will try to return the data, and that requires a database operation.
Now,
extending the ISession life including the View, its not wrong, as long as you are not doing explicit database calls...
i wouldn't know about that
This is actually the proper way regardless of the session EOL: you should try to do as less queries as possible per request and nhibernate gives you that ability via lazyless loading, futures, multihql/criteria etc.
note: although you may have mapped a collection as not lazy loaded it matters also How you query and get your desired result set. eg if you are using HQL then use a fetch join
I don't think there's anything wrong about the first approach, and it will be the easiest to implement.
Session per request is a well known session management pattern for NHibernate.