EF4 QueryView or DefiningQuery? - entity-framework-4

I am in the middle of trying to complete a design for a project and have basically come to a fork in the road. I have made up my mind that I want to use EF4 as my data persistence layer, but my existing database is causing me some pains. Changing or augmenting the database is not an option. I have a single table that really serves multiple purposes and contains 120 columns (I didn't design this table!!! - it is a DB2 carryover after a SQL Server conversion long ago). I have designed a class diagram that creates five entities from this table, at varying levels of aggregation. In my research of what to do in these situations, I have narrowed it down to either using a “QueryView” in my MSL layer or a “DefiningQuery” in my SSDL layer to create the entities I need from this monolith table. The resultant data will only need to be read-only. I’d prefer getting back a proper entity, but anonymous types or dbdatarecord would be okay.
I have attempted to use a QueryView in MSL with my entity defined in my CSDL but the MSL keeps getting regenerated and my changes lost when I compile. Why?
Can anyone provide input as to what I should do here? Is using a DefiningQuery or QueryView preferable in this situation? Any input as to keeping these changes after updating my model from the database or compiling would be also very appreciated.

QueryView should not be regenerated. I'm not sure how QueryView behaves when you do update from database. I'm sure that DefiningQuery will be deleted when doing Update from database because DefiningQuery is defined in SSDL which is completely deleted during Update from database. I have some workaround for custom DefiningQueries by using two different EDMXs - one just for queries and second for entities updated from database. General concept is described here.
Difference between QueryView and DefiningQuery is the level where these constructs are included. QueryView is MSL element built as custom ESQL query on top of existing entity so your 120 columns entity must exists in EDMX. From unknown reason QueryView has no support for aggregations. DefiningQuery is SSDL element build as custom SQL query. It is by default used for database views (btw. probably best choice for you).

Related

Support of expand command in breeze with the Mongo Library

Can someone tell me when the expand command in breeze will be available in combination with MongoDB?
Kind regards
Dominik
The EntityQuery 'expand' function is not likely to be implemented for MongoDB because 'expand' conceptually requires a 'join' which is a feature that Mongo does not implement.
However, the idea within MongoDB is that an object's children ( or relations if you are coming from a relational background) are actually stored and returned with the parents. From a breeze perspective this means that we treat all of these related children objects as complex objects that are automatically returned when you query the parent. In other words, all of the "expands" that you are likely to want are automatically part of the results of your queries.
The only problem occurs when you actually try to use MongoDB in a relational manner, i.e. where you store the ID of an object in one collection as a property of an object in another collection. From a MongoDB ( and breeze) perspective this would mean you would need to perform another query to get this related data.
We did think about translating breeze 'expand's into a series of nested queries but it really does go against the "MongoDB" mindset and the performance of such queries can be terrible. ... and we weren't sure that it would be that useful or desirable to the majority of MongoDB developers.
In general, if this occurs a lot in your data, then MongoDB is probably not the right database to use, because you will end up manually "joining" your data, which is a very tedious process in Mongo. This is one of the cases where a relational database really is a better choice.

Find changes quickly in larger SQL database?

There is a Java Swing application which uses an Informix database. I have user rights granted for the Swing application (i.e. no source code), and read only access to a mirror of the database.
Sometimes I need to find a database column, which is backing a GUI element (TextBox, TableField, Label...). What would be best approach to find out which database column and table is holding the data shown e.g. in a TextBox?
My general approach is to capture the state of the database. Commit a change using the GUI and then capture the state of the database again. Then I need to examine the difference. I've already tried:
Use the nrows field of systables: Didn't work, because the number in nrows does not seem to be a realtime representation of the row count.
Create a script with SELECT COUNT(*) ... for all tables: didn't work because too many tables (> 5000). Also tried to optimize by removing empty tables, but there are still too many left.
Is there a simple solution that I'm missing?
Please look at the Change Data Capture API and check if this suits your needs
There probably isn't a simple solution.
You probably need to build yourself a map of the database, or a data dictionary for it. It sounds as though you can eliminate many of the tables from consideration since they're empty — at least for a preliminary pass. If you're dealing with information in a text box, the chances are it is some sort of character data; you can analyze which (non-empty) tables which contain longer character strings, and they'd be the primary targets of your searches. If the schema is badly designed with lots of VARCHAR(255) columns even though the columns normally only hold short strings, life is more difficult. Over time, you can begin to classify tables and columns so that you end up knowing where to look for parts of the application.
One problem to beware of: the tabid in informix.systables isn't necessarily as stable as you'd like. Your data dictionary needs to record its own dd_tabid for the table it describes, and can store the last known tabid from informix.systables, but it needs to be ready to find a new tabid value on occasion. You should probably only mark data in your dictionary for logical deletion.
To some extent, this assumes you can create a database in which to record this information. If you can't create an Informix database, you may have to use something else (MySQL, or SQLite, perhaps) to store the data dictionary. Alternatively, go to your DBA team and ask them for the information. Unless you're trying something self-evidently untoward, they're likely to help (but politics can get in the way — I've no idea how collegial your teams are).

Moving lookup / reference tables to a new schema

We are building ASP.NET MVC3 web applications using Visual Studio, SQL Server 2008 R2 & EF Code First 4.1.
Quite often we have smaller, what we call, "lookup" tables. For example a "Status" table contain an "Id" and a "Name". As the application grows these tables become quite frequent and I would like to know the best way to "group" these lesser important tables away from the crux of the application.
It has been suggest to me to add a prefix like "LkStatus" to help me but what about moving all the lookup tables out of dbo and into there own schema?
Can anyone see any drawbacks in this method?
Thanks Paul
No drawbacks with this method. I'm a fan of schemas personally. I'd use Lookup though
To change your table schema, you have two ways:
ALTER SCHEMA Lookup TRANSFER dbo.SomeTable
or
ALTER AUTHORIZATION ON dbo.SomeTable TO Lookup
This is going to be down to preference. There really isn't a "gotcha" either way. I prefer a table prefix but wouldn't be bothered either way. We use LU_*. As long as either option is enforced that maintenance down the line will be easy.
Since the tables are small, what about grouping them together into a single table? Instead of using the table name as a pseudo-key, use a real key. For example, you could have a table called Lookup, with an Id, Type, Name and Value, where Type = 'Status' for your status values. Seting the clustered index to (Type, Name) would physically group all rows of the same type together, which would make it fast to read them all as a group, if needed.
If your Names can have different data types, add an extra column for each required type: one for integers, one for strings, one for floats, etc. You can do something similar using an XML column; the T-SQL takes just a little more effort.

Can someone help me understand why an auto-identity (int) is bad when using NHibernate?

I've been seeing a lot of commentary (from an NHibernate perspective) about using Guid as opposed to an int (and presumably auto-id in the database), with the conclusion that using auto-identity breaks the UoW pattern.
This post has a short description of the issue, but it doesn't really tell me "why" it breaks the pattern (unless I'm misunderstanding which is likely the case.
Can someone enlighten me?
There are a few major reasons.
Using a Guid gives you the ability to identify a single entity across many databases, including six relational databases with the same schema but different data, a document database, etc. This becomes important any time you have more than one single place where data goes - and that means your case too: you have a dev database and a prod database, right?
Using a Guid gives NHibernate the ability to batch more statements together, perform more database work at the very end of the unit of work / transaction, and reduce the total number of roundtrips to the database, increasing performance as well as conferring other benefits.
Comment:
Random Guids do not create poor indexes - natively, they create poor clustered indexes. There are two solutions.
Use a partially sequential Guid. With NHibernate, this means using the guid.comb id generator rather than the guid id generator. guid.comb is partially sequential for good performance, but retains a very high degree of randomness.
Have your Guid primary key be a nonclustered index, and put a clustered index on another auto-incrementing column. You may decide to map this column, in which case you lose the benefit of better batching and fewer roundtrips, but you regain all the benefits of short numbers that fit easily in a URL. Or you may decide not to map this column and have it remain completely within the database, in which case you gain better performance for Guids as primary keys as well as better performance for NHibernate doing fewer roundtrips.
My take would be that the key breaking factor is that getting the auto-incremented value requires an actual write to the database, which nHibernate would have deferred or possibly never performed.
Using identity and in a parent-child scenario the database has round trip the database to get the ID of a parent so that it can associate the child correctly. This means that the parent has to be committed at this time. Should there be a problem with the child you would then need to delete the parent in order to exit the UoW correctly.

SPROC to update record: how to handle unchanged values

I'm calling a update SPROC from my DAL, passing in all(!) fields of the table as parameters. For the biggest table this is a total of 78.
I pass all these parameters, even if maybe just one value changed.
This seems rather inefficent to me and I wondered, how to do it better.
I could define all parameters as optional, and only pass the ones changed, but my DAL does not know which values changed, cause I'm just passing it the model - object.
I could make a select on the table before updateing and compare the values to find out which ones changed but this is probably way to much overhead, also(?)
I'm kinda stuck here ... I'm very interested what you think of this.
edit: forgot to mention: I'm using C# (Express Edition) with SQL 2008 (also Express). The DAL I wrote "myself" (using this article).
Its maybe not the latest state of the art way (since its from 2006, "pre-Linq" so to say but Linq works only for local SQL instances in Express anyways) of doing it, but my main goal was learning C#, so I guess this isn't too bad.
If you can change the DAL (without changes being discarded once the layer is "regenerated" from the new schema when changes are made), i would recomend passing a structure containing the column being changed with values, and a structure kontaing key columns and values for the update.
This can be done using hashtables, and if the schema is known, should be fairly easy to manipulate this in the "new" update function.
If this is an automated DAL, these are some of the drawbacks using DALs
You could implement journalized change tracking in your model objects. This way you could keep track of any changes in your objects by saving the previous value of a property every time a new value is set.This information could be stored in one of two ways:
As part of each object's own private state
Centrally in a "manager" class.
In the first solution, you could easily implement this functionality in a base class and have it run in all model objects through inheritance.
In the second solution, you need to create some kind of container class that will keep a reference and a unique identifier to any model object that is created and record all changes in its state in a central store.This is similar to the way many ORM (Object-Relational Mapping) frameworks achieve this kind of functionality.
There are off the shelf ORMs that support these kinds of scenarios relatively well. Writing your own ORM will leave you without many features like this.
I find the "object.Save()" pattern leads to this kind of behavior, but there is no reason you need to follow that pattern (while I'm not personally a fan of object.Save(), I feel like I'm in the minority).
There are multiple ways your data layer can know what changed and most of them are supported by off the shelf ORMs. You could also potentially make the UI and/or business layer's smart enough to pass that knowledge into the data layer.
Two options that I prefer:
Generating/hand coding update
methods that only take the set of
parameters that tend to change.
Generating the update statements
completely on the fly.

Resources