which is the best practice to store the lookup no match output in SSIS ? OLEDB Destination or some file system. I have been asked to keep track of those no match record, but i need to come up with the decision of what medium would be the best for storing those No match Record.
Thanks in advance for Sharing your Valuable Experience with me..!
Storing it in a SQL table will be your best solution as it is much easier to keep track on all the historical mismatches.Make sure you are creating some kind of a process Id and time-stamp to be able to relate each mismatch to the point of time it occured. If you are looking for a simple example try this.
Related
In my Neo4j project I have Role and Permission entities which represent user roles and permissions. Each User in the system has relationships to appropriate sets of roles and permissions.
I think Role and Permission are some kind of supernodes that can become a major headache from a performance point of view in future.
What is the best practice for this case ? How to reimplement Role and Permission in order to avoid possible issues with supernodes ?
Do you plan to make some aggregate/mass queries based on Roles (i.e. count number of people of certain role, list them)?
If not, and you just want to check if a specific user has certain Role, than in my humble opinion it should not cause difficult to maintain, important performance issues ( as you will traverse certain relationships of the graph, ignoring vast majority of multiple relations of your "supernodes" ). I would keep with simple design ( "premature optimization is the root of all evil" ;) ), and once problems are noticed (internally, relationships are stored in a linkedlist-like structure, so finding a proper one may take time on supernode, even if you restrict searching to a certain relation type), splitting Role nodes using meta-node approach should do the job (it's described in Learning Neo4j)
If yes, you have a problem. That's probably a field in which RDBMS are better... Using meta nodes probably won't help, as you will still to have process all of them to list/count all users... So caching that data in a separate store may be simply the best idea ...
I'm going to assume that you're just using Neo4j as a permissions lookup data source (like hasPermission(current_user, 'permission_string')) and not tied into any queries to other entities. That can be fine, especially if you have a hierarchical access schema. If that's not true then this might not apply and it would be good to have a clearer idea of what your entities look like.
Since you're likely using permissions throughout your application it might and if they're going to grow in size and scope it could make sense for performance to use some form of caching like an in-memory store or in Redis, for example.
It might even make sense to generate a denormalized cache of every permission state for every user. So you would evaluate your rules which might be based on hierarchical roles/permissions and come out with a list of "User X has permission Y". Then whenever you change a user or a permission you'd regenerate the cache for that entity, and if you changed a role you would regenerate the cache for all of the associated users and permissions.
Also I don't know if I would apply this advice to just Neo4j. If you're talking about a simple key/value lookup then a lot of general purpose databases would be overkill in performance critical situations.
I have developed a grails application that stores a great deal of information. Currently, when I wish to run analysis on the large sets of data, it can be pretty time consuming. To help speed things up, I have decided to move all the calculated and aggregated data off into a "data mart". This way, a process can be run, maybe by a cron job, to work with all the records, pull out all the requested information, and store the calculated and requested data in separate tables.
My questions are: First, does this seem like the best way to tackle the problem? If so, I'm trying to figure out the best way to manage the new domain classes. Should I keep them in the same domain project folder or is it possible to create a new folder? My domain classes just seem to be getting very cluttered and I would rather a way to separate the relational tables from the data-mart tables. Any suggestions for best structuring would be great.
I am using groovy on grails and MySQL database
thanks
jason
It sounds like what you are doing is a pretty good idea. You will notice that the stack set of apps also have data aggregation processes the run every day (rankings), every few days (badge calculations), etc.
You could create a new package for all the 'datamart' classes required, to keep it separate.
If you don't need anything in your current app, you can even create a new project. Keep in mind that if you need to pull all of the data out of your tables, hibernate might not be the best solution. If possible, leverage your db to do calculations for you.
i need to explain the practical problems that might be encountered when transforming their transactional (and other) data from their diverse sources into the Data Warehouse. according to my knowledge this is about cleansing and scrubbing data. if anyone knows about any practical problem please help me.thanks for your help
That's a broad topic, but I'll offer a few good starting points.
For starters, think about history. If a transaction updates some data point, do you need to apply that retroactively, or do you need to remember what the value was at any given point in time. For example, suppose you have a monthly report of customers by city, and one of your customers moves. How should the DW reflect that.
Think about data acceptance. Is every input row a good input? For example, if you're dealing with web data, there are crawlers and spammers that you might not want to count the same as you count user traffic.
Think about data synchronization. Do all your inputs use the same keys? Do you know how to translate between them? Does Team A mean the same thing by "cust_id" as Team B does? A project glossary is very helpful here.
Think about localization. Are you inputs all in the same time zone? Do they all use the same calendar system? Do you need to handle unicode?
Think about reporting. Are the data you're capturing able to answer the questions people will ask of the DW? If not, how can you capture data that can?
Think about presentation. Should you be showing customers the same data you're using for internal reporting? Does finance need to see a different slice of the data than marketing?
This really only scratches the surface of the issues that come up on a major DW project. I would refer you to Ralph Kimball's assorted books on Data Warehousing for a more in depth discussion of problems and solutions. Hope this helps you get started.
You give the answer in your question.
According to my knowledge this is about cleansing and scrubbing data.
And you are correct. Cleansing data means that you have a company-wide list of clean element attributes, and a mapping that changes the unclean elements into clean elements.
Processing the data against the clean element attributes is a piece of cake compared to creating the company-wide list of clean element attributes.
You have to get people from different departments to agree on what data to warehouse, and to agree on what each element means. This is a difficult sociological problem. It's not a terribly hard technical problem.
Good luck getting your company-wide list of clean element attributes.
I am writing some learning tests (i.e. what's the answer for...; choose correct options...). Now my question is, how should I store them. SQL db seems quite an overkill, but I really don't know what would be the best choice if I wanted to select random subset of questions etc. Perhaps some simple xml files?
Thanks for advice.
A RDBMS could be a good option for you since it sounds like you're wanting to collect some sort of result set based on the questions your asking. This way you'll be able to tie the questions, answers and users together is some logical way.
You could easily store the questions in an XML file and that would work, however it makes it a little more tricky to tie your overall data together.
One thing you could do is draw or write out a real plan of how you'd like your data to interact with itself and other data. This will probably give you a better idea of what needs to be done and how to go about doing it.
The easiest way is to go for a relation database solution. If you don't need support by the heavier db's out there like ms sql server i should have a look at ms sql express or sql lite. Thees database don't need any database server running to work, they are just file based databased and are easily moved around....
This question is somewhat related to this.
I want to have document storage along with some complex metadata. I am not using sharepoint. I have a very simple directory structure that goes 2 levels deep. (One folder and documents underneath). I want to store metadata associated with each file....tags, popularity (# of times accessed), creator name, etc...
What is the best way to achieve this? I am leaning towards the relational database with the link to the file but I have to think this problem has been solved before.
Your approach sounds just fine to me. Just store the folder and filename as one of the columns values and all the other metadata in other columns.
Do you have any concerns about this approach? Or perhaps a specific part of this approach that you're not sure of how to implement?