Datwarehousing queries - data-warehouse

I have an interview coming up for the role of a data engineer.
Can someone please help me with some resources for complex queries which are used in datawarehouse
eg: querying two fact tables and other complex queries like this
Thanks in advance

Good start point is http://www.kimballgroup.com/
if this work will be with MS SQL Server - good point https://technet.microsoft.com/en-us/virtuallabs

Related

Is there any concept of stored procedures in Cassandra?

For database management, my team right now is using a RDBMS based solution (MSSQL to be exact), but we expect to move to Cassandra soon as we're expecting a huge bump in traffic.
The application logic right now is decoupled from insertion logic, as the application only calls the specific procedures in SQL which calls some data validations and makes corresponding insertions.
I want to do something similar in Cassandra. However, I am unable to find anything that could aid me in doing so. UDFs are not useful as they are mostly used in SELECT query. I'd appreciate the community's help/advice on this, thanks!
The closest feature to a stored procedure will be a batch as it will allow you to "bundle" different DML statements associated to an insert, update or delete.
If you are moving from RDBMS to Cassandra, one of the biggest challenges is to adjust to the data modeling required, and more specific, to denormalization of data. The data model is the key factor of success (and failure) of any Cassandra implementation, and because of that, you may find several resources in the web (to mention the basics eBay blog, Datastax academy's Data model course)
Good luck with your implementation!

neo4j data integrity?

I have some questions about Neo4j and data integrity!
As under Neo4j data integrity is ensured? And since all ACID properties are supported, as are implemented. So atomicity, consistency, isolation, durability?
Maybe one or information sources on this?
Thank you in advance for
best regards
start here:
http://docs.neo4j.org/chunked/milestone/introduction-highlights.html
since i don't understand exactly your question, could you please post your goal - what are you about to do with neo4j, and if any what is your current technology you are using for your goal now? than maybe i could answer whether it's possible or not.

Document-Oriented or Graph databases

It's a RoR project.
We want to store user activities, like uploaded a photo, voted for somebody, followed somebody, etc. When listing the activities, we need to list your friends activities as well. So, what is better to use in this case: a document-oriented database (couchdb, mongo db), a graph database (neo4js), or maybe some other approach?
Thank you for helping in advance guys :)
Yeah, I think Neo4j is a good choice, the Rails 3 support is excellent, see https://github.com/andreasronge/neo4j, see even the social examples with cypher like in http://docs.neo4j.org/chunked/snapshot/data-modeling-examples.html, and for activity streams, there are various cool approaches like Graphity, see http://www.rene-pickhardt.de/graphity-an-efficient-graph-model-for-retrieving-the-top-k-news-feeds-for-users-in-social-networks/
Depending on the scale of your application, and volume of activity, I'd recommend a combination of Couchbase (not CouchDB) for actual activity data which is extremely scalable and fast, and Neo4J for the graph discovery (both databases at the same time). I've used the combination very effectively in my application that was both social and real-time.
If you want more info from me, please feel free to contact me directly and I can help with architectural decisions or implementation help.
Take a look at Infinitegraph. It is scalable unlike neo4j. I think they have a free download for 1 million nodes.
Consider using Sqlite. It is a flat file mimicing database and can be used as an embedded database.

Asp.net mvc small amount data storage

I am writing some learning tests (i.e. what's the answer for...; choose correct options...). Now my question is, how should I store them. SQL db seems quite an overkill, but I really don't know what would be the best choice if I wanted to select random subset of questions etc. Perhaps some simple xml files?
Thanks for advice.
A RDBMS could be a good option for you since it sounds like you're wanting to collect some sort of result set based on the questions your asking. This way you'll be able to tie the questions, answers and users together is some logical way.
You could easily store the questions in an XML file and that would work, however it makes it a little more tricky to tie your overall data together.
One thing you could do is draw or write out a real plan of how you'd like your data to interact with itself and other data. This will probably give you a better idea of what needs to be done and how to go about doing it.
The easiest way is to go for a relation database solution. If you don't need support by the heavier db's out there like ms sql server i should have a look at ms sql express or sql lite. Thees database don't need any database server running to work, they are just file based databased and are easily moved around....

If using LINQ to SQL is there any good reason to learn SQL queries/syntax anymore?

I do understand SQL querying and syntax because of previous work using ASP.NET web forms and stored procedures, but I would not call myself an "expert" in it.
Since I have been using ASP.NET MVC and LinqToSql it seems that so much of the heavy lifting is done for me and encapsulated away at the SQL end that I'm questioning whether there is any benefit in continuing to top-up my knowledge of SQL queries or whether I'm better off focusing my "learning time" on other things.
Your thoughts?
You should absolutely know SQL and keep your knowledge up-to-date. ORM is designed to ease the pain of doing something tedious that you know how to do, much like a graphing calculator is designed to do something that you can do by hand (and should know how).
The minute you start letting your ORM do things in the database that you don't fully understand is the minute you've lost control over your model.
In my opinion, knowing SQL is more valuable than any vendor specific technology. There will always be cases when those nice prepackaged frameworks will not be able to solve a particular situation and knowledge of advanced SQL will be required.
It is still important to learn SQL queries/syntax. The reason is you need to at least understand how Linq to SQL translate to the database behind the scenes.
This will help you when you find problems, for example something not updating correctly. Or a query performance needs to increase.
It is the same that you need to understand what assembly language is and how it eventually becomes machine language. However in all you don't have to be an expert, but at least be able to write in it and understand it.
It is still important to know SQL and the paradigm (set-based) behind it to be able to create efficient SQL statements, even if your using LinqToSql or any other OR/M.
There will always be situations where you will want to write the query in native SQL because it is not possible to write it in LinqToSql / HQL / whatever, or LinqToSql is just not able to generate a performant query for it.
There will always be situations where you will want to execute an ad-hoc query on a database using native sql, etc...
I think LinqToSQL (or other Linq to SQL providers) should not prevent you of knowing SQL.
When your query is not returning what you expect, or when it takes 30 minutes to run on the production database, you'd better be able to understand what LTS has generated, and why it is failing.
I know, it's a rehashed topic, and it might not be applicable to what you do ("small" database that will never hit that kind of problem etc), but it pays not to get too oblivious of abstraction layers sometimes.
The other reason is, Linq does not the whole range of what you can do in SQL, so you might have to resort to writing "raw" SQL, even if the result is materialised as objects.
It depends what you're working on, and from what you said it might make more sense to focus on other areas.
Having said that I find knowing SQL allows the following:
The ability to write queries to extract data from systems easily.
For adhoc queries, or for checking things.
The ability to write complex stored procedures, which allows me to group complex data processing in one place, where it should be, in the database.
The ability to fine tune LinqToSql by adding indexes, and understanding the SQL/query plan's it procedures.
Most of these are more of a help on more complex systems, so if you're not working on those it might not be as much of a help.
It may help in your situation to list the technologies which might be of use, and then prioritise them.
In order words make a development plan for yourself, which may encompass more then just learning technical knowledge but allow a more broad focus like design patterns, communication skills and other areas.
SQL is a tool. Linq to SQL is also a tool. Having more tools in your belt is a good thing. It'll give you more perspectives when attacking a problem.
Consider a scenario where you may want to do multiple queries or multiple updates to the db in one operation. If you can write TSQL you can potentially save yourself a lot of roundtrips to the database.
I would say you definately need to know your SQL in depth, because you need to know what code your Linq-expression generates and what effects the code will have if you want high performing queries. Sure you might get the job done in most cases, but sometimes there is a huge difference in performance in very subtle difference in Linq-syntax.
I ran into this this morning actually, where I had done .Any(d => d.Id == (...).First().Id) instead of doing where (...).Any(i => i.Id == d.Id). This resulted in the query executing five times slower.
Sometimes you need to analyze the actual Sql-query to realise the mistakes you make.
Its always a good think to learn the underlying language for stuff like Linq To SQL. SQL is pretty much standardized and it will help you understand a new paradigm in programming.
You may not always be working in .NET.
Doesn't hurt to know the underlying concepts.
LINQ to SQL is not being maintained anymore in favor of the Entity Framework
Sooner or later you will run into problems that need at leat a working knowledge of SQL to solve. And sooner or later you will run into requirements that are best realised in the DB (whether in SP-s or in triggers or views or whaterver).
LINQ To SQL will only work with .NET. IF you happen get another job where you are not working with .NET, then you will have to go back to writing Stored Procs.
Knowing SQL will also give you a better understanding of how the server operates as well as possibly making you a better database designer.

Resources