How to run complex queries in Tarantool - lua

I've always worked with relational DBs and recently decided to migrate a performance-critial service from SQL Server to Tarantool with a hope to take advantage of the fast in-memory search and processing. I've got a couple of questions while planning for the migration.
I've got a table with about one million records containing pricing information which means I'm dealing mostly with numbers and uuids. First, I need to run a select containing multiple conditions to get a subset of the data, like
SELECT * FROM rates WHERE SupplierId = #SupplierId AND ProductId = #ProductId AND (LocalDistributionZoneId = #LocalDistributionZoneId OR LocalDistributionZoneId IS NULL)
Q1: What is the strategy of running such a query in Lua? Do I create an index for each field in the predicate or I can go along with one secondary composite index?
Q2: Will it be more covenient to run such a query in SQL (box.sql.execute) rather than in pure Lua? Will it be considerably slower than running the same query in pure Lua?
Q3: If I use SQL, is it possible to review the execusion plan to make sure that the query I run really uses the indexes I've defined in the space?
Ok, after I've get the results from the first query I need to analyse the data and then based on the results of analysis, run one more query on the dataset returned by the first query.
Q4: Can Tarantool help me in dealing with the intermediate dataset? More specifically, may I somehow run more queries against the intermediate subset of tuples leveraging the indexes created in the space? Or, I would need to implement alternative strategies like re-add the intrim results to a temporary space with pre-defined indexes and then do another select, or implement further search myself?
Thank you!

Don't. Use SQL, it's faster: it doesn't create garbage collected objects for intermediate execution results.
Yes, please use our SQL features for that.
Use EXPLAIN statement.
I don't know what you exactly mean by "help". You could try to whatever strategy works best: create a more complex query, save the original query in a view to use in the resulting query, create a temporary table and work with it. To give more details let's look if the execution plan Tarantool chooses is good enough or you have to manually optimize it.

Related

Is there a way in SumoLogic to store some data and use it in queries?

I have a list of IPs that I want to filter out of many queries that I have in sumo logic. Is there a way to store that list of IPs somewhere so it can be referenced, instead of copy pasting it in every query?
For example, in a perfect world it would be nice to define a list of things like:
things=foo,bar,baz
And then in another query reference it:
where mything IN things
Right now I'm just copying/pasting. I think there may be a way to do this by setting up a custom data source and putting the IPs in there, but that seems like a very round-about way of doing it, and wouldn't help to re-use parts of a query that aren't data (eg re-use statements). Also their template feature is about parameterizing a query, not re-use across many queries.
Yes. There's a notion of Lookup Tables in Sumo Logic. Consult:
https://help.sumologic.com/docs/search/lookup-tables/create-lookup-table/
for details.
It allows to store some values (either manually once, or in a scheduled way as as a result of some query) with | save operator.
And then you can refer to these values using | lookup which is conceptually similar to SQL's JOIN.
Disclaimer: I am currently employed by Sumo Logic.

One single Azure SQL query is consuming almost all query_stats.total_worker_time and query_stats.execution_count

I'm running a production website for 4 years with azure SQL.
With help of 'Top Slow Request' query from alexsorokoletov on github I have 1 super slow query according to Azure query stats.
The one on top is the one that uses a lot of CPU.
When looking at the linq query and the execution plans / live stats, I can't find the bottleneck yet.
And the live stats
The join from results to project is not directly, there is a projectsession table in between, not visible in the query, but maybe under the hood of entity framework.
Might I be affected by parameter sniffing? Can I reset a hash? Maybe the optimized query plan was used in 2014 and now result table is about 4Million rows and the query is far from optimal?
If I run this query in Management Studio its very fast!
Is it just the stats that are wrong?
Regards
Vincent - The Netherlands.
I would suggest you try adding option(hash join) at the end of the query, if possible. Once you start getting into large arity, loops join is not particularly efficient. That would prove out if there is a more efficient plan (likely yes).
Without seeing more of the details (your screenshots are helpful but cut off whether auto-param or forced parameterization has kicked in and auto-parameterized your query), it is hard to confirm/deny this explicitly. You can read more about parameter sniffing in a blog post I wrote a bit longer ago than I care to admit ;) :
https://blogs.msdn.microsoft.com/queryoptteam/2006/03/31/i-smell-a-parameter/
Ultimately, if you update stats, dbcc freeproccache, or otherwise cause this plan to recompile, your odds of getting a faster plan in the cache are higher if you have this particular query + parameter values being executed often enough to sniff that during plan compilation. Your other option is to add optimize for unknown hints which will disable sniffing and direct the optimizer to use an average value for the frequency of any filters over parameter values. This will likely encourage more hash or merge joins instead of loops joins since the cardinality estimates of the operators in the tree will likely increase.

Uniqueness in BatchInserter of Neo4J

I am using a "BatchInserter" to build a graph (in a single thread). I want to make sure nodes (and possibly relationships) are unique. My current solution is to check whether the node exists in the following manner:
String name = (String) nodeProperties.get(IndexKeys.CATEGORY_KEY);
if(index.get(IndexKeys.CATEGORY_KEY, name).size() > 0)
return index.get(IndexKeys.CATEGORY_KEY, name).getSingle();
Long nodeID = inserter.createNode( nodeProperties,categoryLabel );
index.add(nodeID, nodeProperties);
index.flush();
It seems to be working fine but as you can see it is IO expensive (flushing on every new addition - which i believe is a lucene "commit" command). This is slowing down my code considerably.
I am aware of put if absent and uniqueFactory. As documented:
By using put-if-absent functionality, entity uniqueness can be guaranteed using an index.
Here the index acts as the lock and will only lock the smallest part
needed to guaranteed uniqueness across threads and transactions. To
get the more high-level get-or-create functionality make use of
UniqueFactory
However, these are for transaction based interactions with the graph. What I would like to do is to ensure uniqueness of nodes and possibly relationships in a batch insertion semantics, that is faster than my current setup.
Any pointers would be much appreciated.
Thank you
You should investigate the MERGE keyword in cypher. I believe this will permit you to exploit your autoindexes without requiring you to use them yourself. More broadly, you might want to see if you can formulate your bulk load in a way that is conducive to piping large volumes of cypher queries through the neo4j-shell.
Finally, as general pointers and background, you should check out this information on bulk loading
When I encountered this problem, I just decided to go tyrant and force index values in my own. Can't you do the same? I mean, ensure uniqueness before you do the insertions?

Write native SQL in Core Data

I need to write a native SQL Query while I'm using Core Data in my project. I really need to do that, since I'm using NSPredicate right now and it's not efficient enough (in just one single case). I just need to write a couple of subqueries and joins to fetch a big number of rows and sort them by a special field. In particular, I need to sort it by the sum of values of their child-entites. Right now I'm fetching everything using NSPredicate and then I'm sorting my result (array) manually, but this just takes too long since there are many thousands of results.
Please correct me if I'm wrong, but I'm pretty sure this can't be a huge challenge, since there's a way of using sqlite in iOS applications.
It would be awesome if someone could guide me into the right direction.
Thanks in advance.
EDIT:
Let me explain what I'm doing.
Here's my Coredata model:
And here's how my result looks on the iPad:
I'm showing a table with one row per customer, where every customer has an amount of sales he made from January to June 2012 (Last) AND 2013 (Curr). Next to the Curr there's the variance between those two values. The same thing for gross margin and coverage ratio.
Every customer is saved in the Kunde table and every Kunde has a couple of PbsRows. PbsRow actually holds the sum of sales amounts per month.
So what I'm doing in order to show these results, is to fetch all the PbsRows between January and June 2013 and then do this:
self.kunden = [NSMutableOrderedSet orderedSetWithArray:[pbsRows valueForKeyPath:#"kunde"]];
Now I have all customers (Kunde) which have records between January and June 2013.
Then I'm using a for loop to calculate the sum for each single customer.
The idea is to get the amounts of sales of the current year and compare them to the last year.
The bad thing is that there are a lot of customers and the for-loop just takes very long :-(
This is a bit of a hack, but... The SQLite library is capable of opening more than one database file at a given time. It would be quite feasible to open the Core Data DB file (read/only usage) directly with SQLite and open a second file in conjunction with this (reporting/temporary tables). One could then execute direct SQL queries on the data in the Core Data DB and persist them into a second file (if persistence is needed).
I have done this sort of thing a few times. There are features available in the SQLite library (example: full-text search engine) that are not exposed through Core Data.
If you want to use Core Data there is no supported way to do a SQL query. You can fetch specific values and use [NSExpression expressionForFunction:arguments:] with a sum: function.
To see what SQL commands Core Data executes add -com.apple.CoreData.SQLDebug 1 to "Arguments Passed on Launch". Note that this should not tempt you to use the SQL commands youself, it's just for debugging purposes.
Short answer: you can't do this.
Long answer: Core Data is not a database per se - it's not guaranteed to have anything relational backing it, let alone a specific version of SQLite that you can query against. Furthermore, going mucking around in Core Data's persistent store files is a recipe for disaster, especially if Apple decides to change the format of that file in some way. You should instead try to find better ways to optimize your usage of NSPredicate or start caching the values you care about yourself.
Have you considered using the KVC collection operators? For example, if you have an entity Foo each with a bunch of children Bar, and those Bars have a Baz integer value, I think you can get the sum of those for each Foo by doing something like:
foo.bars.#sum.baz
Not sure if these are applicable to predicates, but it's worth looking into.

Is a full list returned first and then filtered when using linq to sql to filter data from a database or just the filtered list?

This is probably a very simple question that I am working through in an MVC project. Here's an example of what I am talking about.
I have an rdml file linked to a database with a table called Users that has 500,000 rows. But I only want to find the Users who were entered on 5/7/2010. So let's say I do this in my UserRepository:
from u in db.GetUsers() where u.CreatedDate = "5/7/2010" select u
(doing this from memory so don't kill me if my syntax is a little off, it's the concept I am looking for)
Does this statement first return all 500,000 rows and then filter it or does it only bring back the filtered list?
It filters in the database since your building your expression atop of an ITable returning a IQueryable<T> data source.
Linq to SQL translates your query into SQL before sending it to the database, so only the filtered list is returned.
When the query is executed it will create SQL to return the filtered set only.
One thing to be aware of is that if you do nothing with the results of that query nothing will be queried at all.
The query will be deferred until you enumerate the result set.
These folks are right and one recommendation I would have is to monitor the queries that LinqToSql is creating. LinqToSql is a great tool but it's not perfect. I've noticed a number of little inefficiencies by monitoring the queries that it creates and tweaking it a bit where needed.
The DataContext has a "Log" property that you can work with to view the queries created. I created a simple HttpModule that outputs the DataContext's Log (formatted for sweetness) to my output window. That way I can see the SQL it used and adjust if need be. It's been worth its weight in gold.
Side note - I don't mean to be negative about the SQL that LinqToSql creates as it's very good and efficient almost every time. Another good side effect of monitoring the queries is you can show your friends that are die-hard ADO.NET - Stored Proc people how efficient LinqToSql really is.

Resources