Mnesia Errors case_clause in QLC query without a case clause - erlang

I have the following function for a hacky project:
% The Record variable is some known record with an associated table.
Query = qlc:q([Existing ||
Existing <- mnesia:table(Table),
ExistingFields = record_to_fields(Existing),
RecordFields = record_to_fields(Record),
ExistingFields == RecordFields
]).
The function record_to_fields/1 simply drops the record name and ID from the tuple so that I can compare the fields themselves. If anyone wants context, it's because I pre-generate a unique ID for a record before attempting to insert it into Mnesia, and I want to make sure that a record with identical fields (but different ID) does not exist.
This results in the following (redacted for clarity) stack trace:
{aborted, {{case_clause, {stuff}},
[{db, '-my_func/2-fun-1-',8, ...
Which points to the line where I declare Query, however there is no case clause in sight. What is causing this error?
(Will answer myself, but I appreciate a comment that could explain how I could achieve what I want)
EDIT: this wouldn't be necessary if I could simply mark certain fields as unique, and Mnesia had a dedicated insert/1 or create/1 function.

For your example, I think your solution is clearer anyway (although it seems you can pull the record_to_fields(Record) portion outside the comprehension so it isn't getting calculated over and over.)
Yes, list comprehensions can only have generators and assignments. But you can cheat a little by writing an assignment as a one-element generator. For instance, you can re-write your expression as this:
RecordFields = record_to_fields(Record),
Query = qlc:q([Existing ||
Existing <- mnesia:table(Table),
ExistingFields <- [record_to_fields(Existing)],
ExistingFields == RecordFields
]).

As it turns out, the QLC DSL does not allow assignments, only generators and filters; as per the documentation (emphasis mine):
Syntactically QLCs have the same parts as ordinary list
comprehensions:
[Expression || Qualifier1, Qualifier2, ...]
Expression (the template)
is any Erlang expression. Qualifiers are either filters or generators.
Filters are Erlang expressions returning boolean(). Generators have
the form Pattern <- ListExpression, where ListExpression is an
expression evaluating to a query handle or a list.
Which means we cannot variable assignments within a QLC query.
Thus my only option, insofar as I know, is to simply write out the query as:
Query = qlc:q([Existing ||
Existing <- mnesia:table(Table),
record_to_fields(Existing) == record_to_fields(Record)
]).

Related

Ecto's fragment allowing SQL injection

When Ecto queries get more complex and require clauses like CASE...WHEN...ELSE...END, we tend to depend on Ecto's fragment to solve it.
e.g. query = from t in <Model>, select: fragment("SUM(CASE WHEN status = ? THEN 1 ELSE 0 END)", 2)
In fact the most popular Stack Overflow post about this topic suggests to create a macro like this:
defmacro case_when(condition, do: then_expr, else: else_expr) do
quote do
fragment(
"CASE WHEN ? THEN ? ELSE ? END",
unquote(condition),
unquote(then_expr),
unquote(else_expr)
)
end
end
so you can use it this way in your Ecto queries:
query = from t in <Model>,
select: case_when t.status == 2
do 1
else 0
end
at the same time, in another post, I found this:
(Ecto.Query.CompileError) to prevent SQL injection attacks, fragment(...) does not allow strings to be interpolated as the first argument via the `^` operator, got: `"exists (\n SELECT 1\n FROM #{other_table} o\n WHERE o.column_name = ?)"
Well, it seems Ecto's team figured out people are using fragment to solve complex queries, but they don't realize it can lead to SQL injection, so they don't allow string interpolation there as a way to protect developers.
Then comes another guy who says "don't worry, use macros."
I'm not an elixir expert, but that seems like a workaround to DO USE string interpolation, escaping the fragment protection.
Is there a way to use fragment and be sure the query was parameterized?
SQL injection, here, would result of string interpolation usage with an external data. Imagine where: fragment("column = '#{value}'") (instead of the correct where: fragment("column = ?", value)), if value comes from your params (usual name of the second argument of a Phoenix action which is the parameters extracted from the HTTP request), yes, this could result in a SQL injection.
But, the problem with prepared statement, is that you can't substitute a paremeter (the ? in fragment/1 string) by some dynamic SQL part (for example, a thing as simple as an operator) so, you don't really have the choice. Let's say you would like to write fragment("column #{operator} ?", value) because operator would be dynamic and depends on conditions, as long as operator didn't come from the user (harcoded somewhere in your code), it would be safe.
I don't know if you are familiar with PHP (PDO in the following examples), but this is exactly the same with $bdd->query("... WHERE column = '{$_POST['value']}'") (inject a value by string interpolation) in opposite to $stmt = $bdd->prepare('... WHERE column = ?') then $stmt->execute([$_POST['value']]); (a correct prepared statement). But, if we come back to my previous story of dynamic operator, as stated earlier, you can't dynamically bind some random SQL fragment, the DBMS would interpret "WHERE column ? ?" with > as operator and 'foo' as value like (for the idea) WHERE column '>' 'foo' which is not syntactically correct. So, the easiest way to turn this operator dynamic is to write "WHERE column {$operator} ?" (inject it, but only it, by string interpolation or concatenation). If this variable $operator is defined by your own code (eg: $operator = some_condition ? '>' : '=';), it's fine but, in the opposite, if it involves some superglobal variable which comes from the client like $_POST or $_GET, this creates a security hole (SQL injection).
TL;DR
Then comes another guy who says "don't worry, use macros."
The answer of Aleksei Matiushkin, in the mentionned post, is just a workaround to the disabled/forbidden string interpolation by fragment/1 to dynamically inject a known operator. If you reuse this trick (and can't really do otherwise), as long as you don't blindly "inject" any random value coming from the user, you'll be fine.
UPDATE:
It seems, after all, that fragment/1 (which I didn't inspect the source) doesn't imply a prepared statement (the ? are not placeholder of a true prepared statement). I tried some simple and stupid enough query like the following:
from(
Customer,
where: fragment("lastname ? ?", "LIKE", "%")
)
|> Repo.all()
At least with PostgreSQL/postgrex, the generated query in console appears to be in fact:
SELECT ... FROM "customers" AS c0 WHERE (lastname 'LIKE' '%') []
Note the [] (empty list) at the end for the parameters (and absence of $1 in the query) so it seems to act like the emulation of prepared statement in PHP/PDO meaning Ecto (or postgrex?) realizes proper escaping and injection of values directly in the query but, still, as said above LIKE became a string (see the ' surrounding it), not an operator so the query fails with a syntax error.

Query against a Postgres array column type

TL;DR I'm wondering what the pros and cons are (or if they are even equivalent) between #> {as_champion, whatever} and using IN ('as_champion', 'whatever') is. Details below:
I'm working with Rails and using Postgres' array column type, but having to use raw sql for my query as the Rails finder methods don't play nicely with it. I found a way that works, but wondering what the preferred method is:
The roles column on the Memberships table is my array column. It was added via rails as so:
add_column :memberships, :roles, :text, array: true
When I examine the table, it shows the type as: text[] (not sure if that is truly how Postgres represents an array column or if that is Rails shenanigans.
To query against it I do something like:
Membership.where("roles #> ?", '{as_champion, whatever}')
From the fine Array Operators manual:
Operator: #>
Description: contains
Example: ARRAY[1,4,3] #> ARRAY[3,1]
Result: t (AKA true)
So #> treats its operand arrays as sets and checks if the right side is a subset of the left side.
IN is a little different and is used with subqueries:
9.22.2. IN
expression IN (subquery)
The right-hand side is a parenthesized subquery, which must return exactly one column. The left-hand expression is evaluated and compared to each row of the subquery result. The result of IN is "true" if any equal subquery row is found. The result is "false" if no equal row is found (including the case where the subquery returns no rows).
or with literal lists:
9.23.1. IN
expression IN (value [, ...])
The right-hand side is a parenthesized list of scalar expressions. The result is "true" if the left-hand expression's result is equal to any of the right-hand expressions. This is a shorthand notation for
expression = value1
OR
expression = value2
OR
...
So a IN b more or less means:
Is the value a equal to any of the values in the list b (which can be a query producing single element rows or a literal list).
Of course, you can say things like:
array[1] in (select some_array from ...)
array[1] in (array[1], array[2,3])
but the arrays in those cases are still treated like single values (that just happen to have some internal structure).
If you want to check if an array contains any of a list of values then #> isn't what you want. Consider this:
array[1,2] #> array[2,4]
4 isn't in array[1,2] so array[2,4] is not a subset of array[1,2].
If you want to check if someone has both roles then:
roles #> array['as_champion', 'whatever']
is the right expression but if you want to check if roles is any of those values then you want the overlaps operator (&&):
roles && array['as_champion', 'whatever']
Note that I'm using the "array constructor" syntax for the arrays everywhere, that's because it is much more convenient for working with a tool (such as ActiveRecord) that knows to expand an array into a comma delimited list when replacing a placeholder but doesn't fully understand SQL arrays.
Given all that, we can say things like:
Membership.where('roles #> array[?]', %w[as_champion whatever])
Membership.where('roles #> array[:roles]', :roles => some_ruby_array_of_strings)
and everything will work as expected. You're still working with little SQL snippets (as ActiveRecord doesn't have a full understanding of SQL arrays or any way of representing the #> operator) but at least you won't have to worry about quoting problems. You could probably go through AREL to manually add #> support but I find that AREL quickly devolves into an incomprehensible and unreadable mess for all but the most trivial uses.

multiple line where clauses

I've got a search page with multiple inputs (text fields). These inputs may or may not be empty - depending on what the user is searching for.
In order to accommodate this I create a base searchQuery object that pulls in all the correct relationships, and then for each non-empty input I modify the query using the searchQuery.Where function.
If I place multiple conditions in the WHERE clause I get the following error:
Cannot compare elements of type 'System.Collections.Generic.ICollection`1'. Only primitive types, enumeration types and entity types are supported.
searchQuery = searchQuery.Where(Function(m) (
(absoluteMinimumDate < m.ClassDates.OrderBy(Function(d) d.Value).FirstOrDefault.Value) _
OrElse (Nothing Is m.ClassDates)
)
)
I know that code looks funky, but I was trying to format it so you didn't have to scroll horizontally to see it all
Now, if I remove the ORELSE clause, everything works (but of course I don't get the results I need).
searchQuery = searchQuery.Where(Function(m) (
(absoluteMinimumDate < m.ClassDates.OrderBy(Function(d) d.Value).FirstOrDefault.Value)
)
)
This one works fine
So, what am I doing wrong? How can I make a multi-condition where clause?
Multiple conditions in the Where isn't the problem. m.ClassDates Is Nothing will never be true and doesn't make sense in SQL terms. You can't translate "is the set of ClassDates associated with this record NULL?" into SQL. What you mean is, are there 0 of them.
If there are no attached ClassDate records, m.ClassDates will be an empty list. You want m.ClassDates.Count = 0 OrElse...

Getting lots of data from Mnesia - fastest way

I have a record:
-record(bigdata, {mykey,some1,some2}).
Is doing a
mnesia:match_object({bigdata, mykey, some1,'_'})
the fastest way fetching more than 5000 rows?
Clarification:
Creating "custom" keys is an option (so I can do a read) but is doing 5000 reads fastest than match_object on one single key?
I'm curious as to the problem you are solving, how many rows are in the table, etc., without that information this might not be a relevant answer, but...
If you have a bag, then it might be better to use read/2 on the key and then traverse the list of records being returned. It would be best, if possible, to structure your data to avoid selects and match.
In general select/2 is preferred to match_object as it tends to better avoid full table scans. Also, dirty_select is going to be faster then select/2 assuming you do not need transactional support. And, if you can live with the constraints, Mensa allows you to go against the underlying ets table directly which is very fast, but look at the documentation as it is appropriate only in very rarified situations.
Mnesia is more a key-value storage system, and it will traverse all its records for getting match.
To fetch in a fast way, you should design the storage structure to directly support the query. To Make some1 as key or index. Then fetch them by read or index_read.
The statement Fastest Way to return more than 5000 rows depends on the problem in question. What is the database structure ? What do we want ? what is the record structure ? After those, then, it boils down to how you write your read functions. If we are sure about the primary key, then we use mnesia:read/1 or mnesia:read/2 if not, its better and more beautiful to use Query List comprehensions. Its more flexible to search nested records and with complex conditional queries. see usage below:
-include_lib("stdlib/include/qlc.hrl").
-record(bigdata, {mykey,some1,some2}).
%% query list comprehenshions
select(Q)->
%% to prevent against nested transactions
%% to ensure it also works whether table
%% is fragmented or not, we will use
%% mnesia:activity/4
case mnesia:is_transaction() of
false ->
F = fun(QH)-> qlc:e(QH) end,
mnesia:activity(transaction,F,[Q],mnesia_frag);
true -> qlc:e(Q)
end.
%% to read by a given field or even several
%% you use a list comprehension and pass the guards
%% to filter those records accordingly
read_by_field(some2,Value)->
QueryHandle = qlc:q([X || X <- mnesia:table(bigdata),
X#bigdata.some2 == Value]),
select(QueryHandle).
%% selecting by several conditions
read_by_several()->
%% you can pass as many guard expressions
QueryHandle = qlc:q([X || X <- mnesia:table(bigdata),
X#bigdata.some2 =< 300,
X#bigdata.some1 > 50
]),
select(QueryHandle).
%% Its possible to pass a 'fun' which will do the
%% record selection in the query list comprehension
auto_reader(ValidatorFun)->
QueryHandle = qlc:q([X || X <- mnesia:table(bigdata),
ValidatorFun(X) == true]),
select(QueryHandle).
read_using_auto()->
F = fun({bigdata,SomeKey,_,Some2}) -> true;
(_) -> false
end,
auto_reader(F).
So i think if you want fastest way, we need more clarification and problem detail. Speed depends on many factors my dear !

How to match ets:match against a record in Erlang?

I have heard that specifying records through tuples in the code is a bad practice: I should always use record fields (#record_name{record_field = something}) instead of plain tuples {record_name, value1, value2, something}.
But how do I match the record against an ETS table? If I have a table with records, I can only match with the following:
ets:match(Table, {$1,$2,$3,something}
It is obvious that once I add some new fields to the record definition this pattern match will stop working.
Instead, I would like to use something like this:
ets:match(Table, #record_name{record_field=something})
Unfortunately, it returns an empty list.
The cause of your problem is what the unspecified fields are set to when you do a #record_name{record_field=something}. This is the syntax for creating a record, here you are creating a record/tuple which ETS will interpret as a pattern. When you create a record then all the unspecified fields will get their default values, either ones defined in the record definition or the default default value undefined.
So if you want to give fields specific values then you must explicitly do this in the record, for example #record_name{f1='$1',f2='$2',record_field=something}. Often when using records and ets you want to set all the unspecified fields to '_', the "don't care variable" for ets matching. There is a special syntax for this using the special, and otherwise illegal, field name _. For example #record_name{record_field=something,_='_'}.
Note that in your example you have set the the record name element in the tuple to '$1'. The tuple representing a record always has the record name as the first element. This means that when you create the ets table you should set the key position with {keypos,Pos} to something other than the default 1 otherwise there won't be any indexing and worse if you have a table of type 'set' or 'ordered_set' you will only get 1 element in the table. To get the index of a record field you can use the syntax #Record.Field, in your example #record_name.record_field.
Try using
ets:match(Table, #record_name{record_field=something, _='_'})
See this for explanation.
Format you are looking for is #record_name{record_field=something, _ = '_'}
http://www.erlang.org/doc/man/ets.html#match-2
http://www.erlang.org/doc/programming_examples/records.html (see 1.3 Creating a record)

Resources