reading mnesia tables where key is a tuple and the search criteria contains '_' underscores - erlang

I have a 3rd or 4th normal form mnesia database and the table in question should be a key/value hash, however, the architect put the keys and values in the key portion of the record.
It looks something like:
-record(idx,{key,value}).
...
Invoice = 1,
Date = now(),
K1 = {?NORMAL_TYPE1,Invoice,Date},
mnesia:write(#idx{key=K1}).
...
So the question is... if I only know the Invoice number can I still get the data from the DB in the same time as if the Invoice was the only key instead of the tuple?
K2 = {?NORMAL_TYPE1,Invoice,'_'},
Rec = mnesia:read(#idx{key=K2}).

The short answer: No.
The longer answer: You may have a chance if the table is an orderet_set or something such, but I would not count on it.
The Aside: mnesia is usually not too effective w.r.t. normalized data. It is usually better to use standard RDBMS systems for that.

You can use mnesia:match_object/1/3 and mnesia:select/2/3/4/1 where you provide a pattern which can contain '_' as a don't care variable. I assume that is what you meant.

Related

Is there an efficient means of using ActiveRecord#Pluck for my situation?

I need to insert a lot of data into a new database. Like, a lot of data, so even nanoseconds count in the context of this query. I'm using activerecord-import to bulk-insert into Postgres, but that doesn't really matter for the scope of this question. Here's what I need:
I need an array that looks like this for each record in the existing DB:
[uuid, timestamp, value, unit, different_timestamp]
The issue is that the uuid is stored on the parent object that I'm looping through to get to this object, so #pluck works for each component aside from that. More annoying is that it is stored as an actual uuid, not a string, and needs to be stored as a uuid (not a string) in the new db as well. I'm not sure but I think using a SELECT inside of #pluck will return a string.
But perhaps the bigger issue is that I need to perform a conversion on the value of value before it is inserted again. It's a simple conversion, in effect just value / 28 or something, but I'm finding it hard to work that into #pluck without also tacking on #each_with_object or something (which slows this down considerably)
Here's the query as it is right now. It seems really silly to me to load the entire record based on the blockage outlined above. I hope there's an alternative.
Klass.find_each do |klass|
Data.where(token: klass.token).find_each do |data|
data << [
klass.uuid,
data.added_at,
data.value / conversion,
data.unit,
data.created_at
]
end
end
And no, the parent and Data are not associated right now and it's not an option, so I can't eager-load or just call Klass.data (they will be linked after this transition).
So ideally this is what I'm looking for:
Data.where(token: klass.token).pluck(:added_at, :value, :unit, :created_at)
But with the parameters outlined above.
I wonder if you can combine a SQL JOIN with pluck:
Klass
.joins('INNER JOIN datas ON datas.token = klasses.token')
.pluck('klasses.uuid', 'datas.added_at', "datas.value / #{conversion.to_f}", 'datas.unit', 'datas.created_at')

Get record by value Erlang/Mnesia

How does one get records -- by value -- in a more efficient way?
Currently I’m doing this:
Coupon = [P || P <- kvs:all(company_coupon), P#company_coupon.company_id == C#company.id],
My question is geared at kvs:all(...). In databases it is usually pretty expensive to get all entries first and then match them.
Is there a better way?
PS: "lists:keyfind" also needs to be provided with ALL records first, to then run them through the loop.
How are you guys doing it?
Cheers!
One can use kvs:index(table, field, value) if one has set the field as a key before:
#table{name=user,fields=record_info(fields,user), keys = [field]}
When you are using a functional language like erlang or lisp, traversing data is unavoidable in most of cases while sql doesn't need it. So it is better to do it with sql if you are storing data in a database like postgres that supports the sql but if you don't need to save data, you are in the correct way.

Mnesia: how to use indexed operations correctly when selecting rows based on criteria involving multiple, indexed columns

Problem:
How to select records efficiently from a table where the select is based on criteria involving two indexed columns.
Example
I have a record,
#rec{key, value, type, last_update, other_stuff}
I have indexes on key (default), type and last_update columns
type is typically an atom or string
last_update is an integer (unix-style milliseconds since 1970)
I want, for example all records whose type = Type and have been updated since a specific time-stamp.
I do the following (wrapped in a non-dirty transaction)
lookup_by_type(Type, Since) ->
MatchHead = #rec{type=Type, last_update = '$1', _= '_'},
Guard = {'>', '$1', Since},
Result = '$_',
case mnesia:select(rec,[{MatchHead, [Guard],[Result]}]) of
[] -> {error, not_found};
Rslts -> {ok, Rslts}
end.
Question
Is the lookup_by_type function even using the underlying indexes?
Is there a better way to utilize indexes in this case
Is there an entirely different approach I should be taking?
Thank you all
One way, which will probably help you, is to look at QLC queries. These are more SQL/declarative and they will utilize indexes if possible by themselves IIRC.
But the main problem is that indexes in mnesia are hashes and thus do not support range queries. Thus you can only efficiently index on the type field currently and not on the last_update field.
One way around that is to make the table ordered_set and then shove the last_update to be the primary key. The key parameter can then be indexed if you need fast access to it. One storage possibility is something like: {{last_update, key}, key, type, ...}. Thus you can quickly answer queries because last_update is orderable.
Another way around it is to store last-update separately. Keep a table {last_update, key} which is an ordered set and use that to limit the amount of things to scan on the larger table in a query.
Remember that mnesia is best used as a small in-memory database. Thus scans are not necessarily a problem due to them being in-memory and thus pretty fast. Its main power though is the ability to do key/value lookups in a dirty way on data for quick query.

Rails - insert new data, or increment existing value with update

In my rails app, I have a "terms" model, that stores a term (a keyword), and the frequency with which it appears in a particular document set (an integer). Whenever a new document gets added to the set, I parse out the words, and then I need to either insert new terms, and their frequency, into the terms table, or I need to update the frequency of an existing term.
The easiest way to do this would be to do a find, then if it's empty do an insert, or if it's not empty, increment the frequency of the existing record by the correct amount. That's two queries per word, however, and documents with high word counts will result in a ludicrously long list of queries. Is there a more efficient way to do this?
You can do this really efficiently, actually. Well, if you're not afraid to tweak Rails's default table layout a bit, and if you're not afraid to generate your own raw SQL...
I'm going to assume you're using MySQL for your database (I'm not sure what other DBs support this): you can use INSERT ... ON DUPLICATE KEY UPDATE to do this.
You'll have to tweak your count table to get it to work, though - "on duplicate key" only refers to the primary key, and Rails's default ID, which is just an arbitrary number, won't help you. You'll need to change your primary key so that it identifies what makes each record unique - in your case, I'd say PRIMARY KEY(word, document_set_id). This might not be supported by Rails by default, but there's at least one plugin, and probably a couple more, if you don't like that one.
Once your database is set up, you can build one giant insert statement, and throw that at MySQL, letting the "on duplicate key" part of the query take care of the nasty existence-checking stuff for you (NOTE: there are plugins to do batch inserts, too, but I don;t know how they work - particularly in regards to "on duplicate key"):
counts = {}
#This is just demo code! Untested, and it'll leave in punctuation...
#document.text.split(' ').each do |word|
counts[word] ||= 0
counts[word] += 1
end
values = []
counts.each_pair do |word, count|
values << ActiveRecord::Base.send(:sanitize_sql_array, [
'(?, ?, ?)',
word,
#document.set_id,
count
])
end
#Massive line - sorry...
ActiveRecord::Base.connection.execute("INSERT INTO word_counts (word, document_set_id, occurences) VALUES ${values.join(', ')} ON DUPLICATE KEY UPDATE occurences = occurences + VALUES(occurences)")
And that'll do it - one SQL query for the entire new document. Should be much faster, half because you're only running a single query, and half because you've sidestepped ActiveRecord's sluggish query building.
Hope that helps!

How best to map this structure onto amazon SimpleDB

So simpledb has a kind of spreadsheet data model.
I have an app that simply needs to store keys against values. Except that a single key can have multiple values.
There will be multiple clients. Each client has an id with it's own set of keys.
I'd like to stick with a single domain if I can at this stage.
How can I map this onto simpleDB?
I was thinking
domain = mydomain
item = clientid
attribute.n.name = key_1 ... key_n
attribute.n.value = val1 ... valn
That would satisfy the ability to store multiple values for the same key.
But then I found that I need to either get ALL attributes in my select or know example
how many attributes I have. I will not know this up front.
Also I allow deleting a specific value from a key (or attribute). I will have to search for it first. It seems that in the select there is no attributeName() function, just the itemName() function.
Would it perhaps be better to make the item name a combination of id + key + _n ?
e.g. if the id is 'myid' and the key is 'boots' then the item name would be
'myidboots_1'
And then have a single attribute per item called say 'keyval'.
and I can do a
select 'keyval' where itemName like 'myidboots_%' ?
Still kindof cumbersome compared to a normal sql database.
Maybe I should try encoding the values like a comma separated list?
Except that it's probably more cumbersome and also I've read that there is a 1000 character limit.
Any other suggestions?
I'm not sure I totally follow your question, but I think it might be helpful to point out that SimpleDB lets you do classic SQL style queries like:
select * from foo where bar = '1'
This will return all the attributes/values for the resulting records.

Resources