How to remove entire redis hashset in minimum time - memory

To delete an entire HashSet Redis we use
HDEL key field [field ...]
where field indicates the field which we want to delete in the HashSet.
The above operation takes O(N) time, where N is the number of fields.
Can't we just remove the HashSet Reference i.e. key in the above-given command?
Would that be the correct way to remove an entire HashSet in Redis? If not, why?

You can use del key, it works with any data type including hset

Related

What would cause Postgres to lose track of the next ID, and how could I fix it?

I mysteriously got an error in my Rails app locally:
PG::UniqueViolation: ERROR: duplicate key value violates unique constraint "users_pkey"
DETAIL: Key (id)=(45) already exists.
The strange thing is that I didn't specify 45 as the ID. This number came from Postgres itself, which also then complained about it. I know this because when I tried it again I got the error with 46. The brute-force fix I used was to just repeat the insertion until it worked, therefore bringing Postgres' idea of the table's next available ID into line with reality.
500.times { User.create({employee_id: 1010101010101, blah_blah: "blah"}) rescue nil }
Since the employee_id has a unique constraint, any subsequent attempts to create the user after the first successful one would fail. And any previous to the first successful one would fail because Postgres tried to use an invalid id (primary key for the table).
So the brute-force approach works, but it's inelegant and it leaves me wondering what could have caused the database to get into this state. It also leaves me wondering how to check to see whether the production database is similarly inconsistent, and how to fix it (short of repeating the brute-force "fix").
Finding your Sequence
The first step to updating your sequence object is to figure out what the name of your sequence is. To find this, you can use the pg_get_serial_sequence() function.
SELECT pg_get_serial_sequence('table_name','id');
This will output something like public.person_id_seq, which is the relation name (regclass).
In Postgres 10+ there is also a pg_sequences view that you can use to find all sorts of information related to your sequences. The last_value column will show you the current value of the sequence:
SELECT * FROM pg_sequences;
Updating your Sequence
Once you have the sequence name, there are a few ways you can reset the sequence value:
1 - Use setval()
SELECT setval('public.person_id_seq',1020); -- Next value will be 1021
SELECT setval('public.person_id_seq',1020, False); -- Next value will be 1020
Source
2 - Use ALTER SEQUENCE (RESTART WITH)
ALTER SEQUENCE person_id_seq RESTART WITH 1030;
In this case, the value you provide (ex. 1030) will be the next value returned, so technically the sequence is being reset to <YOUR VALUE> - 1.
3 - Use ALTER SEQUENCE (START WITH, RESTART)
ALTER SEQUENCE person_id_seq START WITH 1030;
ALTER SEQUENCE person_id_seq RESTART;
Using this method is preferred if you need to repeatedly restart to a specific value. Subsequent calls to RESTART will reset the sequence to 1030 in this example.
This sort of thing happens when rows with specified IDs were inserted into the table. Since the IDs are specified, Postgres doesn't increment its sequence when inserting, and then the sequence becomes out of date with the data in the table. This could happen when manually inserted rows, or copying in rows from a CSV file, or replicating in rows, etc.
To avoid the issue, you simply need to let Postgres always handle the IDs, and never specify the ID yourself. However, if you've already messed up and need to fix the sequence, you can do so with the ALTER SEQUENCE command (using RESTART WITH or INCREMENT).

Redis get keys for sorted sets where at least one member has score less than N

I Have multiple sorted sets in my redis. The keys of them have the following pattern:
user:{userId}:data
where userId is the actual value.
Each member of the corresponding set has score equals to the timestamp when it was added.
Now I'm trying to figure out how can I retrieve redis keys for those sorted sets where at least one member was added before a certain timestamp (meaning that at least one member has score less than given timestamp in millis).
I can retrieve all the keys by pattern:
KEYS 'user:*:data'
Actually, I can check required condition for one key using command:
ZRANGEBYSCORE user:{userId}:data -inf {timestamp}
and then checking size of returned data
But do I have any way (either one-line, or piping commands, or with Lua script) to get only those keys of sorted sets where at least one of the elements has lower score than given?
I'd avoid KEYS, or SCAN, if this query is important (i.e. needs to return fast). Instead, keep another sorted sets with the earliest timestamp each user has, and query it.

Retrieving statement in Jena by its unique ID

I'm building a REST API which will serve information about statements stored in my Jena TDB.
It would be great if each statement has its unique ID so I can use this ID in GET request to retrieve information about particular statement. Is there something like that in Jena?
I know I can retrieve statement(s) by providing appropriate subject/predicate/object identifiers to model.listStatements method, but it would be quite ugly to add these parameters to API GET requests.
In RDF, a triple is defined by its subject, object and predicate. If you have two triples with the same S/P/O, it is really the same triple (value-equality, not instance equality). An RDF graph is a set of triples; if you add a triple twice, the set has only one instance. There is no triple id concept in RDF, and there isn't internally in TDB.
So you could use unique identifiers, say a string of length 4, for every S, every P and every O. Just save them all as key/value (id/resource, id/property) pairs. Then you will have a string of length 12 as unique identifier of your statement.
Even if a statements is deleted and added again, leading to a different id when tagging every statement with an id, this method will yield the same statement every time.

How to match nodes only by value? (not defining specific property)

I want to search nodes by value only, which can be in any of node properties. I know that this is an expensive operation, but nodes will be cut off by some relationship conditions.
I want something like this:
MATCH (n: {*:"Search value"})
RETURN n
Where * imply "any property".
Is there a way to do this?
interesting tidbits can be found in this abstract regarding this topic and why it might not be implemented
https://docs.google.com/document/d/1FPfGkgzhcRXVkleBLBsA92U94Mx4yafu3nO-Xf-NzsE/edit#heading=h.pyvdg2rbofq
Semantics of dynamic key expressions
Using a dynamic key expression like <mapExpr>[<keyExpr>] requires that <mapExpr> evaluates to a map or an entity, and that <keyExpr> evaluates to a string. If this is not the case, a type error is produced either at compile time or at runtime.
If this is given, evaluating <mapExpr>[<keyExpr>] first evaluates keyExpr to a string value (the key), and then evaluates <mapExpr> to a map-like value (the map). Finally the result of <mapExpr>[<keyExpr>] is computed by performing a lookup of the key in the map. If the key is found, the associated value becomes the result. If the key is not found, <mapExpr>[<keyExpr>] evaluates to NULL.
Thus the result of evaluating <mapExpr>[<keyExpr>] can be any value (including NULL
Caveats
Dynamic property lookup might entice users to encode information in property key names. This is bad practice as it interferes with planning, leads to unnatural data models, and might lead to exhausting the available property key id space. This is addressed by issuing a warning when a query uses a dynamic property lookup with a dynamic property key name.
To my knowledge, no. Seems to me that what you really are trying to do would be better achieved by creating a search index over the graph using something like elasticsearch or solr. This would give you the ability to search over all properties. Your choice of analyzer whilst indexing would give you the option of exact or partial value matches.

Can one rely on the auto-incrementing primary key in your database?

In my present Rails application, I am resolving scheduling conflicts by sorting the models by the "created_at" field. However, I realized that when inserting multiple models from a form that allows this, all of the created_at times are exactly the same!
This is more a question of best programming practices: Can your application rely on your ID column in your database to increment greater and greater with each INSERT to get their order of creation? To put it another way, can I sort a group of rows I pull out of my database by their ID column and be assured this is an accurate sort based on creation order? And is this a good practice in my application?
The generated identification numbers will be unique.
Regardless of whether you use Sequences, like in PostgreSQL and Oracle or if you use another mechanism like auto-increment of MySQL.
However, Sequences are most often acquired in bulks of, for example 20 numbers.
So with PostgreSQL you can not determine which field was inserted first. There might even be gaps in the id's of inserted records.
Therefore you shouldn't use a generated id field for a task like that in order to not rely on database implementation details.
Generating a created or updated field during command execution is much better for sorting by creation-, or update-time later on.
For example:
INSERT INTO A (data, created) VALUES (smething, DATE())
UPDATE A SET data=something, updated=DATE()
That depends on your database vendor.
MySQL I believe absolutely orders auto increment keys. SQL Server I don't know for sure that it does or not but I believe that it does.
Where you'll run into problems is with databases that don't support this functionality, most notably Oracle that uses sequences, which are roughly but not absolutely ordered.
An alternative might be to go for created time and then ID.
I believe the answer to your question is yes...if I read between the lines, I think you are concerned that the system may re-use ID's numbers that are 'missing' in the sequence, and therefore if you had used 1,2,3,5,6,7 as ID numbers, in all the implementations I know of, the next ID number will always be 8 (or possibly higher), but I don't know of any DB that would try and figure out that record Id #4 is missing, so attempt to re-use that ID number.
Though I am most familiar with SQL Server, I don't know why any vendor who try and fill the gaps in a sequence - think of the overhead of keeping that list of unused ID's, as opposed to just always keeping track of the last I number used, and adding 1.
I'd say you could safely rely on the next ID assigned number always being higher than the last - not just unique.
Yes the id will be unique and no, you can not and should not rely on it for sorting - it is there to guarantee row uniqueness only. The best approach is, as emktas indicated, to use a separate "updated" or "created" field for just this information.
For setting the creation time, you can just use a default value like this
CREATE TABLE foo (
id INTEGER UNSIGNED AUTO_INCREMENT NOT NULL;
created TIMESTAMP NOT NULL DEFAULT NOW();
updated TIMESTAMP;
PRIMARY KEY(id);
) engine=InnoDB; ## whatever :P
Now, that takes care of creation time. with update time I would suggest an AFTER UPDATE trigger like this one (of course you can do it in a separate query, but the trigger, in my opinion, is a better solution - more transparent):
DELIMITER $$
CREATE TRIGGER foo_a_upd AFTER UPDATE ON foo
FOR EACH ROW BEGIN
SET NEW.updated = NOW();
END;
$$
DELIMITER ;
And that should do it.
EDIT:
Woe is me. Foolishly I've not specified, that this is for mysql, there might be some differences in the function names (namely, 'NOW') and other subtle itty-bitty.
One caveat to EJB's answer:
SQL does not give any guarantee of ordering if you don't specify an order by column. E.g. if you delete some early rows, then insert 'em, the new ones may end up living in the same place in the db the old ones did (albeit with new IDs), and that's what it may use as its default sort.
FWIW, I typically use order by ID as an effective version of order by created_at. It's cheaper in that it doesn't require adding an index to a datetime field (which is bigger and therefore slower than a simple integer primary key index), guaranteed to be different, and I don't really care if a few rows that were added at about the same time sort in some slightly different order.
This is probably DB engine depended. I would check how your DB implements sequences and if there are no documented problems then I would decide to rely on ID.
E.g. Postgresql sequence is OK unless you play with the sequence cache parameters.
There is a possibility that other programmer will manually create or copy records from different DB with wrong ID column. However I would simplify the problem. Do not bother with low probability cases where someone will manually destroy data integrity. You cannot protect against everything.
My advice is to rely on sequence generated IDs and move your project forward.
In theory yes the highest id number is the last created. Remember though that databases do have the ability to temporaily turn off the insert of the autogenerated value , insert some records manaully and then turn it back on. These inserts are no typically used on a production system but can happen occasionally when moving a large chunk of data from another system.

Resources