I am inserting a measurement with a guid id as one of the tag like this,
insert cpu,id=acaf4a7d-9f2f-49d6-b87c-1101e76dc56d,asset=a1 value=10
Let say with same id, same asset=a1 and same value=10 I am inserting one more time same series,
insert cpu,id=acaf4a7d-9f2f-49d6-b87c-1101e76dc56d,asset=a1 value=10
Then I have duplicate data for the measurement with only different in time (of course)
Question is, since I have same id, can we prevent duplicate insert?
Related
I have a single table called SPARES
The table has 3 fields, Stock_No, Store_Id, Description
In the data set there are duplicate Stock_Nos and in some cases the descriptions vary
I make one query to extract the duplicate stock Nos then tried a 2nd query based on the first but don’t know how to extract records with duplicate Stock_Nos but with different descriptions
Can anyone help?
Can it be done in one query?
What is the most efficient way to duplicate a row in an Sqlite3 database exactly except with an updated PrimaryKey?
Use an insert .. select where both the insert and select reference the same relation. Then the entire operation will occur as a single statement within SQLite code itself which cannot be beaten for "efficiency".
It will be easier with an auto-PK (just don't select the PK column), although an appropriate natural key can be assigned as well if such can be derived.
See SQLite INSERT SELECT Query Results into Existing Table?
I have a Join table in Rails which is just a 2 column table with ids.
In order to mass insert into this table, I use
ActiveRecord::Base.connection.execute("INSERT INTO myjointable (first_id,second_id) VALUES #{values})
Unfortunately this gives me errors when there are duplicates. I don't need to update any values, simply move on to the next insert if a duplicate exists.
How would I do this?
As an fyi I have searched stackoverflow and most the answers are a bit advanced for me to understand. I've also checked the postgresql documents and played around in the rails console but still to no avail. I can't figure this one out so i'm hoping someone else can help tell me what I'm doing wrong.
The closest statement I've tried is:
INSERT INTO myjointable (first_id,second_id) SELECT 1,2
WHERE NOT EXISTS (
SELECT first_id FROM myjointable
WHERE first_id = 1 AND second_id IN (...))
Part of the problem with this statement is that I am only inserting 1 value at a time whereas I want a statement that mass inserts. Also the second_id IN (...) section of the statement can include up to 100 different values so I'm not sure how slow that will be.
Note that for the most part there should not be many duplicates so I am not sure if mass inserting to a temporary table and finding distinct values is a good idea.
Edit to add context:
The reason I need a mass insert is because I have a many to many relationship between 2 models where 1 of the models is never populated by a form. I have stocks, and stock price histories. The stock price histories are never created in a form, but rather mass inserted themselves by pulling the data from YahooFinance with their yahoo finance API. I use the activerecord-import gem to mass insert for stock price histories (i.e. Model.import columns,values) but I can't type jointable.import columns,values because I get the jointable is an undefined local variable
I ended up using the WITH clause to select my values and give it a name. Then I inserted those values and used WHERE NOT EXISTS to effectively skip any items that are already in my database.
So far it looks like it is working...
WITH withqueryname(first_id,second_id) AS (VALUES(1,2),(3,4),(5,6)...etc)
INSERT INTO jointablename (first_id,second_id)
SELECT * FROM withqueryname
WHERE NOT EXISTS(
SELECT first_id FROM jointablename WHERE
first_id = 1 AND
second_id IN (1,2,3,4,5,6..etc))
You can interchange the Values with a variable. Mine was VALUES#{values}
You can also interchange the second_id IN with a variable. Mine was second_id IN #{variable}.
Here's how I'd tackle it: Create a temp table and populate it with your new values. Then lock the old join values table to prevent concurrent modification (important) and insert all value pairs that appear in the new table but not the old one.
One way to do this is by doing a left outer join of the old values onto the new ones and filtering for rows where the old join table values are null. Another approach is to use an EXISTS subquery. The two are highly likely to result in the same query plan once the query optimiser is done with them anyway.
Example, untested (since you didn't provide an SQLFiddle or sample data) but should work:
BEGIN;
CREATE TEMPORARY TABLE newjoinvalues(
first_id integer,
second_id integer,
primary key(first_id,second_id)
);
-- Now populate `newjoinvalues` with multi-valued inserts or COPY
COPY newjoinvalues(first_id, second_id) FROM stdin;
LOCK TABLE myjoinvalues IN EXCLUSIVE MODE;
INSERT INTO myjoinvalues
SELECT n.first_id, n.second_id
FROM newjoinvalues n
LEFT OUTER JOIN myjoinvalues m ON (n.first_id = m.first_id AND n.second_id = m.second_id)
WHERE m.first_id IS NULL AND m.second_id IS NULL;
COMMIT;
This won't update existing values, but you can do that fairly easily too by using with a second query that does an UPDATE ... FROM while still holding the write table lock.
Note that the lock mode specified above will not block SELECTs, only writes like INSERT, UPDATE and DELETE, so queries can continue to be made to the table while the process is ongoing, you just can't update it.
If you can't accept that an alternative is to run the update in SERIALIZABLE isolation (only works properly for this purpose in Pg 9.1 and above). This will result in the query failing whenever a concurrent write occurs so you have to be prepared to retry it over and over and over again. For that reason it's likely to be better to just live with locking the table for a while.
We've got a data warehouse design with four dimension tables and one fact table:
dimUser id, email, firstName, lastName
dimAddress id, city
dimLanguage id, language
dimDate id, startDate, endDate
factStatistic id, dimUserId, dimAddressId, dimLanguageId, dimDate, loginCount, pageCalledCount
Our problem is: We want to build the fact table which includes calculating the statistics (depending on userId, date range) and filling the foreign keys.
But we don't know how, because we don't understand how to use natural keys (which seems to be the solution to our problem according to the literature we read).
I believe a natural key would be the userId, which is needed in all ETL jobs which calculate the dimension data.
But there are many difficulties:
in the ETL jobs load(), we do bulk inserts with INSERT IGNORE INTO to remove duplicates => we don't know the surrogate keys which were generated
if we create meta data (including a set of dimension_name, surrogate_key, natural_key) this will not work because of the duplicate elimination
The problem seems to be the duplicate elimination strategy. Is there a better approach?
We are using MySQL 5.1, if it makes any difference.
If your fact table is tracking logins and page calls per user, then you should have set of source tables which track these things, which is where you'll load your fact table data from. I would probably build the fact table at the grain of one row per user / login date - or even lower to persist atomic data if at all possible.
Here you would then have a fact table with two dimensions - User and Date. You can persist address and language as dimensions on the fact as well, but these are really just attributes of user.
Your dimensions should have surrogate keys, but also should have the source "business" or "natural" key available - either as an attribute on the dimension itself, or through a mapping table as your colleague suggested. It's not "wrong" to use a mapping table - it does make things easier when there are multiple sources.
If you store the business keys on a mapping table, or in the dimension as an attribue, then for each row to load in the fact, it's a simple lookup (usually via a join) against the dim or mapping table to get the surrogate key for the user (and then from the user to get the user's "current" address / language to persist on the fact). The date dimension usually hase a surrogate key stored in a YYYYMMDD or other "natural" format - you can just generate this from the date information on your source record that you're loading into the fact.
do not force for single query, try to load the data in separated queries and mix the data in some provider...
I have an Order object which can be in an unpaid or paid state. When an order is paid, I want to set an order_number which should be an incrementing number.
Sounds easy enough, but I'm worried about collisions. I can imagine one order having an order_number stored in memory, about to save and then another order saves itself, using that number, now the one in memory should be recalculated, but how?
You can create a database table that just contains an AUTO_INCREMENT primary key. When you need to to get a new order_number just do an insert into this table and get the the value of the primary key for the created row.
There are a lot of approaches. Essentially you need a lock to ensure that each request to the counter always return a different value.
Memcache, Redis and some key-value stores have this kind of counter feature. eg, each time you want to get a new order_number, just call the incr command of Redis, it will increment the counter and return the new value.
A more complex solution can be implemented via the trigger/stored procedure/sequence features of RMDBS(like mysql). For mysql, create a new table contains an AUTO_INCREMENT primary key. When you want to get a new order_number, make an insert into this table and get last_insert_id(). If you want ACID, just wrap the procedure in a transaction.