I am looking at using Postgres's feature of setting json into a column via activerecords json handling features I am wondering how I would give it a default value upon table creation of something like {name: '', other_name: ''} and so on ...
I am also looking to understand how if I create a default json value for a column, like the example above and then later on I fill in the values, but then at a some other time reset it back to "default" how that would look.
It's just like any other default, once you fix up the json syntax:
CREATE TABLE mytable (
someothercol integer,
somecol json DEFAULT '{"name": "", "other_name": ""}'
);
If you set to DEFAULT, it does just that:
regress=> INSERT INTO mytable(someothercol, somecol) VALUES (42, '{"nondefault": 1}');
INSERT 0 1
regress=> SELECT * FROM mytable;
someothercol | somecol
--------------+-------------------
42 | {"nondefault": 1}
(1 row)
regress=> UPDATE mytable SET somecol = DEFAULT WHERE someothercol = 42;
UPDATE 1
regress=> SELECT * FROM mytable;
someothercol | somecol
--------------+--------------------------------
42 | {"name": "", "other_name": ""}
(1 row)
Related
How is it possible to mark a row in a ksql table for deletion via Rest api or at least as a statement in ksqldb-cli?
CREATE TABLE movies (
title VARCHAR PRIMARY KEY,
id INT,
release_year INT
) WITH (
KAFKA_TOPIC='movies',
PARTITIONS=1,
VALUE_FORMAT = 'JSON'
);
INSERT INTO MOVIES (ID, TITLE, RELEASE_YEAR) VALUES (48, 'Aliens', 1986);
This doesn't work for obvious reasons, but DELETE statement doesn't exist in ksqldb:
INSERT INTO MOVIES (ID, TITLE, RELEASE_YEAR) VALUES (48, null, null);
Is there a way to create a recommended tombstone null value or do I need to write it directly to the underlying topic?
There is a way to do this that's a bit of a workaround. The trick is to use the KAFKA value format to write a tombstone to the underlying topic.
Here's an example, using your original DDL.
-- Insert a second row of data
INSERT INTO MOVIES (ID, TITLE, RELEASE_YEAR) VALUES (42, 'Life of Brian', 1986);
-- Query table
ksql> SET 'auto.offset.reset' = 'earliest';
ksql> select * from movies emit changes limit 2;
+--------------------------------+--------------------------------+--------------------------------+
|TITLE |ID |RELEASE_YEAR |
+--------------------------------+--------------------------------+--------------------------------+
|Life of Brian |42 |1986 |
|Aliens |48 |1986 |
Limit Reached
Query terminated
Now declare a new stream that will write to the same Kafka topic using the same key:
CREATE STREAM MOVIES_DELETED (title VARCHAR KEY, DUMMY VARCHAR)
WITH (KAFKA_TOPIC='movies',
VALUE_FORMAT='KAFKA');
Insert a tombstone message:
INSERT INTO MOVIES_DELETED (TITLE,DUMMY) VALUES ('Aliens',CAST(NULL AS VARCHAR));
Query the table again:
ksql> select * from movies emit changes limit 2;
+--------------------------------+--------------------------------+--------------------------------+
|TITLE |ID |RELEASE_YEAR |
+--------------------------------+--------------------------------+--------------------------------+
|Life of Brian |42 |1986 |
Examine the underlying topic
ksql> print movies;
Key format: KAFKA_STRING
Value format: JSON or KAFKA_STRING
rowtime: 2021/02/22 11:01:05.966 Z, key: Aliens, value: {"ID":48,"RELEASE_YEAR":1986}, partition: 0
rowtime: 2021/02/22 11:02:00.194 Z, key: Life of Brian, value: {"ID":42,"RELEASE_YEAR":1986}, partition: 0
rowtime: 2021/02/22 11:04:52.569 Z, key: Aliens, value: <null>, partition: 0
I am able to make a query to an influxdb and select all the fields/tags:
select * from http_reqs where time > now() - 4d and "status" =~ /^4/
which returns a list of matching values. The first row looks like this:
time error error_code method name proto scenario status tls_version type url value
But when I try to select only a subset of these fields/tags (according to the documentation), I get no result at all:
select "time","name" from http_reqs where time > now() - 4d and "status" =~ /^4
No matter what I try to select. The documentation seems to be wrong or incorrect!
How am I be able to select the fields/tags I want?
It seems you have to first figure out what are "fields" and what are "tags". So you need to do:
show field keys from http_reqs;
in my case this returns
name: http_reqs
fieldKey fieldType
-------- ---------
url string
value float
As per documentation, you have to use at least one field key (for whatever reason).
Then you can query what you need (in addition to the field key, if you need that or not):
select "url","error" from http_reqs where time > now() - 4d and "status" =~ /^4/
I have a stream in ksql, called turnstile_stream. For a column value (station_id) in that stream, when I query all entries, I get below result
ksql> select * from turnstile_stream where station_id = 40820 emit changes;
+----------------------------------------------------+----------------------------------------------------+----------------------------------------------------+----------------------------------------------------+----------------------------------------------------+
|ROWTIME |ROWKEY |STATION_ID |STATION_NAME |LINE |
+----------------------------------------------------+----------------------------------------------------+----------------------------------------------------+----------------------------------------------------+----------------------------------------------------+
|1580720442456 |�Ը�
|40820 |Rosemont |blue |
|1580720442456 |�Ը�
|40820 |Rosemont |blue |
Means, there are only two entries in the stream for that station_id. Which is correct, since I had pushed only two events in my topic, which is being used to create the stream. Now, I have a table, which I have created by using below query. The query groups by station_id and takes a count of event in the stream turnstile_stream.
ksql> describe extended turnstile_summary;
Name : TURNSTILE_SUMMARY
Type : TABLE
Key field : STATION_ID
Key format : STRING
Timestamp field : Not set - using <ROWTIME>
Value format : AVRO
Kafka topic : turnstile_summary_1 (partitions: 2, replication: 1)
Field | Type
----------------------------------------
ROWTIME | BIGINT (system)
ROWKEY | VARCHAR(STRING) (system)
STATION_ID | INTEGER
COUNT | BIGINT
----------------------------------------
Queries that write from this TABLE
-----------------------------------
CTAS_TURNSTILE_SUMMARY_6 : CREATE TABLE TURNSTILE_SUMMARY WITH (KAFKA_TOPIC='turnstile_summary_1', PARTITIONS=2, REPLICAS=1, VALUE_FORMAT='AVRO') AS SELECT
TURNSTILE_STREAM.STATION_ID "STATION_ID",
COUNT(*) "COUNT"
FROM TURNSTILE_STREAM TURNSTILE_STREAM
GROUP BY TURNSTILE_STREAM.STATION_ID
EMIT CHANGES;
Now, the problem is, when I query this turnstile_summary table, I get the below result, which doesn't makes sense.
ksql> select * from turnstile_summary where station_id = 40820 emit changes;
+------------------------------------------------------------------+------------------------------------------------------------------+------------------------------------------------------------------+------------------------------------------------------------------+
|ROWTIME |ROWKEY |STATION_ID |COUNT |
+------------------------------------------------------------------+------------------------------------------------------------------+------------------------------------------------------------------+------------------------------------------------------------------+
|1580720442562 |�Ը�
|40820 |9 |
|1580720442562 |�Ը�
|40820 |10 |
As you can see, the count is 9 and 10, which can not be, since there are only two rows in the stream for that station_id. I am scratching my head, but to no use. Any help is highly appreciated.
I made two changes to make this work.
First, the weird characters in the ROWKEY column of stream and table were due to long type in the key Avro Schema. I changed the key schema from
{
"type": "record",
"name": "arrival.key",
"fields": [
{
"name": "timestamp",
"type": "long"
}
]
}
to
{
"namespace": "com.udacity",
"type": "record",
"name": "arrival.key",
"fields": [
{
"name": "timestamp",
"type": "string" <<-----------
}
]
}
Second, when I was declaring the stream, I was giving it a key declaration, which I should not have given. So, I changed my stream definition from
CREATE STREAM turnstile_stream (
station_id INT,
station_name VARCHAR,
line VARCHAR
) WITH (
KAFKA_TOPIC='app.entity.turnstile',
VALUE_FORMAT='AVRO',
KEY='station_id'
);
to
CREATE STREAM turnstile_stream (
station_id INT,
station_name VARCHAR,
line VARCHAR
) WITH (
KAFKA_TOPIC='app.entity.turnstile',
VALUE_FORMAT='AVRO'
);
After making these changes, my aggregate was working properly.
I'm using PostgreSQL 9.4. I've got a resources table which has the following columns:
id
name
provider
description
category
Let's say none of these columns is required (save for the id). I want resources to have a completion level, meaning that resources with NULL values for each column will be at 0% completion level.
Now, each column has a percentage weight. Let's say:
name: 40%
provider: 30%
description: 20%
category: 10%
So if a resource has a provider and a category, its completion level is at 60%.
These weight percentages could change at any time, so having a completion_level column which always has the value of the completion level will not work out (there could be million of resources). For example, at any moment, the percentage weight of description could decrease from 20% to 10% and category's from 10% to 20%. Maybe even other columns could be created and have their own weight.
The final objective is to be able to order resources by their completion levels.
I'm not sure how to approach this. I'm currently using Rails so almost all interaction with the database has been through the ORM, which I believe is not going to be much help in this case.
The only query I've found that somewhat resembles a solution (and not really) is to do something like the following:
SELECT * from resources
ORDER BY CASE name IS NOT NULL AND
provider IS NOT NULL AND
description is NOT NULL AND
category IS NOT NULL THEN 100
WHEN name is NULL AND provider IS NOT NULL...
However, there I must per mutate by every possible combination and that's pretty bad.
Add a weights table as in this SQL Fiddle:
PostgreSQL 9.6 Schema Setup:
CREATE TABLE resource_weights
( id int primary key check(id = 1)
, name numeric
, provider numeric
, description numeric
, category numeric);
INSERT INTO resource_weights
(id, name, provider, description, category)
VALUES
(1, .4, .3, .2, .1);
CREATE TABLE resources
( id int
, name varchar(50)
, provider varchar(50)
, description varchar(50)
, category varchar(50));
INSERT INTO resources
(id, name, provider, description, category)
VALUES
(1, 'abc', 'abc', 'abc', 'abc'),
(2, NULL, 'abc', 'abc', 'abc'),
(3, NULL, NULL, 'abc', 'abc'),
(4, NULL, 'abc', NULL, NULL);
Then calculate your weights at runtime like this
Query 1:
select r.*
, case when r.name is null then 0 else w.name end
+ case when r.provider is null then 0 else w.provider end
+ case when r.description is null then 0 else w.description end
+ case when r.category is null then 0 else w.category end weight
from resources r
cross join resource_weights w
order by weight desc
Results:
| id | name | provider | description | category | weight |
|----|--------|----------|-------------|----------|--------|
| 1 | abc | abc | abc | abc | 1 |
| 2 | (null) | abc | abc | abc | 0.6 |
| 3 | (null) | (null) | abc | abc | 0.3 |
| 4 | (null) | abc | (null) | (null) | 0.3 |
SQL's ORDER BY can order things by pretty much any expression; in particular, you can order by a sum. CASE is also fairly versatile (if somewhat verbose) and an expression so you can say things like:
case when name is not null then 40 else 0 end
which is more or less equivalent to name.nil?? 0 : 40 in Ruby.
Putting those together:
order by case when name is not null then 40 else 0 end
+ case when provider is not null then 30 else 0 end
+ case when description is not null then 20 else 0 end
+ case when category is not null then 10 else 0 end
Somewhat verbose but it'll do the right thing. Translating that into ActiveRecord is fairly easy:
query.order(Arel.sql(%q{
case when name is not null then 40 else 0 end
+ case when provider is not null then 30 else 0 end
+ case when description is not null then 20 else 0 end
+ case when category is not null then 10 else 0 end
}))
or in the other direction:
query.order(Arel.sql(%q{
case when name is not null then 40 else 0 end
+ case when provider is not null then 30 else 0 end
+ case when description is not null then 20 else 0 end
+ case when category is not null then 10 else 0 end
desc
}))
You'll need the Arel.sql call to avoid deprecation warnings in Rails 5.2+ as they don't want you to order(some_string) anymore, they just want you ordering by attributes unless you want to jump through some hoops to say that you really mean it.
Sum up weights like this:
SELECT * FROM resources
ORDER BY (CASE WHEN name IS NULL THEN 0 ELSE 40 END
+ CASE WHEN provider IS NULL THEN 0 ELSE 30 END
+ CASE WHEN description IS NULL THEN 0 ELSE 20 END
+ CASE WHEN category IS NULL THEN 0 ELSE 10 END) DESC;
This is how I would do it.
First: Weights
Since you say that the weights can chage from time to time, you have to create an structure to handle the changes. It could be a simple table. For this solution, it will be called weigths.
-- Table: weights
CREATE TABLE weights(id serial, table_nane text, column_name text, weight numeric(5,2));
id | table_name | column_name | weight
---+------------+--------------+--------
1 | resources | name | 40.00
2 | resources | provider | 30.00
3 | resources | description | 20.00
4 | resources | category | 10.00
So, when you need to change categories from 10 to 20 or/and description from 20 to 10, you update this structure.
Second: completion_level
Since you say that you could have millions of rows, it is ok to have completion_level column in the table resources; for efficiency purposes.
Making a query to get the completion_level works, you could have it in a view. But when you need the data fast and simple and you have MILLIONS of rows, it is better to set the data by "default" in a column or in another table.
When you have a view, every time you run it, it recreates the data. When you have it already on the table, it's fast and you don't have to recreate nothing, just query the data.
But how can you handle a completion_level? TRIGGERS
You would have to create a trigger for resources table. So, whenever you update or insert data, it will create the completion level.
First you add the column to the resources table
ALTER TABLE resources ADD COLUMN completion_level numeric(5,2);
And then you create the trigger:
CREATE OR REPLACE FUNCTION update_completion_level() RETURNS trigger AS $$
BEGIN
NEW.completion_level := (
CASE WHEN NEW.name IS NULL THEN 0
ELSE (SELECT weight FROM weights WHERE column_name='name') END
+ CASE WHEN NEW.provider IS NULL THEN 0
ELSE (SELECT weight FROM weights WHERE column_name='provider') END
+ CASE WHEN NEW.description IS NULL THEN 0
ELSE (SELECT weight FROM weights WHERE column_name='description') END
+ CASE WHEN NEW.category IS NULL THEN 0
ELSE (SELECT weight FROM weights WHERE column_name='category') END
);
RETURN NEW;
END $$ LANGUAGE plpgsql;
CREATE TRIGGER resources_completion_level
BEFORE INSERT OR UPDATE
ON resources
FOR EACH ROW
EXECUTE PROCEDURE update_completion_level();
NOTE: table weights has a column called table_name; it's just in case you want to expand this functionality to other tables. In that case, you should update the trigger and add AND table_name='resources' in the query.
With this trigger, every time you update or insert you would have your completion_level ready so getting this data would be a simple query on resources table ;)
Third: What about old data and updates on weights?
Since the trigger only works for update and inserts, what about old data? or what if I change the weights of the columns?
Well, for those cases you could use a function to recreate all completion_level for every row.
CREATE OR REPLACE FUNCTION update_resources_completion_level() RETURNS void AS $$
BEGIN
UPDATE resources set completion_level = (
CASE WHEN name IS NULL THEN 0
ELSE (SELECT weight FROM weights WHERE column_name='name') END
+ CASE WHEN provider IS NULL THEN 0
ELSE (SELECT weight FROM weights WHERE column_name='provider') END
+ CASE WHEN description IS NULL THEN 0
ELSE (SELECT weight FROM weights WHERE column_name='description') END
+ CASE WHEN category IS NULL THEN 0
ELSE (SELECT weight FROM weights WHERE column_name='category') END
);
END $$ LANGUAGE plpgsql;
So everytime you update the weights or to update the OLD data, you just run the function
SELECT update_resources_completion_level();
Finally: What if I add columns?
Well, you would have to insert the new column in the weights table and update the functions (trigger and update_resources_completion_level()). Once everything is set, you run the function update_resources_completion_level() to set all weights acording to the changes :D
I have a model for stocks and a model for stock_price_history.
I want to mass insert with this
sqlstatement = "INSERT INTO stock_histories SELECT datapoint1 AS id,
datapoint2 AS `date` ...UNION SELECT datapoint9,10,11,12,13,14,15,16,
UNION SELECT datapoint 17... etc"
ActiveRecord::Base.connection.execute sqlstatement
However, I don't actually want to use datapoint1 AS id. If I leave it blank I get an error that my model has 10 fields and I'm inserting only 9 and that it is missing the primary key.
Is there a way to force an auto increment on the id when inserting by SQL?
Edit: Bonus question cause I'm a noob. I am developing in SQLite3 and deploying to a Posgres (i.e. Heroku), Will I need to modify the above mass insert statement so it's for a posgres database?
2nd edit: my initial question had Assets and AssetHistory instead of Stocks and Stock_Histories. I changed it to Stocks / Stock price histories because I thought it was more intuitive to understand. Therefore some answers refer to Asset Histories for this reason.
You can change your SQL and be more explicit about which fields you're inserting, and leave id out of the list:
insert into asset_histories (date) select datapoint2 as `date` ...etc
Here's a long real example:
jim=# create table test1 (id serial not null, date date not null, name text not null);
NOTICE: CREATE TABLE will create implicit sequence "test1_id_seq" for serial column "test1.id"
CREATE TABLE
jim=# create table test2 (id serial not null, date date not null, name text not null);
NOTICE: CREATE TABLE will create implicit sequence "test2_id_seq" for serial column "test2.id"
CREATE TABLE
jim=# insert into test1 (date, name) values (now(), 'jim');
INSERT 0 1
jim=# insert into test1 (date, name) values (now(), 'joe');
INSERT 0 1
jim=# insert into test1 (date, name) values (now(), 'bob');
INSERT 0 1
jim=# select * from test1;
id | date | name
----+------------+------
1 | 2013-03-14 | jim
2 | 2013-03-14 | joe
3 | 2013-03-14 | bob
(3 rows)
jim=# insert into test2 (date, name) select date, name from test1 where name <> 'jim';
INSERT 0 2
jim=# select * from test2;
id | date | name
----+------------+------
1 | 2013-03-14 | joe
2 | 2013-03-14 | bob
(2 rows)
As you can see, only the selected rows were inserted, and they were assigned new id values in table test2. You'll have to be explicit about all the fields you want to insert, and ensure that the ordering of the insert and the select match.
Having said all that, you might want to look into the activerecord-import gem, which makes this sort of thing a lot more Railsy. Assuming you have a bunch of new AssetHistory objects (not persisted yet), you could insert them all with:
asset_histories = []
asset_histories << AssetHistory.new date: some_date
asset_histories << AssetHistory.new date: some_other_date
AssetHistory.import asset_histories
That will generate a single efficient insert into the table, and handle the id for you. You'll still need to query some data and construct the objects, which may not be faster than doing it all with raw SQL, but may be a better alternative if you've already got the data in Ruby objects.