What is the syntax for DSE Cassandra solr_query for collection types? - datastax-enterprise

I am using DSE Cassandra and wanted to use solr_query for the collection (map, list, set etc.) type columns so wanted the syntax of solr_query for this.
The sample table schema is as follows
CREATE TABLE user_properties (
id UUID,
user_id INT,
properties MAP<text, text>,
PRIMARY KEY (id)
)
Here how do I do solr_query for the 'properties' column?

You will need the schema set up to handle this as a dynamic field. The offical DataStax Doc is here. Once the setup is done you can follow this doc to setup the query.

Related

How to join a KSQL table and a stream on a non row key column

I am using community edition of confluent Platform version 5.4.1. I did not find any CLI command to print the KSQL Server version but when I enter KSQL what I get to see can be found in the attached screenshot.
I have a geofence table -
CREATE TABLE GEOFENCE (GEOFENCEID INT,
FLEETID VARCHAR,
GEOFENCECOORDINATES VARCHAR)
WITH (KAFKA_TOPIC='MONGODB-GEOFENCE',
VALUE_FORMAT='JSON',
KEY= 'GEOFENCEID');
The data is coming to Geofence KSQL table from Kafka MongoDB source connector whenever an insert or update operation is performed on the geofence MongoDB collection from a web application supported by a REST API. The idea behind making geofence a table is that since tables are mutable it would hold the updated geofence information and since the insert or update operation will not be very frequent and whenever there are changes in the Geofence MongoDB collection they will get updated on the Geofence KSQL table since the key here is GeofenceId.
I have a live stream of vehicle position -
CREATE STREAM VEHICLE_POSITION (VEHICLEID INT,
FLEETID VARCHAR,
LATITUDE DOUBLE,
LONGITUDE DOUBLE)
WITH (KAFKA_TOPIC='VEHICLE_POSITION',
VALUE_FORMAT='JSON')
I want to join table and stream like this -
CREATE STREAM VEHICLE_DISTANCE_FROM_GEOFENCE AS
SELECT GF.GEOFENCEID,
GF.FLEETID,
VP.VEHICLEID,
GEOFENCE_UDF(GF.GEOFENCECOORDINATES, VP.LATITUDE, VP.LONGITUDE)
FROM GEOFENCE GF
LEFT JOIN VEHICLE_POSITION VP
ON GF.FLEETID = VP.FLEETID;
But KSQL will not allow me to do because I am performing join on FLEETID which is a non row key column.Though this would have been possible in SQL but how do I achieve this in KSQL?
Note: According to my application's business logic Fleet Id is used to combine Geofences and Vehicles belonging to a fleet.
Sample data for table -
INSERT INTO GEOFENCE
(GEOFENCEID INT, FLEETID VARCHAR, GEOFENCECOORDINATES VARCHAR)
VALUES (10, 123abc, 52.4497_13.3096);
Sample data for stream -
INSERT INTO VEHICLE_POSITION
(VEHICLEID INT, FLEETID VARCHAR, LATITUDE DOUBLE, LONGITUDE DOUBLE)
VALUES (1289, 125abc, 57.7774, 12.7811):
To solve your problem what you need is a table of FENCEID to GEOFENCECOORDINATES. You could use such a table to join to your VEHICLE_POSITION stream to get the result you need.
So, how do you get a table of FENCEID to GEOFENCECOORDINATES?
The simple answer is that you can't with your current table definition! You declare the table as having only the GEOFENCEID as the primary key. Yet a fleetId can have many fences. To be able to mode this, both GEOFENCEID and FENCEID would need to be part of the primary key of the table.
Consider the example:
INSERT INTO GEOFENCE VALUES (10, 'fleet-1', 'coords-1');
INSERT INTO GEOFENCE VALUES (10, 'fleet-2', 'coords-2');
Are running these two inserts the table would contain only a single row, with key 10 and value 'fleet-2', 'coords-2'.
Even if we could somehow capture the above information in a table, consider what happens if there is a tombstone in the topic, because the first row had been deleted from the source Mongo table. A tombstone is the key, (10), and a null value. ksqlDB would then remove the row from its table with key 10, leaving an empty table.
This is the crux of your problem!
First, you'll need to configure the source connector to get both the fence id and fleet id into the key of the messages.
Next, you'll need to access this in ksqlDB. Unfortunately, ksqlDB, as of version 0.10.0 / CP 6.0.0 doesn't support multiple key columns, though this is being worked on soon.
In the meantime, if you key is a JSON document containing the two key fields, e.g.
{
"GEOFENCEID": 10,
"FLEETID": "fleet-1"
}
Then you can import it into ksqlDB as a STRING:
-- 5.4.1 syntax:
-- ROWKEY will contain the JSON document, containing GEOFENCEID and FLEETID
CREATE TABLE GEOFENCE (
GEOFENCECOORDINATES VARCHAR
)
WITH (
KAFKA_TOPIC='MONGODB-GEOFENCE',
VALUE_FORMAT='JSON'
);
-- 6.0.0 syntax:
CREATE TABLE GEOFENCE (
JSONKEY STRING PRIMARY KEY,
GEOFENCECOORDINATES VARCHAR
)
WITH (
KAFKA_TOPIC='MONGODB-GEOFENCE',
VALUE_FORMAT='JSON'
);
With the table now correctly defined you can use EXTRACTJSONFIELD to access the data in the JSON key and collect all the fence coordinates using COLLECT_SET. I'm not 100% sure this would on 5.4.1, (see how you get on), but will on 6.0.0.
-- 6.0.0 syntax
CREATE TABLE FLEET_COORDS AS
SELECT
EXTRACTJSONFIELD(JSONKEY, '$.FLEETID') AS FLEETID,
COLLECT_SET(GEOFENCECOORDINATES)
FROM GEOFENCE
GROUP BY EXTRACTJSONFIELD(JSONKEY, '$.FLEETID');
This will give you a table of fleetId to a set of fence coordinates. You can use this to join to your vehicle position stream. Of course, your GEOFENCE_UDF udf will need to accept an ARRAY<STRING> for the fence coordinates, as there may be many.
Good luck!

Add uniqueness constraint to Postgres text array contents [duplicate]

I'm trying to come up with a PostgreSQL schema for host data that's currently in an LDAP store. Part of that data is the list of hostnames a machine can have, and that attribute is generally the key that most people use to find the host records.
One thing I'd like to get out of moving this data to an RDBMS is the ability to set a uniqueness constraint on the hostname column so that duplicate hostnames can't be assigned. This would be easy if hosts could only have one name, but since they can have more than one it's more complicated.
I realize that the fully-normalized way to do this would be to have a hostnames table with a foreign key pointing back to the hosts table, but I'd like to avoid having everybody need to do joins for even the simplest query:
select hostnames.name,hosts.*
from hostnames,hosts
where hostnames.name = 'foobar'
and hostnames.host_id = hosts.id;
I figured using PostgreSQL arrays could work for this, and they certainly make the simple queries simple:
select * from hosts where names #> '{foobar}';
When I set a uniqueness constraint on the hostnames attribute, though, it of course treats the entire list of names as the unique value instead of each name. Is there a way to make each name unique across every row instead?
If not, does anyone know of another data-modeling approach that would make more sense?
The righteous path
You might want to reconsider normalizing your schema. It is not necessary for everyone to "join for even the simplest query". Create a VIEW for that.
Table could look like this:
CREATE TABLE hostname (
hostname_id serial PRIMARY KEY
, host_id int REFERENCES host(host_id) ON UPDATE CASCADE ON DELETE CASCADE
, hostname text UNIQUE
);
The surrogate primary key hostname_id is optional. I prefer to have one. In your case hostname could be the primary key. But many operations are faster with a simple, small integer key. Create a foreign key constraint to link to the table host.
Create a view like this:
CREATE VIEW v_host AS
SELECT h.*
, array_agg(hn.hostname) AS hostnames
-- , string_agg(hn.hostname, ', ') AS hostnames -- text instead of array
FROM host h
JOIN hostname hn USING (host_id)
GROUP BY h.host_id; -- works in v9.1+
Starting with pg 9.1, the primary key in the GROUP BY covers all columns of that table in the SELECT list. The release notes for version 9.1:
Allow non-GROUP BY columns in the query target list when the primary
key is specified in the GROUP BY clause
Queries can use the view like a table. Searching for a hostname will be much faster this way:
SELECT *
FROM host h
JOIN hostname hn USING (host_id)
WHERE hn.hostname = 'foobar';
Provided you have an index on host(host_id), which should be the case as it should be the primary key. Plus, the UNIQUE constraint on hostname(hostname) implements the other needed index automatically.
In Postgres 9.2+ a multicolumn index would be even better if you can get an index-only scan out of it:
CREATE INDEX hn_multi_idx ON hostname (hostname, host_id);
Starting with Postgres 9.3, you could use a MATERIALIZED VIEW, circumstances permitting. Especially if you read much more often than you write to the table.
The dark side (what you actually asked)
If I can't convince you of the righteous path, here is some assistance for the dark side:
Here is a demo how to enforce uniqueness of hostnames. I use a table hostname to collect hostnames and a trigger on the table host to keep it up to date. Unique violations raise an exception and abort the operation.
CREATE TABLE host(hostnames text[]);
CREATE TABLE hostname(hostname text PRIMARY KEY); -- pk enforces uniqueness
Trigger function:
CREATE OR REPLACE FUNCTION trg_host_insupdelbef()
RETURNS trigger
LANGUAGE plpgsql AS
$func$
BEGIN
-- split UPDATE into DELETE & INSERT
IF TG_OP = 'UPDATE' THEN
IF OLD.hostnames IS DISTINCT FROM NEW.hostnames THEN -- keep going
ELSE
RETURN NEW; -- exit, nothing to do
END IF;
END IF;
IF TG_OP IN ('DELETE', 'UPDATE') THEN
DELETE FROM hostname h
USING unnest(OLD.hostnames) d(x)
WHERE h.hostname = d.x;
IF TG_OP = 'DELETE' THEN RETURN OLD; -- exit, we are done
END IF;
END IF;
-- control only reaches here for INSERT or UPDATE (with actual changes)
INSERT INTO hostname(hostname)
SELECT h
FROM unnest(NEW.hostnames) h;
RETURN NEW;
END
$func$;
Trigger:
CREATE TRIGGER host_insupdelbef
BEFORE INSERT OR DELETE OR UPDATE OF hostnames ON host
FOR EACH ROW EXECUTE FUNCTION trg_host_insupdelbef();
SQL Fiddle with test run.
Use a GIN index on the array column host.hostnames and array operators to work with it:
Why isn't my PostgreSQL array index getting used (Rails 4)?
Check if any of a given array of values are present in a Postgres array
In case anyone still needs what was in the original question:
CREATE TABLE testtable(
id serial PRIMARY KEY,
refs integer[],
EXCLUDE USING gist( refs WITH && )
);
INSERT INTO testtable( refs ) VALUES( ARRAY[100,200] );
INSERT INTO testtable( refs ) VALUES( ARRAY[200,300] );
and this would give you:
ERROR: conflicting key value violates exclusion constraint "testtable_refs_excl"
DETAIL: Key (refs)=({200,300}) conflicts with existing key (refs)=({100,200}).
Checked in Postgres 9.5 on Windows.
Note that this would create an index using the operator &&. So when you are working with testtable, it would be times faster to check ARRAY[x] && refs than x = ANY( refs ).
P.S. Generally I agree with the above answer. In 99% cases you'd prefer a normalized schema. Please try to avoid "hacky" stuff in production.

Backendless - Composition of table id

I have created a table in Backendless and I would like to know if I can compose the table id with 2 fields like in a SQL table. Is it possible?
Backendless automatically assigns a unique identifier to every inserted object. There is a system column named "objectId". Additionally, you can assign constraints to your table which can be:
Indexed
Not null
Unique
.

DBExpress: How to find a primary key field?

I have a TSimpleDataSet based on dinamically created SQL query. I need to know which field is a primary key?
SimpleDataSet1.DataSet.SetSchemaInfo(stIndexes, 'myTable' ,'');
This code tells me that i have a primary key with name 'someName', but how can i know which field (column) works with this index?
A Primary Key/Index can belong to several columns (not just one).
The schema stIndexes dataset will return the PK name INDEX_NAME and the columns that construct that PK/Index (COLUMN_NAME). INDEX_TYPE will tell you which index types you have (eSQLNonUnique/eSQLUnique/eSQLPrimaryKey).
I have never worked with TSimpleDataSet but check if the indexes information is stored in
IndexDefs[TIndexDef].Name/Fields/Options - if ixPrimary in Options then this is your PK. and Fields belongs to that index.
Take a look at the source at SqlExpr.pas: TCustomSQLDataSet.AddIndexDefs.
Note how TCustomSQLDataSet returns the TableName (and then the indexs information) from the command text:
...
if FCommandType = ctTable then
TableName := FCommandText
else
TableName := GetTableNameFromSQL(CommandText);
DataSet := FSQLConnection.OpenSchemaTable(stIndexes, TableName, '', '', '');
...
I think the simple data set does not provide that information.
However, i am sure there are components for that. Check, for Oracle database, Devart's ODAC.
Basically, it involves only one query to the database.
However, it is not something that components will offer by default as, because it involves a different query, it leads to slow response times.
For Oracle database, query on user_indexes.

Rails won't use sequence for primary key?

For one reason or another the pre-existing Postgres schema I'm using with my Rails app doesn't have a default sequence set for a table's primary key, so I am required to query for it every time I want to create a new row.
I have set_sequence_name "seq_people_id" in my model, but whenever I call Person.new Postgres complains to me because Rails is executing the insert query without the ID (which is marked as NOT NULL in the schema).
How do I tell Rails to always use the sequence when creating new records?
Postgres 8.1.4
ActiveRecord 3.0.3
Rails 2.3.10
Here's what I get when I run psql and \d foo:
Table "public.foo"
Column | Type | Modifiers
--------+---------------+------------------------------------------------------
id | integer | not null default nextval('foo_id_seq'::regclass)
(etc.)
I'd check the following:
Verify the actual sequence name is the same as what you reference (people_id_seq vs. seq_people_id)
Verify the table's default is similar to what I have above
(just checking) is the primary key's field named "id" ?
Did you create the table using a migration or by hand? If the latter, try creating a table with a migration, specifying the same fields as in your people table. Does it work properly? Compare the tables.

Resources