I have data coming in this format
{"ROWTIME":1557825832927,"ROWKEY":"null","respondent_id":"noon","machine_data":{"resolution":"1920x1080","region":860}}
when I create a stream called COMPLEX like this:
CREATE STREAM complex WITH (KAFKA_TOPIC='test-topic-complex-2', VALUE_FORMAT='AVRO');
And then run:
SELECT MACHINE_DATA from COMPLEX;
it works fine.
Running this:
SELECT MACHINE_DATA->RESOLUTION from COMPLEX;
doesn't work, saying RESOLUTION is not a field in machine data. But it clearly is
I dropped the COMPLEX stream then recreated it and explicitly specified that resolution is a field by creating the stream using this syntax
CREATE STREAM COMPLEX (respondent_id VARCHAR, machine_data struct<resolution VARCHAR, region INT>) WITH (KAFKA_TOPIC='test-topic-complex-2', VALUE_FORMAT='AVRO');
after this I can run this query select MACHINE_DATA->RESOLUTION from COMPLEX; but I get null as output for resolution
Everything works fine when using JSON as the value format. What gives? Could anyone point out what I am doing wrong?
Related
I have setup pipelinedb and it works great! I would like to know if its possible to stream data out of a continuous view after the value in the view has been updated? That is, have some external process act on changes to a view.
I wish to stream metrics generated from the views into a dashboard, and I do not want to use polling the db to achieve this.
As of 0.9.5, continuous triggers have been removed in favour of using output streams and continuous transforms. (First suggested by DidacticTactic). The output of a continuous view is essentially a stream, which means you can create continuous views or transforms based on it.
Simple Example:
First create a stream and continuous view.
CREATE STREAM s (
x int
);
CREATE CONTINUOUS VIEW hourly_cv AS
SELECT
hour(arrival_timestamp) AS ts,
SUM(x) AS sum
FROM s GROUP BY ts;
Every continuous view now has a output stream. You can create a transform based on the output of the view using output_of. In the transform you have access to the tuples old and new which represent the old values and new values respectively. (0.9.7 has a third called delta) So you can create a transform that uses the output of 'hourly_cv' like so:
CREATE CONTINUOUS TRANSFORM hourly_ct AS
SELECT
(new).sum
FROM output_of('hourly_cv')
THEN EXECUTE PROCEDURE update();
In this example I'm calling update which we still need to define. It needs to be a function that returns a trigger.
CREATE OR REPLACE FUNCTION update()
RETURNS trigger AS
$$
BEGIN
// Do anything you want here.
RETURN NEW;
END;
$$
LANGUAGE plpgsql;
I found the 0.9.5 release notes blog post helpful to understand output streams and why continuous triggers are no more.
Check out the sections in our technical docs on output streams and continuous transforms for help on how to do this, and feel free to ping us in our Gitter channel if you need help beyond what you find in the docs.
I feel like a bit of an idiot trying to figure out what the answer of this could be like using the tools Didactic provided. Maybe I am blind but I have still not found a way. I found the 9.3 version of the DB which included continuous triggers but this has since been removed and I don't wish to switch to an older version of the DB.
This is a bit sad but I suppose it was moved out of the open source version of the project to accommodate the Real Time analytics dashboard project that the same company provides.
Either way. I solved this issue by using a stored procedure. It's probably slightly inefficient compared to what a built-in function would provide but I am hitting the DB a few thousand time a minute and my VM CPU and RAM just yawn at me.
CREATE OR REPLACE FUNCTION all_insert(text,text)
RETURNS void AS
$BODY$
DECLARE
result text;
BEGIN
INSERT INTO all_in (streamid, generalinput) values($1, $2);
SELECT array_to_json(array_agg(json_build_object('streamId', streamid, 'total', count)))::text into result from totals;
PERFORM pg_notify('totals', result);
END
$BODY$
LANGUAGE plpgsql;
So my insert and notify are done by querying this single stored procedure. Then my application simply has to listen for PSQL db notify events and handle them appropriately. In the example above, the application would receive a JSON object with the particular stream id and the total associated with it.
I am stuck on this small section of my Lua program.
currently I have created a table named GPUtable, the keys are GPU Names and the values are Shader core counts. I have used io.write() to create a user input prompt to input a name of a GPU. I would like to use this input (using choice = io.read()) to search the table and print the shader core count.
for example if the user types HD 7950 I would like print(GPUtable[choice]) to print the shader cores and not nil (error).
any help is appreciated
If your table keys are indeed the GPU names such as the HD 7950, then indexing it with sqaure brackets will give you what you are looking for. Without any extra code it is hard to diagnose your issue.
GPUTable = { ['HD 7950'] = 1792, ['GTX 1080'] = 2560 }
print(GPUTable[io.read()])
Try running this in your emulator, it should work flawlessly. Your issue could be between receiving the info and indexing. Make sure that the key is a string and is not being changed when it is indexed. Check that your variable names are correct and that you are not trying to use a local variable. It may help to print the 'choice' variable as you are indexing to check for that.
Using Delphi XE, I have a JvCsvDataset component that is loading a CSV file which has 27 fields.
When the component tries to load the file I get the following error :
Too many fields, or too many long string fields in this record. You must increase the internal record size of the CsvDataSet.
When try it with a CSV file that has only 24 fields, it works fine.
How do I increase the internal record size of the CsvDataSet?
I've tried to reach Warren Postma who wrote the component but did not hear back from him.
Either specify the length of your fields to stay under the default limit or set the value of TextBufferSize to a bigger value before setting active to True.
From the last answer on
http://issuetracker.delphi-jedi.org/view.php?id=4768
It is really important for my application to always emit a "window finished" message, even if the window was empty. I cannot figure out how to do this. My initial idea was to output an int for each record processed and use Sum.integersGlobally and then emit a record based off that, giving me a singleton per window, I could then simply emit one summary record per window, with 0 if the window was empty. Of course, this fails, and you have to use withoutDefaults which will then emit nothing if the window was empty.
Cloud Dataflow is built around the notion of processing data that is likely to be highly sparse. By design, it does not conjure up data to fill in those gaps of sparseness, since this will be cost prohibitive for many cases. For a use case like yours where non-sparsity is practical (creating non-sparse results for a single global key), the workaround is to join your main PCollection with a heartbeat PCollection consisting of empty values. So for the example of Sum.integersGlobally, you would Flatten your main PCollection<Integer> with a secondary PCollection<Integer> that contains exactly one value of zero per window. This assumes you're using an enumerable type of window (e.g. FixedWindows or SlidingWindows; Sessions are by definition non-enumerable).
Currently, the only way to do this would be to write a data generator program that injects the necessary stream of zeroes into Pub/Sub with timestamps appropriate for the type of windows you will be using. If you write to the same Pub/Sub topic as your main input, you won't even need to add a Flatten to your code. The downside is that you have to run this as a separate job somewhere.
In the future (once our Custom Source API is available), we should be able to provide a PSource that accepts an enumerable WindowFn plus a default value and generates an appropriate unbounded PCollection.
Is there limit to either the number of params or to the overall size of a params in a TStoredProc ExecProc call?
Currently running a system that is still using the BDE to connect to Oracle and a recent change to the number of parameters to a package procedure as started producing access violations. The params count is now up to 291 and the AV is being created in the ExecProc call of TStoredProc.
If we remove a single param from the list (any param, does not have to be a specific param), the ExecProc call works fine.
I have debugged through the code and the access violation is being thrown with the TStoredProc.BindParams procedure within DBTables.pas. I have several watches set up, one of which is SizeOf(FRecordBuffer) and as I step through this procedure, the value is 65535. This is MaxWord (Windows.pas). I don't see is any specified limits within the DBTables code.
The callstack is TStoredProd.ExecProc -> TStoredProc.CreateCursor -> TStoredProc.GetCursor -> TStoredProc.BindParams and the access violation is thrown in the for-loop that iterates through the FParams.
Thanks in advance, we need to find something we can pinpoint so we can steer clear.
I'm not at all versed in Oracle SQL, but since you're maintaining the thing, I would see if I could change the call with all that parameters to a single insert into a new dedicated table (with that many columns plus an autonumber primary key), and change the stored procedure to take this key as input and call the values from this new record to do its job. This may just deliver a bit faster than finding out what's the maximum number of parameters and try to find a fix there. (Though it's a bit of a strange number, as in not a power of 2, it may well be 291...)