Missing data using GROUP BY with InfluxDB subqueries

Missing data using GROUP BY with InfluxDB subqueries - influxdb

This is some seed data which contains 30 events. For each of two possible values in user_id, there are 5 values for event, and each of those is duplicated among 3 possible times.
> SELECT time,user_id,event,score_value FROM user_scores
name: user_scores
time user_id event score_value
---- ------- ----- -----------
1517616000000000000 456 card_comment_created 10
1517616000000000000 123 card_comment_created 5
1517616000000000000 123 card_created 5
1517616000000000000 456 card_created 10
1517616000000000000 456 card_liked 10
1517616000000000000 123 card_liked 5
1517616000000000000 123 card_marked_as_complete 5
1517616000000000000 456 card_marked_as_complete 10
1517616000000000000 123 card_viewed 5
1517616000000000000 456 card_viewed 10
1517702400000000000 456 card_comment_created 10
1517702400000000000 123 card_comment_created 5
1517702400000000000 123 card_created 5
1517702400000000000 456 card_created 10
1517702400000000000 456 card_liked 10
1517702400000000000 123 card_liked 5
1517702400000000000 456 card_marked_as_complete 10
1517702400000000000 123 card_marked_as_complete 5
1517702400000000000 123 card_viewed 5
1517702400000000000 456 card_viewed 10
1517788800000000000 456 card_comment_created 10
1517788800000000000 123 card_comment_created 5
1517788800000000000 123 card_created 5
1517788800000000000 456 card_created 10
1517788800000000000 456 card_liked 10
1517788800000000000 123 card_liked 5
1517788800000000000 456 card_marked_as_complete 10
1517788800000000000 123 card_marked_as_complete 5
1517788800000000000 123 card_viewed 5
1517788800000000000 456 card_viewed 10
>
I am downsampling the data into a daily aggregation using the following query:
SELECT \
user_id,total_user_score,smartbites_commented_count,\
smartbites_completed_count,smartbites_consumed_count,\
smartbites_liked_count \
INTO user_scores_daily \
FROM ( \
SELECT SUM(score_value) AS total_user_score \
FROM user_scores \
GROUP BY time(1d),user_id \
),( \
SELECT COUNT(score_value) AS smartbites_commented_count \
FROM user_scores \
WHERE event='card_comment_created' \
GROUP BY time(1d),user_id \
),( \
SELECT COUNT(score_value) AS smartbites_completed_count \
FROM user_scores \
WHERE event='card_marked_as_complete' \
GROUP BY time(1d),user_id \
),( \
SELECT COUNT(score_value) AS smartbites_consumed_count \
INTO smartbites_consumed_counts_daily \
FROM user_scores \
WHERE event='card_viewed' \
GROUP BY time(1d),user_id \
),( \
SELECT COUNT(score_value) AS smartbites_liked_count \
FROM user_scores \
WHERE event='card_liked' \
GROUP BY time(1d),user_id \
)
Notice how each of the subqueries is grouping by time(1d) and user_id. I need a row in the results for each user/day combination.
This is my results:
> SELECT * FROM user_scores_daily
name: user_scores_daily
time smartbites_commented_count smartbites_completed_count smartbites_consumed_count smartbites_liked_count total_user_score user_id
---- -------------------------- -------------------------- ------------------------- ---------------------- ---------------- -------
1517616000000000000 1 1 1 1 50 456
1517702400000000000 1 1 1 1 50 456
1517788800000000000 1 1 1 1 50 456
The data for one of the users looks perfect. But what about the second user? There should be six total rows, but there are only 3. It's missing three rows where user_id=123.
edit in response to comment:
> SHOW TAG KEYS FROM "user_scores"
name: user_scores
tagKey
------
actor_id
analytics_version
event
owner_id
role
user_id
> SHOW FIELD KEYS FROM "user_scores"
name: user_scores
fieldKey fieldType
-------- ---------
score_value integer
>

What I ended up doing is adding GROUP BY user_id,time(1d) to my top level query (after the subqueries), and changing the fields selected by my outermost SELECT into aggregations.
These aggregations are redundant, but if I am going to use GROUP BY on the top level query I need to use them.
Code looks like so:
SELECT \
MEAN(user_id) as user_id,\
MEAN(total_user_score) as total_user_score,\
MEAN(smartbites_commented_count) as smartbites_commented_count,\
MEAN(smartbites_consumed_count) as smartbites_consumed_count,\
MEAN(smartbites_liked_count) as smartbites_liked_count,\
INTO user_scores_daily \
FROM ( \
SELECT SUM(score_value) AS total_user_score \
FROM user_scores \
GROUP BY time(1d),user_id \
),( \
SELECT COUNT(score_value) AS smartbites_commented_count \
FROM user_scores \
WHERE event='card_comment_created' \
GROUP BY time(1d),user_id \
),( \
SELECT COUNT(score_value) AS smartbites_completed_count \
FROM user_scores \
WHERE event='card_marked_as_complete' \
GROUP BY time(1d),user_id \
),( \
SELECT COUNT(score_value) AS smartbites_consumed_count \
INTO smartbites_consumed_counts_daily \
FROM user_scores \
WHERE event='card_viewed' \
GROUP BY time(1d),user_id \
),( \
SELECT COUNT(score_value) AS smartbites_liked_count \
FROM user_scores \
WHERE event='card_liked' \
GROUP BY time(1d),user_id \
) \
GROUP BY time(1d),user_id
That's one MEAN query if I may say so myself (I'll show myself out)

Related

different behavior of parallel when input is from STDIN

I am using the GNU parallel tool. I have an input file in.txt the looks like this:
export MY_ENV=$1 && echo hi: $MY_ENV
export MY_ENV=$1 && echo hi: $MY_ENV
export MY_ENV=$1 && echo hi: $MY_ENV
export MY_ENV=$1 && echo hi: $MY_ENV
export MY_ENV=$1 && echo hi: $MY_ENV
export MY_ENV=$1 && echo hi: $MY_ENV
I use this command (case 1) to invoke parallel:
parallel -j 4 -a in.txt --link ::: 11 22 33 44
which (as expected) results in this output:
hi: 11
hi: 22
hi: 33
hi: 44
hi: 11
hi: 22
However when i try to send the input via STDIN using the command below (case 2) I get different behavior. In other words this command:
cat in.txt | parallel -j 4 --link ::: 11 22 33 44
results in this error message:
/bin/bash: 11: command not found
/bin/bash: 22: command not found
/bin/bash: 33: command not found
/bin/bash: 44: command not found
Shouldn't the behavior be identical? How can I invoke the parallel program so that when the input is via STDIN I get the same output as in case 1 above?

cat in.txt | parallel -j 4 -a - --link ::: 11 22 33 44
or
cat in.txt | parallel -j 4 --link :::: - ::: 11 22 33 44
or
cat in.txt | parallel -j 4 :::: - :::+ 11 22 33 44
See details on https://doi.org/10.5281/zenodo.1146014 (section 4.2).

Why is WLP installUtility not able obtain assets from feature repo?

I am running docker build with following Dockerfile, with main idea to use feature repo as described https://github.com/WASdev/ci.docker#installing-liberty-features-from-local-repository-19008:
FROM websphere-liberty-kernel-ubi-min:19.0.0.9
COPY usr/ /opt/ibm/wlp/usr/
USER root
ARG FEATURE_REPO_URL=http://xyz.openshift.local/19.0.0.9/repo.zip
ARG VERBOSE=true
RUN configure.sh
RUN chown -R 1001:0 /tmp \
&& chmod -R g+rw /tmp \
&& chown -R 1001:0 /opt/ibm/wlp/output \
&& chmod -R g+rw /opt/ibm/wlp/output \
&& chown -R 1001:0 /opt/ibm/wlp/usr/servers/defaultServer \
&& chmod -R g+rw /opt/ibm/wlp/usr/servers/defaultServer \
&& chown -R 1001:0 /opt/ibm/wlp/usr/shared/resources \
&& chmod -R g+rw /opt/ibm/wlp/usr/shared/resources
USER 1001
Docker build output shows that repo.zip is downloaded and missing features are detected:
+ '[' http://xyz.openshift.local/19.0.0.9/repo.zip ']'
+ curl -k --fail http://xyz.openshift.local/19.0.0.9/repo.zip % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0 14 311M 14 44.8M 0 0 55.0M 0 0:00:05 --:--:-- 0:00:05 54.9M 31 311M 31 99.4M 0 0 54.8M 0 0:00:05 0:00:01 0:00:04 54.7M 49 311M 49 152M 0 0 54.2M 0 0:00:05 0:00:02 0:00:03 54.2M 66 311M 66 207M 0 0 54.4M 0 0:00:05 0:00:03 0:00:02 54.4M 83 311M 83 260M 0 0 54.0M 0 0:00:05 0:00:04 0:00:01 54.0M 99 311M 99 310M 0 0 53.4M 0 0:00:05 0:00:05 --:--:-- 53.1M100 311M 100 311M 0 0 53.4M 0 0:00:05 0:00:05 --:--:-- 52.7M
+ installUtility install --acceptLicense defaultServer --from=/tmp/repo.zip
Checking for missing features required by the server ...
The server requires the following additional features: mpconfig-1.3 transportsecurity-1.0 cdi-2.0 mpopenapi-1.0 jaxws-2.2 jsonp-1.1 jpa-2.2 mprestclient-1.3 mphealth-2.1 wssecurity-1.1 jaxrs-2.1. Installing features from the repository ...
Successfully connected to the configured repository.
but then installation of all features fails:
Preparing assets for installation. This process might take several minutes to complete.
CWWKF1259E: Unable to obtain the following assets: mpconfig-1.3 transportsecurity-1.0 cdi-2.0 mpopenapi-1.0 jaxws-2.2 jsonp-1.1 jpa-2.2 mprestclient-1.3 mphealth-2.1 wssecurity-1.1 jaxrs-2.1. Ensure that the specified assets are valid. To find the IDs of applicable assets, run the installUtility find command.
I have taken a look in downloaded repo.zip and I can find following file (that should match one of the missing features):
So, what is the reason for the error?

Using different image (ibmcom/websphere-liberty:some21.0.0.3version) and pointing FEATURE_REPO_URL to place where 21.0.0.3 repo is hosted works.
So whoever prepared 19.0.0.3 image for me and said which feature repo to use, pointed me to some incompatible artefacts.
What is interesting, when I combined different versions of image vs repo, configure.sh was nicely verbose (it explained there is incompatibility), but this error CWWKF1259E: Unable to obtain the following assets is really unhelpful.

Not getting results from docker api list tasks with filters

I am trying to find list of tasks using docker api. I am running the command as below:
curl -v --connect-timeout 5 --max-time 10 --retry 5 'https://${DOCKER_URL}/tasks?filters={%22service%22:{%22test1%22}}' | jq ..
When i run this I get curl:
(3) [globbing] nested brace in column 51
I have tried other way also as below:
curl -s -q --connect-timeout 5 --max-time 10 --retry 5 --data-urlencode 'filters={"service":["test1"]}' https://${DOCKER_URL}/tasks
{"message":"page not found"}

You need to add "-g" as a parameter to curl (see also Passing a URL with brackets to curl), and use without encoding, like:
curl -g -v --connect-timeout 5 --max-time 10 --retry 5 \
'http://127.0.0.1:2375/tasks?filters={"service":["test1"]}' | jq .

join and paste command

I have 2 files.
bash-3.2$ cat result2.txt
HOSTNAME=host4 2
HOSTNAME=host1 2
HOSTNAME=host6 1
HOSTNAME=host3 1
HOSTNAME=host2 1
bash-3.2$ cat result1.txt
HOSTNAME=host1 2
HOSTNAME=host2 1
bash-3.2$ cat result.txt
HOSTNAME=host1 2
HOSTNAME=host2 1
HOSTNAME=host3 1
bash-3.2$ cat result3.txt
HOSTNAME=host4 3
HOSTNAME=host1 4
HOSTNAME=host3 7
HOSTNAME=host2 8
HOSTNAME=host6 6
bash-3.2$ join -1 1 -2 1 -a 1 -a 1 result2.txt result1.txt
HOSTNAME=host4 2
HOSTNAME=host1 2
HOSTNAME=host6 1
HOSTNAME=host3 1
HOSTNAME=host2 1
I would like to join 2 files when the order and value of 1st column of both the files are not same.
I want the output to be
hostname result result1 result2 result3
HOSTNAME=host1 2 2 2 4
HOSTNAME=host2 1 1 1 8
HOSTNAME=host3 1 0 1 7
HOSTNAME=host4 0 0 2 3
HOSTNAME=host6 0 0 1 6
Even paste command is not working as it assumes the 1st column of both the files are same. Or is there any other command in bash that i can use for this output

Update: You changed the question significantly after I've answered it already. Now you say that you have 4 files instead of just 2.
However, the basic logic keeps the same, we just need to join again with the results of the previous join operation:
join -o auto -j1 -a1 -a2 -e0 \
<(join -o auto -j1 -a 1 -a 2 -e 0 \
<(join -o auto -j 1 -a 1 -a 2 -e 0 \
<(sort r1.txt) <(sort r0.txt)) <(sort r2.txt)) <(sort r3.txt)
Output:
HOSTNAME=host1 2 2 2 4
HOSTNAME=host2 1 1 1 8
HOSTNAME=host3 0 1 1 7
HOSTNAME=host4 0 0 2 3
HOSTNAME=host6 0 0 1 6
You are looking for the following command:
join -o '1.1 1.2 2.2' -j 1 -a 1 -a 2 -e 0 <(sort r2.txt) <(sort r1.txt)
Output:
HOSTNAME=host1 2 2
HOSTNAME=host2 1 1
HOSTNAME=host3 1 0
HOSTNAME=host4 2 0
HOSTNAME=host6 1 0
Explanation:
-j 1 is the same as -1 1 -2 1 (which you had). It means "join by field 1 in both files"
-a 1 -a 2 prints un-joinable lines from file1 and file2
-e 0 uses 0 as the default value for empty columns
<(sort file) is so called process substitution
-o '1.1 1.2 2.2' tells join that you want to output field 1 from file1 and field2 from file1 and file2. If one of the files is missing field2, a 0 will be used because of -e 0.

This is a solution on the first requirement, with just two files. For the solution on multiple files, check hek2mgl's answer!
What about using awk for this? It is just a matter of storing the data from the second file (result1.txt) in an array and then printing accordingly when reading the first one (result2.txt):
$ awk 'FNR==NR {data[$1]=$2; next} {print $0, ($1 in data) ? data[$1] : 0}' f2 f1
HOSTNAME=host4 2 0
HOSTNAME=host1 2 2
HOSTNAME=host6 1 0
HOSTNAME=host3 1 0
HOSTNAME=host2 1 1
If you need this to be sorted, pipe to sort: awk '...' f2 f1 | sort or say awk '...' f2 <(sort f1).
How does this work?
awk 'things' f2 f1
reads the file f2 and then the file f1.
FNR==NR {data[$1]=$2; next}
Since FNR stands for File Number of Record and NR for Number of Record, when reading the first file, these values match. This way, saying FNR==NR allows you to do something just when reading the first file. Here, it consists in storing the data in an array data[first field] = second field. Then, next triggers to skip the current line without doing anything else. You can read more about this technique in Idiomatic awk.
{print $0, ($1 in data) ? data[$1] : 0}
Now we are reading the second file. Here, we check if the first field is present in the array. If so, we print its corresponding value from the first file; otherwise, we just print a 0.

list grep results in one line?

A quick question: ls . | grep -E "^[0-9]" gives me the results in the following format:
1
2
3
4
5
How can I let it be simply displayed as 1 2 3 4 5?

Try
ls . | grep -E "^[0-9]" | tr '\n' ' ' ; echo

try this with tr:
your cmd ....|tr "\n" ' '

try ls . | grep -E "^[0-9" | tr '\n' ' '

Using awk
ls . | awk '/^[0-9]/ {printf "%s ",$0}'
Or more clean:
ls . | awk '/^[0-9]/ {printf "%s ",$0} END {print ""}'

If it is available, you can use the column command from bsdmainutils:
ls | grep '^[0-9]' | column
Output:
1 2 3 4 5
Another test:
seq 50 | column
Example output:
1 6 11 16 21 26 31 36 41 46
2 7 12 17 22 27 32 37 42 47
3 8 13 18 23 28 33 38 43 48
4 9 14 19 24 29 34 39 44 49
5 10 15 20 25 30 35 40 45 50

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

Missing data using GROUP BY with InfluxDB subqueries - influxdb

Related

different behavior of parallel when input is from STDIN

Why is WLP installUtility not able obtain assets from feature repo?

Not getting results from docker api list tasks with filters

join and paste command

list grep results in one line?

Categories

Resources