Getting "Subquery failed with error: Field _messagetime not found, please check the spelling and try again." When attempting to use timeslice - sumologic

This is my query.
_sourceCategory=contactlist-prod
[subquery:_sourceCategory=contactlist-prod "recycle"
| count by campaign | compose campaign keywords] | parse "Handling export of*contacts" as message | replace(message, /([^0-9])/, "") as contacts | count_distinct (contacts) by contacts | avg(contacts) as avgcontacts | timeslice 1m | count by _timeslice
For some reason, i'm getting this error when I try to use timeslice on my sumo query.
Subquery failed with error: Field _messagetime not found, please check the spelling and try again.
I assume this is because my subquery also needs a timeslice but I can't see a way to include a timeslice in my subquery also. Is there any way to get around this?

Related

Swift & Firebase - Cloud firestore scalable?

I'm really new on Cloud Firestore, so it's a bit strange for me to structure the database.
I would like to save my workouts. If I were on RealtimeDatabase I would do something like that:
WorkoutResults
|
+--AutoID
| |
| +--date
| +--userID
| +--result
AND
UserWorkoutResult
|
+--UserID
| |
| +--WodResultGeneratedID
|
In that way, I can only fetch one node to a specific user.
But if I understand well on Cloud Firestore, it's not possible to query on subcollections.
So my question is, do you think this structure is good enough to scale?
WorkoutResults
|
+--AutoID
| |
| +--date
| +--userID
| +--result
By doing something like:
.whereField("userID", isEqualTo: "userIDString").whereField("date", isEqualTo: theDateIWant) ?
Your query looks fine to me. And as Firestore promises, its performance is only related to the number of matching WorkoutResults, not to the size of that collection.
But you could get the exact same result by querying collection("Users").doc("userIDString").collection("WorkoutResults").whereField("date", isEqualTo: theDateIWant) in your first data structure. The only thing that isn't possible there is to query across the WorkoutResults for multiple users, since querying across multiple collections isn't possible.

Restrict Sumo Logic search to one timeslice bucket

I have logs being pushed to sumo logic once every day, but other co-workers have the ability to force a push to update statistics. This causes an issue where some sumo logic searches will find and return double (or more) than what is expected due to finding more than one message within the allocated time range.
I am wondering if there is some way I can use timeslice so that I am only looking at the last set of results within a 24h period?
My search that works when there is only one log in 24h:
| json field=_raw "Policy"
| count by policy
| sort by _count
What I am trying to achieve:
| json field=_raw "Policy"
| timeslice 1m
| where last(_timeslice)
| count by policy
| sort by _count
Found a solution, not sure if optimal.
| json field=_raw "Policy"
| timeslice 1m
| count by policy , _timeslice
| filter _timeslice in (sort by _timeslice desc | limit 1)
| sort by _count
| fields policy, _count
If I'm understanding your question right, I think you could try something with the accum operator:
*
| json field=_raw "Policy"
| timeslice 1m
| count by _timeslice, policy
| 1 as rank
| accum rank by _timeslice
| where _accum = 1
This would be similar to doing a window partition in SQL to get rid of duplicates.

Getting "undefined method" when attempting to order the result of a Rails query

I'm using Ruby on Rails 5. I'm having trouble ordering results returned from this query:
#crypto_currencies = CryptoIndexCurrency
.all
.pluck("crypto_currency_id")
.order(:name)
The name field is a valid member of the table from which I want to order.
cindex=# \d crypto_currencies;
Table "public.crypto_currencies"
Column | Type | Modifiers
--------------------------+-----------------------------+----------------------------------------------------------------
id | integer | not null default nextval('crypto_currencies_id_seq'::regclass)
name | character varying |
symbol | character varying |
latest_market_cap_in_usd | bigint |
latest_total_supply | bigint |
created_at | timestamp without time zone | not null
updated_at | timestamp without time zone | not null
Unfortunately whe I run the above I get the error:
undefined method `order' for #<Array:0x007fa3b2b27bd8>
How do I correct this?
That's right, pluck returns an array. Here is a note from documentation:
Pluck returns an Array of attribute values type-casted to match the plucked column names
where and order and other methods like that return Relation so that you can chain them. pluck should be in the end.
Say you're trying to cook eggs for yourself -- you're trying to scramble an egg carton instead of the egg inside the carton.
name is a field of CryptoCurrency, but .pluck is returning an array of CryptoCurrency objects -- the array has no name field.
You have to get the specific CryptoCurrency array element by either iterating through the array (for example, using .each... do) or accessing an element by index (#crypto_currencies[0]).
For more tutorial information on how to access the elements of an array, either by iteration or by element, please consult the official documentation.

Removing duplicates in InfluxDB

I would like to perform a query to remove duplicates. What I define as a duplicate here is a measurement where we have more than 1 data point. They will have different tags, so they are not overwritten by default but I would like to remove the oldest inserted, regardless of the tags.
So for example, measurement of logins (it doesn't really make sense but it's to avoid using abstract entities):
> Email | Name | TS | Login Time
>
> a#a.com | Alice | xxxxx1000 | 2017-05-19
> a#a.com | Alice | xxxxx1000 | 2017-05-18
> a#a.com | Alice | xxxxx1000 | 2017-05-17
> b#b.com | Bob | xxxxx1000 | 2017-05-18
> c#c.com | Charlie | xxxxx1200 | 2017-05-19
I would like to remove the second and third line, because the data point has the same timestamp as the first, it is the same measurement but they have different login times and I would like to take only the last.
I know well that I could solve this with a query, but the requirement is more complex than this (visualization in Grafana of weird KPI data) and I need to remove actual duplicates (generated and loaded twice).
Thank you.
You can fetch all login user names using group by and then order by time , so that the latest login time will come up first ,then you can delete the remaining ones.
Also, you might need to copy your latest items to some another measurement , since you can't remove row in influxdb .
For this you might use limit 1 offset 0 so that only the latest login time will come from the query output.
Let me know, if I understand it correctly.

Cassandra cql kind of multiget

i want to make a query for two column families at once... I'm using the cassandra-cql gem for rails and my column families are:
users
following
followers
user_count
message_count
messages
Now i want to get all messages from the people a user is following. Is there a kind of multiget with cassandra-cql or is there any other possibility by changing the datamodel to get this kind of data?
I would call your current data model a traditional entity/relational design. This would make sense to use with an SQL database. When you have a relational database you rely on joins to build your views that span multiple entities.
Cassandra does not have any ability to perform joins. So instead of modeling your data based on your entities and relations, you should model it based on how you intend to query it. For your example of 'all messages from the people a user is following' you might have a column family where the rowkey is the userid and the columns are all the messages from the people that user follows (where the column name is a timestamp+userid and the value is the message):
RowKey Columns
-------------------------------------------------------------------
| | TimeStamp0:UserA | TimeStamp1:UserB | TimeStamp2:UserA |
| UserID |------------------|------------------|------------------|
| | Message | Message | Message |
-------------------------------------------------------------------
You would probably also want a column family with all the messages a specific user has written (I'm assuming that the message is broadcast to all users instead of being addressed to one particular user):
RowKey Columns
--------------------------------------------------------
| | TimeStamp0 | TimeStamp1 | TimeStamp2 |
| UserID |------------|------------|-------------------|
| | Message | Message | Message |
--------------------------------------------------------
Now when you create a new message you will need to insert it multiple places. But when you need to list all messages from people a user is following you only need to fetch from one row (which is fast).
Obviously if you support updating or deleting messages you will need to do that everywhere that there is a copy of the message. You will also need to consider what should happen when a user follows or unfollows someone. There are multiple solutions to this problem and your solution will depend on how you want your application to behave.

Resources