I've set up a Druid cluster to ingest real-time data from Kafka.
Question
Does Druid support fetching data that's sorted by timestamp? For example, let's say I need to retrieve the latest 10 entries from a Datasource X. Can I do this by using a LimitSpec (in the Query JSON) that includes the timestamp field? Or is there another better option supported Druid?
Thanks in advance.
Get unaggregated rows
To get unaggregated rows, you can do a query with "queryType: "select".
Select queries are also useful when pagination is needed - they let you set a page size, and automatically return a paging identifier for use in future queries.
In this example, if we just want the top 10 rows, we can pass in "pagingSpec": { "pageIdentifiers": {}, "threshold": 10 }.
Order by timestamp
To order these rows by "timestamp", you can pass in "descending": "true".
Looks like most Druid query types support the descending property.
Example Query:
{
"queryType": "select",
"dataSource": "my_data_source",
"granularity": "all",
"intervals": [ "2017-01-01T00:00:00.000Z/2017-12-30T00:00:00.000Z" ],
"descending": "true",
"pagingSpec": { "pageIdentifiers": {}, "threshold": 10 }
}
Docs on "select" type queries
You can use a group by query to do this, So group by __time as an extraction function then set granularity to all and use the limitSpec to sort/limit that will work. Now if you want to use a timeseries query it is more tricky to get the latest 10. One way to do it is to set the granularity to the desired one let say Hour then set the interval to be 10H starting from the most recent point in time. This sounds more easy to say than achieve. I will go the first way unless you have a major performance issue.
{
"queryType": "groupBy",
"dataSource": "wikiticker",
"granularity": "all",
"dimensions": [
{
"type": "extraction",
"dimension": "__time",
"outputName": "extract_time",
"extractionFn": {
"type": "timeFormat"
}
},
],
"limitSpec": {
"type": "default",
"limit": 10,
"columns": [
{
"dimension": "extract_time",
"direction": "descending"
}
]
},
"aggregations": [
{
"type": "count",
"name": "$f2"
},
{
"type": "longMax",
"name": "$f3",
"fieldName": "added"
}
],
"intervals": [
"1900-01-01T00:00:00.000/3000-01-01T00:00:00.000"
]
}
Related
Is it possible to get the labels and priority from a Microsoft Planner task with the Microsoft Graph API?
See screenshot below to have an idea:
Using next endpoint: https://graph.microsoft.com/v1.0/planner/plans/<plan-id>/tasks I get next data:
{
"#odata.etag": "W/\"JzEtVGFzayAgQEBAQEBAQEBAQEBAQEBBWCc=\"",
"planId": "r4g58er4grregrg7848",
"bucketId": "64g8df54hhktohk487",
"title": "Title of a task",
"orderHint": "545457845775LM",
"assigneePriority": "",
"percentComplete": 0,
"startDateTime": null,
"createdDateTime": "2022-01-07T13:58:14.5355148Z",
"dueDateTime": null,
"hasDescription": true,
"previewType": "description",
"completedDateTime": null,
"completedBy": null,
"referenceCount": 0,
"checklistItemCount": 1,
"activeChecklistItemCount": 3,
"conversationThreadId": null,
"id": "grejgopreg645647",
"createdBy": {
"user": {
"displayName": null,
"id": "74463467-d67d-4512-9086-c9e279dde6ae"
}
},
"appliedCategories": {
"category5": true
},
"assignments": {}
}
I've next comments on this JSON:
What is assigneePriority? When a priority is filled in, will always be an empty string.
What is appliedCategories? Can these categories being used for the labels? But what is category5?
While it's not the most straightforward answer, you can figure out what labels are assigned to a task. You'll need both the planid and taskid to get it.
The appliedCategories are actually the labels applied to a particular task. Their identifieres are just category##. To find the corresponding label name, you'll need to make a call to get the plan details.
Graph API URL: https://graph.microsoft.com/beta/planner/plans/{planid}/details
This will return a JSON object containing each of the categories and their descriptions. You can find more info here about the plannerPlanDetails type. Note: the v1.0 graph endpoint only returns the first 6 categories, while the beta version will return 25.
"categoryDescriptions": {
"category1": "Some name",
"category2": "Some other name",
"category3": "Another",
"category4": null,
...
"category25": null
}
Within the task details, appliedCategories object will contain any labels assigned to that task.
For the priority, you will find a priority property on the task object when using the beta version of the endpoint. It's an integer, but from my testing, the following are the corresponding priority titles
9 - Low
5 - Important
3 - Medium
1 - Urgent
You'll have to do some correlation on your own to match them up, but this is how you can get the information you're looking for.
I have Druid timeseries query:
{
"queryType": "timeseries",
"dataSource": {
"type": "union",
"dataSources": [
"ds1",
"ds2"
]
},
"dimensions":["dim1"],
"aggregations": [
{
"name": "y1",
"type": "doubleMax",
"fieldName": "value1"
}
],
"granularity": {
"period": "PT10S",
"type": "period"
},
"postAggregations": [],
"intervals": "2017-06-09T13:05:46.000Z/2017-06-09T13:06:46.000Z"
}
And i want to return the values of the dimensions as well, not just for the aggregations like this:
{
"timestamp": "2017-06-09T13:05:40.000Z",
"result": {
"y1": 28.724306106567383
}
},
{
"timestamp": "2017-06-09T13:05:50.000Z",
"result": {
"y1": 28.724306106567383
}
},
How do I have to change the query? Thanks in advance!
If your requirement is to use dimension column in time series query that means you are using aggregated data with non aggregated column, this requirement leads to the use of topN or groupBy query.
groupBy query is probably one of the most powerful druid currently supports but it has poor performance as well, instead you can use topN query for your purpose.
Link for topN documentation and example can be found here:
http://druid.io/docs/latest/querying/topnquery.html
Is Timeseries() query is not supporting dimension?
I tried it in my project but it is not working.
here is Error:
TypeError: queryRep.dataSource(...).dimension is not a function
2|DSP-api | at dimensionData (/home/ec2-user/reports/dsp_reports/controllers/ReportController.js:228:22)
Let me know if anyone has a solution for this.
TY.
I have the following json data
{
"Display_Selected List":
[
{
"product_name": "Product1",
"items":
[
{
"item_name": "SubItem1",
"specifications":
[
{
"list": [
{
"name": "Sp1"
},
{
"name": "Sp2"
}
],
"specification_name": "Specification Group 1"
},
{
"list": [
{
"name": "Sp3"
},
{
"name": "Sp4"
}
],
"specification_name": "Specification Group 2"
}
]
},
{
"item_name": "Sub Item2",
"specifications":
[
{
"list": [
{
"name": "Sp2"
}
],
"specification_name": "Specification Group 1"
},
{
"list": [
{
"name": "Sp3"
}
],
"specification_name": "Specification Group 2"
}
]
}
]
},
{
"product_name": "Product2",
"items":
[
{
"item_name": "Item1",
"specifications":
[
{
"list": [
{
"name": "Sp3"
},
{
"name": "Sp4"
}
],
"specification_name": "Specification Group 2"
}
]
}
]
}
]
}
As per the design requirement i have to diplay this whole data in single uitable view like follow
I have created rough design as shown in below image
I can achieve this via uitableview inside uitableviewcell but as
per Apple recommendation Apple does not recommend table views to be
added as subviews of other scrollable objects
Now my question is how can i achieve following design by single uitableview and and also as per my json all the content are dynamic
Does anyone have seen something like this around ? Any reference would be helpful.
If you don't wish to use tableView inside tableViewCell, you could possible go by the following approach.
Create 3 different cells first one for showing item name, second one for showing the Specification group name and the third one for showing the specification items (eg: Sp1,Sp2,..)
numberOfRowsInSection will have the correct count to show data using the above created cells. So numberOfRows should return the total count like rowsInSection =
count of items + count of specifications in each items + count of list in each specifications for each item
Change your data source accordingly and make condition check so that you will display the item Name cell first then followed by the cell for Specification group name then display specification items inside each specification then show the next item name and so on.
I hope this approach will help you achieve the result.
It will be easy if you could use tableView inside the tableViewCell, in many Applications I have used this approach and I haven't faced any Apple review problem. If you are using tableView inside tableViewCell it would be better to disable scrolling and bounces property.
This query:
https://www.googleapis.com/youtube/v3/channelSections?part=snippet%2CcontentDetails%2Ctargeting&channelId=UC-9-kyTW8ZkZNDHQJ6FgpwQ&key={YOUR_API_KEY}
returns lots of data like this:
{
"kind": "youtube#channelSection",
"etag": "\"iDqJ1j7zKs4x3o3ZsFlBOwgWAHU/oIyqO89jk-vcfHm5Kuz3sikdUzc\"",
"id": "UC-9-kyTW8ZkZNDHQJ6FgpwQ.lc3PRFGaA4k",
"snippet": {
"type": "singlePlaylist",
"style": "horizontalRow",
"channelId": "UC-9-kyTW8ZkZNDHQJ6FgpwQ",
"position": 0
},
"contentDetails": {
"playlists": [
"PLFgquLnL59alW3xmYiWRaoz0oM3H17Lth"
]
},
"targeting": {
"regions": [
"US"
]
}
},
Is there any way to fetch only items with specific region?
Thanks for any help.
You can't filter these channel sections in your query like that. What you'd have to do is get the list of ChannelSections from that channel, store it in an object, and check if the targeted region for each channel section matches what you want (i.e. the value in targeting.regions[] equals XX, where XX is the region you're looking for). Then, you could store the channel sections that you wanted to find into an array and return that. If you're concerned about time, you'd have to set up a server that can do all of that for you.
I am getting JSON data from a webservice and try to store that in Core Data with Magical Record. I read the great post (and only documentation?) "Importing data made easy" by Saul Mora but I still do not really understand what I need to do to get all data in my entities.
Here is the JSON the web service returns:
{
"ApiVersion": 4,
"AvailableFileSystemLibraries": [
{
"Id": 10,
"Name": "Movie Shares",
"Version": "0.5.4.0"
},
{
"Id": 11,
"Name": "Picture Shares",
"Version": "0.5.4.0"
},
{
"Id": 5,
"Name": "Shares",
"Version": "0.5.4.0"
},
{
"Id": 9,
"Name": "Music Shares",
"Version": "0.5.4.0"
}
],
"AvailableMovieLibraries": [
{
"Id": 3,
"Name": "Moving Pictures",
"Version": "0.5.4.0"
},
{
"Id": 7,
"Name": "MyVideo",
"Version": "0.5.4.0"
}
],
"AvailableMusicLibraries": [
{
"Id": 4,
"Name": "MyMusic",
"Version": "0.5.4.0"
}
],
"AvailablePictureLibraries": [
{
"Id": 8,
"Name": "Picture Shares",
"Version": "0.5.4.0"
}
],
"AvailableTvShowLibraries": [
{
"Id": 6,
"Name": "MP-TVSeries",
"Version": "0.5.4.0"
}
],
"DefaultFileSystemLibrary": 5,
"DefaultMovieLibrary": 3,
"DefaultMusicLibrary": 4,
"DefaultPictureLibrary": 0,
"DefaultTvShowLibrary": 6,
"ServiceVersion": "0.5.4"
}
The entities I want to store that data in look like this:
There is also a Server entity with a 1:1 relationship to ServerInfo.
What I want to do:
Store basic data (ApiVersion, ...) in ServerInfo. This I already got to work.
Store each object in AvailableXYLibraries in BackendLibrary (1:n relationship from ServerInfo).
Set type based on the XY part of AvailableXYLibraries, for example "movie" for AvailableMovieLibraries.
Set defaultLibrary to true if this library is referenced by DefaultXYLibrary.
Set providerId to servername + LibraryId as there are multiple servers that can have BackendLibraries with the same numeric ID.
Is this possible with Magical Record? I guess I need to implement some of the import hooks and set some user info keys, but everything I read doesn't really tell me where to set what user info key or implement which method where and how.
I hope this made sense and that you can give me some hints :) Thanks!
The structure of this data is quite a bit different from your Core Data model. What you'll most likely have to do is iterate a bit on the dictionary. That is, there are various collections of library data, eg. FileSystemLibraries, AvailableMovieLibraries, etc. You'll have to get the array out of those keys, and then map your entities as I described in the article. In order to launch the process, you'll have to call
[BackendLibrary importFromArray:arrayFromDownloadedDictionary];
where the arrayFromDownloadedDictionary is each array in the example dictionary you've posted. Once you give the array to MagicalRecord, and provided the proper field mapping, MagicalRecord will then import and create all the entities for you at that point.
Make sure you map "Id" to BackendLibary.id, "Name" to BackendLibrary.name, and "Version" to BackendLibrary.version