Requirement:
I want to create an influxDB database to store time-series data from multiple sensors reporting temperatures from different locations.
Problem:
When I write points to the database with same timestamp but different tags (example : location) and field (temperature) value , influx overwrites both tags and fields values with the latest timestamp
I followed the documentation available on their website and they show a sample db with above requirement but am not able find the schema used.
Example Table with duplicate timestamps
Additional Information :
Sample Input :
json_body_1 = [
{
"measurement": "cpu_load_short",
"tags": {
"host": "server02",
"region": "us-west"
},
"time": "2009-11-10T23:00:00Z",
"fields": {
"Float_value": 0.7,
"Int_value": 6,
"String_value": "Text",
"Bool_value": False
}
},
{
"measurement": "cpu_load_short",
"tags": {
"host": "server01",
"region": "us-west"
},
"time": "2009-11-10T23:00:00Z",
"fields": {
"Float_value": 1.0,
"Int_value": 2,
"String_value": "Text",
"Bool_value": False
}
}]
I used the example given in official documentation , still instead of 2 records , I get only one. Please note host tag is different which should ideally make each point unique.
Documentation
Today I too faced the same issue and now I come to the solution. :)
from influxdb import InfluxDBClient
client = InfluxDBClient(host='host name', port=8086, database='test_db',username='writer', password=Config.INFLUXDB_WRITE_PWD)
points = [{
"measurement": "cpu_load_short",
"tags": {
"host": "server02",
"region": "us-west"
},
"time": "2009-11-10T23:00:00Z",
"fields": {
"Float_value": 0.7,
"Int_value": 6,
"String_value": "Text",
"Bool_value": False
}
},
{
"measurement": "cpu_load_short",
"tags": {
"host": "server01",
"region": "us-west"
},
"time": "2009-11-10T23:00:00Z",
"fields": {
"Float_value": 1.0,
"Int_value": 2,
"String_value": "Text",
"Bool_value": False
}
}]
status = client.write_points(json_body_1, database='test_db', batch_size=10000, protocol='json')
Here is the output
> select * from cpu_load_short;
name: cpu_load_short
time Bool_value Float_value Int_value String_value host region
---- ---------- ----------- --------- ------------ ---- ------
1257894000000000000 false 1 2 Text server01 us-west
1257894000000000000 false 0.7 6 Text server02 us-west
Related
Pulling data into Google Data Studio from a Google Sheet with dates stored in yyyy-mm-dd format. The dates look correct and calculate correctly with formulas and adjustments everywhere except in a Gantt chart using the Vega-Lite Community Visualization, which shows the date in a long-number format (e.g. 20210520), and is unable to display the data when using "type": "temporal" or using "timeUnit": "utcyearmonthdatehours".
I've ran various tests, including...
Changing the date format for the date columns to plain text, yyyyddmm, yymmdd, yyyy/mm/dd formats.
Replace the current date columns with new columns using the alternate formats in point 1 (above).
Changing the date field formats directly in Google Data Studio to the formats in point 1 (above).
Creating a secondary set of date columns in plain-text using an Arrayformula and Text() function to reformat the actual dates to plain-text.
So far, options 2 & 4 are the only way I've been able to get the gantt to render correctly, reading the data in date format. But option 2 renders the other charts in GDS as unusable, as the other charts cannot translate the plain-text to usable dates.
Option 4 does work, but isn't the ideal route, given the redundant data. I'd prefer to have just 1 column for the Start Date and another for the End Date, rather than 2 columns for both. Feels like I may be missing something obvious here. Is there a way to either properly format the dates in Google Sheets to work properly with both the GDS date fields and Vega-Lite, or is there a way to properly parse the date data in Vega-Lite without needing to use a second set of plain-text columns?
Report replicating the issue: Project Tracking (debug report)
Edit: below is the code for the Vega-lite visualizations using the date fields from Google Sheets, which Vega-lite is not interpreting as dates.
Without timeunit or temporal field type:
{
"$schema": "https://vega.github.io/schema/vega-lite/v5.json",
"description": "A bar chart with highlighting on hover and selecting on click. (Inspired by Tableau's interaction style.)",
"config": {
"background": null,
"view": {
"stroke": "transparent"
}
},
"layer": [
{
"layer": [
{
"params": [
{
"name": "grid",
"select": "interval",
"bind": "scales"
}
],
"mark": {
"type": "bar",
"cursor": "pointer",
"tooltip": true,
"point": true,
"cornerRadiusEnd": 5,
"opacity": 0.8
},
"encoding": {
"color": {
"field": "$dimension3",
"title": "$dimension3.name"
}
}
}
],
"encoding": {
"x": {
"field": "$dimension0",
"axis": {
"title": null,
"grid": true
}
},
"y": {
"field": "$dimension1",
"title": "$dimension1.name",
"type": "nominal",
"sort": "x",
"axis": {
"title": null,
"grid": true,
"tickBand": "extent"
}
},
"x2": {
"field": "$dimension2"
},
"yOffset": {
"field": "$dimension3"
}
}
}
]
}
With timeunit and field type temporal:
{
"$schema": "https://vega.github.io/schema/vega-lite/v5.json",
"description": "A bar chart with highlighting on hover and selecting on click. (Inspired by Tableau's interaction style.)",
"config": {
"background": null,
"view": {
"stroke": "transparent"
}
},
"layer": [
{
"layer": [
{
"params": [
{
"name": "grid",
"select": "interval",
"bind": "scales"
}
],
"mark": {
"type": "bar",
"cursor": "pointer",
"tooltip": true,
"point": true,
"cornerRadiusEnd": 5,
"opacity": 0.8
},
"encoding": {
"color": {
"field": "$dimension3",
"title": "$dimension3.name"
}
}
}
],
"encoding": {
"x": {
"field": "$dimension0",
"type": "temporal",
"timeUnit": "utcyearmonthdatehours",
"axis": {
"title": null,
"grid": true
}
},
"y": {
"field": "$dimension1",
"title": "$dimension1.name",
"type": "nominal",
"sort": "x",
"axis": {
"title": null,
"grid": true,
"tickBand": "extent"
}
},
"x2": {
"field": "$dimension2"
},
"yOffset": {
"field": "$dimension3"
}
}
}
]
}
I am looking to convert JSON to Avro without altering the shape of the data
A nested field in the JSON contains a variable number of keys, which are never known in advance. The record that branches off of each of these unknown nodes however is of known, well-defined shape.
An example of this input data is as shown below
{
"customers": {
"paul_ince": {
"name": "Mr Paul Ince",
"age": 54
},
"kim_kardashian": {
"name": "Ms Kim Kardashian",
"age": 41
},
"elon_musk": {
"name": "Elon Musk, Technoking of Tesla",
"age": 50
}
}
Now it would be more avro friendly of course to have customers lead to an array of the form
{
"customers": [
{
"customer_name": "paul_ince",
"name": "Mr Paul Ince",
"age": 54
},
{
...
}
]
}
But this would violate my constraint that the shape of the input data be unchanged.
This problem seems to manifest itself frequently if I rely on data from external sources or scrapes, or preexisting data that was never created for Avro.
In my head, the schema would look something like the below,
{
"fields": [
{
"name": "customers",
"type": {
"type": "record",
"name": "customers",
"fields": [
{
"name": $customer_name,
"type": {
"type": "record",
"name": $customer_name,
"fields": [
{
"name": "name",
"type": "string",
},
{
"name": "age",
"type": "int"
}
]
}
}
]
}
}
]
}
where $customer_name is an assignment value, defined on read. Just asking the question it feels like this violates fundamental avro but I must use Avro and I strongly desire to maintain the input shape of the data. It would be highly impractical to modify this, not least given how frequently this problem appears and how large and varied the data I need to transfer from JSON to Avro is
We have an Microsoft Search instance for crawling one custom app : https://learn.microsoft.com/en-us/microsoftsearch/connectors-overview
Query & display is working as expected but aggregation provides wrong results
query JSON : https://graph.microsoft.com/v1.0/search/query
select title + submitter and aggregation on submitter
"fields": [
"title",
"submitter"
],
"aggregations": [
{
"field": "submitter",
"size": 1,
"bucketDefinition": {
"sortBy": "keyAsString",
"isDescending": true,
"minimumCount": 0
}
}
]
JSON response
submitter property is correctly returned with Firstname Lastname on row 0 but aggregate is lowercase and middle space trimmed firstnamelastname
"hitsContainers": [
{
"total": 1,
"moreResultsAvailable": false,
"hits": [
{
"hitId": "xxxx",
"contentSource": "ConnectionId",
"rank": 1,
"summary": "New service / <c0>business</c0> <c0>model</c0> <c0>design</c0> <ddd/>",
"resource": {
"#odata.type": "#microsoft.graph.externalConnectors.externalItem",
"properties": {
"title": "New service / business model design",
"submitter": "Firstname Lastname"
}
}
}
],
"aggregations": [
{
"field": "submitter",
"buckets": [
{
"key": "firstnamelastname",
"count": 1,
"aggregationFilterToken": "\"ǂǂ696c736573706f656c73747261\""
}
]
}
]
}
]
reproducible in Microsoft Graph Explorer (a bit obfuscated)
result with space
aggregation concatenated in lowercase
Rootcause has been identified as submitter property wasn't created with flag refinable
{
"name": "submitter",
"type": "String",
"isSearchable": "true",
"isQueryable": "true",
"isRetrievable": "true"
"isRefinable": "false"
}
as a consequence, output was incorrect.
testing with refinable = true provides correct aggregation value (1 = non refinable, 2 = refinable).
small note : refinable properties can't be searchable
I am trying to modify the quick sample provided here.
I tried to add a few custom sensor data type but it is failing. Then I tried a few data types mentioned in the documentation which also failed.
I am getting below error
Creating Sensor: {
"DataType": "Noise",
"DeviceId": "some-device-id",
"HardwareId": "SAMPLE_SENSOR_NOISE"
}
Request: POST
https://******.*******.azuresmartspaces.net/management/api/v1.0/sensors
Response Status: 404, NotFound , {"error":
{"code":"404.600.000.001","message":"There is no SensorDataType of the
given name."}}
Can we add custom sensor datatype?
If no then what are the inbuilt data types? or if yes then what went wrong here?
You need to post the DataType when creating the Sensor object. Use “None” if you want to change it later. Swagger DOCs show the “Model” you can expand and see required fields.
If the DataType is not in the api/v1/system/types you will need to enable it or create a new DataType. Create a new DataType POST to the Types with the required information. The minimum is the TypeName and SpaceID to neat the type under. My typical pattern is to create a root space and append any custom twin objects like types to this space.
I believe these are case sensitive names as well.
https://{servicename}.{region}.azuresmartspaces.net/management/swagger/ui/index#/Types
EDIT:
Check your Ontologies with:
https://{servicename}.{region}.azuresmartspaces.net/management/api/v1.0/ontologies
Select these by ID and POST to set them to true to get all available built-in types:
[
{
"id": 1,
"name": "Required",
"loaded": true
},
{
"id": 2,
"name": "Default",
"loaded": true
},
{
"id": 3,
"name": "BACnet",
"loaded": true
},
{
"id": 4,
"name": "Advanced",
"loaded": true
}
]
Then you can query all the given types:
https://{servicename}.{region}.azuresmartspaces.net/management/api/v1.0/types?includes=Description,FullPath,Ontologies,Space
You should receive something like:
[
{
"id": 1,
"category": "DeviceSubtype",
"name": "None",
"disabled": false,
"logicalOrder": 0,
"fullName": "None",
"spacePaths": [
"/system"
],
"ontologies": [
{
"id": 1,
"name": "Required",
"loaded": true
}
]
},
{
"id": 2,
"category": "DeviceType",
"name": "None",
"disabled": false,
"logicalOrder": 0,
"fullName": "None",
"spacePaths": [
"/system"
],
"ontologies": [
{
"id": 1,
"name": "Required",
"loaded": true
}
]
},
{
"id": 3,
"category": "DeviceBlobSubtype",
"name": "None",
"disabled": false,
"logicalOrder": 0,
"fullName": "None",
"spacePaths": [
"/system"
],
"ontologies": [
{
"id": 1,
"name": "Required",
"loaded": true
}
]
},
...Objects,
]
For the Discovery REST api, the argument/parameter "return" controls which fields are returned.
So if I pass these arguments to the API
{
"query": named_sector,
"count": "10",
"filter": filter_dates,
"aggregation" : "term(docSentiment.type,count:3)"
}
my_query = discovery.query(my_disc_environment_id, my_disc_collection_id, qopts)
print(json.dumps(my_query, indent=2))
I get the following:
{
"matching_results": 14779,
"aggregations": [
{
"type": "term",
"field": "docSentiment.type",
"count": 3,
"results": [
{
"key": "positive",
"matching_results": 4212
},
{
"key": "negative",
"matching_results": 3259
},
{
"key": "neutral",
"matching_results": 152
}
]
}
],
"results": [
{
"id": "6389715fe7e7f711e0bc09d4f1236639",
"score": 1.3689895,
"yyyymm": "201704",
"url": "https://seekingalpha.com/article/4060446-valuation-dashboard-consumer-discretionary-update",
"enrichedTitle": null,
"host": "seekingalpha.com",
"text": "Valuation Dashboard: Consumer Discretionary - Update\n\nSummary\n\nValuation metrics in Consumer Discretionary.\n\nEvolution since last month.\n\nA list of stocks loo ....
and thousands of more lines. How do I restrict the output to the aggregations section? Is this an issue of me better handling the JSON structure that is returned?
thanks
If you change the count argument to 0, the returned JSON will only contain the aggregations.
Also, if you're using the Discovery web tooling, you can enter 0 for the "Number of results to return (Count)" field.
More details and an example can be found here: https://www.ibm.com/watson/developercloud/doc/discovery/using.html#building-aggregations