Generating an avro schema with optional values - flume

I am trying to write a very easy avro schema (easy because I am just pointing out my current issue) to write an avro data file based on data stored in json format. The trick is that one field is optional, and one of avrotools or me is not doing it right.
The goal is not to write my own serialiser, the endgoal will be to have this in flume, I am in the early stages.
The data (works), in a file named so.log:
{
"valid": {"boolean":true}
, "source": {"bytes":"live"}
}
The schema, in a file named so.avsc:
{
"type":"record",
"name":"Event",
"fields":[
{"name":"valid", "type": ["null", "boolean"],"default":null}
, {"name":"source","type": ["null", "bytes"],"default":null}
]
}
I can easily generate an avro file with the following command:
java -jar avro-tools-1.7.6.jar fromjson --schema-file so.avsc so.log
So far so good. The thing is that "source" is optional, so I would expect the following data to be valid as well:
{
"valid": {"boolean":true}
}
But running the same command gives me the error:
Exception in thread "main" org.apache.avro.AvroTypeException: Expected start-union. Got END_OBJECT
at org.apache.avro.io.JsonDecoder.error(JsonDecoder.java:697)
at org.apache.avro.io.JsonDecoder.readIndex(JsonDecoder.java:441)
at org.apache.avro.io.ResolvingDecoder.doAction(ResolvingDecoder.java:229)
at org.apache.avro.io.parsing.Parser.advance(Parser.java:88)
at org.apache.avro.io.ResolvingDecoder.readIndex(ResolvingDecoder.java:206)
at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:155)
at org.apache.avro.generic.GenericDatumReader.readField(GenericDatumReader.java:193)
at org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:183)
at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:151)
at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:142)
at org.apache.avro.tool.DataFileWriteTool.run(DataFileWriteTool.java:99)
at org.apache.avro.tool.Main.run(Main.java:84)
at org.apache.avro.tool.Main.main(Main.java:73)
I did try a lot of variations in the schema, even things that do not follow the avro spec. The schema I show here is, as far as I know, what the spec says it should be.
Would anybody know what I am doing wrong, and how I can actually have optional elements without writing my own serialiser?
Thanks,

According to the documentation of the Java API:
using a builder requires setting all fields, even if they are null
The python API, on the other hand, seems to allow null fields to be really optional:
Since the field favorite_color has type ["string", "null"], we are not required to specify this field
In short, as most tools are written Java, null fields must usually be explicitly given.

Related

Phonograph2:ReadOnlyTables error in Slate writeback query

I'm working with a sample dataset of airports as I continue exploring Slate features for my team. I copied the default airport dataset into my files, so this is a version that I fully own (so presumably no permission issues there). The dataset is properly available in my Slate application since I'm also using it to display and filter data via Phonograph2 queries.
Based on the Phonograph2 docs, I created a new query to add a new airport to the dataset. I'm using the "Table Storage Service" and the "Post Event" endpoint. As a test, I configured my tableEditedEventPostRequest request as:
{
"primaryKey": {
"airport": "ABC"
},
"payload": {
"type": "rowAdded",
"rowAdded": {
"columns": {
"display_name": "[ABC] My New Airport"
}
}
}
}
(Once I get this working I'd switch the values out with dynamic values from widgets.)
When I run a test of this query, I get this error response:
{
"errorCode":"INVALID_ARGUMENT",
"errorName":"Phonograph2:ReadOnlyTables",
"errorInstanceId":"17ec990d-5d58-479d-a1b6-5ad033c8c808",
"parameters":{
"tableRids":"[ri.phonograph2.main.table.f3f33f6e-801a-4454-98e9-f2df5f170559]",
"dataInputLocatorRids":"[ri.foundry.main.dataset.6add7c46-d3c9-4056-89b6-a19dbe461ed4]"
}
}
I'm not finding anything about this error or anything in the docs (so far) about the target dataset being configured as read-only. There aren't settings I can find on the dataset to make it more permissive and I'm already the owner of the dataset. I'd appreciate any insights or tips to get past this road block.
For a Phonograph Table to be "editable" it needs to be associated with a writeback dataset. If you created the sync through the Ontology, which it seems like you did not, you would do this on the "Datasources" configuration tab.
Since it sounds like you created the sync directly from the Dataset Details view (or maybe through the Slate Datasets tab), you should have an option in that configuration to create a new dataset for writeback. All you should need to do is provide a dataset name and folder location.

Puppet3 | read values from different yaml file

So I'm using puppet3 and I have X.yaml and Y.yaml. X.yaml has profiles::resolv_conf::nameservers: [ '1.1.1.1', '8.8.8.8', '2.2.2.2' ]in it. I want to add that [ '1.1.1.1', '8.8.8.8', '2.2.2.2' ] as a value to the servers: which is in Y.yaml:
'dns_test':
plugin_type: 'dns_query'
options:
'servers': \['1.1.1.1', '8.8.8.8', "2.2.2.2"\]
'domains': \['google.com'\]
'record_type': 'A'
'timeout': 5
tags:
'input_source': 'dns_query'
By doing this I want to make sure that when someone change values in profiles::resolv_conf::nameservers: that value is changed in this telegraf plugin too.
I tried multiple solution but the one that was the closest was:
'dns_test':
plugin_type: 'dns_query'
options:
'servers': "%{hiera('profiles::resolv_conf::nameservers')}"
'domains': ['google.com']
'record_type': 'A'
'timeout': 5
tags: 'input_source': 'dns_query'
but problem is that puppet was adding extra " " to the value and final value in plugin conf was:
"["1.1.1.1", "2.2.2.2", "8.8.8.8"]" instead of ["1.1.1.1", "2.2.2.2", "8.8.8.8"]
TL;DR: You can't.
From the current docs and the Puppet documentation archive, I confirm that no version of the %{hiera} interpolation function or its replacement, %{lookup}, ever supported interpolating values other than strings. That's expressed in the current docs like so:
The lookup and hiera interpolation functions look up a key and return
the resulting value. The result of the lookup must be a string; any
other result causes an error.
(Emphasis added)
What you're looking for would be supported by Hiera 5's %{alias} function, provided that the data are available somewhere else in the same hierarchy (which is also a requirement for %{hiera}). Since you're stuck on Puppet 3, however, you're probably on Hiera 2, and certainly not later than Hiera 3.
"But wait!" You may say. "I'm getting a successful interpolation, but the data are just munged". Specifically, you wrote:
problem is that puppet was adding extra " " to the value and final value
Since %{hiera()} interpolates only strings, it is not surprising that you got a string value, given that you got a value at all. I do find it a bit surprising that Puppet did not throw an error, but I'm not prepared to comment further on that without a minimum reproducible example that demonstrates the behavior.

Defining a ProblemMatcher in VSCode tasks -- schema disagrees with docs?

In VSCode I'm trying to create a ProblemMatcher to parse errors on a custom script of mine which I run (markdown file -> pandoc -> PDFs if you're interested).
The pretty good VSCode ProblemMatcher documentation has an example task which appears (to me) to run a command ("command": "gcc") and define a problem matcher ("problemMatcher": {...}).
When I try this for my tasks.json file with both, I get an 'the description can't be converted into a problem matcher' error, which isn't terribly helpful. I checked the tasks.json schema and it clearly says:
The problem matcher to be used if a global command is executed (e.g. no tasks are defined). A tasks.json file can either contain a global problemMatcher property or a tasks property but not both.
Is the schema wrong? In which case I'll raise an issue.
Or is my code wrong? In which case, please point me in the right direction. Code in full (minus comments):
{
"version": "2.0.0",
"tasks": [
{
"label": "md2pdf",
"type": "shell",
"command": "md2pdf",
"group": {
"kind": "build",
"isDefault": true
},
"presentation": {
"reveal": "always",
"panel": "shared",
"showReuseMessage": false
},
"problemMatcher": {
"owner": "Markdown",
"fileLocation": ["absolute", "/tmp/md2pdf.log"],
"pattern": [
{
// Regular expression to match filename (on earlier line than actual warnings)
"regexp": "^Converting:\\s+(.*)$",
"kind": "location",
"file": 1
},
{
// Regular expression to match: "l.45 \msg_fatal:nn {fontspec} {cannot-use-pdftex}" with a preceding line giving "Converting <filename>:"
"regexp": "l.(\\d+)\\s(.*):(.*)$",
"line": 1,
"severity": 2,
"message": 3
}
]
}
}]
}
I've since spent more time figuring this out, and have corresponded with VSCode team, which has led to improvements in the documentation.
The two changes needed to get something simple working were:
Need to have "command": "/full/path/to/executable" not just "executable name".
The "fileLocation" isn't about the location of the file to be matched, but about how to treat file paths mentioned in the task output. The file to be matched can't be specified, as it's implicitly the file or folder open in the editor at the time of the task. The setting wasn't important in my case.
If you, like me, have come here due to the description can't be converted into a problem matcher, here is what I learned:
If your problem matcher says something like "base": "$gcc", then I assume you are using the Microsoft C/C++ plugin. If you are using some other base which is not listed on the official docs webpage (search Tasks in Visual Studio Code), then assume that it is probably supplied by a plugin.
So, this error could mean that you are missing a plugin. In my case I was trying to run this task remotely in WSL/Ubuntu using VS Code's awesome WSL integration. I installed the C/C++ plugin inside WSL and the error was fixed (go to the extension panel, click Install in WSL: <Distro name>).
Just a hunch, but I bet your fileLocation is wrong. Try something like
"fileLocation": "absolute",

dask - read_json into dataframe ValueError

A minimal example here: I have a json file xaa.json whose contents looks like this (two rows from stackoverflow archive):
[
{"Id": 11, "Body": "<p>Given a specific <code>DateTime</code> value", "Title": "Calculate relative time in C#", "Comments": "There is the .net package https://github.com/NickStrupat/TimeAgo which pretty much does what is being asked."},
{"Id": 7888, "Body": "<p>You need to use an <code>ifstream</code> if you just want to read (use an <code>ofstream</code> to write, or an <code>fstream</code> for both).</p>
<p>To open a file in text mode, do the following:</p>
<pre><code>ifstream in(\\"filename.ext\\", ios_base::in); // the in flag is optional
</code></pre>
<p>To open a file in binary mode, you just need to add the \\"binary\\" flag.</p>
<pre><code>ifstream in2(\\"filename2.ext\\", ios_base::in | ios_base::binary );
</code></pre>
<p>Use the <code>ifstream.read()</code> function to read a block of characters (in binary or text mode). Use the <code>getline()</code> function (it's global) to read an entire line.</p>
", "Title": null, "Comments": "+1 for noting that the global getline() function is to be used instead of the member function."}
]
I want to load such json files into a dask dataframe. I use:
so_posts_df = dd.read_json('./xaa.json', orient='columns').compute()
I get this error:
ValueError: Unexpected character found when decoding object value
After looking into the contents, I figured that the "\\"' stuff was causing it. So, when I removed them, (the editor - IntelliJ said it was clean and nice looking JSON) and when I ran the same read_json, it was able to read into a df and display them nicely.
So, I have 2 questions: (a) what are the values for the read_json argument "errors" ? (b) How can I properly preprocess the json file before reading into dask dataframe? The presence of double-quotes and the double-escaping seems to be causing an issue.
[This may not be a dask issue at all...]...
This also fails with pandas.read_json. I recommend first trying to get things to work well with Pandas, and then try the same workload with dask dataframe. You will likely get much better support when asking Pandas questions.

Jmeter Parameter CSV not recognizing variable

I am confused on a Jmeter variable not getting picked up by the CSV Data config. I have a Thread with HTTP request, CSV Data Config, HTTP Header Manager, and Results Tree. Everything seems to work fine, but there is just one variable that is not recognized...
Here is the Request Body after running the test:
{
"W_ID": "${W_ID}",
"b": "b",
"c": "c",
"d": "d"
}
For some reason the W_ID variable is not being recognized, but other variables are. All rows have the correct value assigned to them except the W_ID. I tried deleting the W_ID column from my file(in case there was weird formatting or white space), saving, and re-running the test, but same results.
Any ideas? Thanks for your help! Please let me know if I can provide more information or clarity.
Edit1:
I noticed that the object name shows up in the body of the service... could that have an impact? This is the body (inv_adj is the object name):
{
"inv_adj": {
"W_ID": "string",
"a": "string",
"b": "string",
"c": "string",
}
Edit2:
CSV variables were requested:
Row 1: W_ID, b, c, d
Row 2: a, b, c, d
In JMeter, If Variables are referenced as follows:
${VARIABLE}
If an undefined function or variable is referenced, JMeter does not report/log an error - the reference is returned unchanged. For example, if UNDEF is not defined as a variable, then the value of ${UNDEF} is ${UNDEF}.
So, Double check your CSV Data Set Config that how you have defined your variable name for each row. Is it WarehouseID or W_ID in your CSV data set config? If you use as WarehouseID in your CSV data set config, then you should use like {"W_ID": "${WarehouseID}"}in your HTTP Sampler's body.
Edit:
Here is an example step by step:
CSV Data Set:
CSV Data Set Config:
Request Body Before Test:
Request Body After Test in Results Tree:
I tried to reproduce your issue locally on my JMeter instance. But I could not reproduce the error that you are facing. Unless we have your entire data file and the JMeter test plan, it is difficult to understand the issue. Please find below my test plan
And then, look at the sampler configuration
When I replay this, I can see that the values are getting substituted properly.

Resources