Kedro catalog fails when overwriting a GeoJson dataset even though the driver is supported - geojson

I have the following catalog item in my kedro project
suggested_routes_table#geopandas: type: geopandas.GeoJSONDataSet filepath: data/04_feature/routes_suggestions_table.geojson load_args: driver: "GeoJSON" mode: "a"
The keyword argument mode: "a" stands for append, meaning that every time the node is run, it should append new rows to the geojson instead of overwriting the file in the path.
As stated in Kedro GeoJson documentation and innGeoPandas to_file() documentation, the appending mode can be used with selected drivers.
When checking the selected fiona drivers in my environment I got the following output
import fiona fiona.supported_drivers { 'ARCGEN': 'r', 'DXF': 'rw', 'CSV': 'raw', 'OpenFileGDB': 'r', 'ESRIJSON': 'r', 'ESRI Shapefile': 'raw', 'FlatGeobuf': 'rw', 'GeoJSON': 'raw', 'GeoJSONSeq': 'rw', 'GPKG': 'raw', 'GML': 'rw', 'OGR_GMT': 'rw', 'GPX': 'rw', 'GPSTrackMaker': 'rw', 'Idrisi': 'r', 'MapInfo File': 'raw', 'DGN': 'raw', 'PCIDSK': 'rw', 'OGR_PDS': 'r', 'S57': 'r', 'SQLite': 'raw', 'TopoJSON': 'r' }
Meaning that GeoJson, the driver that I'm specifying in Kedro, support appending mode.
I have tried different drivers always ending up with the following DataSetError when calling catalog.load("suggested_routes_table#geopandas")
DataSetError: Failed while loading data from data set GeoJSONDataSet(filepath=/Users/nicolasbetancourt/Documents/GitHub/jama-coaque-routes/jama-coaque-routes/data/04_fea ture/routes_suggestions_table.geojson, load_args={'driver': GeoJSON, 'mode': a}, protocol=file, save_args={'driver': GeoJSON}). No such file or directory /vsimem/ec1c814e95f9446290be2efde0c02145.json
As soon as I delete the line mode: "a" from the data catalog, the GeoJSON dataset is loaded perfectly. That's why I assume that the error is due to this keyword argument but I don't understand why because I'm using it with a supported driver.
Also, when I try to save a GeoPandasDataFrame in the same path using the keyword mode="a" it works perfectly. That's why I assume that the error comes from the interaction between kedro and fiona

Related

Getting explainable predictions from Vertex AI for custom trained model

I've created a custom docker container to deploy my model on Vertex AI (the model uses LightGBM, so I can't use the pre-built containers provided by Vertex AI for TF/SKL/XGBoost).
I'm getting errors while trying to get explainable predictions from the model (I deployed the model and normal predictions are working just fine). I have gone through the official Vertex AI guide(s) for getting predictions/explanations, and also tried different ways of configuring the explanation parameters and metadata, but it's still not working. The errors are not very informative, and this is what they look like:
400 {"error": "Unable to explain the requested instance(s) because: Invalid response from prediction server - the response field predictions is missing. Response: {'error': '400 Bad Request: The browser (or proxy) sent a request that this server could not understand.'}"}
This notebook from Vertex provided by Google has some examples on how to configure the explanation parameters and metadata for models trained with different frameworks. I'm trying something similar.
https://github.com/GoogleCloudPlatform/vertex-ai-samples/blob/main/notebooks/community/ml_ops/stage4/get_started_with_vertex_xai.ipynb
The model is a classifier that takes tabular input with 5 features (2 string, 3 numeric), and output value from model.predict() is 0/1 for each input instance. My custom container returns predictions in this format:
# Input for prediction
raw_input = request.get_json()
input = raw_input['instances']
df = pd.DataFrame(input, columns = ['A', 'B', 'C', 'D', 'E'])
# Prediction from model
predictions = model.predict(df).tolist()
response = jsonify({"predictions": predictions})
return response
This is how I am configuring the explanation parameters and metadata for the model:
# Explanation parameters
PARAMETERS = {"sampled_shapley_attribution": {"path_count": 10}}
exp_parameters = aip.explain.ExplanationParameters(PARAMETERS)
# Explanation metadata (this is probably the part that is causing the errors)
COLUMNS = ["A", "B", "C", "D", "E"]
exp_metadata = aip.explain.ExplanationMetadata(
inputs={
"features": {"index_feature_mapping": COLUMNS, "encoding": "BAG_OF_FEATURES"}
},
outputs={"predictions": {}},
)
For getting predictions/explanations, I tried using the below format, among others:
instance_1 = {"A": <value>, "B": <>, "C": <>, "D": <>, "E": <>}
instance_2 = {"A": <value>, "B": <>, "C": <>, "D": <>, "E": <>}
inputs = [instance_1, instance_2]
predictions = endpoint.predict(instances=inputs) # Works fine
explanations = endpoint.explain(instances=inputs) # Returns error
Could you please suggest me how to correctly configure the explanation metadata, or provide input in the right format to the explain API, to get explanations from Vertex AI? I have tried many different formats, but nothing is working so far. :(

highchartr: Synchronize point and line colors and legend

I'm using the highchartr package to create an interactive chart.
My chart has lines on it corresponding to different series. In addition, I would like to have shapes at certain points on the lines.
Its very easy to get the points in the right place. However, I would like to map the point color to the line it is associated with. And when the user clicks on the legend entry for the line, I'd like the associated points to be toggled as well.
The code looks like this:
highchart() %>%
hc_add_series(
type="line",
marker=list(enabled=F),
data=input_data,
mapping=hcaes(x=x, y=y, group=series_name)
) %>%
hc_add_series(
type="point",
data=input_data %>% filter(! is.na(marker)),
mapping=hcaes(x=x, y=y, color=series_name, fill=series_name, group=series_name, shape=marker)
)
The result gets the points in the right place. But the point color is on a different color mapping from the lines. Clicking on the entry for the line in the legend toggles only the line - the points show up as separate entries by series_name.
What
What can I do so:
- The points and lines share the same color mapping
- The points and lines can be toggled together by clicking on the line in the legend
- The points show up separately in the legend based on their shape rather than their color?
Thanks!
Generally, it can be achieved in at least few different ways. It all depends on your data which you haven't provided (I created a sample data).
Additionally, I will provide all the examples in jsFiddle (JavaScript) because it is faster to explain something that way with a quick online example.
The final answer will contain R code (maybe with some custom JavaScript if needed, but all will be reproducible in R.
First of all, your assumption that you need a separate series is wrong and causes problems. If you want markers on your line with the same color and you want to toggle them together on legend click, then you don't need separate series - one series with markers enabled on some points is enough, see this example: https://jsfiddle.net/BlackLabel/s24rk9x7/
In this case, the R data needs to be defined properly.
If you don't want to keep it simple as described above, you can keep lines and markers as separate series as in your original question.
In this case, you can use series.linkedTo property to connect your "point" series to line series (BTW in Highcharts there is no something like "point" series type, it is "scatter" series type. Another reason why your code is wrong and is not working and you got unvoted), but there is a problem with it in Highcharter - doesn't work, seems like a bug and should be reported on Highcharter GitHub repo.
This is a JavaScript version which works fine: https://jsfiddle.net/BlackLabel/3mtdfqLo/
In this example, if you want to keep markers and line series in the same color, you can define colors manually or you can write some custom code (like I did) that will change the color for you automatically.
And this is the same R version which should work, but is not:
library(highcharter)
highchart() %>%
hc_add_series(
data=list(4, 3, 5, 6, 2, 3)
) %>%
hc_add_series(
data=list(14, 13, 15, 16, 12, 13),
id="first"
) %>%
hc_add_series(
data=list(10, 8, 6, 2, 5, 12),
id="second"
) %>%
hc_add_series(
type="scatter",
linkedTo="first",
data=list(list(1, 3), list(2, 5))
) %>%
hc_add_series(
type="scatter",
linkedTo="second",
data=list(list(1, 13), list(2, 15), list(3, 16))
) %>%
hc_plotOptions(
line = list(marker=list(enabled=F))
)
There is probably something wrong with hc_add_series function.
As a workaround, you can write it all as a custom JavaScript code, which (again) works fine:
library(highcharter)
highchart() %>%
hc_plotOptions(
line = list(marker=list(enabled=F))
) %>%
hc_chart(
events = list(load = JS("function() {
this.addSeries({
data: [4, 3, 5, 6, 2, 3],
id: 'first'
});
this.addSeries({
data: [14, 13, 15, 16, 12, 13],
id: 'second'
});
this.addSeries({
data: [10, 8, 6, 2, 5, 12]
});
this.addSeries({
type: 'scatter',
linkedTo: 'first',
data: [[1, 3], [2, 5]]
});
this.addSeries({
type: 'scatter',
linkedTo: 'second',
data: [[1, 13], [2, 15], [3, 16]]
});
}")))
Of course, last examples don't contain functionality that changes colors - you can copy it from the jsFiddle above.

Doc2vec: Only 10 docvecs in gensim doc2vec model?

I used gensim fit a doc2vec model, with tagged document (length>10) as training data. The target is to get doc vectors of all training docs, but only 10 vectors can be found in model.docvecs.
The example of training data (length>10)
docs = ['This is a sentence', 'This is another sentence', ....]
with some pre-treatment
doc_=[d.strip().split(" ") for d in doc]
doc_tagged = []
for i in range(len(doc_)):
tagd = TaggedDocument(doc_[i],str(i))
doc_tagged.append(tagd)
tagged docs
TaggedDocument(words=array(['a', 'b', 'c', ..., ],
dtype='<U32'), tags='117')
fit a doc2vec model
model = Doc2Vec(min_count=1, window=10, size=100, sample=1e-4, negative=5, workers=8)
model.build_vocab(doc_tagged)
model.train(doc_tagged, total_examples= model.corpus_count, epochs= model.iter)
then i get the final model
len(model.docvecs)
the result is 10...
I tried other datasets (length>100, 1000) and got same result of len(model.docvecs).
So, my question is:
How to use model.docvecs to get full vectors? (without using model.infer_vector)
Is model.docvecs designed to provide all training docvecs?
The bug is in this line:
tagd = TaggedDocument(doc[i],str(i))
Gensim's TaggedDocument accepts a sequence of tags as a second argument. When you pass a string '123', it's turned into ['1', '2', '3'], because it's treated as a sequence. As a result, all of the documents are tagged with just 10 tags ['0', ..., '9'], in various combinations.
Another issue: you're defining doc_ and never actually using it, so your documents will be split incorrectly as well.
Here's the proper solution:
docs = [doc.strip().split(' ') for doc in docs]
tagged_docs = [doc2vec.TaggedDocument(doc, [str(i)]) for i, doc in enumerate(docs)]

Placeholder tensors require a value in ml-engine predict but not local predict

I've been developing a model for use with the cloud ML engine's online prediction service. My model contains a placeholder_with_default tensor that I use to hold a threshold for prediction significance.
threshold = tf.placeholder_with_default(0.01, shape=(), name="threshold")
I've noticed that when using local predict:
gcloud ml-engine local predict --json-instances=data.json --model-dir=/my/model/dir
I don't need to supply values for this tensor. e.g. this is a valid input:
{"features": ["a", "b"], "values": [10, 5]}
However when using online predict:
gcloud ml-engine predict --model my_model --version v1 --json-instances data.json
If I use the above JSON I get an error:
{
"error": "Prediction failed: Exception during model execution: AbortionError(code=StatusCode.INVALID_ARGUMENT, details=\"input size does not match signature\")"
}
However if I include threshold, then I don't. e.g:
{"features": ["a", "b"], "values": [10, 5], "threshold": 0.01}
Is there a way to have "threshold" be an optional input?
Thanks
Matthew
Looks like currently it's not possible in CloudML. If you're getting predictions from a JSON file, you need to add the default values explicitly (like you did with "threshold": 0.01).
In Python I'm just dynamically adding the required attributes before doing the API request:
def add_empty_fields(instance):
placeholder_defaults = {"str_placeholder": "", "float_placeholder": -1.0}
for ph, default_val in placeholder_defaults.items():
if ph not in instance:
instance[ph] = default_val
which would mutate the instance dict that maps placeholder names to placeholder values. For a model with many optional placeholders, this is a bit nicer than manually setting missing placeholder values for each instance.

Opencv CV_FOURCC('F','L','V','1') not working?

I want to write .flv file from Opencv and spent so much time on it...
OpenCv 2.3 Documentation says we can create flv file with this codec
CV_FOURCC('F','L','V','1')
but I am always getting this error.
[flv # 0x9bf5000] Tag FLV1/0x31564c46 incompatible with output codec id '22'
Please help....
Currently I am using OpenCv 2.3 on Ubuntu 10.10
I know this is quite old, but I'll add my experience to this wall incase future people have this problem.
I encountered this using the PIM1 fourCC for output - my problem was solved when I changed from
video_output = cvCreateVideoWriter("disparity_output.mov", CV_FOURCC('P', 'I', 'M', '1'), 32, size, 0);
to:
video_output = cvCreateVideoWriter("disparity_output.mkv", CV_FOURCC('P', 'I', 'M', '1'), 32, size, 0);
changing the output path to have a file extension that was acceptable (or so I assume). Not sure if it helps, but it worked for me.

Resources