I am working in roscpp and publishing odometry data to a ros node. Two classes are responsible for publishing this data. For safety reasons, it is sometimes necessary to call the destructor on the primary class and post still data (an odometry pose that uses the same values to let ros know the robot is standing still) from the other class. I have already confirmed the secondary class posts the correct pose on the correct anchor.
Most of the time after the switch to the secondary class, I am running into a check failed stemming from the data time not being greater than the previous data's time.
The error message I get is this:
F0402 08:12:25.187301 9044 map_by_time.h:43] Check failed: data.time > std::prev(trajectory.end())->first (636898039451158060 vs. 636898039451158060)
As you can see it thinks the data time is equal to the previous data time. I have confirmed that the timestamp being published from my code is in the area of 1554227809034068 (in microseconds since epoch), which is accurate to the time I collected the data. After the switch the time stamp is still correct, a value close to and slightly higher than the previous value.
I am trying to figure out why this error message has such a large number for time, and why it does not match the posted timestamp.
Related
I am verifying a very small model. But I receive the memory exhaust message. I changed the model several times but having same problem.
I thought that problem would be due to using a user defined function or using the select option to get the random number. Then I changed the model and didn't call the function nor used the Selection option but still....
I am wondering either it's UPPAAL's issue or in my model. There is no error other than memory exhaust. Once the value of "r1" and "r2" are changed after that ctl property doesn't work.
CTL works for all values of r1 and r2 before the increment.
The model increments several variables (r1, r2 and cntr): if there is no upper bound for them (and it seems there is not, but I cannot see all the functions), then the state space is going to be huge (all values multiplied times the number of locations, times clock zones) and thus exhaust all the memory.
Either make those variables bounded (do not allow increments passed some value), or declare them as meta (don't do it if you do not understand the consequences).
I have an existing BEAM pipeline that is handling the data ingested (from Google Pubsub topic) by 2 routes. The 'hot' path does some basic transformation and stores them in Datastore, while the 'cold' path performs fixed hourly windowing for deeper analysis before storage.
So far the pipeline has been running fine until I started to do some local buffering on the data before publishing to Pubsub (so data arrives at Pubsub may be a few hours 'late'). The error that gets thrown is as below:
java.lang.IllegalArgumentException: Cannot output with timestamp 2018-06-19T14:00:56.862Z. Output timestamps must be no earlier than the timestamp of the current input (2018-06-19T14:01:01.862Z) minus the allowed skew (0 milliseconds). See the DoFn#getAllowedTimestampSkew() Javadoc for details on changing the allowed skew.
at org.apache.beam.runners.core.SimpleDoFnRunner$DoFnProcessContext.checkTimestamp(SimpleDoFnRunner.java:463)
at org.apache.beam.runners.core.SimpleDoFnRunner$DoFnProcessContext.outputWithTimestamp(SimpleDoFnRunner.java:429)
at org.apache.beam.sdk.transforms.WithTimestamps$AddTimestampsDoFn.processElement(WithTimestamps.java:138)
It seems to be referencing the section of my code (withTimestamps method) that performs the hourly windowing as below:
Window<KV<String, Data>> window = Window.<KV<String, Data>>into
(FixedWindows.of(Duration.standardHours(1)))
.triggering(Repeatedly.forever(pastEndOfWindow()))
.withAllowedLateness(Duration.standardSeconds(10))
.discardingFiredPanes();
PCollection<KV<String, List<Data>>> keyToDataList = eData.apply("Add Event Timestamp", WithTimestamps.of(new EventTimestampFunction()))
.apply("Windowing", window)
.apply("Group by Key", GroupByKey.create())
.apply("Sort by date", ParDo.of(new SortDataFn()));
I'm not sure if I understand exactly what I've done wrong here. Is it because the data is arriving late that is throwing the error? As I understand, if the data arrives late past the allowed lateness, it should be discarded and not throw an error like the one I'm seeing.
Wondering if setting an unlimited timestampSkew will resolve this? The data that's late can be exempt from analysis, I just need to ensure that errors don't get thrown that will choke the pipeline. There's also nowhere else where I'm adding/ changing the timestamps for the data so I'm not sure why the errors are thrown.
It looks like your DoFn is using “outputWithTimestamp” and you are trying to set a timestamp which is older than the input element’s timestamp. Typically timestamps of output elements are derived from inputs, this is important to ensure the correctness of the watermark computation.
You may be able to workaround this by increasing both the timestamp skew and the windwing allowed lateness, however, some data may be lost, it is for you to determine if such loss is acceptable in your scenario.
Another alternative is not to use output with timestamp and instead use the PubSub message timestamp to process each message. Then, output each element as a KV, where the RealTimestamp is computed in the same way you are currently processing the timestamp (just don’t use it in “WithTimestamps”), GroupByKey and write the KV to Datastore.
Other questions you can ask yourself are:
Why are the input elements associated to a most recent timestamp than the output elements?
Do you really need to Buffer that much data before publishing to PubSub?
I have a dataset with potentially corrupted/malicious data. The data is timestamped. I'm rating the data with a heuristic function. After a period of time I know that all new data items coming with some IDs needs to be discarded and they represent a significant portion of data (up to 40%).
Right now I have two batch pipelines:
First one just runs the rating over the data.
The second one first filters out the corrupted data and runs the analysis.
I would like to switch from batch mode (say, running every day) into an online processing mode (hope to get a delay < 10 minutes).
The second pipeline uses a global window which makes processing easy. When the corrupted data key is detected, all other records are simply discarded (also using the discarded keys from previous days as a pre-filter is easy). Additionally it makes it easier to make decisions about the output data as during the processing all historic data for a given key is available.
The main question is: can I create a loop in a Dataflow DAG? Let's say I would like to accumulate quality-rates given to each session window I process and if the rate sum is over X, some a filter function in earlier stage of pipeline should filter out malicious keys.
I know about side input, I don't know if it can change during runtime.
I'm aware that DAG by definition cannot have cycle, but how achieve same result without it?
Idea that comes to my mind is to use side output to mark ID as malicious and make fake unbounded output/input. The output would dump the data to some storage and the input would load it every hour and stream so it can be joined.
Side inputs in the Beam programming model are windowed.
So you were on the right path: it seems reasonable to have a pipeline structured as two parts: 1) computing a detection model for the malicious data, and 2) taking the model as a side input and the data as a main input, and filtering the data according to the model. This second part of the pipeline will get the model for the matching window, which seems to be exactly what you want.
In fact, this is one of the main examples in the Millwheel paper (page 2), upon which Dataflow's streaming runner is based.
I am looking for a method that redraws all the features stored in a layer (equivalent to method "redraw" with OL2)
the method "changed" of class ol.layer.Vector "refreshes" only the features visible on a map (for instance in the zoomed part)
and thus doesn't impact the features outside
the treatment applied to those data is to delete periodically old features
how can I achieve this ?
another question is how to be notified of the end of this specific deletion ?
thanks in advance
Jean-Marie
first thanks for your answers
my question requires effectively more information :
the browser client receives points through a real time websocket connection
every second, an array of new features collected from those points is added into the Vector layer in this way :
vectorLayer.getSource().addFeatures(features);
the duration of the source buffer is, for instance one hour, and to manage a temporal sliding window of one hour, old features are removed every minute
map.once('postrender',removeOldFeatures);
vectorLayer.changed(); or map.renderSync();
this removal is only correctly done for visible features
But as soon as some features are not visible due, for instance, to a zoom on a portion of the map where those features are not displayed, then the removal treatment (removeOldFeatures) is not executed for those features whatever the method used (vectorLayer.changed() or map.render())
as a consequence the number of features doesn't stop increasing...
Jean-Marie
I had the same problem with a TileVector Source and format GeoJSON. At the end i use the provided TileUrlFunction and to redraw the layer, i just set the Source again with the layer.setSource(yourdefinedSource) method. Dube is right. Most of the time (if the source is updated to often) it is useful to send a unique param (like unix timestamp) as a cachebuster.
I am using the NWS REST API as my weather service for an app I am making. I was initially reluctant to use NWS because of its bad documentation, but I couldn't resist as it is offered completely free.
Now that I am trying to use it, I am running into some difficulty. When making a request for multiple days, the minimum temperature appears nil for several days.
(EDIT: As I have been testing the API more I have found that it is not always the minimum temperatures that are nil. It can be a max temp or a precipitation, it seems completely random. If you would like to make test calls using their web interface, you can do so here: http://graphical.weather.gov/xml/sample_products/browser_interface/ndfdBrowserByDay.htm
and here: http://graphical.weather.gov/xml/sample_products/browser_interface/ndfdXML.htm)
Here is an example of a request the minimum temperatures are empty: http://graphical.weather.gov/xml/sample_products/browser_interface/ndfdBrowserClientByDay.php?listLatLon=40.863235,-73.714780&format=24%20hourly&numDays=7
Surprisingly, on their website, the minimum temperatures are available:
http://forecast.weather.gov/MapClick.php?textField1=40.83&textField2=-73.70
You'll see under the Minimum temperatures that it is filled with about 5 (sometimes less, it is inconsistent) blank fields that say <value xsi:nil="true"/>
If anybody can help me it would be greatly appreciated, using the NWS API can be a little overwhelming at times.
Thanks,
The nil values, from what I can understand of the documentation, here and here, simply indicate that the data is unavailable.
Without making assumptions about NOAA's data architecture, it's conceivable that the information available via the API may differ from what their website displays.
Missing values are represented by an empty element and xsi:nil=”true” (R2.2.1).
Nil values being returned seems to involve the time period. Notice the difference between the time-layout keys (see section 5.3.2) in 1 in these requests:
k-p24h-n7-1
k-p24h-n6-1
The data times are different.
<layout-key> element
The key is derived using the following convention:
“k” stands for key.
“p24h” implies a data period length of 24 hours.
“n7” means that the number of data times is 7.
“1” is a sequential number used to keep the layout keys unique.
Here, startDate is the factor. Leaving it off includes more time and might account for some requested data not yet being available.
Per documentation:
The beginning day for which you want NDFD data. If the string is empty, the start date is assumed to be the earliest available day in the database. This input is only needed if one wants to shorten the time window data is to be retrieved for (less than entire 7 days worth), e.g. if user wants data for days 2-5.
I'm not experiencing the randomness you mention. The folks on NOAA's Yahoo! Groups forum might be able to tell you more.