AnyLogic Histogram of Options in OptionsList - histogram

In my AnyLogic model there is a source where the parameter agent.type is one of Options from an OptionList called Types.
I want to create a Histogram that shows how many agents there are with each of the different possible Types.
I can do it by setting up a variable for each Type that increments the count() using a longwinded function, but I would prefer to use a dataset or histogram_data optionsHistogram using the OptionsList as the Horizontal axis value, and the count of the number of agents with that type as the Vertical axis value.
Is this possible, and what would you recommend as the best way to achieve this?
Thanks

A histogram is used to plot the spread of one type of data.
If you want to plot the number of agents by type (and that is defined by an OptionList), you should use simple bar chart and use a statistic on your agent population as below:
You can then plot it in a bar chart using this setup:
PS: There is a lot of info about how those agent-pop statistics work in the help, worth a read.

Related

Problems plotting time-series interactively with Altair

Description of the problem
My goal is quite basic: to plot time series in an interactive plot. After some research I decided to give a try to Altair.
There are already QGIS plugins for time-series visualisation, but as far as I'm aware, none for plotting time-series at vector-level, interactively clicking on a map and selecting a Polygon. So that's why I decided to go for a self-made solution using Altair, maybe combining it with Folium to add functionalities later on.
I'm totally new to the Altair library (as well as Vega and Vega-lite), and quite new in datascience and data visualisation as well... so apologies in advance for my ignorance!
There are already well explained tutorials on how to plot time series with Altair (for example here, or in the official website). However, my study case has some particularities that, as far as I've seen, have not yet been approached altogether.
The data is produced using the Python API for Google Earth Engine and preprocessed with Python and the pandas/geopandas libraries:
In Google Earth Engine, a vegetation index (NDVI in the current case) is computed at pixel-level for a certain region of interest (ROI). Then the function image.reduceRegions() is mapped across the ImageCollection to compute the mean of the ndvi in every polygon of a FeatureCollection element, which represent agricultural parcels. The resulting vector file is exported.
Under a Jupyter-lab environment, the data is loaded into a geopandas GeoDataFrame object and preprocessed, transposing the DataFrame and creating a datetime column, among others, in order to have the data well-shaped for time-series representation with Altair.
Data overview after preprocessing:
My "final" goal would be to show, in the same graphic, an interactive line plot with a set of lines representing each one an agricultural parcel, with parcels categorized by crop types in different colours, e.g. corn in green, wheat in yellow, peer trees in brown... (the information containing the crop type of each parcel can be added to the DataFrame making a join with another DataFrame).
I am thinking of something looking more or less like the following example, with legend's years being the parcels coloured by crop types:
But so far I haven't managed to make my data look this way... at all.
As you can see there are many nulls in the data (this is due to the application of a cloud masking function and to the fact that there are several Sentinel-2 orbits intersecting the ROI). I would like to just omit the non-null values for earch column/parcel, but I don't know if this data configuration can pose problems (any advice on that?).
So far I got:
The generation of the preceding graphic, for a single parcel, takes already around 23 seconds. Which is something maybe shoud/cloud be improved (how?)
And more importantly, the expected line representing the item/polygon/parcel's values (NDVI) is not even shown in the plot (note that I chose a parcel containing rather few non-null values).
For sure I am doing many things wrong. Would be great to get some advice to solve (some of) them.
Sample of the data and code to reproduce the issue
Here's a text sample of the data in JSON format, and the code used to reproduce the issue is the following:
import pandas as pd
import geopandas as gpd
import altair as alt
df= pd.read_json(r"path\to\json\file.json")
df['date']= pd.to_datetime(df['date'])
print(gdf.dtypes)
df
Output:
lines=alt.Chart(df).mark_line().encode(
x='date:O',
y='17811:Q',
color=alt.Color(
'17811:Q', scale=alt.Scale(scheme='redyellowgreen', domain=(-1, 1)))
)
lines.properties(width=700, height=600).interactive()
Output:
Thanks in advance for your help!
If I understand correctly, it is mostly the format of your dataframe that needs to be changed from wide to long, which you can do either via .melt in pandas or .transform_fold in Altair. With melt, the default names are 'variable' (the previous columns name) and 'value' (the value for each column) for the melted columns:
alt.Chart(df.melt(id_vars='date'), width=500).mark_line().encode(
x='date:T',
y='value',
color=alt.Color('variable')
)
The gaps comes from the NaNs; if you want Altair to interpolate missing values, you can drop the NaNs:
alt.Chart(df.melt(id_vars='date').dropna(), width=500).mark_line().encode(
x='date:T',
y='value',
color=alt.Color('variable')
)
If you want to do it all in Altair, the following is equivalent to the last pandas example above (the transform uses 'key' instead of 'variable' as the name for the former columns). I also use and ordinal instead of nominal type for the color encoding to show how to make the colors more similar to your example.:
alt.Chart(df, width=500).mark_line().encode(
x='date:T',
y='value:Q',
color=alt.Color('key:O')
).transform_fold(
df.drop(columns='date').columns.tolist()
).transform_filter(
'isValid(datum.value)'
)

How to merge zero values (vector(0) with metric values in PromQL

I'm using flexlm_exporter to export my license usage to Prometheus and from Prometheus to custom service (Not Graphana).
As you know Prometheus hides missing values.
However, I need those missing values in my metric values, therefore I added to my prom query or vector(0)
For example:
flexlm_feature_used_users{app="vendor_lic-server01",name="Temp"} or vector(0)
This query adds a empty metric with zero values.
My question is if there's a way to merge the zero vector with each metric values?
Edit:
I need grouping, at least for a user and name labels, so vector(0) is probably not the best option here?
I tried multiple solutions in different StackOverflow threads, however, nothing works.
Please assist.
It would help if you used Absent with labels to convert the value from 1 to zero, use clamp_max
( Metrics{label=“a”} OR clamp_max(absent(notExists{label=“a”}),0))
+
( Metrics2{label=“a”} OR clamp_max(absent(notExists{label=“a”}),0)
Vector(0) has no label.
clamp_max(Absent(notExists{label=“a”},0) is 0 with label.
If you do sum(flexlm_feature_used_users{app="vendor_lic-server01",name="Temp"} or vector(0)) you should get what you're looking for, but you'll lose possibility to do group by, since vector(0) doesn't have any labels.
I needed a similar thing, and ended up flattening the options. What worked for me was something like:
(sum by xyz(flexlm_feature_used_users{app="vendor_lic-server01",name="Temp1"} + sum by xyz(flexlm_feature_used_users{app="vendor_lic-server01",name="Temp2"}) or
sum by xyz(flexlm_feature_used_users{app="vendor_lic-server01",name="Temp1"} or
sum by xyz(flexlm_feature_used_users{app="vendor_lic-server01",name="Temp2"}
There is no an easy generic way to fill gaps in returned time series with zeroes in Prometheus. But this can be easily done via default operator in VictoriaMetrics:
flexlm_feature_used_users{app="vendor_lic-server01",name="Temp"} default 0
The q default N fills gaps with the given default value N per each time series returned from q. See more details in MetricsQL docs.

Is it possible to split a dataset in Google Dataprep? If so, how?

I've been looking into Google Dataprep as an ETL solution to perform some basic data transformation before feeding it to a machine learning platform. I'm wondering if it's possible to use the Dataprep/Dataflow tools to split a dataset into train, test, and validation sets. Ideally I'm looking to do a stratified split on a target column, but for starters I'd settle for a simple uniform random split by percent of whole (e.g. 50% train, 30% validation, 20% test).
So far I haven't been able to find anything about whether this is even possible with Dataprep, so I'm wondering if anyone knows definitively if this is possible and, if so, how to accomplish it.
EDIT 1
Thanks #jakub-janoštík for getting me going in the right direction! I modified your answer slightly and came up with the following (in wrangle form):
case condition: customConditions cases: [false,0] default: rand() as: 'split_condition'
case condition: customConditions cases: [split_condition < 0.6,'train'],[split_condition >= 0.8,'test'] default: 'validation' as: 'dataset_type'
drop col: split_condition action: Drop
By assigning random values in a separate step, I got the guaranteed percentage split I was looking for. The flow ended up looking like this:
Image: final flow diagram with dataset splitting
EDIT 2
I just figured out how to do the stratified split too, so I thought I'd add it in case anyone else is trying to do this. Here's the rough steps:
Split your dataset based on whatever subpopulations you're targeting (e.g. target0, target1)
For each subpopulation, do the uniform random split described above (e.g. now you have target0-train, target0-test, target0-validation, target1-train, etc.)
For each set type (i.e. train, test, validation):
Create a new recipe from one of the sets
Edit the recipe, and use the Union transform to merge it with other datasets of the same type (e.g. target0-train union with target1-train). The union button is in the middle of the toolbar on the Edit Recipe page.
I hope that's helpful to someone!
I'm looking at the same problem and I was able to partially solve this using "case on custom condition" and "Random" functions. What I do is that I create new column named target and apply following logic:
After applying this you'll have new column with these 3 new labels and you can generate 3 new datasets by applying row filtering rules based on those values. Thing to keep in mind is that each time you'll run the job you'll get different validation set. So if you want to keep it fixed you need to use the dataset created in first run as input for future runs (and randomise only train and test sets).
If you need more control on the distribution of labels in your datasets there is ROWNUMBER window function that could potentially be used. But I haven't been able to make it work yet.

can SPSS regard ordinal measures as producing continuous data?

In SPSS, when defining the measure of a variable, the usual options are "Scale", "Ordinal", and "Nominal" (see image).
However, when using actual dialog boxes to do analyses, SPSS will often ask us to describe whether the data are "Continuous" or "Categorical". E.g., I was watching this video by James Gaskin (a great YouTube teacher by the way), and saw this dialog box (image below).
My Question: In the second image, you can see that the narrator put some "Ordinal" variables in the "Continuous" box. Is it okay to do that? How come?
For most procedures, the treatment of a variable is determined by how you use it. The measurement level is just a reminder, so you can treat a variable however it makes sense.
There are some procedures that automatically determine how to treat a variable based on the measurement level, including CTABLES, the Chart Builder, and TREE, but you can change the level temporarily in the dialog box or in syntax or change it persistently via VARIABLE LEVEL or in the Data Editor. Also, most of the statistical extension commands use the declared measurement level to determine whether a variable is continuous or a factor.

Add data series to highlight cases on a box plot (Excel, SPSS or R)

first time user of this forum - guidance on how to provide enough information is very appreciated. I am trying to replicate the presentation of data used in the Medical education field. This will help improve the quality of examiners' marking of trainees in a Clinical Exam. What I would like to communicate will be similar to what is already communicated in the College of General Practitioners regarding one of their own exams, please see www.gp10.com.au/slides/thursday/slide29.pdf to help understand what it is I want to present. I have access to Excel, SPSS and R, so any help with any of these would be great. However as a first attempt I have used SPSS and created 3 variables: dummy variable, a "station score" and a "global rating score"(GRS). The "station score"(ST) is a value between 0 and 10 (non-integers) and is on the y-axis similar to the pdf presentation of "Candidate Final Marks". The x-axis is the "global rating scale", an integer from 1 to 6 and is represented in the pdf as the "Overall Performance Scale". When I use SPSS's boxplot I get a boxplot as depicted.
.
What I would like to do is overlay a single examiners own scoring of X number of examinees. So for one examiner (examiner A) provided the following marks:
ST: 5.53,7.38,7.38,7.44,6.81
GRS: 3,4,4,5,3
(this is transposed into two columns).
Whether it be SPSS, Excel or R how would I be able to overlay the box and whisker plots with the individual data points provided by the one examiner? This would help show the degree to which the examiners' marking styles are in concordance with the expected distribution of ST scores across GRS. Any help greatly appreciated! I like Excel graphics but I have found it very difficult to work with when choosing the examiners' data as a separate series - somehow the examiners' GRS scores do not line up nicely on the x-axis. I am very new to R but am also very interested in R, and would expend time to get a good result in R if a good result is viable. I understand JMP may be preferable for this type of thing but access to this may not be possible.

Resources