ClearML get max value from logged values - devops

I use ClearML to track my tensorboard logs (from PyTorch Lightning) during training.
At a point later I start another script which connects to existing task and do some testing.
But unfortenautly I do not have all information in the second script, so I want to query them from the logged values from ClearML server.
How would I do this?
I thought about something like this, but havn't found anything in documentation:
task = Task.init(project_name="Project", task_name="name", reuse_last_task_id="Task_id, continue_last_task=True)
x_value, y_value = task.get_value(key="val/acc", mode="max")
x_value2, y_value2 = task.get_value(key="epoch", mode="x", x=x_value)
x_value would be my epoch or global step
y_value the maximum value of plot "val/acc"
x_value2 would be my epoch or global step
y_value2 the value of plot "epoch" at x_value

Disclaimer I'm part of the ClearML (formerly Trains) team.
To get an existing Task object for a running (or completed/failed) experiment, assuming we know Task ID:
another_task = Task.get_task(task_id='aabbcc')
If we only know the Task project/name
another_task = Task.get_task(project_name='the project', task_name='the name')
Notice that if you have multiple task under the same name it will return the most updated one.
Once we have the Task object, we can do:
latest_scalar_values_dict = another_task.get_last_scalar_metrics()
Which would return all the scalars min/maxm/last values, for example:
latest_scalar_values_dict = {
'title': {
'series': {
'last': 0.5,
'min': 0.1,
'max': 0.9
}
}
}
documentation here
If you need to get the entire graphs you can use task.get_reported_scalars() see docs

Related

tensorflow federated takes long time to start training

I'm facing a little bit annoying problem. The tensorflow-federated training (initialize and next) takes a long time to start (I'm not talking about time to finish, it's just starting time takes a while).
I doubt that this is due to either using: 1) with eager_mode():,
or 2) use shuffling, as below:
with eager_mode():
def preprocess(new_dataset):
def map_fn(elem):
return collections.OrderedDict([('x', tf.reshape(elem['In'], [-1])),('y', tf.reshape(elem['Out'],[1]))])
DS2= new_dataset.map(map_fn)
if Use_shuffle:
return DS2.repeat(SNN_epoch).shuffle(shuffle_buffer).batch(SNN_batch_size)
else:
return DS2.repeat(SNN_epoch).batch(SNN_batch_size)
...
...
...
This is what I do:
trainer_Itr_Process = tff.learning.build_federated_averaging_process(model_fn_Federated,server_optimizer_fn=(lambda : tf.keras.optimizers.SGD(learning_rate=learn_rate)),client_weight_fn=None)
FLstate = trainer_Itr_Process.initialize()
# Track loss of different ...... of federated iteration
for round_num in range(Fed_iter_min,Fed_iter_max):
FLstate, FLoutputs = trainer_Itr_Process.next(FLstate, federated_train_data)
......
......
......
This is the warning I'm getting:
W0616 11:30:00.217065 139843447875392 deprecation_wrapper.py:118] From /usr/local/lib/python3.6/dist-packages/tensorflow_estimator/python/estimator/api/_v1/estimator/__init__.py:10: The name tf.estimator.inputs is deprecated. Please use tf.compat.v1.estimator.inputs instead.
W0616 11:30:02.400945 139843447875392 deprecation_wrapper.py:118] From /usr/local/lib/python3.6/dist-packages/tensorflow_federated/python/core/impl/tensorflow_serialization.py:296: The name tf.initializers.variables is deprecated. Please use tf.compat.v1.initializers.variables instead.

Breaking down statistics at API level or Flow level in K6?

I was taking a quick look to K6 from loadimpact.
The graphs that I got so far show TPS, Response Time, Error rates at the global level and that is not too useful.
When I load test, I rather have those stats at the global level, but also at the flow level or at the APi level. This way if for example if I see some high latency I can tell right away if is caused by a single API or if all APIs are slow.
Or I can tell of a given API is giving say HTTP/500 or several different APIs are.
Can K6 show stats like TPS, Response Time, HTTP status at the API level, the flow level and global level?
Thanks
Yes, it can, and you have 3 options here in terms of result presentation (all involve using custom metrics to some degree):
End of test summary printed to stdout.
You output the result data to InfluxDB+Grafana.
You output the result data to Load Impact Insights.
Global stats you get with all three above, and per API endpoint stats you get out of the box with 2) and 3), but to get stats at the flow level you'd need to create custom metrics which works with all three options above. So something like this:
import http from "k6/http";
import { Trend, Rate } from "k6/metrics";
import { group, sleep } from "k6";
export let options = {
stages: [
{ target: 10, duration: "30s" }
]
};
var flow1RespTime = new Trend("flow_1_resp_time");
var flow1TPS = new Rate("flow_1_tps");
var flow1FailureRate = new Rate("flow_1_failure_rate");
export default function() {
group("Flow 1", function() {
let res = http.get("https://test.loadimpact.com/");
flow1RespTime.add(res.timings.duration);
flow1TPS.add(1);
flow1FailureRate.add(res.status == 0 || res.status > 399);
});
sleep(3);
};
This would expand the end of test summary stats printed to stdout to include the custom metrics:

New Relic alert when application stop

I have an application deployed on PCF and have a new relic service binded to it. In new relic I want to get an alert when my application is stopped . I don't know whether it is possible or not. If it is possible can someone tell me how?
Edit: I don't have access to New Relic Infrastructure
Although an 'app not reporting' alert condition is not built into New Relic Alerts, it's possible to rig one using NRQL alerts. Here are the steps:
Go to New Relic Alerts and begin creating a NRQL alert condition:
NRQL alert conditions
Query your app with:
SELECT count(*) FROM Transaction WHERE appName = 'foo'
Set your threshold to :
Static
sum of query results is below x
at least once in y minutes
The query runs once per minute. If the app stops reporting then count will turn the null values into 0 and then we sum them. When the number goes below whatever your threshold is then you get a notification. I recommend using the preview graph to determine how low you want your transactions to get before receiving a notification. Here's some good information:
Relic Solution: NRQL alerting with “sum of the query results”
Basically you need to create a NewRelic Alert with conditions that check if application available, Specially you can use Host not reporting alert condition
The Host not reporting event triggers when data from the Infrastructure agent does not reach the New Relic collector within the time frame you specify.
You could do something like this:
// ...
aggregation_method = "cadence" // Use cadence for process monitoring otherwise it might not alert
// ...
nrql {
// Limitation: only works for processes with ONE instance; otherwise use just uniqueCount() and set a LoS (loss of signal)
query = "SELECT filter(uniqueCount(hostname), WHERE processDisplayName LIKE 'cdpmgr') OR -1 FROM ProcessSample WHERE GENERIC_CONDITIONS FACET hostname, entityGuid as 'entity.guid'"
}
critical {
operator = "below"
threshold = 0
threshold_duration = 5*60
threshold_occurrences = "ALL"
}
Previous solution - turned out it is not that robust:
// ...
critical {
operator = "below"
threshold = 0.0001
threshold_duration = 600
threshold_occurrences = "ALL"
}
nrql {
query = "SELECT percentage(uniqueCount(entityAndPid), WHERE commandLine LIKE 'yourExecutable.exe') FROM ProcessSample FACET hostname"
}
This will calculate the fraction your process has against all other processes.
If the process is not running the percentage will turn to 0. If you have a system running a vast amount of processes it could fall below 0.0001 but this is very unprobable.
The advantage here is that you can still have an active alert even if the process slips out of your current time alert window after it stopped. Like this you prevent the alert from auto-recovering (compared to just filtering with WHERE).

Gatling: Can ramping up of individual scenarios be done just like the users?

Consider an example of testing API's with Gatling. For some weird requirement i had to get a scenario for each user
var scenarioList // This is of type mutable list
I have plenty of scenarios added to this list as my request body should differ for each user or the request won't be processed.This individual scenarios have following gatling simulation configured at this moment
Ex: scenarioList += scenario1. inject(rampUsers(1) over (1 minutes)
scenarioList += scenario2. inject(rampUsers(1) over (1 minutes)
scenarioList += scenario3. inject(rampUsers(1) over (1 minutes)
.
.
.
so on
Now in the global setup as below while calling all these scenarios
setUp(scenarioList: _*).assertions(
forAll.successfulRequests.percent.gte(90)
)
Suppose i have 1000 users (scenarioList size is 1000), The problem here would be all of the 1000 users would start at the same time but i want to ramp up these users. So the question comes of ramping up the scenarios instead of running them parallely.
Is this possible ? If not is there any other approach to follow ?
I can't have the luxury of running the same scenario with multiple users as the body of the requests change. Please let me know.
I was able to solve this problem by using feeders within the scenario so i don't need to create multiple scenarios.
With feeders Gatling provides option to parameterize your request body of any http request.
Code Example:
var randomSession = Iterator.continually(Map("randsession" -> ( req.replace("0000000000", randomStringGenerator.randomString(10)))))
val httpConf = http
.baseURL("http://localhost:5000")
.acceptHeader("text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8")
.userAgentHeader("Mozilla/4.0(compatible;IE;GACv10. 0. 0. 1)")
val scn = scenario("Activate")
.feed(randomSession)
.exec(http("activate request")
.post("/login/activate")
.body(StringBody("""${randsession}"""))
.check(status.is(200)))
.pause(5)
setUp(
scn.inject(atOnceUsers(5))
).protocols(httpConf)
}

zipline backtesting using non-US (European) intraday data

I'm trying to get zipline working with non-US, intraday data, that I've loaded into a pandas DataFrame:
BARC HSBA LLOY STAN
Date
2014-07-01 08:30:00 321.250 894.55 112.105 1777.25
2014-07-01 08:32:00 321.150 894.70 112.095 1777.00
2014-07-01 08:34:00 321.075 894.80 112.140 1776.50
2014-07-01 08:36:00 321.725 894.80 112.255 1777.00
2014-07-01 08:38:00 321.675 894.70 112.290 1777.00
I've followed moving-averages tutorial here, replacing "AAPL" with my own symbol code, and the historical calls with "1m" data instead of "1d".
Then I do the final call using algo_obj.run(DataFrameSource(mydf)), where mydf is the dataframe above.
However there are all sorts of problems arising related to TradingEnvironment. According to the source code:
# This module maintains a global variable, environment, which is
# subsequently referenced directly by zipline financial
# components. To set the environment, you can set the property on
# the module directly:
# from zipline.finance import trading
# trading.environment = TradingEnvironment()
#
# or if you want to switch the environment for a limited context
# you can use a TradingEnvironment in a with clause:
# lse = TradingEnvironment(bm_index="^FTSE", exchange_tz="Europe/London")
# with lse:
# the code here will have lse as the global trading.environment
# algo.run(start, end)
However, using the context doesn't seem to fully work. I still get errors, for example stating that my timestamps are before the market open (and indeed, looking at trading.environment.open_and_close the times are for the US market.
My question is, has anybody managed to use zipline with non-US, intra-day data? Could you point me to a resource and ideally example code on how to do this?
n.b. I've seen there are some tests on github that seem related to the trading calendars (tradincalendar_lse.py, tradingcalendar_tse.py , etc) - but this appears to only handle data at the daily level. I would need to fix:
open/close times
reference data for the benchmark
and probably more ...
I've got this working after fiddling around with the tutorial notebook. Code sample below. It's using the DF mid, as described in the original question. A few points bear mentioning:
Trading Calendar I create one manually and assign to trading.environment, by using non_working_days in tradingcalendar_lse.py. Alternatively you could create one that fits your data exactly (however could be a problem for out-of-sample data). There are two fields that you need to define: trading_days and open_and_closes.
sim_params There is a problem with the default start/end values because they aren't timezone aware. So you must create a sim_params object and pass start/end parameters with a timezone.
Also, run() must be called with the argument overwrite_sim_params=False as calculate_first_open/close raise timestamp errors.
I should mention that it's also possible to pass pandas Panel data, with fields open,high,low,close,price and volume in the minor_axis. But in this case, the former fields are mandatory - otherwise errors are raised.
Note that this code only produces a daily summary of the performance. I'm sure there must be a way to get the result at a minute resolution (I thought this was set by emission_rate, but apparently it's not). If anybody knows please comment and I'll update the code.
Also, not sure what the api call is to call 'analyze' (i.e. when using %%zipline magic in IPython, as in the tutorial, the analyze() method gets automatically called. How do I do this manually?)
import pytz
from datetime import datetime
from zipline.algorithm import TradingAlgorithm
from zipline.utils import tradingcalendar
from zipline.utils import tradingcalendar_lse
from zipline.finance.trading import TradingEnvironment
from zipline.api import order_target, record, symbol, history, add_history
from zipline.finance import trading
def initialize(context):
# Register 2 histories that track daily prices,
# one with a 100 window and one with a 300 day window
add_history(10, '1m', 'price')
add_history(30, '1m', 'price')
context.i = 0
def handle_data(context, data):
# Skip first 30 mins to get full windows
context.i += 1
if context.i < 30:
return
# Compute averages
# history() has to be called with the same params
# from above and returns a pandas dataframe.
short_mavg = history(10, '1m', 'price').mean()
long_mavg = history(30, '1m', 'price').mean()
sym = symbol('BARC')
# Trading logic
if short_mavg[sym] > long_mavg[sym]:
# order_target orders as many shares as needed to
# achieve the desired number of shares.
order_target(sym, 100)
elif short_mavg[sym] < long_mavg[sym]:
order_target(sym, 0)
# Save values for later inspection
record(BARC=data[sym].price,
short_mavg=short_mavg[sym],
long_mavg=long_mavg[sym])
def analyze(context,perf) :
perf["pnl"].plot(title="Strategy P&L")
# Create algorithm object passing in initialize and
# handle_data functions
# This is needed to handle the correct calendar. Assume that market data has the right index for tradeable days.
# Passing in env_trading_calendar=tradingcalendar_lse doesn't appear to work, as it doesn't implement open_and_closes
from zipline.utils import tradingcalendar_lse
trading.environment = TradingEnvironment(bm_symbol='^FTSE', exchange_tz='Europe/London')
#trading.environment.trading_days = mid.index.normalize().unique()
trading.environment.trading_days = pd.date_range(start=mid.index.normalize()[0],
end=mid.index.normalize()[-1],
freq=pd.tseries.offsets.CDay(holidays=tradingcalendar_lse.non_trading_days))
trading.environment.open_and_closes = pd.DataFrame(index=trading.environment.trading_days,columns=["market_open","market_close"])
trading.environment.open_and_closes.market_open = (trading.environment.open_and_closes.index + pd.to_timedelta(60*7,unit="T")).to_pydatetime()
trading.environment.open_and_closes.market_close = (trading.environment.open_and_closes.index + pd.to_timedelta(60*15+30,unit="T")).to_pydatetime()
from zipline.utils.factory import create_simulation_parameters
sim_params = create_simulation_parameters(
start = pd.to_datetime("2014-07-01 08:30:00").tz_localize("Europe/London").tz_convert("UTC"), #Bug in code doesn't set tz if these are not specified (finance/trading.py:SimulationParameters.calculate_first_open[close])
end = pd.to_datetime("2014-07-24 16:30:00").tz_localize("Europe/London").tz_convert("UTC"),
data_frequency = "minute",
emission_rate = "minute",
sids = ["BARC"])
algo_obj = TradingAlgorithm(initialize=initialize,
handle_data=handle_data,
sim_params=sim_params)
# Run algorithm
perf_manual = algo_obj.run(mid,overwrite_sim_params=False) # overwrite == True calls calculate_first_open[close] (see above)
#Luciano
You can add analyze(None, perf_manual)at the end of your code for automatically running the analyze process.

Resources