access logged values during runtime - machine-learning

How can I retrieve a logged value from wandb before the run was finished?
import os
import wandb
wandb.init(project='someproject')
def loss_a():
# do_stuff and log:
wandb.log({"loss_a": 1.0})
def loss_b():
# do_stuff and log:
wandb.log({"loss_b": 2.0})
for epoch in range(2):
loss_a()
loss_b()
# somehow retrieve loss_a and loss_b and print them here:
print(f'loss_a={??}, loss_b={??}')
After run was finished I can find it with wandb.Api to get run.history. But it seems that before run was fininshed, accessing run.history doesn't work.

You can retrieve a logged value from wandb before the run was finished by using wandb.run.summary. It's a dictionary that holds the last value being logged for a specific key name. Check out this link.
You can also check out this colab notebook to try it out yourself.
import os
import wandb
wandb.init(project='someproject')
def loss_a():
# do_stuff and log:
wandb.log({"loss_a": 1.0})
def loss_b():
# do_stuff and log:
wandb.log({"loss_b": 2.0})
for epoch in range(2):
loss_a()
loss_b()
# somehow retrieve loss_a and loss_b and print them here:
print(f'loss_a={wandb.run.summary["loss_a"]}, loss_b={wandb.run.summary["loss_b"]}')

Looking into the source code, run.history will only hold the data momentarily in history._data. As you can see on this particular line, the dictionnary containing the logs is cleared just after the callback is called (I am guessing this corresponds to sending the data to the server via http).
As a workaround, you could return the losses from loss_a and loss_b and log them at the same time. This actually has the benefit of logging both losses on the same timestep:
def loss_a():
# do_stuff and log:
return 1.0
def loss_b():
# do_stuff and log:
return 2.0
for epoch in range(2):
a = loss_a()
b = loss_b()
print(f'loss_a={a}, loss_b={b}')
wandb.log({"loss_a": a, "loss_b": b})
This is also a good way to decouple your code from wandb!

Related

Biopython : how to extract only relevant atom and save a pdb file (not locally)?

Using Biopython. I have a list of atoms. rep_atoms = [CA, CB, CD3] (Carbon atoms).
I want to save only these from any given PDB file. I don't want to save it locally; I want it to save in the memory (Lots of iteration).
I have arrived at the code below, but it saves the file locally and is very slow.
So, my goal is from each atom in PDB, if it is present in rep_atoms. Make a new_pdb store only that information so that when I call it later in my code, it should be a PDB file without getting saved in my computer in a local folder.
How do I append each atom? Printing all atoms is very fast. I want to append it, but it wouldn't be a PDB structure file. What should I do?
from Bio.PDB import .... PDBIO, Select ....
class rep_atom_Select(Select):
def accept_atom(self, atom):
if atom.get_name() in rep_atoms:
return 1
else:
return 0
def rep_atoms_pdb(input_pdb):
io = PDBIO()
io.set_structure(input_pdb)
for model in input_pdb:
for chain in model:
for residue in chain:
for atom in residue:
if atom.get_name() in rep_atoms:
print(atom)
# dnr_only = io.save("dnr_only.pdb", rep_atom_Select())
Save after the loop, once, instead of thousands of times inside the loop.
def rep_atoms_pdb(input_pdb):
my_atoms = list()
for model in input_pdb:
for chain in model:
for residue in chain:
for atom in residue:
if atom.get_name() in rep_atoms: # or if rep_atom_Select().accept_atom(atom):
my_atoms.append(atom) # or something like this
# The function returns the list of extracted atoms
return my_atoms
Your definition of rep_atom_Select() does not seem to be directly compatible with this design, nor am I sure receiving the atoms as a list is actually what you want, but this should at least give you a nudge in the right direction.
Brief reading of the Bio.PDB.PDBIO documentation suggests that you might simply want to return the actual PDBIO object. I think something like this:
class rep_atom_Select(Select):
def accept_atom(self, atom):
if atom.get_name() in rep_atoms:
return 1
else:
return 0
def rep_atoms_pdb(input_pdb):
io = rep_atom_Select()
io.set_structure(input_pdb)
return io
This is based on a very cursory reading of the documentation, but at least demonstrates how you would use your overridden class to select only some of the atoms in the input_pdb structure.

tensorflow federated takes long time to start training

I'm facing a little bit annoying problem. The tensorflow-federated training (initialize and next) takes a long time to start (I'm not talking about time to finish, it's just starting time takes a while).
I doubt that this is due to either using: 1) with eager_mode():,
or 2) use shuffling, as below:
with eager_mode():
def preprocess(new_dataset):
def map_fn(elem):
return collections.OrderedDict([('x', tf.reshape(elem['In'], [-1])),('y', tf.reshape(elem['Out'],[1]))])
DS2= new_dataset.map(map_fn)
if Use_shuffle:
return DS2.repeat(SNN_epoch).shuffle(shuffle_buffer).batch(SNN_batch_size)
else:
return DS2.repeat(SNN_epoch).batch(SNN_batch_size)
...
...
...
This is what I do:
trainer_Itr_Process = tff.learning.build_federated_averaging_process(model_fn_Federated,server_optimizer_fn=(lambda : tf.keras.optimizers.SGD(learning_rate=learn_rate)),client_weight_fn=None)
FLstate = trainer_Itr_Process.initialize()
# Track loss of different ...... of federated iteration
for round_num in range(Fed_iter_min,Fed_iter_max):
FLstate, FLoutputs = trainer_Itr_Process.next(FLstate, federated_train_data)
......
......
......
This is the warning I'm getting:
W0616 11:30:00.217065 139843447875392 deprecation_wrapper.py:118] From /usr/local/lib/python3.6/dist-packages/tensorflow_estimator/python/estimator/api/_v1/estimator/__init__.py:10: The name tf.estimator.inputs is deprecated. Please use tf.compat.v1.estimator.inputs instead.
W0616 11:30:02.400945 139843447875392 deprecation_wrapper.py:118] From /usr/local/lib/python3.6/dist-packages/tensorflow_federated/python/core/impl/tensorflow_serialization.py:296: The name tf.initializers.variables is deprecated. Please use tf.compat.v1.initializers.variables instead.

zipline backtesting using non-US (European) intraday data

I'm trying to get zipline working with non-US, intraday data, that I've loaded into a pandas DataFrame:
BARC HSBA LLOY STAN
Date
2014-07-01 08:30:00 321.250 894.55 112.105 1777.25
2014-07-01 08:32:00 321.150 894.70 112.095 1777.00
2014-07-01 08:34:00 321.075 894.80 112.140 1776.50
2014-07-01 08:36:00 321.725 894.80 112.255 1777.00
2014-07-01 08:38:00 321.675 894.70 112.290 1777.00
I've followed moving-averages tutorial here, replacing "AAPL" with my own symbol code, and the historical calls with "1m" data instead of "1d".
Then I do the final call using algo_obj.run(DataFrameSource(mydf)), where mydf is the dataframe above.
However there are all sorts of problems arising related to TradingEnvironment. According to the source code:
# This module maintains a global variable, environment, which is
# subsequently referenced directly by zipline financial
# components. To set the environment, you can set the property on
# the module directly:
# from zipline.finance import trading
# trading.environment = TradingEnvironment()
#
# or if you want to switch the environment for a limited context
# you can use a TradingEnvironment in a with clause:
# lse = TradingEnvironment(bm_index="^FTSE", exchange_tz="Europe/London")
# with lse:
# the code here will have lse as the global trading.environment
# algo.run(start, end)
However, using the context doesn't seem to fully work. I still get errors, for example stating that my timestamps are before the market open (and indeed, looking at trading.environment.open_and_close the times are for the US market.
My question is, has anybody managed to use zipline with non-US, intra-day data? Could you point me to a resource and ideally example code on how to do this?
n.b. I've seen there are some tests on github that seem related to the trading calendars (tradincalendar_lse.py, tradingcalendar_tse.py , etc) - but this appears to only handle data at the daily level. I would need to fix:
open/close times
reference data for the benchmark
and probably more ...
I've got this working after fiddling around with the tutorial notebook. Code sample below. It's using the DF mid, as described in the original question. A few points bear mentioning:
Trading Calendar I create one manually and assign to trading.environment, by using non_working_days in tradingcalendar_lse.py. Alternatively you could create one that fits your data exactly (however could be a problem for out-of-sample data). There are two fields that you need to define: trading_days and open_and_closes.
sim_params There is a problem with the default start/end values because they aren't timezone aware. So you must create a sim_params object and pass start/end parameters with a timezone.
Also, run() must be called with the argument overwrite_sim_params=False as calculate_first_open/close raise timestamp errors.
I should mention that it's also possible to pass pandas Panel data, with fields open,high,low,close,price and volume in the minor_axis. But in this case, the former fields are mandatory - otherwise errors are raised.
Note that this code only produces a daily summary of the performance. I'm sure there must be a way to get the result at a minute resolution (I thought this was set by emission_rate, but apparently it's not). If anybody knows please comment and I'll update the code.
Also, not sure what the api call is to call 'analyze' (i.e. when using %%zipline magic in IPython, as in the tutorial, the analyze() method gets automatically called. How do I do this manually?)
import pytz
from datetime import datetime
from zipline.algorithm import TradingAlgorithm
from zipline.utils import tradingcalendar
from zipline.utils import tradingcalendar_lse
from zipline.finance.trading import TradingEnvironment
from zipline.api import order_target, record, symbol, history, add_history
from zipline.finance import trading
def initialize(context):
# Register 2 histories that track daily prices,
# one with a 100 window and one with a 300 day window
add_history(10, '1m', 'price')
add_history(30, '1m', 'price')
context.i = 0
def handle_data(context, data):
# Skip first 30 mins to get full windows
context.i += 1
if context.i < 30:
return
# Compute averages
# history() has to be called with the same params
# from above and returns a pandas dataframe.
short_mavg = history(10, '1m', 'price').mean()
long_mavg = history(30, '1m', 'price').mean()
sym = symbol('BARC')
# Trading logic
if short_mavg[sym] > long_mavg[sym]:
# order_target orders as many shares as needed to
# achieve the desired number of shares.
order_target(sym, 100)
elif short_mavg[sym] < long_mavg[sym]:
order_target(sym, 0)
# Save values for later inspection
record(BARC=data[sym].price,
short_mavg=short_mavg[sym],
long_mavg=long_mavg[sym])
def analyze(context,perf) :
perf["pnl"].plot(title="Strategy P&L")
# Create algorithm object passing in initialize and
# handle_data functions
# This is needed to handle the correct calendar. Assume that market data has the right index for tradeable days.
# Passing in env_trading_calendar=tradingcalendar_lse doesn't appear to work, as it doesn't implement open_and_closes
from zipline.utils import tradingcalendar_lse
trading.environment = TradingEnvironment(bm_symbol='^FTSE', exchange_tz='Europe/London')
#trading.environment.trading_days = mid.index.normalize().unique()
trading.environment.trading_days = pd.date_range(start=mid.index.normalize()[0],
end=mid.index.normalize()[-1],
freq=pd.tseries.offsets.CDay(holidays=tradingcalendar_lse.non_trading_days))
trading.environment.open_and_closes = pd.DataFrame(index=trading.environment.trading_days,columns=["market_open","market_close"])
trading.environment.open_and_closes.market_open = (trading.environment.open_and_closes.index + pd.to_timedelta(60*7,unit="T")).to_pydatetime()
trading.environment.open_and_closes.market_close = (trading.environment.open_and_closes.index + pd.to_timedelta(60*15+30,unit="T")).to_pydatetime()
from zipline.utils.factory import create_simulation_parameters
sim_params = create_simulation_parameters(
start = pd.to_datetime("2014-07-01 08:30:00").tz_localize("Europe/London").tz_convert("UTC"), #Bug in code doesn't set tz if these are not specified (finance/trading.py:SimulationParameters.calculate_first_open[close])
end = pd.to_datetime("2014-07-24 16:30:00").tz_localize("Europe/London").tz_convert("UTC"),
data_frequency = "minute",
emission_rate = "minute",
sids = ["BARC"])
algo_obj = TradingAlgorithm(initialize=initialize,
handle_data=handle_data,
sim_params=sim_params)
# Run algorithm
perf_manual = algo_obj.run(mid,overwrite_sim_params=False) # overwrite == True calls calculate_first_open[close] (see above)
#Luciano
You can add analyze(None, perf_manual)at the end of your code for automatically running the analyze process.

Rails 4 ActiveModel won't update_columns when tested with RSpec

In a normal test using human and browser, everything is work as expected. However, when I use rspec, I can see that I have:
D, [2014-08-16T13:48:09.510013 #19418] DEBUG -- : SQL (0.6ms) UPDATE "system_flights_cacheds" SET "client_stuff" = '{"captcha":"656556"}' WHERE "system_flights_cacheds"."guid" = '5647046e-4194-498e-a0d7-512614b147d8'
But I cannot believe that actually my database record is not updated. Previously I used .save, but with no success in fact it creates SAVEPOINT.
My code in trouble is basically an API endpoint:
cache = System::Flights::Cached.search_cache options
# update database, when the captcha is present. this way, the worker
# when updating the database can see the changes and act accordingly!
if cache && params[:captcha]
# remember, anyone can (basically) see the captcha. thus,
# this is a bit paranoid, only allow captcha update
# if the user is same! in the json, if not forgotten,
# captcha is only displayed when the user_id is equal
server_stuff = cache.server_stuff.with_indifferent_access
if server_stuff[:user_id] == current_user.id
cache.time_renewed = 10
cache.client_stuff_will_change!
cache.client_stuff ||= {}
cache.client_stuff[:captcha] = params[:captcha]
# cache.save!
cache.update_columns(client_stuff: cache.client_stuff)
end
else
# only spawn worker if there is no captcha parameter passed
spawn_search_worker({user_id: current_user.id, options: options})
end
The client can reach this anytime and it will span worker. When a new record is already in database but is_processed is false, the worker will quit. Thus, calling this multiple times will be ok as also be a means to check status if the work is done or not.
The worker will wait the client to enter for a captcha. So, we have class like WaitableLogin, that do basically:
max_repeat = 3 # 14
# annul flag, if set to true, the data will not get persisted.
annul = false
while max_repeat > 0
# interval of 5 secs that worker can check the database
sleep 5
max_repeat -= 1
# break if captcha already entered by client
# seek from the database if the client has posted
# the captcha text
cache = System::Flights::Cached.search_cache options
client_stuff = nil
client_stuff = cache.client_stuff.with_indifferent_access if cache && cache.client_stuff
if client_stuff && client_stuff[:captcha]
captcha_text = client_stuff[:captcha]
airline.fill_captcha(captcha_text).finalize_login
puts "SOMEHOW I AM HERE: #{captcha_text}"
# remove all server's stuff
cache.server_stuff_will_change!
cache.server_stuff.clear
cache.save!
annul = airline.in_login_page?
end
end
So, WaitableLogin will check if the client_stuff is updated. If it is, then we know that client has submitted the captcha (through the Endpoint, the worker will check if captcha is a param and will update the database if there's captcha field).
The control then transferred back to the Worker. You can see that there's a lot of code that use cache at many parts of the codes across files, cache is just variable name nothing to do with its semantic meaning in Rails or whatever.
When I run normally on browser, I don't see any problem. In fact, no SAVEPOINT even if I use .save. I thought, it is creating some bug somewhere with that SAVEPOINT so I decided to try using .update_columns. But, again, with no success.
This is what the test looks like
before(:each) do
System::Flights::Cached.delete_all
end
describe "requests" do
it "should process 2a1c1i" do
cached = nil
post("/api/v1/x.json", {
access_token: CommonFlightData::ACCESS_TOKEN,
business_token: CommonFlightData::BUSINESS_TOKEN,
captcha: ""
}.merge!(CommonFlightData.oneway_1a(from: "8-9-2014")))
puts "enter the captcha: "
captcha = STDIN.gets.chomp
puts "Entered: #{captcha}"
post("/api/v1/x.json", {
access_token: CommonFlightData::ACCESS_TOKEN,
business_token: CommonFlightData::BUSINESS_TOKEN,
captcha: captcha
}.merge!(CommonFlightData.oneway_1a(from: "8-9-2014")))
sleep 10
SO what am I missing at, I tired. No error raised. When I check .inspect after update_columns, it seems all is updated. But, when you see at the database, nothing is updated.
EDIT: I put lock_version so that I have optimistic locking (by default, I think). And turn out, as expected, it was set to 2.
EDIT 2: If I command an edit from a rails console at the time when the code asking for captcha, IT UPDATES the data. SO, why the RSpec spec that run the api endpoint to submit a captcha won't update the row. All real-life no spec-in-origin code is finely executed.

Disable rails class_caching mechanism for Time.now?

I'm currently fighting with the rails class_caching mechanism as I need to return a file path that changes discrete over time. It is used to constantly change a log file path after the amount of GRAIN seconds and returns a fully working timestamp:
GRAIN = 30
def self.file_path
timestamp = (Time.now.to_i / GRAIN) * GRAIN
return FILE_DIR + "tracking_#{timestamp.call}.csv"
end
This works really great if the class_caching of rails is set to false. But of course the app is to run with enabled class caching. And as soon as I enable it, either the timestamp variable is cached or the Time.now expression.
I tried to solve this with a proc block, but no success:
def self.file_path
timestamp = Proc.new { (Time.now.to_i / GRAIN) * GRAIN }
return FILE_DIR + "tracking_#{timestamp.call}.csv"
end
Is there anything like a cache disabled scope I could use or something like skip_class_caching :file_path? Or any other solutions?
Thank you for your help!
It's not entirely clear where your code is located, but ActiveRecord has an uncached method that suspends the cache for whatever is inside its block.
I found the problem. Not the Time.now was beeing cached but a logger instance. It was assigned in another method calling the file_path.
As long as the class caching was disabled the environment forgot about the class variable between the requests. But as soon as it was enabled the class variable stayed the same - and desired value - but never changed.
So I had to add a simple condition that checks if the file_path changed since the last request. If so, the class variable is reassigned, otherwise it keeps the same desired value.
I changed from:
def self.tracker
file_path = Tracking.file_path
##my_tracker ||= Logger.new(file_path)
end
to:
def self.tracker
file_path = Tracking.file_path
##my_tracker = Logger.new(file_path) if ##my_tracker.nil? or Tracking.shift_log?(file_path)
##my_tracker
end
Thank you for your help anyways!

Resources