How to determine which base to use when building a nuro_image? - bazel

As stated in the title, when I add a new push_image, how do I determine how to specify the base in BUILD file for the nuro_image target? I've seen several different ones in the code base such as
base = "//learning/utils:tf_base_image",
and
base = "#nuro_image//image",
what are the differences?

This depends on what dependencies does the binary/image need. If a binary needs Tensorflow as dependency then you want to use the tf_base_image.
The idea is similar to lowest common ancestor.
Imagine we are building binary A into an image. Say A has two dependencies B and C, and both B and C depends on Tensorflow.
If A uses tf_base_image, then when we build B and C, the dependent Tensorflow library is already included and can be reused. But if we A uses #nuro_image//image, then both B and C needs to fetch Tensorflow, thus we end up including the same library twice. As a result, the build becomes slower and we get a larger image.
For better understanding, the definition of tf_base_image is here:
https://gitent.corp.nuro.team/Nuro-ai/Nuro/blob/develop/learning/utils/BUILD#L84-L98

Related

Doors, creating link from one baseline to another baseline

I am using Doors 9.6.1 and DXL scripting.
Consider Module A and Module B, I would like to link objects in a baseline of Module A to a baseline of Module B.
I tried the following, consider this Scenario 1:
Baseline 1.0 of Module B is generated, and links are created from actual Module A to Baseline 1.0 of Module B, the outgoing links from actual Module A link to Baseline 1.0 of Module B and the outgoing links from Baseline 1.0 of Module B link to actual Module A.
I generate Baseline 1.0 of Module A, and the links remain the same, in the actual module I delete the objects, but then links from Baseline 1.0 of Module B are broken because they point to links in actual Module A, I was hoping that when generating Baseline 1.0 of Module A, the incoming links would be updated to link to the objects in Baseline 1.0 of Module A, rather than to the objects in actual Module A.
I also tried the following, consider this Scenario 2:
I generate Baseline 1.0 of Module A and Baseline 1.0 of Module B attempting to create the links afterwards, but it is not possible in Doors to create links between baselines.
Hopefully I managed to explain it properly, if something is not clear please let me know.
My question comes down to this, is it possible in Doors to generate a bidirectional link between Baseline 1.0 of Module A and Baseline 1.0 of Module B? And in that case, how is it possible to achieve this?
First of all, you should not work with bidirectional links in Doors. A typical scenario is that one Module contains high level requirements (let's say "SYS" for system requirements), this module is completed and baselined. Later, a module with low level requirements (let's say "HR" for hardware requirements) is created, each requirements in HR corresponds to a requirement in SYS, the relation might be called "refines". This module will probably be baselined much later.
If you have two modules A and B and both are developed at the same time and both link to one another, you might get problems later, e.g. when you want to create a valid requirements report, a dependency or coverage report.
Having said that and if you still want to go on with your approach, I think your solution is "Baseline Sets".
Project --> Properties --> Baseline Set Definitions
Create a new Baseline Set Definition (BSD) with a name that corresponds to why the modules belong together
In the new BSD there's an Edit Button where you add the modules that belong together (A and B in your example)
Now for the BSD, create a Baseline Set (BS). A BS consists of several baselines, 0 or 1 baselines for each module contained in the BSD. For the BS, define whether the contained Baselines shall have a new major or minor number and define a suffix. The BS will have a number and a suffix, the Baselines will also have the suffix and the next free major/minor number for their respective modules.
For the new BS, navigate to "Baselines", you will see all modules of the BSD and you will see for which module a Baseline has been created that corresponds to this specific BS
with "Add to Set" a new Baseline is created for the module(s) you choose
when Baselines have been created for all modules defined in the BSD (or when you choose not to include a module for a specific BS and press BS --> Close), you will see that the links in the baselines of the BS point to one another.

transfer learning practice for distinguishing cat and dog

i'm trying to practice transfer learning myself.
I'm trying to count the number of each cat and dog files (each 12500 pictures for cat and dog with the total of 25000 pictures).
Here is my code.Code
And here is my path for the picture folderenter image description here.
I thought this was a simple code, but still couldn't figure out why i keep getting (0,0) in my coding (supposed to be (12500 cat files,12500 dog files)):(.
Use os.path.join() inside glob.glob(). Also, if all your images are of a particular extension (say, jpg), you could replace '*.*' with '*.jpg*' for example.
Solution
import os, glob
files = glob.glob(os.path.join(path,'train/*.*'))
As a matter of fact, you might as well just do the following using os library alone, since you are not selecting any particular file extension type.
import os
files = os.listdir(os.path.join(path,'train'))
Some Explanation
The method os.path.join() here helps you join multiple folders together to create a path. This will work whether you are on a Windows/Mac/Linux system. But, for windows the path-separator is \ and for Mac/Linux it is /. So, not using os.path.join() could create an un-resolvable path for the OS. I would use glob.glob when I am interested in getting some specific types (extensions) of files. But glob.glob(path) requires a valid path to work with. In my solution, os.path.join() is creating that path from the path components and feeding it into glob.glob().
For more clarity, I suggest you see documentation for os.path.join and glob.glob.
Also, see pathlib module for path manipulation as an alternative to os.path.join().

How to restore weights with different names but same shapes Tensorflow?

I have multiple architectures in Tensorflow. Some of them share the design of certain parts.
I would like to train one of the networks and use the trained weights of the similar layers in another network.
At this point in time, I am able to save the weights I want and reload them in an architecture with an exactly similar naming convention for the variables.
However, when the weights have different names in the two networks, it is not possible to restore. I have this naming convention for the first network:
selector_network/c2w/var1
in the second network I have this:
joint_network/c2w/var1
Apart from that, the variables are similar in terms of shape. Is there a possibility to change the names upon reloading or to tell Tensorflow where to fit those variables?
EDIT: I found this script from #batzner that allows renaming the variables of a Tensorflow checkpoint : tensorflow_rename_variables.
It is not working. I get the following error:
ValueError: Couldn't find 'checkpoint' file or checkpoints in given directory ./joint_pos_tagger_lemmatizer/fi/
tf.train.Saver has builtin support for that using a dictionary for the var_list argument. This dictionary maps the names of the objects in the checkpoint file to your variables you want to restore.
If you want to restore your "joint network" with a checkpoint of your "selector network", you can do it like this:
# var1 is the variable you want ot restore
saver = tf.train.Saver(var_list={'selector_network/c2w/var1': var1})
saver.restore(...)
If you want to restore more variables, you simply have to extend the dictionary.
Still,there is a way if you don't save it. Write a map function so you can use it to choose the right tensor.And use tensor.assign() to assign the value from the first network.

When are placeholders necessary?

Every TensorFlow example I've seen uses placeholders to feed data into the graph. But my applications work fine without placeholders. According to the documentation, using placeholders is the "best practice", but they seem to make the code unnecessarily complex.
Are there any occasions when placeholders are absolutely necessary?
According to the documentation, using placeholders is the "best practice"
Hold on, this quote is out-of-context and could be misinterpreted. Placeholders are the best practice when feeding data through feed_dict.
Using a placeholder makes the intent clear: this is an input node that needs feeding. Tensorflow even provides a placeholder_with_default that does not need feeding — but again, the intent of such a node is clear. For all purposes, a placeholder_with_default does the same thing as a constant — you can indeed feed the constant to change its value, but is the intent clear, would that not be confusing? I doubt so.
There are other ways to input data than feeding and AFAICS all have their uses.
A placeholder is a promise to provide a value later.
Simple example is to define two placeholders a,b and then an operation on them like below .
a = tf.placeholder(tf.float32)
b = tf.placeholder(tf.float32)
adder_node = a + b # + provides a shortcut for tf.add(a, b)
a,b are not initialized and contains no data Because they were defined as placeholders.
Other approach to do same is to define variables tf.Variable and in this case you have to provide an initial value when you declare it.
like :
tf.global_variables_initializer()
or
tf.initialize_all_variables()
And this solution has two drawbacks
Performance wise that you need to do one extra step with calling
initializer however these variables are updatable .
in some cases you do not know the initial values for these variables
so you have to define it as a placeholder
Conclusion :
use tf.Variable for trainable variables such as weights (W) and biases (B) for your model or when Initial values are required in
general.
tf.placeholder allows you to create operations and build computation graph, without needing the data. In TensorFlow
terminology, we then feed data into the graph through these
placeholders.
I really like Ahmed's answer and I upvoted it, but I would like to provide an alternative explanation that might or might not make things a bit clearer.
One of the significant features of Tensorflow is that its operation graphs are compiled and then executed outside of the original environment used to build them. This allows Tensorflow do all sorts of tricks and optimizations, like distributed, platform independent calculations, graph interoperability, GPU computations etc. But all of this comes at the price of complexity. Since your graph is being executed inside its own VM of some sort, you have to have a special way of feeding data into it from the outside, for example from your python program.
This is where placeholders come in. One way of feeding data into your model is to supply it via a feed dictionary when you execute a graph op. And to indicate where inside the graph this data is supposed to go you use placeholders. This way, as Ahmed said, placeholder is a sort of a promise for data supplied in the future. It is literally a placeholder for things you will supply later. To use an example similar to Ahmed's
# define graph to do matrix muliplication
x = tf.placeholder(tf.float32)
y = tf.placeholder(tf.float32)
# this is the actual operation we want to do,
# but since we want to supply x and y at runtime
# we will use placeholders
model = tf.matmul(x, y)
# now lets supply the data and run the graph
init = tf.global_variables_initializer()
with tf.Session() as session:
session.run(init)
# generate some data for our graph
data_x = np.random.randint(0, 10, size=[5, 5])
data_y = np.random.randint(0, 10, size=[5, 5])
# do the work
result = session.run(model, feed_dict={x: data_x, y: data_y}
There are other ways of supplying data into the graph, but arguably, placeholders and feed_dict is the most comprehensible way and it provides most flexibility.
If you want to avoid placeholders, other ways of supplying data are either loading the whole dataset into constants on graph build or moving the whole process of loading and pre-processing the data into the graph by using input pipelines. You can read up on all of this in the TF documentation.
https://www.tensorflow.org/programmers_guide/reading_data

F# - Organisation of algorithms in a file

I do not find a good way to organize various algorithms. Today the file is like this :
1/ Extraction of values from Excel
2/ First algorithm based on these values (extracted from Excel) starting with
"let matriceAlgo1 ="
3/ Second algorithm starting from the same values
"let matriceAlgo2 ="
4/ Synthesis algorithm, doing a weighted average (depending on several values) of the 2/ and 3/ and selecting the result to be shown.
"let matriceSynthesis ="
My question is the following : what should i put before the different parts of this file in order to just call them by there name ? I have seen answers explaining that Module could be an answer but I don't know how to apply it in my case (or anything else if it's not the good answer).At the end, I would like to be able to write something like this :
"launch Extraction
launch First Algorithm
launch Second Algorithm
Launch Synthesis"
The way I usually organize files is to have some clear visual separator between different sections of a file (see for example Crawler.fsx on GitHub) and then have one "main" section at the end that calls functions declared previously.
I don't really use modules unless I have a large number of functions with clashing names. It would be good idea to use modules if your algorithm consists of more functions (e.g. Alg1.initialize, Alg1.run, etc.). Then you could easily switch between using different algorithms using module alias:
module Alg = Alg1 // or Alg2
let a = Alg.initialize
Alg.run a
If the file is getting longer, then you could also move sections to separate files and use #load "File.fs" to load algorithms or functions from a file. In that case, you probably need to use modules, but you can always open the module after loading the file.

Resources