Evaluating high level algorithm fitness to an embedded platform [closed]

Evaluating high level algorithm fitness to an embedded platform [closed] - image-processing

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
What is the process you would consider to evaluate high level algorithm (mainly computer vision algorithms, written in Matlab, python etc.) to run real time on an embedded CPU.
The idea is to have a reliable assessment/calculations at early stage when you cannot implement or profile it on the target HW.
To put things in focus lets assume that your input is a grayscale QVGA frame, 8bpp # 30fps and you have to perform a full canny edge detection on each and every input frame. How can we find or estimate the minimum processing power needed to perform this successfully?

A generic assessment isn't quite possible and what you request is tedious manual work.
There are however a few generic steps you could follow to arrive at a rough idea
Estimate the run-time complexity of your algorithm in terms of basic math operations like additions and multiplications (best/average/worst ? your choice). Do you need floating point support? Also track high level math operations like saturating add/subtract (Why ? see point 3).
Devour the ISA of the target processor and focus especially on the math and branching instructions. How many cycles does a multiplication take? Or, does your processor dispatch several per cycle ?
See if your processor supports features like,
Saturating math. ARM Cortex-M4 does. PIC18 micro-controller does not, incurring additional execution overhead.
Hardware floating point operations.
Branch prediction.
SIMD.Will provide significant speed boost if your algorithm could be tailored to it.
Since you explicitly asked for a CPU, see if yours has a GPU attached. Image processing algorithms generally benefit from the presence of one.
Map your operations (from step 1) to what the target processor supports (in step 3) to arrive at an estimate.
Other factors (out of a zillion other) that you need to take into account
Do you plan to run an OS on the target or is it bare-bone ?
Is your algorithm bound by IO bottlenecks ?
If your processor has a cache, how efficient is your algorithm in utilizing it ?

Related

What does "learning rate warm-up" mean? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 2 years ago.
Locked. This question and its answers are locked because the question is off-topic but has historical significance. It is not currently accepting new answers or interactions.
In machine learning, especially deep learning, what does it mean to warm-up?
I've heard sometimes that in some models, warming-up is a phase in training. But honestly, I don't know what it is because I'm very new to ML. Until now I've never used or come across it, but I want to know it because I think it might be useful for me.
What is learning rate warm-up and when do we need it?

If your data set is highly differentiated, you can suffer from a sort of "early over-fitting". If your shuffled data happens to include a cluster of related, strongly-featured observations, your model's initial training can skew badly toward those features -- or worse, toward incidental features that aren't truly related to the topic at all.
Warm-up is a way to reduce the primacy effect of the early training examples. Without it, you may need to run a few extra epochs to get the convergence desired, as the model un-trains those early superstitions.
Many models afford this as a command-line option. The learning rate is increased linearly over the warm-up period. If the target learning rate is p and the warm-up period is n, then the first batch iteration uses 1*p/n for its learning rate; the second uses 2*p/n, and so on: iteration i uses i*p/n, until we hit the nominal rate at iteration n.
This means that the first iteration gets only 1/n of the primacy effect. This does a reasonable job of balancing that influence.
Note that the ramp-up is commonly on the order of one epoch -- but is occasionally longer for particularly skewed data, or shorter for more homogeneous distributions. You may want to adjust, depending on how functionally extreme your batches can become when the shuffling algorithm is applied to the training set.

It means that if you specify your learning rate to be say 2e-5, then during training the learning rate will be linearly increased from approximately 0 to 2e-5 within the first say 10,000 steps.

There are actually two strategies for warmup, ref here.
constant: Use a low learning rate than base learning rate for the initial few steps.
gradual: In the first few steps, the learning rate is set to be lower than base learning rate and increased gradually to approach it as step number increases. As #Prune an #Patel suggested.

What is the role of probability in machine learning software? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
There are several components and techniques used in learning programs. Machine learning components include ANN, Bayesian networks, SVM, PCA and other probability based methods. What role do Bayesian networks based techniques play in machine learning?
Also it would be helpful to know how does integrating one or more of these components into applications lead to real solutions, and how does software deal with limited knowledge and still produce sufficiently reliable results.

Probability and Learning
Probability plays a role in all learning. If we apply Shannon's information theory, the movement of probability toward one of the extremes 0.0 or 1.0 is information. Shannon defined a bit as the quotient of the log_2 of the before and after probabilities of a hypothesis. Given the probability of the hypothesis and its logical inversion, if the probability does not increase for either, no bits of information have been learned.
Bayesian Approaches
Bayesian Networks are directed graphs that represents causality hypotheses. They are generally represented as nodes with conditions connected by arrows that represent the hypothetical causes and corresponding effects. Algorithms have been developed based on Bayes' Theorem that attempt to statistically analyze causality from data that had been or is being collected.
MINOR SIDE NOTE: There are often usage constraints for the analytic tools. Most Bayesian algorithms require that the directed graph be acyclic, meaning that no series of arrows exist between two or more nodes anywhere in the graph that create a purely clockwise or purely counterclockwise closed loop. This is to avoid endless loops, however there may be now or in the future algorithms that work with cycles and handle them seamlessly from mathematical theory and software usability perspectives.
Application to Learning
The application to learning is that the probabilities calculated can be used to predict potential control mechanisms. The litmus test for learning is the ability to reliably alter the future through controls. An important application is the sorting of mail from handwriting. Both neural nets and Naive Bayesian classifiers can be useful in general pattern recognition integrated into routing or manipulation robotics.
Keep in mind here that the term network has a very wide meaning. Neural Nets are not at all the same approach as Bayesian Networks, although they may be applied to similar problem-solution topologies.
Relation to Other Approaches and Mechanisms
How a system designer uses support vector machines, principle component analysis, neural nets, and Bayesian networks in multivariate time series analysis (MTSA) varies from author to author. How they tie together also depends on the problem domain and statistical qualities of the data set, including size, skew, sparseness, and the number of dimensions.
The list given includes only four of a much larger set of machine learning tools. For instance Fuzzy Logic combines weights and production system (rule based) approaches.
The year is also a factor. An answer given now might be stale next year. If I were to write software given the same predictive or control goals as I was given ten years ago, I might combine various techniques entirely differently. I would certainly have a plethora of additional libraries and comparative studies to read and analyse before drawing my system topology.
The field is quite active.

Image processing in microcontroller? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 3 years ago.
Improve this question
I have a robot project and it needs to process images coming from a camera. But I am looking for a microcontroller to have image processing on its own, free of any computer or laptop. Does such a microcontroller exist? What is it? And how is it done?

I think you're taking the wrong approach to your question. At its core, a microcontroller is pretty much just a computation engine with some variety of peripheral modules. The features that vary generally are meant to fulfill an application where a certain performance metric is needed. So in that respect any generic microcontroller will suffice assuming it meets your performance criteria. I think what you should be asking is:
What computations do you want to perform? All the major controller vendors offer some sort of graphics processing libraries for use on their chips. You can download them and look through their interfaces to see if they offer the operations that you need. If you can't find a library that does everything you need then you might have to roll your own graphics library.
Memory constraints? How big will the images be? Will you process an image in its entirety or will you process chunks of an image at a time? This will affect how much memory you'll require your controller to have.
Timing constraints? Are there certain deadlines that need to be met like the robot needing results within a certain period of time after the image is taken? This will affect how fast your processor will need to be or whether a potential controller needs dedicated computation hardware like barrel shifters or multiply-add units to speed the computations along.
What else needs to be controlled? If the controller also needs to control the robot then you need to address what sort of peripherals the chip will need to interface with the robot. If another chip is controlling the robot then you need to address what sort of communications bus is available to interface with the other chip.
Answer these questions first and then you can go and look at controller vendors and figure out which chip suits your needs best. I work mostly with Microchip PIC's these days so I'd suggest the dsPIC33 line from that family as a starting point. The family is built for DSP applications as it's peripheral library includes some image processing stuff and it has the aforementioned barrel-shifters and multiply-add hardware units intended for applications like filters and the like.

It is impossible to answer your question without knowing what image processing it is you need to do, and how fast. For a robot I presume this is real-time processing where a result needs to be available perhaps at the frame rate?
Often a more efficient solution for image processing tasks is to use an FPGA rather than a microprocessor since it allows massive parallelisation and pipe-lining, and implements algorithms directly in logic hardware rather than sequential software instructions so that very sophisticated image processing can be achieved at relatively low clock rates, an FPGA running at just 50 MHz can easily outperform a desktop class processor when performing specialised tasks. Some tasks would be impossible to achieve in any other way.
Also worth consideration is a DSP, this will not have the performance of an FPGA but is easier to use perhaps and more flexible, and is designed to move data rapidly and to execute instructions efficiently, often including a level of instruction level parallelisation.
If you want a conventional microprocessor, then you have to throw clock cycles at the problem (brute force), then an ARM 11, Renesas SH-4, or even an Intel Atom may be suitable. For lower end tasks an ARM Cortex-M4, which includes a DSP engine and optionally floating point hardware may be suited.

The CMUcam3 is the combination of a small camera and an ARM-based microcontroller that is freely programmable. I've programmed image processing code on it before. One caveat, however, is that it only has 64 KB of RAM, so any processing you want to do must be done scanline-by-scanline.

Color object tracking and similar simple image processing can be done with AVRcam. For more intensive processing I would use OpenCV on some ARM Linux board.

How do qubits work and what are their pros and cons? What impact will they have on programming languages? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 years ago.
Improve this question
What can we do more with qubits than normal bits, and how do they work? I read about them some time ago, and it appears that qubits can store not just 0 or 1, but also 0 and 1 at the same time. I don't really understand how they work. Can someone please explain this to me?
What are their pros and cons, and what impact will they have on programming languages like C after quantum computers are actually invented?
How would we manage memory when a bit (which is also a quantum) can take multiple values at once? How can we determine if something is true or false, when there is more than just 1 and 0?

Any "classical" (as it will be called once the technology is in wider use) problem which is solved by "classical" code can be solved using some sort of quantum processor by transforming the problem. For example, to do a database search, instead of using an index-based search/binary search, or a linear search for an unsorted database, you can use Grover's algorithm. Also, to take a step back from the previous poster's mention of BQP problems, problems with a classical "solution" that runs in NP-time can be sped up considerably by Grover's algorithm (a speedup in the time to search through every possible solution). RSA cryptography is also made much more insecure by the advent of Shor's algorithm, since it makes factorising large numbers into their prime factors (the hinge upon which RSA sits) solvable in logarithmic time.
EDIT: Shor's algorithm actually runs in O((log N)^3), which is polynomial-over-logarithmic time.
The conclusion of this sort of thing is that pre-existing programming languages like C will not be able to be used on a quantum computer due to the nature of quantum algorithms (applying certain functions to quantum states), unless someone invents a way to map quantum gates and so forth to logical gates (EDIT: This has apparantly been mostly addressed here), in which case about all we get is a very very fast logical processor when using languages like C.
PS: I'm sure there'll be OpenGL bindings for quantum computing eventually :P

If we can make a working quantum computer (still an open question) then it can efficiently solve certain algorithmic problems that (we think) a classical computer cannot efficiently solve. These are the problems in the complexity class BQP but not in P. One big one is integer factorization. As Will A mentioned, if you can factor enormous integers quickly, you can break a lot of modern ciphers.
The catch is that nobody knows for sure if BQP is actually "bigger" than P — it might be that anything a quantum computer can do quickly, so can a classical computer.
We also don't know if BQP is as big as NP — for instance, nobody has found an efficient way to solve the Traveling Salesman Problem on a quantum computer. This is a common misconception about quantum computers. They might be able to do NP-complete problems quickly, and then again they might not. Nobody knows.
http://scottaaronson.com/blog/?p=208 be good readin' on this topic (as is the rest of the blog).

Regarding what can be solved with quantum computers: A quantum computer would break current asymmetric encryption schemes. It is a common misconception, that quantum computers can solve most optimization problems. They cannot. See
this article for more details what can be solved using quantum computers and what cannot.

qubits doesn't store 0 and 1 simultaneously, actually they are made from the superposition of the 0 and 1 at a time.
so if a normal bit can represent 0 or 1 at a time, but qubits contain 0 and 1 at a time. three normal bits can store any one of the following....
000,001,010,...,111. but qubit can represent all of them at a time(which are in superposition). so a 'n' qubits store 2^n bits simultaneously!

Suppose a qubit an electron and it spins just like dipole momentum particle and when it spins it create an amplitude of multiple intensity and frequencies that minor amplitude can create spin vibration or momentum of particle that momentum can store thousand bits of information !!! (that's called quantum information processing) which is future !

Is it possible to quantify scalability as a requirement? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 1 year ago.
Improve this question
G'day,
I was reading the item Quantify in the book "97 Things Every Software Architect Should Know" (sanitised Amazon link) and it got me wondering how to quantify scalability.
I have designed two systems for a major British broadcasting corporation that are used to:
detect the country of origin for incoming HTTP requests, or
determine the suitable video formats for a mobile phones's screen geometry and current connection type.
Both of the designs were required to provide scalability.
My designs for both systems are scalable horizontally behind caching load-balancing layers which are used to handle incoming requests for both of these services and distribute them across several servers which actually provide the service itself. Initial increases in service capacity are made by adding more servers behind the load-balance layer, hence the term horizontal scalability.
There is a limit to the scalability of this architecture however if the load balance layer starts having difficulty coping with the incoming request traffic.
So, is it possible to quantify scalability? Would it be an estimate of how many additional servers you could add to horizontally scale the solution?

I think this comes down to what scalability means in a given context and therefore the answer would be it depends.
I've seen scalability in requirements for things that simply didn't exist yet. For example, a new loan application tool that specifically called out needing to work on the iPhone and other mobile devices in the future.
I've also seen scalability used to describe potential expansion of more data centers and web servers in different areas of the world to improve performance.
Both examples above can be quantifiable if there is a known target for the future. But scalability may not be quantifiable if there really is no known target or plan which makes it a moving target.

I think it is possible in some contexts - for example scalability of a web application could be quantified in terms of numbers of users, numbers of concurrent requests, mean and standard deviation of response time, etc. You can also get into general numbers for bandwidth and storage, transactions per second, and recovery times (for backup and DR).
You can also often give numbers within the application domain - let's say the system supports commenting, you can quantify what is the order of magnitude of the number of comments that it needs to be able to store.
It is however worth bearing in mind that not everything that matters can be measured, and not everything that can be measured matters. :-)

The proper measure of scalability (not the simplest one;-) is a set of curves defining resource demanded (CPUs, memory, storage, local bandwidth, ...), and performance (e.g. latency) delivered, as the load grows (e.g. in terms of queries per second, but other measures such as total data throughput demanded may also be appropriate for some applications). Decision makers will typically demand that such accurate but complex measures be boiled down to a few key numbers (specific spots on some of the several curves), but I always try to negotiate for more-accurate as against simpler-to-understand measurements of such key metrics!-)

When I think of scalability I think of:
performance - how responsive the app needs to be for a given load
how large a load the app can grow into and at what unit cost (if its per server include software, support, etc)
how fast you can scale the app up and how much buffer you want over peak period usage (we can add 50% more bandwidth in 2-3 hours and require a 30% buffer over planned peak usage)
Redundancy is something else, but should also be included and considered.

"The system shall scale as to maintain a linear relationship of X for cost/user".

Here's one way:
"assume that a single processor can process 100 units of work per second..."
From http://www.information-management.com/issues/19971101/972-1.html

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart