Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 3 years ago.
Improve this question
I have a robot project and it needs to process images coming from a camera. But I am looking for a microcontroller to have image processing on its own, free of any computer or laptop. Does such a microcontroller exist? What is it? And how is it done?
I think you're taking the wrong approach to your question. At its core, a microcontroller is pretty much just a computation engine with some variety of peripheral modules. The features that vary generally are meant to fulfill an application where a certain performance metric is needed. So in that respect any generic microcontroller will suffice assuming it meets your performance criteria. I think what you should be asking is:
What computations do you want to perform? All the major controller vendors offer some sort of graphics processing libraries for use on their chips. You can download them and look through their interfaces to see if they offer the operations that you need. If you can't find a library that does everything you need then you might have to roll your own graphics library.
Memory constraints? How big will the images be? Will you process an image in its entirety or will you process chunks of an image at a time? This will affect how much memory you'll require your controller to have.
Timing constraints? Are there certain deadlines that need to be met like the robot needing results within a certain period of time after the image is taken? This will affect how fast your processor will need to be or whether a potential controller needs dedicated computation hardware like barrel shifters or multiply-add units to speed the computations along.
What else needs to be controlled? If the controller also needs to control the robot then you need to address what sort of peripherals the chip will need to interface with the robot. If another chip is controlling the robot then you need to address what sort of communications bus is available to interface with the other chip.
Answer these questions first and then you can go and look at controller vendors and figure out which chip suits your needs best. I work mostly with Microchip PIC's these days so I'd suggest the dsPIC33 line from that family as a starting point. The family is built for DSP applications as it's peripheral library includes some image processing stuff and it has the aforementioned barrel-shifters and multiply-add hardware units intended for applications like filters and the like.
It is impossible to answer your question without knowing what image processing it is you need to do, and how fast. For a robot I presume this is real-time processing where a result needs to be available perhaps at the frame rate?
Often a more efficient solution for image processing tasks is to use an FPGA rather than a microprocessor since it allows massive parallelisation and pipe-lining, and implements algorithms directly in logic hardware rather than sequential software instructions so that very sophisticated image processing can be achieved at relatively low clock rates, an FPGA running at just 50 MHz can easily outperform a desktop class processor when performing specialised tasks. Some tasks would be impossible to achieve in any other way.
Also worth consideration is a DSP, this will not have the performance of an FPGA but is easier to use perhaps and more flexible, and is designed to move data rapidly and to execute instructions efficiently, often including a level of instruction level parallelisation.
If you want a conventional microprocessor, then you have to throw clock cycles at the problem (brute force), then an ARM 11, Renesas SH-4, or even an Intel Atom may be suitable. For lower end tasks an ARM Cortex-M4, which includes a DSP engine and optionally floating point hardware may be suited.
The CMUcam3 is the combination of a small camera and an ARM-based microcontroller that is freely programmable. I've programmed image processing code on it before. One caveat, however, is that it only has 64 KB of RAM, so any processing you want to do must be done scanline-by-scanline.
Color object tracking and similar simple image processing can be done with AVRcam. For more intensive processing I would use OpenCV on some ARM Linux board.
Related
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
What is the process you would consider to evaluate high level algorithm (mainly computer vision algorithms, written in Matlab, python etc.) to run real time on an embedded CPU.
The idea is to have a reliable assessment/calculations at early stage when you cannot implement or profile it on the target HW.
To put things in focus lets assume that your input is a grayscale QVGA frame, 8bpp # 30fps and you have to perform a full canny edge detection on each and every input frame. How can we find or estimate the minimum processing power needed to perform this successfully?
A generic assessment isn't quite possible and what you request is tedious manual work.
There are however a few generic steps you could follow to arrive at a rough idea
Estimate the run-time complexity of your algorithm in terms of basic math operations like additions and multiplications (best/average/worst ? your choice). Do you need floating point support? Also track high level math operations like saturating add/subtract (Why ? see point 3).
Devour the ISA of the target processor and focus especially on the math and branching instructions. How many cycles does a multiplication take? Or, does your processor dispatch several per cycle ?
See if your processor supports features like,
Saturating math. ARM Cortex-M4 does. PIC18 micro-controller does not, incurring additional execution overhead.
Hardware floating point operations.
Branch prediction.
SIMD.Will provide significant speed boost if your algorithm could be tailored to it.
Since you explicitly asked for a CPU, see if yours has a GPU attached. Image processing algorithms generally benefit from the presence of one.
Map your operations (from step 1) to what the target processor supports (in step 3) to arrive at an estimate.
Other factors (out of a zillion other) that you need to take into account
Do you plan to run an OS on the target or is it bare-bone ?
Is your algorithm bound by IO bottlenecks ?
If your processor has a cache, how efficient is your algorithm in utilizing it ?
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
Closed 1 year ago.
Improve this question
I want to implement a FPGA-based real-time stereo vision system for long range (up to 100m) depth estimation.
I have decided to use IP cameras for this project
(although I still don't know that is there any other kind of camera proper for this range or not?).
Is it possible to port output of an IP camera to fpga and then perform related image processing? How?
I will be grateful for any information you can provide.
Possible but impractical, and unlikely to work.
Taking input from an IP camera would require your FPGA design to contain a full network stack to make an HTTP request to each camera, download an image, and decode it. This is more of a job for a microcontroller than an FPGA; it will be very time-consuming to implement in hardware.
You are also likely to run into issues because IP cameras tend to be relatively slow, and cannot be synchronized. That is, if you request an image from two cameras at the same time, there is no guarantee that the images you get back will have been taken at the same time.
Don't use IP cameras for this. They're not suited to the purpose. Use camera modules with digital outputs; they're readily available, and likely less expensive than the IP cameras.
I will assume you have a mid-range FPGA .. then I would say your possible options :
- you can capture a single frame at a time from the IP camera .. if it outputs VGA video .. with hsync, Vsync, ...
- if you are working on a Dev. Kit, the FPGA would be interfaced with an SDram, which gives you the ability to save a couple of frames in it (not a whole video for sure)
- you can conduct simple image processing algos with available DSP slices in your Fpga .. if you are working with xilinx, check DSP48e1 or DSP48A1
Maybe you should think about using Cameras with SDI interface. SDI is a common standard video interface and is designed to work up to 120m, over 75Ohm coaxial cables.
The SMPTE standard ST 425-4 describes the transmission of a stereoscopic camera stream over dual 3G-SDI links in FullHD at 50/60 Hz.
If you are fine with 1080i, then a single 3G-SDI link will be enough (described in ST 425-2).
SDI interface would be the most ideal for long range applications (widely used in television industries). Then depends on your goal you can implement ISP modules and/or transform SDI signals to your desired output protocols (e.g. PCIe) on the FPGA.
It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center.
Closed 10 years ago.
Now I want to make app to show pic from server.But found it is so slowly to make the pics come out. Is there any way or code to make the http request be more faster?
Approach the problem from a different angle. Think it as an entropy/economy thing.
There is two distant point in the problem you are describing and they want to transfer a data between them. Lets say this is going to cost 100 units to realize in ideal conditions. And again lets assume it is not possible to further lower this cost. This cost is where the energy required to transfer is minimum
Now assuming that transfer rates are not under our control. Here are some theoretical "seemingly" improvements which are actually just different trade-off sets.
Forward Caching / Caching: Preload/download all images whenever possible so they will be ready when the user requests them. Install everything static with the first time installation.
Trade-off: You spent most of your 100 points on disk space and pre-process power this may make your app go little slower always but the performance will be great once they are loaded on disk. Effectiveness decrease as images you want changes frequently
Compression / Mapping: If your bottleneck is at transfer rates compress/map the images as much as you can so they will be transferred with low cost but when they arrive you will use much processor power once they arrive at the app.
Trade-off: CPU power is used a lot more then before but while in transfer they will be moving fast. Side that compresses uses a lot more memory and side that decompress also uses more memory and CPU. Only good if your images are slow moving because they are huge then you will see this trade-offs benefits. If you want to try this install 7z and check advanced zipping settings and try really huge maps.
Drawing algorithms: If your images are drawings instead of bitmaps/real pictures. Send only vector graphics format, change all your pictures(raster images to be technical) to vector images. Will greatly reduces the number of bytes to carry an image to app but it will need more processor power at the end.
Trade-off: Aside from not all pictures can be converted to vector graphics. you are going to be using more CPU and memory but there excellent libraries already built in which are very optimized. You can also think this as "mathematical compression" where a single infinite line can not be stored in any computer in universe a simple one line mathematical expression such as x = y + 1 will make it happen.
Write your own server: If you are sure the bottleneck is at the communication time with the service provider(in this case a http server which is most probably very efficient). Write you own server which answers you very quickly and starts sending the file. The likely hood of this improvements is to low to even talk about a trade off.
Never be forced to send repeating information: Design and tag your information and pictures in such a way. You will only send non-repeating chunks where the receiving side will store any chunk received to further improve its cache of information.
Trade-Off:Combination of 1,2 and 3 you this is just another form of disturbing 100 points of cost. You get the point.
BitTorrent Ideology:If the bottleneck is your servers bandwidth there is a simple formula to see if using your user's bandwidth is logical and have positive effects. Again it is probably only effective if your data set is very large.
Trade-Off: This is an interesting option to discuss about trade-off. You will be using your users bandwidth and proximity to compensate for your servers lack of bandwidth. It requires a little bit more CPU then conventional way of acquiring data(Maintain more TCP connections)
As a closing note: There can not be a function call that can improve and make the cost of transferring information from 100 points to 95 points. In current technology level it seems that that we are really close to effectively transfer. I mean compression, mapping and various other techniques are pretty mature including network transfer methodologies but there is always a room for improvement. For example currently we think sending data with the light speed is the absolute maximum as they are electrical signals but quantum entangled observation technique denies this limit where two entangled particles theoretically send and receive information in infinite speed across universe(??). Also If you can develop a better and more efficient way of compression that will be awesome.
Anyway, as you have asked a question which does not provide much information where we can talk. I would strongly recommend thinking like an engineer, creating a test environment pointing out to the main cause and attacking it with all you got. If you can define the problem with better mathematical expression or pointing out the bottleneck we can answer it better than generic information theory.
Also one last note. I am not saying information transfer is not going to be any more efficient it may go up %1000 percent tomorrow, I am just saying this field is pretty mature to get any huge improvements without working on mathematics and theory for years. Like any other field of research.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 years ago.
Improve this question
In the beginning, I would like to describe my current position and the goal that I would like to achieve.
I am a researcher dealing with machine learning. So far have gone through several theoretical courses covering machine learning algorithms and social network analysis and therefore have gained some theoretical concepts useful for implementing machine learning algorithms and feed in the real data.
On simple examples, the algorithms work well and the running time is acceptable whereas the big data represent a problem if trying to run algorithms on my PC. Regarding the software I have enough experience to implement whatever algorithm from articles or design my own using whatever language or IDE (so far have used Matlab, Java with Eclipse, .NET...) but so far haven't got much experience with setting-up infrastructure. I have started to learn about Hadoop, NoSQL databases, etc, but I am not sure what strategy would be the best taking into consideration the learning time constraints.
The final goal is to be able to set-up a working platform for analyzing big data with focusing on implementing my own machine learning algorithms and put all together into production, ready for solving useful question by processing big data.
As the main focus is on implementing machine learning algorithms I would like to ask whether there is any existing running platform, offering enough CPU resources to feed in large data, upload own algorithms and simply process the data without thinking about distributed processing.
Nevertheless, such a platform exists or not, I would like to gain a picture big enough to be able to work in a team that could put into production the whole system tailored upon the specific customer demands. For example, a retailer would like to analyze daily purchases so all the daily records have to be uploaded to some infrastructure, capable enough to process the data by using custom machine learning algorithms.
To put all the above into simple question: How to design a custom data mining solution for real-life problems with main focus on machine learning algorithms and put it into production, if possible, by using the existing infrastructure and if not, design distributed system (by using Hadoop or whatever framework).
I would be very thankful for any advice or suggestions about books or other helpful resources.
First of all, your question needs to define more clearly what you intend by Big Data.
Indeed, Big Data is a buzzword that may refer to various size of problems. I tend to define Big Data as the category of problems where the Data size or the Computation time is big enough for "the hardware abstractions to become broken", which means that a single commodity machine cannot perform the computations without intensive care of computations and memory.
The scale threshold beyond which data become Big Data is therefore unclear and is sensitive to your implementation. Is your algorithm bounded by Hard-Drive bandwidth ? Does it have to feet into memory ? Did you try to avoid unnecessary quadratic costs ? Did you make any effort to improve cache efficiency, etc.
From several years of experience in running medium large-scale machine learning challenge (on up to 250 hundreds commodity machine), I strongly believe that many problems that seem to require distributed infrastructure can actually be run on a single commodity machine if the problem is expressed correctly. For example, you are mentioning large scale data for retailers. I have been working on this exact subject for several years, and I often managed to make all the computations run on a single machine, provided a bit of optimisation. My company has been working on simple custom data format that allows one year of all the data from a very large retailer to be stored within 50GB, which means a single commodity hard-drive could hold 20 years of history. You can have a look for example at : https://github.com/Lokad/lokad-receiptstream
From my experience, it is worth spending time in trying to optimize algorithm and memory so that you could avoid to resort to distributed architecture. Indeed, distributed architectures come with a triple cost. First of all, the strong knowledge requirements. Secondly, it comes with a large complexity overhead in the code. Finally, distributed architectures come with a significant latency overhead (with the exception of local multi-threaded distribution).
From a practitioner point of view, being able to perform a given data mining or machine learning algorithm in 30 seconds is one the key factor to efficiency. I have noticed than when some computations, whether sequential or distributed, take 10 minutes, my focus and efficiency tend to drop quickly as it becomes much more complicated to iterate quickly and quickly test new ideas. The latency overhead introduced by many of the distributed frameworks is such that you will inevitably be in this low-efficiency scenario.
If the scale of the problem is such that even with strong effort you cannot perform it on a single machine, then I strongly suggest to resort to on-shelf distributed frameworks instead of building your own. One of the most well known framework is the MapReduce abstraction, available through Apache Hadoop. Hadoop can be run on 10 thousands nodes cluster, probably much more than you will ever need. If you do not own the hardware, you can "rent" the use of a Hadoop cluster, for example through Amazon MapReduce.
Unfortunately, the MapReduce abstraction is not suited to all Machine Learning computations.
As far as Machine Learning is concerned, MapReduce is a rigid framework and numerous cases have proved to be difficult or inefficient to adapt to this framework:
– The MapReduce framework is in itself related to functional programming. The
Map procedure is applied to each data chunk independently. Therefore, the
MapReduce framework is not suited to algorithms where the application of the
Map procedure to some data chunks need the results of the same procedure to
other data chunks as a prerequisite. In other words, the MapReduce framework
is not suited when the computations between the different pieces of data are
not independent and impose a specific chronology.
– MapReduce is designed to provide a single execution of the map and of the
reduce steps and does not directly provide iterative calls. It is therefore not
directly suited for the numerous machine-learning problems implying iterative
processing (Expectation-Maximisation (EM), Belief Propagation, etc.). The
implementation of these algorithms in a MapReduce framework means the
user has to engineer a solution that organizes results retrieval and scheduling
of the multiple iterations so that each map iteration is launched after the reduce
phase of the previous iteration is completed and so each map iteration is fed
with results provided by the reduce phase of the previous iteration.
– Most MapReduce implementations have been designed to address production needs and
robustness. As a result, the primary concern of the framework is to handle
hardware failures and to guarantee the computation results. The MapReduce efficiency
is therefore partly lowered by these reliability constraints. For example, the
serialization on hard-disks of computation results turns out to be rather costly
in some cases.
– MapReduce is not suited to asynchronous algorithms.
The questioning of the MapReduce framework has led to richer distributed frameworks where more control and freedom are left to the framework user, at the price of more complexity for this user. Among these frameworks, GraphLab and Dryad (both based on Direct Acyclic Graphs of computations) are well-known.
As a consequence, there is no "One size fits all" framework, such as there is no "One size fits all" data storage solution.
To start with Hadoop, you can have a look at the book Hadoop: The Definitive Guide by Tom White
If you are interested in how large-scale frameworks fit into Machine Learning requirements, you may be interested by the second chapter (in English) of my PhD, available here: http://tel.archives-ouvertes.fr/docs/00/74/47/68/ANNEX/texfiles/PhD%20Main/PhD.pdf
If you provide more insight about the specific challenge you want to deal with (type of algorithm, size of the data, time and money constraints, etc.), we probably could provide you a more specific answer.
edit : another reference that could prove to be of interest : Scaling-up Machine Learning
I had to implement a couple of Data Mining algorithms to work with BigData too, and I ended up using Hadoop.
I don't know if you are familiar to Mahout (http://mahout.apache.org/), which already has several algorithms ready to use with Hadoop.
Nevertheless, if you want to implement your own Algorithm, you can still adapt it to Hadoop's MapReduce paradigm and get good results. This is an excellent book on how to adapt Artificial Intelligence algorithms to MapReduce:
Mining of Massive Datasets - http://infolab.stanford.edu/~ullman/mmds.html
This seems to be an old question. However given your usecase, the main frameworks focusing on Machine Learning in Big Data domain are Mahout, Spark (MLlib), H2O etc. However to run Machine Learning algorithms on Big Data you have to convert them to parallel programs based on Map Reduce paradigm. This is a nice article giving a brief introduction to major (not all) big Data frameworks:
http://www.codophile.com/big-data-frameworks-every-programmer-should-know/
I hope this will help.
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 1 year ago.
Improve this question
G'day,
I was reading the item Quantify in the book "97 Things Every Software Architect Should Know" (sanitised Amazon link) and it got me wondering how to quantify scalability.
I have designed two systems for a major British broadcasting corporation that are used to:
detect the country of origin for incoming HTTP requests, or
determine the suitable video formats for a mobile phones's screen geometry and current connection type.
Both of the designs were required to provide scalability.
My designs for both systems are scalable horizontally behind caching load-balancing layers which are used to handle incoming requests for both of these services and distribute them across several servers which actually provide the service itself. Initial increases in service capacity are made by adding more servers behind the load-balance layer, hence the term horizontal scalability.
There is a limit to the scalability of this architecture however if the load balance layer starts having difficulty coping with the incoming request traffic.
So, is it possible to quantify scalability? Would it be an estimate of how many additional servers you could add to horizontally scale the solution?
I think this comes down to what scalability means in a given context and therefore the answer would be it depends.
I've seen scalability in requirements for things that simply didn't exist yet. For example, a new loan application tool that specifically called out needing to work on the iPhone and other mobile devices in the future.
I've also seen scalability used to describe potential expansion of more data centers and web servers in different areas of the world to improve performance.
Both examples above can be quantifiable if there is a known target for the future. But scalability may not be quantifiable if there really is no known target or plan which makes it a moving target.
I think it is possible in some contexts - for example scalability of a web application could be quantified in terms of numbers of users, numbers of concurrent requests, mean and standard deviation of response time, etc. You can also get into general numbers for bandwidth and storage, transactions per second, and recovery times (for backup and DR).
You can also often give numbers within the application domain - let's say the system supports commenting, you can quantify what is the order of magnitude of the number of comments that it needs to be able to store.
It is however worth bearing in mind that not everything that matters can be measured, and not everything that can be measured matters. :-)
The proper measure of scalability (not the simplest one;-) is a set of curves defining resource demanded (CPUs, memory, storage, local bandwidth, ...), and performance (e.g. latency) delivered, as the load grows (e.g. in terms of queries per second, but other measures such as total data throughput demanded may also be appropriate for some applications). Decision makers will typically demand that such accurate but complex measures be boiled down to a few key numbers (specific spots on some of the several curves), but I always try to negotiate for more-accurate as against simpler-to-understand measurements of such key metrics!-)
When I think of scalability I think of:
performance - how responsive the app needs to be for a given load
how large a load the app can grow into and at what unit cost (if its per server include software, support, etc)
how fast you can scale the app up and how much buffer you want over peak period usage (we can add 50% more bandwidth in 2-3 hours and require a 30% buffer over planned peak usage)
Redundancy is something else, but should also be included and considered.
"The system shall scale as to maintain a linear relationship of X for cost/user".
Here's one way:
"assume that a single processor can process 100 units of work per second..."
From http://www.information-management.com/issues/19971101/972-1.html