I'm trying to get a Human pose from their body joints.
The number of body joints coordinates is 14 for one person (ex. ankle, knee, hip etc)
And I need to give their connectivity between the coordinates (ex. ankle-knee, knee-hip) as an input of DNN model.
I used to use the relative coordinates (ex.x1-x2, y1-y2) for giving the direction between joints, but there were limitations to increase predict performance.
I want to get a fresh and creative idea you have.
If you have any ideas, let's share.
Thanks
A commonly used convention for selecting frames of reference in robotics applications is the Denavit and Hartenberg (D–H) convention which was introduced by Jacques Denavit and Richard S. Hartenberg. In this convention, coordinate frames are attached to the joints between two links such that one transformation is associated with the joint, [Z], and the second is associated with the link [X].
The coordinate transformations along a serial robot consisting of n links form the kinematics equations of the robot.
You can check out the wiki link for the same here: DH-Parameter
Now to implement this in Python there are lot's of open-source module is available. One such is uw-biorobotics/IKBT, or if you can refer to Python_robotics
Related
I am interested in using the Direct Collocation Method to generate a walking trajectory for a 2D 7-link biped robot (torso, left and right upper leg, lower leg and foot).
Specifically,
input : torque to each joint
state : position of waist and angle of each joint
I have parameters of each link and equations of motion.
However, I couldn't understand how to write "system" (and "context") by reading the API.
Is there a good way to describe "system" from this information or a similar example anywhere?
I'm going to use pydrake.
I have a number of relevant examples in my course notes. I would recommend the compass gait limit cycle exercise from the chapter on planning through contact (which uses a URDF to specify the dynamics), or the SLIP model example in the notebook associated with the "Simple models of legged robots" chapter for an example of writing the equations of motion out manually.
Please understand that DirectCollocation, by itself, is not ideal for planning through collisions. Those chapters describe the "hybrid trajectory optimization" approach that is likely what you will want.
Is it possible to show a neighborhood analysis with relative distances by defined scores in cytoscape? As an example: 2=not in the direct neighborhood, 1=in the direct neighborhood, 0=random permutation.
To take larger values is a very good and I will also implement this in my analysis. Unfortunately, I'm still unclear which method I should use for the visualization of the relative distances. According to what I had read the "force-directed" paradigm should be the best solution for my analysis. Do someone agree with that?
Thanks in advance and best regards Joschua
Sure, you could certainly do something like that. The real question is what would you want the network to look like? If you are trying to see some sort of clustering effect by using your relative distances, I would use different numbers -- maybe 100=not in the direct neighborhood, 1=in the direct neighborhood, and blank for random permutations. If you want to use your relative distances for something else (e.g. edge type or color), then the scaling shouldn't matter.
-- scooter
Interestingly, what technologies and algorithms the large companies use. I found only that the Microsoft uses the PhotoDNA technologie, but it is responsible only how photos are compared. It is interesting also as they automatically detecting pornographic images.
For example, are used any of methods : Skin Detection, ROI Detection, Bag-of-Visual-Words.
Seminal work in this field was done in Fleck, Margaret M., David A. Forsyth, and Chris Bregler. "Finding naked people." Computer Vision—ECCV'96. Springer Berlin Heidelberg, 1996. 593-602.. The approach detects skin-colored regions and then determines whether or not the regions match predefined human shapes. More on their skin detection algorithm here: http://www.cs.hmc.edu/~fleck/naked-skin.html .
More recent papers with summaries of current methods are available:
http://iseclab.org/people/cplatzer/papers/sfcs05-platzer.pdf
http://arxiv.org/abs/1402.5792
You may also take a look at: What is the best way to programatically detect porn images?
update 2016: use a convnet. They are far better at building high resolution filters. I wrote about it in more detail here
http://blog.clarifai.com/what-convolutional-neural-networks-see-at-when-they-see-nudity/
https://www.slideshare.net/mobile/RyanCompton1/what-convnets-look-at-when-they-look-at-nudity
I am developing an application to read the letters and numbers from an image using opencv in c++. I first changed the given colour image and colour template to binary image, then called the method cvMatchTemplate(). This method just highlighted the areas where the template matches.. But not clear.. I just dont want to see the area.. I need to parse the characters(letters & numbers) from the image. I am new to openCV. Does anybody know any other method to get the result??
Image is taken from camera. the sample image is shown above. I need to get all the texts from the LED display(130 and Delft Tanthaf).
Friends I tried with the sample application of face detection, It detects the faces. the HaarCascade file is provided with the openCV. I just loaded that file and called the method cvHaarDetectObjects(); To detect the letters I created the xml file by using the application letter_recog.cpp provided by openCV. But when I loading this file, it shows some error(OpenCV error: UnSpecified error > in unknown function, file ........\ocv\opencv\src\cxcore\cxpersistence.cpp,line 4720). I searched in web for this error and got the information about lib files used. I did so, but the error still remains. Is the error with my xml file or calling the method to load this xml file((CvHaarClassifierCascade*)cvLoad("builded xml file name",0,0,0);)?? please HELP...
Thanks in advance
As of OpenCV 3.0 (in active dev), you can use the built-in "scene text" object detection module ~
Reference: http://docs.opencv.org/3.0-beta/modules/text/doc/erfilter.html
Example: https://github.com/Itseez/opencv_contrib/blob/master/modules/text/samples/textdetection.cpp
The text detection is built on these two papers:
[Neumann12] Neumann L., Matas J.: Real-Time Scene Text Localization
and Recognition, CVPR 2012. The paper is available online at
http://cmp.felk.cvut.cz/~neumalu1/neumann-cvpr2012.pdf
[Gomez13] Gomez L. and Karatzas D.: Multi-script Text Extraction from
Natural Scenes, ICDAR 2013. The paper is available online at
http://refbase.cvc.uab.es/files/GoK2013.pdf
Once you've found where the text in the scene is, you can run any sort of standard OCR against those slices (Tesseract OCR is common). And there's now an end-to-end sample in opencv using OpenCV's new interface to Tesseract:
https://github.com/Itseez/opencv_contrib/blob/master/modules/text/samples/end_to_end_recognition.cpp
Template matching tend not to be robust for this sort of application because of lighting inconsistencies, orientation changes, scale changes etc. The typical way of solving this problem is to bring in machine learning. What you are trying to do by training your own boosting classifier is one possible approach. However, I don't think you are doing the training correctly. You mentioned that you gave it 1 logo as a positive training image and 5 other images not containing the logo as negative examples? Generally you need training samples to be in the order of hundreds or thousands or more. You cannot possibly train with 6 training samples and expect it to work.
If you are unfamiliar with machine learning, here is roughly what you should do:
1) You need to collect many positive training samples (from hundred onwards but generally the more the merrier) of the object you are trying to detect. If you are trying to detect individual characters in the image, then get cropped images of individual characters. You can start with the MNIST database for this. Better yet, to train the classifier for your particular problem, get many cropped images of the characters on the bus from photos. If you are trying to detect the entire rectangular LED board panel, then use images of them as your positive training samples.
2) You will need to collect many negative training samples. Their number should be in the same order as the number of positive training samples you have. These could be images of the other objects that appear in the images you will run your detector on. For example, you could crop images of the front of the bus, road surfaces, trees along the road etc. and use them as negative examples. This is to help the classifier rule out these objects in the image you run your detector on. Hence, negative examples are not just any image containing objects you don't want to detect. They should be objects that could be mistaken for the object you are trying to detect in the images you run your detector on (at least for your case).
See the following link on how to train the cascade of classifier and produce the XML model file: http://note.sonots.com/SciSoftware/haartraining.html
Even though you mentioned you only want to detect the individual characters instead of the entire LED panel on the bus, I would recommend first detecting the LED panel so as to localize the region containing the characters of interest. After that, either perform template matching within this smaller region or run a classifier trained to recognize individual characters on patches of pixels in this region obtained using sliding window approach, and possibly at multiple scale. (Note: The haarcascade boosting classifier you mentioned above will detect characters but it won't tell you which character it detected unless you only train it to detect that particular character...) Detecting characters in this region in a sliding window manner will give you the order the characters appear so you can string them into words etc.
Hope this helps.
EDIT:
I happened to chance upon this old post of mine after separately discovering the scene text module in OpenCV 3 mentioned by #KaolinFire.
For those who are curious, this is the result of running that detector on the sample image given by the OP. Notice that the detector is able to localize the text region, even though it returns more than one bounding box.
Note that this method is not foolproof (at least this implementation in OpenCV with the default parameters). It tends to generate false-positives, especially when the input image contains many "distractors".
Here are more examples obtained using this OpenCV 3 text detector on the Google Street View dataset:
Notice that it has a tendency to find "text" between parallel lines (e.g., windows, walls etc). Since the OP's input image is likely going to contain outdoor scenes, this will be a problem especially if he/she does not restrict the region of interest to a smaller region around the LED signs.
It seems that if you are able to localize a "rough" region containing just the text (e.g., just the LED sign in the OP's sample image), then running this algorithm can help you get a tighter bounding box. But you will have to deal with the false-positives though (perhaps discarding small regions or picking among the overlapping bounding boxes using a heuristic based on knowledge about the way letters appear on the LED signs).
Here are more resources (discussion + code + datasets) on text detection.
Code
Extracting text OpenCV
http://libccv.org/doc/doc-swt/
Stroke Width Transform (SWT) implementation (Python)
https://github.com/subokita/Robust-Text-Detection
Datasets
You will find the google streetview and MSRA datasets here. Although the images in these datasets are not exactly the same as the ones for the LED signs on buses, they may be helpful either for picking the "best" performing algorithm from among several competing algorithms, or to train a machine learning algorithm from scratch.
http://www.iapr-tc11.org/mediawiki/index.php/Datasets_List
See my answer to How to read time from recorded surveillance camera video? You can/should use cvMatchTemplate() to do that.
If you are working with a fixed set of bus destinations, template matching will do.
However, if you want the system to be more flexible, I would imagine you would need some form of contour/shape analysis for each individual letter.
You can also look at EAST: Efficient Scene Text Detector - https://www.learnopencv.com/deep-learning-based-text-detection-using-opencv-c-python/
Under this link, you have examples with C++ and Python. I used this code to detect numbers of buses (after detecting that given object is a bus).
I have a basic understanding in image processing and now studying in-depth the "Digital Image Processing" book by Gonzales.
When image given and object of interest approximated form is known (e.g. circle, triangle),
what is the best algorithm / method to find this object on image?
The object can be slightly deformed, so brute force approach will not help.
You may try using Histograms of Oriented Gradients (also called Edge Orientation Histograms). We have used them for detecting road signs. http://en.wikipedia.org/wiki/Histogram_of_oriented_gradients and the papers by Bill Triggs should get you started.
I recommend you use the Hough transform, which allows you to find any given pattern described by a equation. What's more the Hough transform works also great for deformed objects.
The algorithm and implementation itself is quite simple.
More details can be found here: http://en.wikipedia.org/wiki/Hough_transform , even a source code for this algorithm is included on a referenced page (http://www.rob.cs.tu-bs.de/content/04-teaching/06-interactive/HNF.html).
I hope that helps you.
I would look at your problem in two steps:
first finding your object's outer boundary:
I'm supposing you have contrasted enough image, that you can easily threshold to get a binary image of your object. You need to extract the object boundary chain-code.
then analyzing the boundary's shape to deduce the form (circle, polygon,...):
You can calculate the curvature in each point of the boundary chain and thus determine how many sharp angles (i.e. high curvature value) there are in your shape. Several sharp angles means you have a polygon, none means you have a circle (constant curvature).
You can find a description on how to get your object's boundary from the binary image and ways of analysing it in Gonzalez's Digital Image Processing, chapter 11.
I also found this insightful presentation on binary image analyis (PPT) and a matlab script that implements some of the techniques that Gonzalez talks about in DIP.
I strongly recommend you to use OpenCV, it's a great computer vision library that greatly help with anything related to computer vision. Their website isn't really attractive, nor helpful, but the API is really powerful.
A book that helped me a lot since there isn't a load of documentation on the web is Learning OpenCV. The documentation that comes with the API is good, but not great for learning how to use it.
Related to your problem, you could use a Canny Edge detector to find the border of your item and then analyse it, or you could proceed with and Hough transform to search for lines and or circles.
you can specially try 'face recognition'. Because, you know that is a specific topic. On the other hand 'face detection' etc. EmguCV can be useful for you.. It is .Net wrapper to the Intel OpenCV image processing library.
It looks like professor Jean Rouat from the University of Sherbooke, has found a way to find objects in images by processing neutral spiking neural network. His technology name RN-SPIKES, seems to be available for licencing.