Automatic people counting + twittering - image-processing

Want to develop a system accurately counting people that go through a normal 1-2m wide door. and twitter whenever people goes in or out and tells how many people remain inside.
Now, Twitter part is easy, but people counting is difficult. There is some semi existing counting solution, but they do not quite fit my needs.
My idea/algorithm:
Should I get some infra-red camera mounting on top of my door and constantly monitoring, and divide the camera image into several grid and calculating they entering and gone?
can you give me some suggestion and starting point?

How about having two sensors about 6 inches apart. They could be those little beam sensors (you know, the ones that chime when you walk into some shops) placed on either side of the door jam. We'll call the sensors S1 and S2
If they are triggered in the order of S1 THEN S2 - this means a person came in
If they are triggered in the order of S2 THEN S1 - this means a person left.
-----------------------------------------------------------
| sensor | door jam | sensor |
-----------------------------------------------------------
| |
| |
| |
| |
S1 S2 this is inside the store
| |
| |
| |
| |
-----------------------------------------------------------
| sensor | door jam | sensor |
-----------------------------------------------------------

If you would like to have the people filmed by a camera you can try to segment the people in the image and track them using a Particle Filter for multi-object tracking.
http://portal.acm.org/citation.cfm?id=1561072&preflayout=flat
This is a paper by one of my professors. Maybe you wanna have a look at it.
If your camera is mounted and doesnt move you can use a substraction-method for segmentation of the moving people (Basically just substract two following images and all that stays where the things that move). Then do some morphological operations on it so only big parts (people) stay. Maybe even identify them by checking on rectangularity so you only keep "standing" objects.
Then use a Particle Filter to track the people in the scene automatically... And each new object would increase the counter...
If you want I could maybe send you a presentation I held a while ago (unfortunately its in German, but you can translate it)
Hope that helps...

Related

Algorithms for correlation of events/issues

We are working on a system that aims to help development teams, SRE, DevOps team members by debugging many of the well known infrastructure issues (k8s to begin with) on their behalf and generate a detailed report which details report which details the specifics of the issue, possible root causes and clear next steps for the users facing the problem. In short, instead of you having to open up terminal, run several commands to arrive at an issue, a system does it for you and show it in a neat UI. We plan to leverage AI to provide better user experiences.
Questions:
1.There are several potential use case like predictive analytics, anomaly detection, forecasting, etc. We will not analysis application logs or metrics (may include metrics in future). Unlike application level logs, the platform logs are more unified. What is a good starting point for AI usage especially for platform based logs?
2.We plant to use AI to analysis issue correlations, we Apyori, FP Growth and got output. The output looks like below
| antecedent | consequent | confidence | lift |
|----------------------------|-------------------| ---------- | ---- |
| [Failed, FailedScheduling] | [BackOff] | 0.75 | 5.43 |
| [NotTriggerScaleUp] | [FailedScheduling]| 0.64 | 7.29 |
| [Failed] | [BackOff] | 0.52 | 3.82 |
| [FailedCreatePodSandBox] | [FailedScheduling]| 0.51 | 5.88 |
FP Growth is data mining algorithm, from the output we can figure the pattern of events. There is one potential use case, save the previous output and compare it with latest output to detect abnormal pattern in the latest output. Can we use the output to inference issue correlations or any other scenario we can use the output?
3.Some logs seems irrelevant, but actually they have connections, like one host has issue, it will impact the applications running on it, the time span maybe long, how can we figure out this kind of relationships?
Any comments and suggestions will be greatly appreciated, thank you in advance.

how to setup a alert on machine learning toolkit for historical data

I am working on a splunk time series forcasting poc and needed to show how splunk send alert when the prediction returns a result above threshold.
the search | inputlookup internet_traffic.csv | timechart span=120min avg("bits_transferred") as bits_transferred | eval bits_transferred=round(bits_transferred) , if predicts bits_trasferred above the condition given in alert should send email to mentioned id.
Currently the condition give is per result of the search.
Kindly let me know how to set up the alert or which condition to setup.
I'll give you an example using the Splunk core function, predict, but you should be able to also apply it to the Machine Learning Toolkit
| inputlookup internet_traffic.csv | timechart span=120min avg("bits_transferred") as bits_transferred | eval bits_transferred=round(bits_transferred) | predict bits_transferred | where bits_transferred > 'upper95(prediction(bits_transferred))'
The Machine Learning Toolkit actually has a showcase example that you can tweak that illustrates detecting anomalies with MLTJ=K
/en-US/app/Splunk_ML_Toolkit/detect_numeric_outliers?ml_toolkit.dataset=Employee%20Logins%20(prediction%20errors)

Machine Learning nominal data

Im working on machine learning with svm. I try to feed my svm with data, but my data is nominal and i have no idea how to transform it.
My data looks like:
--------------------------------------------------
Item | Productname | Label name | Packaging |etc...
--------------------------------------------------
1 | Battery Micro 4 | Batt. Micro | Folding|...
--------------------------------------------------
2 | Battery Micro 8 | Batt. Micro | Blister|...
--------------------------------------------------
3 | button cell Battery | btn Batt. | Blister | ...
--------------------------------------------------
I want to train my svm to identify that "Battery Micro 4" is column "Productname"
and "Batt. Micro" is column "label name" and Folding is column "Packaging" so on.
Methods like onehot seems not to be good for my case.
The number of items will increase after some time.
Does anyone know a method to transform these data to numerical values with less information loss?
thanks.
Since your data has no natural ordering it would be of no use to use integer encoding. Next option would have been one hot encoding but as you said the no of items can increase we can discard this as well, next option is to get a value count of all the discrete values you have sort them and then do an integer encoding going from smallest to largest, while doing this step you should also take care about the cardinality of discrete values, if you have an discrete element with a cardinality <1% it would be better to create a special category for these values and add all such value to that category, this way any new category that arrives during test time should be assign to this category as its cardinality would definitely be very low.

I'm failing to understand how the stack works

I'm building an emulator for the MOS6502 processor, and at the moment I'm trying to simulate the stack in code, but I'm really failing to understand how the stack works in the context of the 6502.
One of the features of the 6502's stack structure is that when the stack pointer reaches the end of the stack it will wrap around, but I don't get how this feature even works.
Let's say we have a stack with 64 maximum values if we push the values x, y and z onto the stack, we now have the below structure. With the stack pointer pointing at address 0x62, because that was the last value pushed onto the stack.
+-------+
| x | 0x64
+-------+
| y | 0x63
+-------+
| z | 0x62 <-SP
+-------+
| | ...
+-------+
All well and good. But now if we pop those three values off the stack we now have an empty stack, with the stack pointer pointing at value 0x64
+-------+
| | 0x64 <-SP
+-------+
| | 0x63
+-------+
| | 0x62
+-------+
| | ...
+-------+
If we pop the stack a fourth time, the stack pointer wraps around to point at address 0x00, but what's even the point of doing this when there isn't a value at 0x00?? There's nothing in the stack, so what's the point in wrapping the stack pointer around????
I can understand this process when pushing values, if the stack is full and a value needs to be pushed to the stack it'll overwrite the oldest value present on the stack. This doesn't work for popping.
Can someone please explain this because it makes no sense.
If we pop the stack a fourth time, the stack pointer wraps around to point at address 0x00, but what's even the point of doing this when there isn't a value at 0x00?? There's nothing in the stack, so what's the point in wrapping the stack pointer around????
It is not done for a functional reason. The 6502 architecture was designed so that pushing and popping could be done by incrementing an 8 bit SP register without any additional checking. Checks for overflow or underflow of the SP register would involve more silicon to implement them, more silicon to implement the stack overflow / underflow handling ... and extra gate delays in a critical path.
The 6502 was designed to be cheap and simple using 1975 era chip technology1. Not fast. Not sophisticated. Not easy to program2
1 - According to Wikipedia, the original design had ~3200 or ~3500 transistors. One of the selling points of the 6502 was that it was cheaper than its competitors. Fewer transistors meant smaller dies, better yields and lower production costs.
2 - Of course, this is relative. Compared to some ISAs, the 6502 is easy because it is simple and orthogonal, and you have so few options to chose from. But compared to others, the limitations that make it simple actually make it difficult. For example, the fact that there are at most 256 bytes in the stack page that have to be shared by everything. It gets awkward if you are implementing threads or coroutines. Compare this with an ISA where the SP is a 16 bit register or the stack can be anywhere.

Can Machine Learning help classify data

I have a data set as below,
Code | Description
AB123 | Cell Phone
B467A | Mobile Phone
12345 | Telephone
WP9876 | Wireless Phone
SP7654 | Satellite Phone
SV7608 | Sedan Vehicle
CC6543 | Car Coupe
Need to create a automated grouping based on the Code and Description. Lets assume I have so many such data already classified into 0-99 groups. Whenever a new data comes in with a Code and Description, the Machine Learning algorithm needs to automatically classify this based on the previously available data.
Code | Description | Group
AB123 | Cell Phone | 1
B467A | Mobile Phone | 1
12345 | Telephone | 1
WP9876 | Wireless Phone | 1
SP7654 | Satellite Phone | 1
SV7608 | Sedan Vehicle | 2
CC6543 | Car Coupe | 3
Can this be achieved to some level of accuracy? Currently this process is so manual. Any such ideas or references are there, please help with that.
Try reading up on Supervised Learning. You need to provide labels for your training data so that the algorithms know what are the correct answers - and are able to generate appropriate models for you.
Then you can "predict" the output classes for your new incoming data using the generated model(s).
Finally, you may wish to circle back to check the accuracy of the predicted results. If you then enter the labels for the newly received and predicted data then those data can then be used for further training on your model(s).
Yes, it's possible with supervised learning. You pick yourself a model which you "train" with the data you already have. The model/algorithm then "generalizes" to previously unseen data from the known data.
What you specify as a group would be called class or "label" which needs to be predicted based on 2 input features (code/description). Whether you input these features directly or preprocess them into more abstract features which suits the algorithm better, depends on which algorithm you choose.
If you have no experience with Machine Learning, you might start with learning some basics while testing already implemented algorithms in tools such as RapidMiner, Weka or Orange.
I don't think machine learning methods are the most appropriate for the solution of the problem, because text based machine learning algorithms tend to be quite complicated. From the examples you provided I'm not sure how
I think the simplest way of solving, or attempting to solve this problem is the following, which can be implemented in many free programming languages, such as python. Each description can be stored as a string. What you could do is to store all the substrings of all the strings (ie Phone is your string, the substrings will be 'P','h',Ph',..,'e') that belong in a particular group in a list (see this question for how to implement it in python... Substrings of a string using Python). Then you want to for each substring and all substrings stored, see which ones are unique to a certain group. Then select strings over a certain length (say 3 characters long, to get rid of random letter concatenations) as your classification criteria. Then when you get new data, check whether the description is unique to a certain group. With this for instance, you would be able to classify all objects that are in group 1 based on whether their description contains the word phone.
Its hard to provide concrete code to solves your problem without knowing what languages you are familiar with/are feasible to use. I hope this helps anyway. Yves

Resources