Cluster/Bucketize time series based on temporal proximity of measurements - time-series

I have a time series of events. The events carry a timestamp and some other metadata. I want to create clusters/buckets of these events based on their temporal proximity in order to discard some redundant events. For example, if I have a cluster of 10 events which are temporally close to each other - let's define temporal proximity as events being no further than x temporal units away from each other -, I can choose one of them as a representative and discard all others. For example, I have the following events
e0-e1-e2-------e3------------------------e5-e6-e7-e8--------------e9-e10
A sample bucketization would be:
(e0,e1,e2)
e3
(e5,e6,e7,e8)
(e9, e10)
I tried to represent the proximity of the events using the dashes above.
I know that there are some elaborate ways to go about this like DBSCAN or k-means but I was hoping that there would be something more straight forward, without any extensive prior knowledge of ML.
I also thought of a sliding window implementation, but the problem becomes quickly very complicated in terms of how to choose the window size and the shifts of the window...
Is there a standard way of doing this?

Related

Monte Carlo Tree Search - intuition behind child selection function for games of two players with opposite goals

Simple question on hello world example of the MCTS for tic-tac-toe,
Let's assume we are given a board and we want to make an optimal decision. As I undestand the choice of consecutive nodes while simulation (until leaf is met) is determined by a exploration/exploitation trade-off function (as described on wikipedia). I really wonder what is the intuition behind first component (exploitation) of the function here, especially for games between two players with oppposite goals. Then the meaning of "the most promising" changes depending on who makes a move. Shouldn't this function change depeding on who makes the next move (especially its first component)?
Yes, that exploitation part of the equation should be implemented to take into account the evaluations from the perspective of the agent/player who gets to select an action in that node.
For single-agent settings, the implementation is straightforward; simply always maximize.
For zero-sum, turn-based, two-player settings, you'd want to alternate between maximizing or minimizing that exploitation part of the equation (note: always maximize the exploration term!). This can also be implemented by simply multiplying that term by -1 in nodes where the opponent gets to move.
Other settings are possible too, but require slightly more implementation effort (e.g. keeping different average scores for different players in settings which are not zero-sum or have more than two players)

Cluster Analysis for crowds of people

I have location data from a large number of users (hundreds of thousands). I store the current position and a few historical data points (minute data going back one hour).
How would I go about detecting crowds that gather around natural events like birthday parties etc.? Even smaller crowds (let's say starting from 5 people) should be detected.
The algorithm needs to work in almost real time (or at least once a minute) to detect crowds as they happen.
I have looked into many cluster analysis algorithms, but most of them seem like a bad choice. They either take too long (I have seen O(n^3) and O(2^n)) or need to know how many clusters there are beforehand.
Can someone help me? Thank you!
Let each user be it's own cluster. When she gets within distance R to another user form a new cluster and separate again when the person leaves. You have your event when:
Number of people is greater than N
They are in the same place for the timer greater than T
The party is not moving (might indicate a public transport)
It's not located in public service buildings (hospital, school etc.)
(good number of other conditions)
One minute is plenty of time to get it done even on hundreds of thousands of people. In naive implementation it would be O(n^2), but mind there is no point in comparing location of each individual, only those in close neighbourhood. In first approximation you can divide the "world" into sectors, which also makes it easy to make the task parallel - and in turn easily scale. More users? Just add a few more nodes and downscale.
One idea would be to think in terms of 'mass' and centre of gravity. First of all, do not mark something as event until the mass is not greater than e.g. 15 units. Sure, location is imprecise, but in case of events it should average around centre of the event. If your cluster grows in any direction without adding substantial mass, then most likely it isn't right. Look at methods like DBSCAN (density-based clustering), good inspiration can be also taken from physical systems, even Ising model (here you think in terms of temperature and "flipping" someone to join the crowd)ale at time of limited activity.
How to avoid "single-linkage problem" mentioned by author in comments? One idea would be to think in terms of 'mass' and centre of gravity. First of all, do not mark something as event until the mass is not greater than e.g. 15 units. Sure, location is imprecise, but in case of events it should average around centre of the event. If your cluster grows in any direction without adding substantial mass, then most likely it isn't right. Look at methods like DBSCAN (density-based clustering), good inspiration can be also taken from physical systems, even Ising model (here you think in terms of temperature and "flipping" someone to join the crowd). It is not a novel problem and I am sure there are papers that cover it (partially), e.g. Is There a Crowd? Experiences in Using Density-Based Clustering and Outlier Detection.
There is little use in doing a full clustering.
Just uses good database index.
Keep a database of the current positions.
Whenever you get a new coordinate, query the database with the desired radius, say 50 meters. A good index will do this in O(log n) for a small radius. If you get enough results, this may be an event, or someone joining an ongoing event.

Change in two 3D models

I'm trying to think of the best way to conduct some sort of analysis between two 3D models of the same object.
The first scan is of the original item and the second scan is after it has been put under some load x.
An example would be trying to find the difference between two types of metal.
I would like to be able to scan the initial metal cylinder, apply a measured load, scan it again, and then finally apply some sort of algorithm to compare the difference.
Is it possible to do this efficiently (maybe using Mablab) over say 50 - 100 items for an object around 5inch^3?
I am assuming I will need to work out some sort of utility function as the total mass should be the same?
Would machine learning be beneficial in this case?
Any suggestions or direction would be amazing.
Thank you :)
EDIT: The scan files are coming through as '.stl'

Is this a correct implementation of Q-Learning for Checkers?

I am trying to understand Q-Learning,
My current algorithm operates as follows:
1. A lookup table is maintained that maps a state to information about its immediate reward and utility for each action available.
2. At each state, check to see if it is contained in the lookup table and initialise it if not (With a default utility of 0).
3. Choose an action to take with a probability of:
(*ϵ* = 0>ϵ>1 - probability of taking a random action)
1-ϵ = Choosing the state-action pair with the highest utility.
ϵ = Choosing a random move.
ϵ decreases over time.
4. Update the current state's utility based on:
Q(st, at) += a[rt+1, + d.max(Q(st+1, a)) - Q(st,at)]
I am currently playing my agent against a simple heuristic player, who always takes the move that will give it the best immediate reward.
The results - The results are very poor, even after a couple hundred games, the Q-Learning agent is losing a lot more than it is winning. Furthermore, the change in win-rate is almost non-existent, especially after reaching a couple hundred games.
Am I missing something? I have implemented a couple agents:
(Rote-Learning, TD(0), TD(Lambda), Q-Learning)
But they all seem to be yielding similar, disappointing, results.
There are on the order of 10²⁰ different states in checkers, and you need to play a whole game for every update, so it will be a very, very long time until you get meaningful action values this way. Generally, you'd want a simplified state representation, like a neural network, to solve this kind of problem using reinforcement learning.
Also, a couple of caveats:
Ideally, you should update 1 value per game, because the moves in a single game are highly correlated.
You should initialize action values to small random values to avoid large policy changes from small Q updates.

How does retiming work in systolic arrays?

How does retiming work in systolic arrays (used in signal processors)? I read that there is some notion of negative delay which is used, but how can a delay be negative and if that is just an abstraction then how does it help?
The basic model of retiming is that you have wavefronts of registers interconnected by a bunch of combinational logic, and you are improving the timing or area of the resulting circuit by repositioning the registers at different points in the circuit such that every path through the logic still goes through the same number of registers. For a simple example, lets say that you have an AND gate feeding a register, the longest path to the input of the register is 12ns, the longest path from the output of the register is 6ns, the delay of the AND gate is 3ns, and you need to get the clock cycle time down to 10ns. You could achieve this by deleting the register and replacing it with two registers, one at each input of the AND gate, clocked by the same clock as the original register. Now you have reduced the longest input path to 9ns, expanded the output path to 9ns, and met your clock cycle objective. In effect, you have added -3ns to the effective arrival time at the register (and added +3 ns to the effective output time).
A modified version of Leiserson and Saxe's original paper on retiming is available here. Wikipedia has a decent, though short, article on the subject with a few links. If you have access to IEEE Xplore or the ACM Digital Library, a search through the proceedings of the Design Automation Conference or the International Conference on Computer-Aided Design looking for retiming should yield lots of articles - this has been an active research area for years.

Resources