Sound Wave Peak Detection (Cricket Chirping) - signal-processing

I am looking for a way to code or find a program that can record the chirps of crickets either live or through a prerecorded audio file (large ~24 hours) for a lab experiment.
I'm not too sure how to approach this as I'm a web developer, but I have experience with JS and python, along with libraries. My initial idea was to use Matplotlib to produce an audio visualizer, and then count each time a certain range of db is reached which matches the db of a cricket chirp, but I have no idea how to approach it.
I have successfully visualized the chirps on a online spectrum analyzer (Spectrum Visualizer of Audio Chirps), and can see it clearly, however I don't know how I can use code to count each "chirp" and record it along with the date and time for each chirp in a table of values / dataset of some sort.
Any guidance or help would be greatly appreciated!

Really naive solution: For every unit of time (column of pixels in the spectrogram), you can calculate the sum of all the values in a column. For example, if all the pixels in a column are black, the sum for that column will be 0, if some of them are colored, the sum will be >0.
Then loop over the sums: if at a certain step you go from 0 to > 0, and then after a while you go back to 0, you've hit a peak.
If the output is too "noisy", you can use a small threshold value instead of 0. Tweak it until results seem OK. This works well when the input background noise is pretty much the same all the way through, but if it's skewed up and down during the whole recording, you need something more complicated (for example, the threshold constantly changes with the average of the last n sums)
Or you can Google "peak detection algorithm" and implement one of them. Or you can Google "peak detection library" in your favourite language and use one of them.

Related

How to hit the texel cache in WebGL?

What i'm doing is GPGPU on WebGL and I don't know the access pattern which I'd be talking about applies to general graphics and gaming programs. In our code, frequently, we come across data which needs to be summarized or reduced per output texel. A very simple example is matrix multiplication during which, for every output texel, your return a value which is a dot product of a row of one input and a column of the other input.
This has been the sore point of our performance because of not so much the computation but multiplied data access. So I've been trying to find a pattern of reads or data layouts which would expedite this operation and I have been completely unsuccessful.
I will be describing some assumptions and some schemes below. The sample code for all these are under https://github.com/jeffsaremi/webgl-experiments
Unfortunately due to size I wasn't able to use the 'snippet' feature of StackOverflow. NOTE: All examples write to console not the html page.
Base matmul implementation: Example: [2,3]x[3,4]->[2,4] . This produces in a simplistic form 2 textures of (w:3,h:2) and (w:4,h:3). For each output texel I will be reading along the X axis of the left texture but going along the Y axis of the right texture. (see webgl-matmul.html)
Assuming that GPU accesses data similar to CPU -- that is block by block -- if I read along the width of the texture I should be hitting the cache pretty often.
For this, I'd layout both textures in a way that I'd be doing dot products of corresponding rows (along texture width) only. Example: [2,3]x[4,3]->[2,4] . Note that the data for the right texture is now transposed so that for each output texel I'd be doing a dot product of one row from the left and one row from the right. (see webgl-matmul-shared-alongX.html)
To ensure that the above assumption is indeed working, I created a negative test also. In this test I'd be reading along the Y axis of both left and right textures which should have the worst performance ever. Data is pre-transposed so that the results make sense. Example: [3,2]x[3,4]->[2,4]. (see webgl-matmul-shared-alongY.html).
So I ran these -- and I hope you could do as well to see -- and I found no evidence to support existence or non-existence of such caching behavior. You need to run each example a few times to get consistent results for comparison.
Then I came along this paper http://fileadmin.cs.lth.se/cs/Personal/Michael_Doggett/pubs/doggett12-tc.pdf which in short claims that the GPU caches data in blocks (or tiles as I call them).
Based on this promising lead I created a version of matmul (or dot product) which uses blocks of 2x2 to do its calculation. Prior to using this of course I had to rearrange my inputs into such layout. The cost of that re-arrangement is not included in my comparison. Let's say I could do that once and run my matmul many times after. Even this scheme did not contribute anything to the performance if not taking something away. (see webgl-dotprod-tiled.html).
A this point I am completely out of ideas and any hints would be appreciated.
thanks

FSK demodulation with GNU Radio

I'm trying to demodulate a signal using GNU Radio Companion. The signal is FSK (Frequency-shift keying), with mark and space frequencies at 1200 and 2200 Hz, respectively.
The data in the signal text data generated by a device called GeoStamp Audio. The device generates audio of GPS data fed into it in real time, and it can also decode that audio. I have the decoded text version of the audio for reference.
I have set up a flow graph in GNU Radio (see below), and it runs without error, but with all the variations I've tried, I still can't get the data.
The output of the flow graph should be binary (1s and 0s) that I can later convert to normal text, right?
Is it correct to feed in a wav audio file the way I am?
How can I recover the data from the demodulated signal -- am I missing something in my flow graph?
This is a FFT plot of the wav audio file before demodulation:
This is the result of the scope sink after demodulation (maybe looks promising?):
UPDATE (August 2, 2016): I'm still working on this problem (occasionally), and unfortunately still cannot retrieve the data. The result is a promising-looking string of 1's and 0's, but nothing intelligible.
If anyone has suggestions for figuring out the settings on the Polyphase Clock Sync or Clock Recovery MM blocks, or the gain on the Quad Demod block, I would greatly appreciate it.
Here is one version of an updated flow graph based on Marcus's answer (also trying other versions with polyphase clock recovery):
However, I'm still unable to recover data that makes any sense. The result is a long string of 1's and 0's, but not the right ones. I've tried tweaking nearly all the settings in all the blocks. I thought maybe the clock recovery was off, but I've tried a wide range of values with no improvement.
So, at first sight, my approach here would look something like:
What happens here is that we take the input, shift it in frequency domain so that mark and space are at +-500 Hz, and then use quadrature demod.
"Logically", we can then just make a "sign decision". I'll share the configuration of the Xlating FIR here:
Notice that the signal is first shifted so that the center frequency (middle between 2200 and 1200 Hz) ends up at 0Hz, and then filtered by a low pass (gain = 1.0, Stopband starts at 1 kHz, Passband ends at 1 kHz - 400 Hz = 600 Hz). At this point, the actual bandwidth that's still present in the signal is much lower than the sample rate, so you might also just downsample without losses (set decimation to something higher, e.g. 16), but for the sake of analysis, we won't do that.
The time sink should now show better values. Have a look at the edges; they are probably not extremely steep. For clock sync I'd hence recommend to just go and try the polyphase clock recovery instead of Müller & Mueller; chosing about any "somewhat round" pulse shape could work.
For fun and giggles, I clicked together a quick demo demod (GRC here):
which shows:

What should i do to maintain performance of a mobile app which is using database?

I'm building an app using database.
I have a words table and everytime user types something, this app will record and update word the database.
And the frequency field will be auto increase after user enter one matched word.
But the trouble is user type day by day and i afraid the search performance will be reduce after times and also the Int field will reach to the limit (max limit Int) someday.
So, i limit the database to around less than 50.000 records.
I delete less-used records after a certain time.
But i don't know how to deal with frequency Int field of each word?
How to know exactly frequency usage of each word without increasing the field forever?
I recommend that you use a logarithmic scale for the frequency values. That's what is often done in situations like this. See Wikipedia to learn about logarithmic scales.
For example, if you have a word MAN that has a frequency of 15, the value you store in the database would be log(15) ~= 1.17609125906.
If you then find 4 new occurrences of MAN, then you want to add 4 to the field. You cannot add the log values directly because log(x)+log(y)=log(x*y). (See the Logarithm Rules section of this article for more information on log rules.)
Instead -- assuming you use a base 10 logarithm, you would use this formula:
SET frequency = log(10^frequency+4)
Depending on the length of your words, the few bytes for the frequency don't matter. With an unsigned four bytes integer, you can count up to more than two billion, which is way above the number of words what the user can type in in their whole lifespan.
So may want to go for two or three bytes, but the savings may be negligible.
Anyway, there are the following approaches for preventing overflow:
You can detect it, and then undo the operations, scale everything down by some factor of two, and then redo.
You can periodically check all your numbers and do the scaling when approaching the limit.
You can do a probabilistic update like below.
Probabilistic update
Instead of simply incrementing the frequency every time by one, you do it only with a probability which gets lower and lower as the counter grows. For example, you can do the increment with a probability of 1.0 / (oldValue + 1) or 2 ** -oldValue. The latter leads to a logarithmic growth, but, unlike the idea in the other answer, it works.
There are obviously some disadvantages due to the randomness and precision loss, but when all you care about is the relative frequency, it should be good enough.

How to Recognize a Pattern in a Stream of Numbers

I have a stream of numbers (integers for the sake of discussion) being sampled off an analog input (a a/d converter attached to a potentiomeger). I am curious how I would recognize a pattern in the numbers in realtime.
That is to say, if someone quickly twiddles the pot all the way up and back down, how do I recognize that, vs if turn it only half way. Or what if they turn it up and down three times in a row. How can I convert these actions into distinct "events"? This seems especially tricky to me since the time window over which each of these events will occur will be modestly variable.
I can think of a few quick, hacky ways to do this, but nothing that I am confident in. I am also curious how one would expand this out to multiple different inputs (i.e. input off a spectrograph). Does that change things dramatically? I am not even sure what topic area I should be googling.
If you know what you are looking for, correlate the input signal against a replica of what you expect. Basically, implement a matched filter. If you want to see when the input stream is -127, -63, 0, 63, 127, implement a direct form fir filter with these values as the coefficients. Then look for a maximum on the output. The maximum output of a filter with those coefficients occurs when the data in the filter is -127, -63, 0, 63, 127.
Google "Matched Filter Detection" or or "detection theory" maybe even "Feature detection"
If you don't know exactly what you are looking for, or what you are looking for is variable, it gets more complicated. You would then try to implement a filter that's output would give you information about what is going on. The example that I gave above would show the output spike up when that input sequence occurred. If you then saw that spike occurring with regular frequency, you would guess that the input event was occurring with regular frequency.
if you made your filter 0, 63, 127 63 0, which correlates to turning the knob all the way up, and then back down again, and on your output saw the aforementioned spike occur, but having a lower maximum amplitude and wider time over which the correlation occurs, that might tell you that the know was turned all the way up and then back down, but either slower or faster than the speed for which the filter is design to get a maximum response.
To combat this you might implement 3 of these filters in parallel, one designed for a slow knob turn, one for a medium speed knob turn, and one for a fast knob turn. Then looking at the 3 outputs you get 3 different correlations which better help you understand what is occurring
Did you consider taking the running difference of the signal (its differentiation)?

Mahout Recommender: What relative preference values are suitable for a GenericUserBasedRecommender?

In mahout, I'm setting up a GenericUserBasedRecommender, pretty straight forward for now, typical settings.
In generating a "preference" value for an item, we have the following 5 data points:
Positive interest
User converted on item (highest possible sign of interest)
Normal like (user expressed interest, e.g. like buttons)
Indirect expression of interest (clicks, cursor movements, measuring "eyeballs")
Negative interest
Indifference (items the user ignored when active on other items, a vague expression of disinterest)
Active dislike (thumbs down, remove item from my view, etc)
Over what range I should express these different attributes, let's use a 1-100 scale for discussion?
Should I be keeping the 'Active dislike' and 'Indifference' clustered close together, for example, at 1 and 5 respectively, with all the likes clustered in the 90-100 range?
Should 'Indifference' and 'Indirect expressions of interest' by closer to the center? As in 'Indifference' in the 20-35 range and 'Indirect like' in the 60-70 range?
Should 'User conversion' blow the scale away and be heads and tails higher than the others? As in: 'User Conversion' # 100, 'Lesser likes' # ~65, 'Dislikes' clustered in the 1-10 range?
On the scale of 1-100 is 50 effectively "null", or equivalent to no data point at all?
I know the final answer lies in trial and error and in the meaning of our data, but as far as the algorithm goes, I'm trying to understand at what point I need to tip the scales between interest and disinterest for the algorithm to function properly.
The actual range does not matter, not for this implementation. 1-100 is OK, 0-1 is OK, etc. The relative values are all that really matters here.
These values are estimated by a simple (linearly) weighted average. Therefore the response ought to be "linear". It ought to match an intuition that if action X gets a score 2x higher than action Y, then X should be an indicator of twice as much interest in real life.
A decent place to start is to simply size them relative to their frequency. If click-to-conversion rate is 2%, you might make a click worth 2% of a conversion.
I would ignore the "Indifference" signal you propose. It is likely going to be too noisy to be of use.

Resources