I am building an application that uses CreateML Activity Classifier to make a prediction from sensor data. Currently I am able to get the prediction to work correctly when telling the app when to start and stop recording live sensor data.( Ex. 1. Start recording sensor data 2. Do gesture 3. Stop recording sensor data. 4. Predict ) This works fine but I am confused on how I can predict sensor data coming in live without telling the app when to record and stop recording. If I set it to a fixed value and use a counter such as when you have 300 values, make a prediction; I feel this wouldn't work because what if I didn't start doing the gesture until it already had recorded 200 values and it only got 100 values of my gesture which takes 300 values to complete and the following recording would only get the last 200 values of my gesture. This would result in an incorrect prediction correct? Is there a technique where I can have continuous predictions? I am new to iOS and ML so forgive my ignorance on the subject, any help is appreciated.
Related
I am running a simulation in pydrake where I give a disturbance to my robot in the form of an externally applied spatial force. My goal is visualize and record this disturbance as well as the contact forces drawn by ContactVisualizer in meshcat. However currently the recording only shows the arrows as they were at the last time step of the simulation. If I understand the documentation correctly only published events get recorded, but my attempts at declaring a publish period for my LeafSystem have not yielded the desired results. So my question is: How do I change the behavior of my LeafSystem and the ContactVisualizer to record all the intermediate states of these arrows and display them in the meshcat recording?
This is a known limitation with the meshcat recording at the moment. I started typing a long response here, but decided to open an issue on Drake with the detailed answer so that we can track the resolution.
I am currently working on developing ROS nodes. One node is for parsing raw lidar data which is gained from .bag file and publish to a topic, and the other node subscribes the parsed data from the first node and publish a bunch of the parsed point cloud data by gathering for a specific time duration.
However, the point cloud data seems not to be published to nodes with its correct amount of data.
When I play the .bag file with 0.1 rate, the point cloud data seems not to have much loss data, but when it plays with its normal speed, then it much data gets lost.
For example,
rosbag play data.bag
: (result) point cloud # 60000~80000
(normal speed)
rosbag play data.bag --rate=0.1
: (result) point cloud # 1000~1100
(10times slower speed)
according to the result above, when the data is played with the normal speed it loses almost 50% of its data.
Do you have any idea to get the full data with its normal play speed?
I have once scenario in which user capturing the concert scene with the realtime audio of the performer and at the same time device is downloading the live streaming from audio broadcaster device.later i replace the realtime noisy audio (captured while recording) with the one i have streamed and saved in my phone (good quality audio).right now i am setting the audio offset manually with trial and error basis while merging so i can sync the audio and video activity at exact position.
Now what i want to do is to automate the process of synchronisation of audio.instead of merging the video with clear audio at given offset i want to merge the video with clear audio automatically with proper sync.
for that i need to find the offset at which i should replace the noisy audio with clear audio.e.g. when user start the recording and stop the recording then i will take that sample of real time audio and compare with live streamed audio and take the exact part of that audio from that and sync at perfect time.
does any one have any idea how to find the offset by comparing two audio files and sync with the video.?
Here's a concise, clear answer.
• It's not easy - it will involve signal processing and math.
• A quick Google gives me this solution, code included.
• There is more info on the above technique here.
• I'd suggest gaining at least a basic understanding before you try and port this to iOS.
• I would suggest you use the Accelerate framework on iOS for fast Fourier transforms etc
• I don't agree with the other answer about doing it on a server - devices are plenty powerful these days. A user wouldn't mind a few seconds of processing for something seemingly magic to happen.
Edit
As an aside, I think it's worth taking a step back for a second. While
math and fancy signal processing like this can give great results, and
do some pretty magical stuff, there can be outlying cases where the
algorithm falls apart (hopefully not often).
What if, instead of getting complicated with signal processing,
there's another way? After some thought, there might be. If you meet
all the following conditions:
• You are in control of the server component (audio broadcaster
device)
• The broadcaster is aware of the 'real audio' recording
latency
• The broadcaster and receiver are communicating in a way
that allows accurate time synchronisation
...then the task of calculating audio offset becomes reasonably
trivial. You could use NTP or some other more accurate time
synchronisation method so that there is a global point of reference
for time. Then, it is as simple as calculating the difference between
audio stream time codes, where the time codes are based on the global
reference time.
This could prove to be a difficult problem, as even though the signals are of the same event, the presence of noise makes a comparison harder. You could consider running some post-processing to reduce the noise, but noise reduction in its self is an extensive non-trivial topic.
Another problem could be that the signal captured by the two devices could actually differ a lot, for example the good quality audio (i guess output from the live mix console?) will be fairly different than the live version (which is guess is coming out of on stage monitors/ FOH system captured by a phone mic?)
Perhaps the simplest possible approach to start would be to use cross correlation to do the time delay analysis.
A peak in the cross correlation function would suggest the relative time delay (in samples) between the two signals, so you can apply the shift accordingly.
I don't know a lot about the subject, but I think you are looking for "audio fingerprinting". Similar question here.
An alternative (and more error-prone) way is running both sounds through a speech to text library (or an API) and matching relevant part. This would be of course not very reliable. Sentences frequently repeat in songs and concert maybe instrumental.
Also, doing audio processing on a mobile device may not play well (because of low performance or high battery drain or both). I suggest you to use a server if you go that way.
Good luck.
I want to let my device (iPhone/Watch) record the movements of the user. It seems Apple isn't providing a way to do this in "real-time". The only way I found is to use CMSensorRecorder to get "historical" data from the past. But this sensor recorder samples at 50 Hz, so there are ~50 samples per second. If you want to record the user's activity for e.g. two hours or more, then you have to process 50 * 60 * 60 * 2 = 360.000 samples. That's a real pain on the Apple Watch because of the processor.
I've seen apps on the AppStore which seem to use exactly this sensor recorder to do some analysis based on the movement data. Is there another way to get movement data from the past? Or to set the sample rate of the recorder? I have already put hours of work into my app, but I can't get the data processing fast enough. It's really slow ...
I have a DirectShow application written in Delphi 6 using the DSPACK component library and running on Windows XP. At the top of my filter graph is an audio capture filter. The capture filter is assigned to my VOIP phone and has a sample grabber filter immediately down streeam. In the sample grabber filter's callback method, I added code to report whenever I get two media samples in a row from the sample grabber filter with identical timestamps (SampleTime's). That condition is occurring quite frequently, sometimes nearly every time. Note, the capture filter has a buffer size of 100 milliseconds and a sample rate 8000 kHz. Logic tells me that I should never get two sample deliveries with identical sample times and that they should be always very close to 100 milliseconds apart. But that is not what's happening.
What does it mean when a DirectShow capture filter sends you 2 successive media samples with identical sample times? Should I ignore the second sample delivery that has the same sample time as a previous one? Or is there another problem somewhere that I need to address?
Note, I have no control over the sample times coming in to me. They are being generated by the capture filter.
The real error was a mistake I made in calculating timestamps. The capture filter was not responsible. I'd vote to close my post except there's a valuable comment to it about a utility called DumpMediaSample (see comments section to my original post).