Extracting EEG segments from an MNE Raw object such that the time frames are preserved - mne-python

I have an MNE raw EEG object from which I would like to extract segments given by start time and end time points that are in a csv file that looks like this:
Sleep Stage start end
SLEEP-REM 4770.0 5280.0
SLEEP-REM 5310.0 5760.0
SLEEP-REM 10620.0 12270.0
SLEEP-REM 16440.0 17010.0
SLEEP-REM 17040.0 17670.0
SLEEP-REM 21390.0 21630.0
I just want the REM segments such that the times are preserved exactly as they are. I tried the following:
rem_raw = raw.copy().crop(tmin=rem_df.iloc[0,1], tmax=rem_df.iloc[0,2]) #first rem epoch
for i in range(1,len(rem_df)):
t_start = rem_df.iloc[i,1] #iterating over start
t_end = rem_df.iloc[i,2] #iterating over end
rem_raw.append(raw.copy().crop(tmin=t_start, tmax=t_end))
This does extract the REM stages for me, but the problem in appending this way is that it completely restarts the timepoints from t = 0 and has a continuous data structure, while I want a discontinuous structure.
Is there a way to store all of this in discontinuous epochs?


Iterate on (or access directly) xarray chunks

I'm after a way to iterate on xarray chunks, so something similar to dask.array.blocks but that would give me access to xarray chunks with coordinates and dimensions.
For the record, I'm aware that xarray.map_blocks exists, but what I'm doing maps input chunks to output chunks of unknown shape, so I'd like to write something custom by looping directly on the xarray chunks.
I've tried to look into the xarray.map_blocks source code, since I guess something similar to what I need is in there, but I had a hard time understanding what's going on there.
My use case is that I would like, for each xarray chunk, to get an output xarray chunk of variable length along a new dimension (called foo below), and eventually concatenate them along foo.
This is a mocked scenario that should at least clarify what I'm after.
For now I've solved the problem constructing, from each dask chunk of the DataArray, an "xarray" chunk (but this looks quite convoluted), and then using client.map(fn_on_chunk, xarray_chunks).
n = 1000
x_raster = y_raster = np.arange(n)
time = np.arange(10)
vals_raster = np.arange(n*n*10).reshape(n, n, 10)
da_raster = xr.DataArray(vals_raster, coords={"y": y_raster, "x": x_raster, 'time':time})
da_raster = da_raster.chunk(dict(x=100, y=100))
def fn_on_chunk(da_chunk):
# Tried to replicate the fact that I can't know in advance
# the lenght of one dimension of the output
len_range = np.random.randint(10)
outs = []
for foo in range(len_range):
# Do some magic that finds needed coordinates
# on this particular chunk
x_chunk, y_chunk = fn_magic(foo)
out = da_chunk.sel(x=x_chunk, y=y_chunk)
out['foo'] = foo
return xr.concat(outs, dim='foo')

Load and merge many files from S3 using Dask

I have about 1m "result" files in S3 bucket which I want to process. Each result file should be merge with additional columns from an associated "context" file, which I have about 50k of (i.e. each context is associated with about 20 results)
Processing it serially is slow so I am using dask to parallelize some of the work.
In my serial code, I just load everything up-front and merge them, e.g.
contexts_map = {get_context_id(ctx_file): load_context(ctx_file) for ctx_file in ctx_files}
data = []
for result_file in result_files:
ctx_id, res_id = get_context_and_res_id(result_file)
ctx = contexts_map[ctx_id]
df = pd.DataFrame(data)
Initially I thought to divide the data and process in batches using dask (i.e. run the above in parallel on several batches) but then I read about dask bag and dask dataframe from_delayed and thought to use it. What I have:
delayed_get_context = delayed(get_context)
# load the contexts
ctx_map = {}
for ctx_file in ctx_files:
ctx_id = get_context_id(ctx_file)
ctx_map[ctx_file] = delayed_get_context(ctx_item)
# process the contexts
delayed_get_context_stats = delayed(get_context_stats)
ctx_stat_map = {ctx_id: delayed_get_context_stats(ctx) for ctx_id, ctx in ctx_map}
# the main bag of result files to process
res_bag = db.from_sequence(res_items, npartitions=num_workers * 2)
# prepare a list of corresponding delayed per results
# the order in this list corresponds to order of res_bag
res_context_list = [
ctx_stat_map[get_context_and_res_id(item)[0]] for item in res_items
# then create a bag from that list
ctx_bag = db.from_sequence(res_context_list, npartitions=num_workers * 2)
# create delays for the results
delayed_extract = delayed(extract_stats)
# from what I understand, if one of the arguments is also a bug
# it is distributed in accordance to the "main" bag
results = res_bag.map(delayed_extract, ctx_stats=ctx_bag)
df = ddf.from_delayed(results)
df = df.compute()
This create a computation graph similar to the following:
When I run this on a subset (as in the image above) it works ok. Running the code on 1m items, I don't see anything happen (maybe didn't wait enough for it to finish building the graph and moving things around?)
With that, does the code above makes sense? Should I have done it another way?
One of the things I am "afraid" of with the above implementation is that there's a lot of data movement.
I could potentially spend some time up-front to arrange context+results and then treat that as the "unit-of-work" and maybe get better results?
Any feedback here would be appreciated - is there a better approach?
And another question - what number of partitions I should use? I saw in the docs it will default to about 100, but is there some rule of thumb to use here?

How do you split a rosbag into several files without calling rosbag filter multiple times?

I want to split a 100-GB rosbag into 100 1-GB bags. I tried using rosbag filter but it takes a long time as I have to run each filter manually and each time, it performs a scan of the full bag. Is there a better way to perform this split (either through command line or Python script)?
You could simply use a function like this to split your bag file into chunks:
import rosbag
def extract_chunks(file_in, chunks):
bagfile = rosbag.Bag(file_in)
messages = bagfile.get_message_count()
m_per_chunk = int(round(float(messages) / float(chunks)))
chunk = 0
m = 0
outbag = rosbag.Bag("chunk_%04d.bag" % chunk, 'w')
for topic, msg, t in bagfile.read_messages():
m += 1
if m % m_per_chunk == 0:
chunk += 1
outbag = rosbag.Bag("chunk_%04d.bag" % chunk, 'w')
outbag.write(topic, msg, t)
Be aware, that this method uses the number of messages to perform splitting, hence the resulting chunk bag files do not necessarily have the same size.

Lua random image

I have a Lua script I am using in a tabletop game and basically you have a "token" that represents a creature. When it dies, it overlays an image (which I have indicated in an .xml script) with an image of like a blood splat, or tombstone etc.
How do I make it so it would randomize which image gets overlayed?
The Script is here.
The lines below (178-184) are the main section that tells it "put image X over the token". I want it to randomize between say, 5 different images..
if not widgetDeathIndicator then
widgetDeathIndicator = tokenCT.addBitmapWidget("token_dead");
widgetDeathIndicator.setTooltipText(sName .. " has fallen, as if dead.");
widgetDeathIndicator.setSize(nWidth-20, nHeight-20);
token_dead is the name of the current image being used, which in the .xml directs to a .png
Yes, you can use math.random for this.
local images = {
local image = images[math.random(#images)]
math.random(n) will return a pseudo-random integer between 1 and n, so if you pass in #images (the length of the images table) you will get a valid pseudo-random table index for images.
To get better randomness you should set math.randomseed before you call math.random. (If you don't set it, then math.random will return the same sequence of "random" numbers each time.)

Determine consecutive video clips

I a long video stream, but unfortunately, it's in the form of 1000 15-second long randomly-named clips. I'd like to reconstruct the original video based on some measure of "similarity" of two such 15s clips, something answering the question of "the activity in clip 2 seems like an extension of clip 1". There are small gaps between clips --- a few hundred milliseconds or so each. I can also manually fix up the results if they're sufficiently good, so results needn't be perfect.
A very simplistic approach can be:
(a) Create an automated process to extract the first and last frame of each video-clip in a known image format (e.g. JPG) and name them according to video-clip names, e.g. if you have the video clips:
clipA.avi, clipB.avi, clipC.avi
you may create the following frame-images:
clipA_first.jpg, clipA_last.jpg, clipB_first.jpg, clipB_last.jpg, clipC_first.jpg, clipC_last.jpg
(b) The sorting "algorithm":
1. Create a 'Clips' list of Clip-Records containing each:
(a) clip-name (string)
(b) prev-clip-name (string)
(c) prev-clip-diff (float)
(d) next-clip-name (string)
(e) next-clip-diff (float)
2. Apply the following processing:
for Each ClipX having ClipX.next-clip-name == "" do:
ClipX.next-clip-diff = <a big enough number>;
for Each ClipY having ClipY.prev-clip-name == "" do:
float ImageDif = ImageDif(ClipX.last-frame.jpg, ClipY.first_frame.jpg);
if (ImageDif < ClipX.next-clip-diff)
ClipX.next-clip-name = ClipY.clip-name;
ClipX.next-clip-diff = ImageDif;
Clips[ClipX.next-clip-name].prev-clip-name = ClipX.clip-name;
Clips[ClipX.next-clip-name].prev-clip-diff = ClipX.next-clip-diff;
3. Scan the Clips list to find the record(s) with no <prev-clip-name> or
(if all records have a <prev-clip-name> find the record with the max <prev-clip-dif>.
This is a good candidate(s) to be the first clip in sequence.
4. Begin from the clip(s) found in step (3) and rename the clip-files by adding
a 5 digits number (00001, 00002, etc) at the beginning of its filename and going
from aClip to aClip.next-clip-name and removing the clip from the list.
5. Repeat steps 3,4 until there are no clips in the list.
6. Voila! You have your sorted clips list in the form of sorted video filenames!
...or you may end up with more than one sorted lists (if you have enough
'time-gap' between your video clips).
Very simplistic... but I think it can be effective...
PS1: Regarding the ImageDif() function: You can create a new DifImage, which is the difference of Images ClipX.last-frame.jpg, ClipY.first_frame.jpg and then then sum all pixels of DifImage to a single floating point ImageDif value. You can also optimize the process to abort the difference (or sum process) if your sum is bigger than some limit: You are actually interested in small differences. A ImageDif value which is larger than an (experimental) limit, means that the 2 images differs so much that the 2 clips cannot be one next each other.
PS2: The sorting algorithm order of complexity must be approximately O(n*log(n)), therefore for 1000 video clips it will perform about 3000 image comparisons (or a little more if you optimize the algorithm and you allow it to not find a match for some clips)
