Returning a list of nearest neighbors from KNN - machine-learning

I am trying to use a KNN model to show the closest related brands to brand X. I have read in my data and transposed it so that the format is like this:
User1 User2 User3 User4 User5
Brand1 1 0 0 0 1
Brand2 0 0 0 1 1
Brand3 0 0 1 1 1
Brand4 1 1 1 0 1
Brand5 0 0 0 1 1
I've then defined my model:
from sklearn.neighbors import NearestNeighbors
model_knn = NearestNeighbors(metric='cosine', algorithm='brute')
model_knn.fit(df_mini)
Then I use the following code to list the nearest 5 brands to a randomly selected brand:
query_index = np.random.choice(df_mini.shape[0])
distances, indices = model_knn.kneighbors(df_mini.iloc[query_index, :].values.reshape(1, -1), n_neighbors = 6)
for i in range(0, len(distances.flatten())):
if i == 0:
print ('Recommendations for {0}:\n'.format(df_mini.index[query_index]))
else:
print ('{0}: {1}, with distance of {2}:'.format(i, df_mini.index[indices.flatten()[i]], distances.flatten()[i]))
Returning sample results like this:
Recommendations for BRAND_X:
1: BRAND_a, with distance of 1.0:
2: BRAND_b, with distance of 1.0:
3: BRAND_c, with distance of 1.0:
4: BRAND_d, with distance of 1.0:
5: BRAND_e, with distance of 1.0:
All my results are showing all brands with a distance of 1.0, where have i gone wrong in my code for this to be the case? I have tried increasing the size of the sample data and this remains the same, which makes me feel it's a code error rather than a data quirk?
EDIT: Here is a fuller sample of my code:
import pandas as pd
df = pd.read_csv('sample.csv')
print(df.head())
df_mini = df[:5000]
df_mini = df_mini.transpose()
df_mini = df_mini.drop('UserID',axis=0)
from sklearn.neighbors import NearestNeighbors
model_knn = NearestNeighbors(metric='cosine', algorithm='brute')
model_knn.fit(df_mini)
query_index = np.random.choice(df_mini.shape[0])
distances, indices = model_knn.kneighbors(df_mini.iloc[query_index, :].values.reshape(1, -1), n_neighbors = 6)
for i in range(0, len(distances.flatten())):
if i == 0:
print ('Recommendations for {0}:\n'.format(df_mini.index[query_index]))
else:
print ('{0}: {1}, with distance of {2}:'.format(i, df_mini.index[indices.flatten()[i]], distances.flatten()[i]))
Sample data file:
https://drive.google.com/open?id=19KRJDGrsLNpDD0WNAz4be76O66fGmQtJ

Related

Best way to parallelize computation over dask blocks that do not return np arrays?

I'd like to return a dask dataframe from an overlapping dask array computation, where each block's computation returns a pandas dataframe. The example below shows one way to do this, simplified for demonstration purposes. I've found a combination of da.overlap.overlap and to_delayed().ravel() as able to get the job done, if I pass in the relevant block key and chunk information.
Edit:
Thanks to a #AnnaM who caught bugs in the original post and then made it general! Building off of her comments, I'm including an updated version of the code. Also, in responding to Anna's interest in memory usage, I verified that this does not seem to take up more memory than naively expected.
def extract_features_generalized(chunk, offsets, depth, columns):
shape = np.asarray(chunk.shape)
offsets = np.asarray(offsets)
depth = np.asarray(depth)
coordinates = np.stack(np.nonzero(chunk)).T
keep = ((coordinates >= depth) & (coordinates < (shape - depth))).all(axis=1)
data = coordinates + offsets - depth
df = pd.DataFrame(data=data, columns=columns)
return df[keep]
def my_overlap_generalized(data, chunksize, depth, columns, boundary):
data = data.rechunk(chunksize)
data_overlapping_chunks = da.overlap.overlap(data, depth=depth, boundary=boundary)
dfs = []
for block in data_overlapping_chunks.to_delayed().ravel():
offsets = np.array(block.key[1:]) * np.array(data.chunksize)
df_block = dask.delayed(extract_features_generalized)(block, offsets=offsets,
depth=depth, columns=columns)
dfs.append(df_block)
return dd.from_delayed(dfs)
data = np.zeros((2,4,8,16,16))
data[0,0,4,2,2] = 1
data[0,1,4,6,2] = 1
data[1,2,4,8,2] = 1
data[0,3,4,2,2] = 1
arr = da.from_array(data)
df = my_overlap_generalized(arr,
chunksize=(-1,-1,-1,8,8),
depth=(0,0,0,2,2),
columns=['r', 'c', 'z', 'y', 'x'],
boundary=tuple(['reflect']*5))
df.compute().reset_index()
-- Remainder of original post, including original bugs --
My example only does xy overlaps, but it's easy to generalize. Is there anything below that is suboptimal or could be done better? Is anything likely to break because it's relying on low-level information that could change (e.g. block key)?
def my_overlap(data, chunk_xy, depth_xy):
data = data.rechunk((-1,-1,-1, chunk_xy, chunk_xy))
data_overlapping_chunks = da.overlap.overlap(data,
depth=(0,0,0,depth_xy,depth_xy),
boundary={3: 'reflect', 4: 'reflect'})
dfs = []
for block in data_overlapping_chunks.to_delayed().ravel():
offsets = np.array(block.key[1:]) * np.array(data.chunksize)
df_block = dask.delayed(extract_features)(block, offsets=offsets, depth_xy=depth_xy)
dfs.append(df_block)
# All computation is delayed, so downstream comptutions need to know the format of the data. If the meta
# information is not specified, a single computation will be done (which could be expensive) at this point
# to infer the metadata.
# This empty dataframe has the index, column, and type information we expect in the computation.
columns = ['r', 'c', 'z', 'y', 'x']
# The dtypes are float64, except for a small number of columns
df_meta = pd.DataFrame(columns=columns, dtype=np.float64)
df_meta = df_meta.astype({'c': np.int64, 'r': np.int64})
df_meta.index.name = 'feature'
return dd.from_delayed(dfs, meta=df_meta)
def extract_features(chunk, offsets, depth_xy):
r, c, z, y, x = np.nonzero(chunk)
df = pd.DataFrame({'r': r, 'c': c, 'z': z, 'y': y+offsets[3]-depth_xy,
'x': x+offsets[4]-depth_xy})
df = df[(df.y > depth_xy) & (df.y < (chunk.shape[3] - depth_xy)) &
(df.z > depth_xy) & (df.z < (chunk.shape[4] - depth_xy))]
return df
data = np.zeros((2,4,8,16,16)) # round, channel, z, y, x
data[0,0,4,2,2] = 1
data[0,1,4,6,2] = 1
data[1,2,4,8,2] = 1
data[0,3,4,2,2] = 1
arr = da.from_array(data)
df = my_overlap(arr, chunk_xy=8, depth_xy=2)
df.compute().reset_index()
First of all, thanks for posting your code. I am working on a similar problem and this was really helpful for me.
When testing your code, I discovered a few mistakes in the extract_features function that prevent your code from returning correct indices.
Here is a corrected version:
def extract_features(chunk, offsets, depth_xy):
r, c, z, y, x = np.nonzero(chunk)
df = pd.DataFrame({'r': r, 'c': c, 'z': z, 'y': y, 'x': x})
df = df[(df.y >= depth_xy) & (df.y < (chunk.shape[3] - depth_xy)) &
(df.x >= depth_xy) & (df.x < (chunk.shape[4] - depth_xy))]
df['y'] = df['y'] + offsets[3] - depth_xy
df['x'] = df['x'] + offsets[4] - depth_xy
return df
The updated code now returns the indices that were set to 1:
index r c z y x
0 0 0 0 4 2 2
1 1 0 1 4 6 2
2 2 0 3 4 2 2
3 1 1 2 4 8 2
For comparison, this is the output of the original version:
index r c z y x
0 1 0 1 4 6 2
1 3 1 2 4 8 2
2 0 0 1 4 6 2
3 1 1 2 4 8 2
It returns lines number 2 and 4, two times each.
The reason why this happens is three mistakes in the extract_features function:
You first add the offset and subtract the depth and then filter out the overlapping parts: the order needs to be swapped
df.y > depth_xy should be replaced with df.y >= depth_xy
df.z should be replaced with df.x, since it is the x dimension that has an overlap
To optimize this even further, here is a generalized version of the code that would work for an arbitrary number of dimension:
def extract_features_generalized(chunk, offsets, depth, columns):
coordinates = np.nonzero(chunk)
df = pd.DataFrame()
rows_to_keep = np.ones(len(coordinates[0]), dtype=int)
for i in range(len(columns)):
df[columns[i]] = coordinates[i]
rows_to_keep = rows_to_keep * np.array((df[columns[i]] >= depth[i])) * \
np.array((df[columns[i]] < (chunk.shape[i] - depth[i])))
df[columns[i]] = df[columns[i]] + offsets[i] - depth[i]
del coordinates
return df[rows_to_keep > 0]
def my_overlap_generalized(data, chunksize, depth, columns):
data = data.rechunk(chunksize)
data_overlapping_chunks = da.overlap.overlap(data, depth=depth,
boundary=tuple(['reflect']*len(columns)))
dfs = []
for block in data_overlapping_chunks.to_delayed().ravel():
offsets = np.array(block.key[1:]) * np.array(data.chunksize)
df_block = dask.delayed(extract_features_generalized)(block, offsets=offsets,
depth=depth, columns=columns)
dfs.append(df_block)
return dd.from_delayed(dfs)
data = np.zeros((2,4,8,16,16))
data[0,0,4,2,2] = 1
data[0,1,4,6,2] = 1
data[1,2,4,8,2] = 1
data[0,3,4,2,2] = 1
arr = da.from_array(data)
df = my_overlap_generalized(arr, chunksize=(-1,-1,-1,8,8),
depth=(0,0,0,2,2), columns=['r', 'c', 'z', 'y', 'x'])
df.compute().reset_index()

sklearn oneclass svm KeyError

My Dataset is a set of system calls for both malware and benign, I preprocessed it and now it looks like this
NtQueryPerformanceCounter
NtProtectVirtualMemory
NtProtectVirtualMemory
NtQuerySystemInformation
NtQueryVirtualMemory
NtQueryVirtualMemory
NtProtectVirtualMemory
NtOpenKey
NtOpenKey
NtOpenKey
NtQuerySecurityAttributesToken
NtQuerySecurityAttributesToken
NtQuerySystemInformation
NtQuerySystemInformation
NtAllocateVirtualMemory
NtFreeVirtualMemory
Now I'm using tfidf to extract the features and then use ngram to make a sequence of them
from __future__ import print_function
import numpy as np
import pandas as pd
from time import time
import matplotlib.pyplot as plt
from sklearn import svm, datasets
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.utils import shuffle
from sklearn.svm import OneClassSVM
nGRAM1 = 8
nGRAM2 = 10
weight = 4
main_corpus_MAL = []
main_corpus_target_MAL = []
main_corpus_BEN = []
main_corpus_target_BEN = []
my_categories = ['benign', 'malware']
# feeding corpus the testing data
print("Loading system call database for categories:")
print(my_categories if my_categories else "all")
import glob
import os
malCOUNT = 0
benCOUNT = 0
for filename in glob.glob(os.path.join('C:\\Users\\alika\\Documents\\testingSVM\\sysMAL', '*.txt')):
fMAL = open(filename, "r")
aggregate = ""
for line in fMAL:
linea = line[:(len(line)-1)]
aggregate += " " + linea
main_corpus_MAL.append(aggregate)
main_corpus_target_MAL.append(1)
malCOUNT += 1
for filename in glob.glob(os.path.join('C:\\Users\\alika\\Documents\\testingSVM\\sysBEN', '*.txt')):
fBEN = open(filename, "r")
aggregate = ""
for line in fBEN:
linea = line[:(len(line) - 1)]
aggregate += " " + linea
main_corpus_BEN.append(aggregate)
main_corpus_target_BEN.append(0)
benCOUNT += 1
# weight as determined in the top of the code
train_corpus = main_corpus_BEN[:(weight*len(main_corpus_BEN)//(weight+1))]
train_corpus_target = main_corpus_target_BEN[:(weight*len(main_corpus_BEN)//(weight+1))]
test_corpus = main_corpus_MAL[(len(main_corpus_MAL)-(len(main_corpus_MAL)//(weight+1))):]
test_corpus_target = main_corpus_target_MAL[(len(main_corpus_MAL)-len(main_corpus_MAL)//(weight+1)):]
def size_mb(docs):
return sum(len(s.encode('utf-8')) for s in docs) / 1e6
# size of datasets
train_corpus_size_mb = size_mb(train_corpus)
test_corpus_size_mb = size_mb(test_corpus)
print("%d documents - %0.3fMB (training set)" % (
len(train_corpus_target), train_corpus_size_mb))
print("%d documents - %0.3fMB (test set)" % (
len(test_corpus_target), test_corpus_size_mb))
print("%d categories" % len(my_categories))
print()
print("Benign Traces: "+str(benCOUNT)+" traces")
print("Malicious Traces: "+str(malCOUNT)+" traces")
print()
print("Extracting features from the training data using a sparse vectorizer...")
t0 = time()
vectorizer = TfidfVectorizer(ngram_range=(nGRAM1, nGRAM2), min_df=1, use_idf=True, smooth_idf=True) ##############
analyze = vectorizer.build_analyzer()
X_train = vectorizer.fit_transform(train_corpus)
duration = time() - t0
print("done in %fs at %0.3fMB/s" % (duration, train_corpus_size_mb / duration))
print("n_samples: %d, n_features: %d" % X_train.shape)
print()
print("Extracting features from the test data using the same vectorizer...")
t0 = time()
X_test = vectorizer.transform(test_corpus)
duration = time() - t0
print("done in %fs at %0.3fMB/s" % (duration, test_corpus_size_mb / duration))
print("n_samples: %d, n_features: %d" % X_test.shape)
print()
The output is:
Loading system call database for categories:
['benign', 'malware']
177 documents - 45.926MB (training set)
44 documents - 12.982MB (test set)
2 categories
Benign Traces: 72 traces
Malicious Traces: 150 traces
Extracting features from the training data using a sparse vectorizer...
done in 7.831695s at 5.864MB/s
n_samples: 177, n_features: 603170
Extracting features from the test data using the same vectorizer...
done in 1.624100s at 7.993MB/s
n_samples: 44, n_features: 603170
Now for the learning section I'm trying to use sklearn OneClassSVM:
print("==================\n")
print("Training: ")
classifier = OneClassSVM(kernel='linear', gamma='auto')
classifier.fit(X_test)
fraud_pred = classifier.predict(X_test)
unique, counts = np.unique(fraud_pred, return_counts=True)
print (np.asarray((unique, counts)).T)
fraud_pred = pd.DataFrame(fraud_pred)
fraud_pred= fraud_pred.rename(columns={0: 'prediction'})
main_corpus_target = pd.DataFrame(main_corpus_target)
main_corpus_target= main_corpus_target.rename(columns={0: 'Category'})
this the output to fraud_pred and main_corpus_target
prediction
0 1
1 -1
2 1
3 1
4 1
5 -1
6 1
7 -1
...
30 rows * 1 column
====================
Category
0 1
1 1
2 1
3 1
4 1
...
217 0
218 0
219 0
220 0
221 0
222 rows * 1 column
but when i try to calculate TP,TN,FP,FN:
##Performance check of the model
TP = FN = FP = TN = 0
for j in range(len(main_corpus_target)):
if main_corpus_target['Category'][j]== 0 and fraud_pred['prediction'][j] == 1:
TP = TP+1
elif main_corpus_target['Category'][j]== 0 and fraud_pred['prediction'][j] == -1:
FN = FN+1
elif main_corpus_target['Category'][j]== 1 and fraud_pred['prediction'][j] == 1:
FP = FP+1
else:
TN = TN +1
print (TP, FN, FP, TN)
I get this error:
KeyError Traceback (most recent call last)
<ipython-input-32-1046cc75ba83> in <module>
7 elif main_corpus_target['Category'][j]== 0 and fraud_pred['prediction'][j] == -1:
8 FN = FN+1
----> 9 elif main_corpus_target['Category'][j]== 1 and fraud_pred['prediction'][j] == 1:
10 FP = FP+1
11 else:
c:\users\alika\appdata\local\programs\python\python36\lib\site-packages\pandas\core\series.py in __getitem__(self, key)
1069 key = com.apply_if_callable(key, self)
1070 try:
-> 1071 result = self.index.get_value(self, key)
1072
1073 if not is_scalar(result):
c:\users\alika\appdata\local\programs\python\python36\lib\site-packages\pandas\core\indexes\base.py in get_value(self, series, key)
4728 k = self._convert_scalar_indexer(k, kind="getitem")
4729 try:
-> 4730 return self._engine.get_value(s, k, tz=getattr(series.dtype, "tz", None))
4731 except KeyError as e1:
4732 if len(self) > 0 and (self.holds_integer() or self.is_boolean()):
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_value()
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_value()
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.Int64HashTable.get_item()
pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.Int64HashTable.get_item()
KeyError: 30
1) I know the error is because it's trying to access a key that isn’t in a dictionary, but i can't just insert some numbers in the fraud_pred to handle this issue, any suggestions??
2) Am i doing anything wrong that they don't match?
3) I want to compare the results to other one class classification algorithms, Due to my method, what are the best ones that i can use??
Edit: Before calculating the metrics:
You could change your fit and predict functions to:
fraud_pred = classifier.fit_predict(X_test)
Also, your main_corpus_target and X_test should have the same length, can you put the code where you create main_corpus_target please?
its created it right after the benCOUNT += 1:
main_corpus_target = main_corpus_target_MAL main_corpus_target.extend(main_corpus_target_BEN)
This means that you are creating a main_corpus_target that includes MAL and BEN, and the error you get is:
ValueError: Found input variables with inconsistent numbers of samples: [30, 222]
The number of samples of fraud_pred is 30, so you should evaluate them with an array of 30. main_corpus_target contains 222.
Watching your code, I see that you want to evaluate the X_test, which is related to test_corpus X_test = vectorizer.transform(test_corpus). It would be better to compare your results to test_corpus_target, which is the target variable of your dataset and also has a length of 30.
These two lines that you have should output the same length:
test_corpus = main_corpus_MAL[(len(main_corpus_MAL)-(len(main_corpus_MAL)//(weight+1))):]
test_corpus_target = main_corpus_target_MAL[(len(main_corpus_MAL)-len(main_corpus_MAL)//(weight+1)):]
May I ask why are you calculating the TP, TN... by yourself?
You have a faster option:
Transform the fraud_pred series, replacing the -1 to 0.
Use the confusion matrix function that sklearn offers.
Use ravel to extract the values of the confusion matrix.
An example, after transforming the -1 to 0:
from sklearn.metrics import confusion_matrix
tn, fp, fn, tp = confusion_matrix(fraud_pred, main_corpus_target['Category'].values).ravel()
Also, if you are using the last pandas version:
from sklearn.metrics import confusion_matrix
tn, fp, fn, tp = confusion_matrix(fraud_pred, main_corpus_target['Category'].to_numpy()).ravel()

How to create combined ROC Curve for 2 classifiers and two different data set

I have a dataset of 1127 patients. My goal was to classify each patient to 0 or 1.
I have two different classifiers but with the same purpose - to classify the patient to 0 or 1.
I've run one classifier on 364 patients and the second classifier on the 763 patients.
for each classifier\group, I generated the ROC curve.
Now, I would like to combine the curves.
someone could guide me on how to do it?
I'm thinking of calculating the weighted FPR and TPR, but I'm not sure how to do it.
The number of FPR\TPR pairs is different between the curves (The first ROC curve based on 312 pairs and the second ROC curve based on 666 pairs).
Thanks!!!
Imports
import numpy as np
from sklearn.metrics import roc_curve, auc
import matplotlib.pyplot as plt
Data generation
# simulate first dataset with 364 obs
df1 = \
pd.DataFrame(i for i in range(364))
df1['predict_proba_1'] = np.random.normal(0,1,len(df1))
df1['epsilon'] = np.random.normal(0,1,len(df1))
df1['true'] = (0.7*df1['epsilon'] < df1['predict_proba_1']) * 1
df1 = df1.drop(columns=[0, 'epsilon'])
# simulate second dataset with 763 obs
df2 = \
pd.DataFrame(i for i in range(763))
df2['predict_proba_2'] = np.random.normal(0,1,len(df2))
df2['epsilon'] = np.random.normal(0,1,len(df2))
df2['true'] = (0.7*df2['epsilon'] < df2['predict_proba_2']) * 1
df2 = df2.drop(columns=[0, 'epsilon'])
Quick look at generated data
df1
predict_proba_1 true
0 1.234549 1
1 -0.586544 0
2 -0.229539 1
3 0.132185 1
4 -0.411284 0
.. ... ...
359 -0.218775 0
360 -0.985565 0
361 0.542790 1
362 -0.463667 0
363 1.119244 1
[364 rows x 2 columns]
df2
predict_proba_2 true
0 0.278755 1
1 0.653663 0
2 -0.304216 1
3 0.955658 1
4 -1.341669 0
.. ... ...
758 1.359606 1
759 -0.605894 0
760 0.379738 0
761 1.571615 1
762 -1.102565 0
[763 rows x 2 columns]
Necessary functions
def show_ROCs(scores_list: list, ys_list: list, labels_list:list = None):
"""
This function plots a couple of ROCs. Corresponding labels are optional.
Parameters
----------
scores_list : list of array-likes with scorings or predicted probabilities.
ys_list : list of array-likes with ground true labels.
labels_list : list of labels to be displayed in plotted graph.
Returns
----------
None
"""
if len(scores_list) != len(ys_list):
raise Exception('len(scores_list) != len(ys_list)')
fpr_dict = dict()
tpr_dict = dict()
for x in range(len(scores_list)):
fpr_dict[x], tpr_dict[x], _ = roc_curve(ys_list[x], scores_list[x])
for x in range(len(scores_list)):
try:
plot_ROC(fpr_dict[x], tpr_dict[x], str(labels_list[x]) + ' AUC:' + str(round(auc(fpr_dict[x], tpr_dict[x]),3)))
except:
plot_ROC(fpr_dict[x], tpr_dict[x], str(x) + ' ' + str(round(auc(fpr_dict[x], tpr_dict[x]),3)))
plt.show()
def plot_ROC(fpr, tpr, label):
"""
This function plots a single ROC. Corresponding label is optional.
Parameters
----------
fpr : array-likes with fpr.
tpr : array-likes with tpr.
label : label to be displayed in plotted graph.
Returns
----------
None
"""
plt.figure(1)
plt.plot([0, 1], [0, 1], 'k--')
plt.plot(fpr, tpr, label=label)
plt.xlabel('False positive rate')
plt.ylabel('True positive rate')
plt.title('ROC curve')
plt.legend(loc='best')
Plotting
show_ROCs(
[df1['predict_proba_1'], df2['predict_proba_2']],
[df1['true'], df2['true']],
['df1 with {} obs'.format(len(df1)), 'df2 with {} obs'.format(len(df2))]
)

Torch: Concatenating tensors of different dimensions

I have a x_at_i = torch.Tensor(1,i) that grows at every iteration where i = 0 to n. I would like to concatenate all tensors of different sizes into a matrix and fill the remaining cells with zeroes. What is the most idiomatic way to this. For example:
x_at_1 = 1
x_at_2 = 1 2
x_at_3 = 1 2 3
x_at_4 = 1 2 3 4
X = torch.cat(x_at_1, x_at_2, x_at_3, x_at_4)
X = [ 1 0 0 0
1 2 0 0
1 2 3 0
1 2 3 4 ]
If you know n and assuming you have access to your x_at_i easily at each iteration I would try something like
X = torch.Tensor(n, n):zero()
for i = 1, n do
X[i]:narrow(1, 1, i):copy(x_at[i])
end

Torch tensors swapping dimensions

I came across these two lines (back-to-back) of code in a torch project:
im4[{1,{},{}}] = im3[{3,{},{}}]
im4[{3,{},{}}] = im3[{1,{},{}}]
What do these two lines do? I assumed they did some sort of swapping.
This is covered in indexing in the Torch Tensor Documentation
Indexing using the empty table {} is shorthand for all indices in that dimension. Below is a demo which uses {} to copy an entire row from one matrix to another:
> a = torch.Tensor(3, 3):fill(0)
0 0 0
0 0 0
0 0 0
> b = torch.Tensor(3, 3)
> for i=1,3 do for j=1,3 do b[i][j] = (i - 1) * 3 + j end end
> b
1 2 3
4 5 6
7 8 9
> a[{1, {}}] = b[{3, {}}]
> a
7 8 9
0 0 0
0 0 0
This assignment is equivalent to: a[1] = b[3].
Your example is similar:
im4[{1,{},{}}] = im3[{3,{},{}}]
im4[{3,{},{}}] = im3[{1,{},{}}]
which is more clearly stated as:
im4[1] = im3[3]
im4[3] = im3[1]
The first line assigns the values from im3's third row (a 2D sub-matrix) to im4's first row and the second line assigns the first row of im3 to the third row of im4.
Note that this is not a swap, as im3 is never written and im4 is never read from.

Resources