Using graph properties as an input in Neural Network - machine-learning

I have almost 1000 of pandas DataFrame, which I converted into graphs. Now I can access edges and nodes of those graphs. For one DataFrame it looks like following:
nx.edges(FG)
Out[59]: EdgeView([('Dy0O7', 'Dy1O6'), ('Dy0O7', 'Dy2O6'), ('Dy0O7', 'Dy3O7'), ('Dy0O7', 'Dy4O6'), ('Dy1O6', 'Dy3O7'), ('Dy1O6', 'Dy5O6'), ('Dy2O6', 'Dy4O6'), ('Dy3O7', 'Dy4O6'), ('Dy3O7', 'Dy5O6')])
nx.nodes(FG)
Out[61]: NodeView(('Dy0O7', 'Dy1O6', 'Dy2O6', 'Dy3O7', 'Dy4O6', 'Dy5O6'))
Also I can have adjancy view, which gives the information about connected nodes with corresponding weights.
FG.adj
Out[64]: AdjacencyView({'Dy0O7': {'Dy1O6': {'weight': 3.0}, 'Dy2O6': {'weight': 1.0}, 'Dy3O7': {'weight': 2.0}, 'Dy4O6': {'weight': 1.0}}, 'Dy1O6': {'Dy0O7': {'weight': 3.0}, 'Dy3O7': {'weight': 1.0}, 'Dy5O6': {'weight': 1.0}}, 'Dy2O6': {'Dy0O7': {'weight': 1.0}, 'Dy4O6': {'weight': 1.0}}, 'Dy3O7': {'Dy0O7': {'weight': 2.0}, 'Dy1O6': {'weight': 1.0}, 'Dy4O6': {'weight': 3.0}, 'Dy5O6': {'weight': 1.0}}, 'Dy4O6': {'Dy0O7': {'weight': 1.0}, 'Dy2O6': {'weight': 1.0}, 'Dy3O7': {'weight': 3.0}}, 'Dy5O6': {'Dy1O6': {'weight': 1.0}, 'Dy3O7': {'weight': 1.0}}})
I want to use such graph properties as an input in Machine learning algorithm such as NN, How can we do that?

There are a bunch of algorithms out there to convert Graph into feature vectors. Two famous examples are:
Node2vec from Stanford
Deepwalk from Univ. Sherbrooke
Their implementation exists on GitHub.
The underlying idea between these methods are almost the same: 1) Random walk 2) generate some sequences 3) word2vec (skip-gram) or other DL methods 4) use output as a feature vector in other Task.

Related

How to properly use AdaptiveAvgPool for global pooling?

I am trying to convert a model which uses Flatten/Linear as the final layer to use global pooling with AdapativeAvgPool1d/Linear. The output dimensions of the Linear layer after the global pooling are messing up the training epochs. I get the following error:
ValueError: operands could not be broadcast together with shapes (64,4) (64,)
Model with Flatten-->Linear (works)
conv1d --> relu --> maxpool1d --> Flatten --> Linear:
model = nn.Sequential(
nn.Conv1d(in_channels=1, out_channels=32, kernel_size=128, stride=16, padding=1),
nn.ReLU(),
nn.MaxPool1d(kernel_size=2, stride=2),
nn.Flatten(),
nn.LazyLinear(n_classes)
)
==========================================================================================
Layer (type:depth-idx) Output Shape Param #
==========================================================================================
Sequential -- --
├─Conv1d: 1-1 [64, 32, 505] 4,128
├─ReLU: 1-2 [64, 32, 505] --
├─MaxPool1d: 1-3 [64, 32, 252] --
├─Flatten: 1-4 [64, 8064] --
├─Linear: 1-5 [64, 4] 32,260
==========================================================================================
Model with AdaptiveAvgPool1d-->Linear (output dimension wrong)
I want the output of this implementation to match that of the previous one, where the output shape coming out of the Linear layer is [64,4]
model = nn.Sequential(
nn.Conv1d(in_channels=1, out_channels=32, kernel_size=128, stride=16, padding=1),
nn.ReLU(),
nn.MaxPool1d(kernel_size=2, stride=2),
nn.AdaptiveAvgPool1d(1),
nn.LazyLinear(n_classes)
)
==========================================================================================
Layer (type:depth-idx) Output Shape Param #
==========================================================================================
Sequential -- --
├─Conv1d: 1-1 [64, 32, 505] 4,128
├─ReLU: 1-2 [64, 32, 505] --
├─MaxPool1d: 1-3 [64, 32, 252] --
├─AdaptiveAvgPool1d: 1-4 [64, 32, 1] --
├─Linear: 1-5 [64, 32, 4] 8
==========================================================================================
You can't replace nn.Flatten with nn.AdaptiveAvgPool1d because they don't do the same thing. You still need to add nn.Flatten() after nn.AdaptiveAvgPool1d to have the same output shape.

How to reduce the `dask_ml.xgboost` worker memory consumption?

I've been testing the dask_ml.xgboost regressor on a synthetic 10GB dataset. When training, the memory usage of the workers exceeds the amount available on my local laptop. I am aware that I can try running on an online dask cluster with larger memory, or that I can sample the data (and ignore the rest) before training. But is there a different solution? I tried limiting the number and the depth of the trees generated, subsampling the rows and columns, and changing the tree construction algorithm but the workers still run out of memory.
Given a fixed memory allocation, is there a way to reduce the memory consumption of each worker when training dask_ml.xgboost?
Here is a code snippet:
import dask.dataframe as dd
from dask.distributed import Client
from dask_ml.xgboost import XGBRegressor
client = Client(memory_limit='7GB')
ddf = dd.read_csv('10GB_float.csv')
X = ddf[ddf.columns.difference(['float_1'])].persist()
y = ddf['float_1'].persist()
reg = XGBRegressor(
objective='reg:squarederror', n_estimators=10, max_depth=2, tree_method='hist',
subsample=0.001, colsample_bytree=0.5, colsample_bylevel=0.5,
colsample_bynode=0.5, n_jobs=-1)
reg.fit(X, y)
The synthetic dataset 10GB_float.csv has 50 columns and 26758707 rows containing random floats (float64) ranging from 0 to 1. Below are the cluster details:
Cluster
Workers: 4
Cores: 12
Memory: 28.00 GB
And some information about my local laptop:
Memory: 31.1 GiB
Processor: Intel® Core™ i7-8750H CPU # 2.20GHz × 12
Additionally, here are the parameters of XGBRegressor (using .get_params()):
{'base_score': 0.5,
'booster': 'gbtree',
'colsample_bylevel': 0.5,
'colsample_bynode': 0.5,
'colsample_bytree': 0.5,
'gamma': 0,
'importance_type': 'gain',
'learning_rate': 0.1,
'max_delta_step': 0,
'max_depth': 2,
'min_child_weight': 1,
'missing': None,
'n_estimators': 10,
'n_jobs': -1,
'nthread': None,
'objective': 'reg:squarederror',
'random_state': 0,
'reg_alpha': 0,
'reg_lambda': 1,
'scale_pos_weight': 1,
'seed': None,
'silent': None,
'subsample': 0.001,
'verbosity': 1,
'tree_method': 'hist'}
Thank you very much for your time!

Convolutional Autoencoder in Pytorch for Dummies

I am here to ask some more general questions about Pytorch and Convolutional Autoencoders.
If I only use Convolutional Layers (FCN), do I even have to care about the input shape? And then how do I choose the number of featuremaps best?
Does a ConvTranspose2d Layer automatically unpool?
Can you spot any errors or unconventional code in my example?
By the way, I want to make a symmetrical Convolutional Autoencoder to colorize black and white images with different image sizes.
self.encoder = nn.Sequential (
# conv 1
nn.Conv2d(in_channels=3, out_channels=512, kernel_size=3, stride=1, padding=1),
nn.ReLU,
nn.MaxPool2d(kernel_size=2, stride=2), # 1/2
nn.BatchNorm2d(512),
# conv 2
nn.Conv2d(in_channels=512, out_channels=256, kernel_size=3, stride=1, padding=1),
nn.ReLU,
nn.MaxPool2d(kernel_size=2, stride=2), # 1/4
nn.BatchNorm2d(256),
# conv 3
nn.Conv2d(in_channels=256, out_channels=128, kernel_size=3, stride=1, padding=1),
nn.ReLU,
nn.MaxPool2d(kernel_size=2, stride=2), # 1/8
nn.BatchNorm2d(128),
# conv 4
nn.Conv2d(in_channels=128, out_channels=64, kernel_size=3, stride=1, padding=1),
nn.ReLU,
nn.MaxPool2d(kernel_size=2, stride=2), #1/16
nn.BatchNorm2d(64)
)
self.encoder = nn.Sequential (
# conv 5
nn.ConvTranspose2d(in_channels=64, out_channels=128, kernel_size=3, stride=1, padding=1),
nn.ReLU,
nn.BatchNorm2d(128),
# conv 6
nn.ConvTranspose2d(in_channels=128, out_channels=256, kernel_size=3, stride=1, padding=1),
nn.ReLU,
nn.BatchNorm2d(256),
# conv 7
nn.ConvTranspose2d(in_channels=256, out_channels=512, kernel_size=3, stride=1, padding=1),
nn.ReLU,
nn.BatchNorm2d(512),
# conv 8
nn.ConvTranspose2d(in_channels=512, out_channels=512, kernel_size=3, stride=1, padding=1),
nn.Softmax()
)
def forward(self, x):
h = x
h = self.encoder(h)
h = self.decoder(h)
return h
No, you don't need to care about input width and height with a fully convolutional model. But should probably ensure that each downsampling operation in the encoder is matched by a corresponding upsampling operation in the decoder.
I'm not sure what you mean by unpooling. If you mean upsampling (increasing spatial dimensions), then this is what the stride parameter is for. In PyTorch, a transpose convolution with stride=2 will upsample twice. Note, however, that instead of a transpose convolution, many practitioners prefer to use bilinear upsampling followed by a regular convolution. This is one reason why.
If, on the other hand, you mean actual unpooling, then you should look at the documentation of torch.MaxUnpool2d. You need to collect maximal value indices from the MaxPool2d operation and feed them into MaxUnpool2d.
The general consensus seems to be that you should increase the number of feature maps as you downsample. Your code appears to do the reverse. Consecutive powers of 2 seem like a good place to start. It's hard to suggest a better rule of thumb. You probably need to experiment a little.
In other notes, I'm not sure why you apply softmax to the encoder output.

How to use (learning data preparations) multilayered perseptron for anomaly detection?

I have a task to use machine learning for anomaly detection.
I have data as sales count info like that:
{ 5/*bread*/, 10/*milk*/, 2/*potato*/, .../*other products*/ },
{ 6, 9, 3, ... }, { 5, 12, 1, ... },
{ 10/*bread sales count is anomaly high*/, 10, 2, ... },
{ 4, 8, 3, ... }
I coded learning set generator based on idea to turn certain product sales count into array { 5, 6, 5, 10, 4, ... }, calc mean value (let's assume it 5), turn array into percents diff from mean value { 0%, 20%, 0%, 100%, -20%, ... }, so if array does not contains values higher by module than 5%, then there is no anomalies in row. I know I can use simplest self-made functionc to check this, but I HAVE a task to use machine learning for that.
My generator were making sequences like { 1%, 3%, -2%, 5%, 1%, ...} and were labeling them as good. That way it made around 1k good sequences.
After this generator starts to generate anomal sequences by modifing good ones into this: { 24%, 3%, -2%, 5%, 1%, ... }, { -24%, 3%, -2%, 5%, 1%, ...}, { 1%, 24%, -2%, 5%, 1%, ... }, { 1%, -24%, -2%, 5%, 1%, ...}, ..., { 1%, 3%, -2%, 5%, -100% }
Later I transformed this percents into [0, 1] range and fed to multilayer perceptron with 128 neurons in second layer, 32 in third and 2 in output (good or anomaly). After learning I got around 50% recognition rate that is terribly bad.
Then I modified my generator to generate 1k good sequences like that { 1%, 3%, -2%, 5%, 1%, ...} and 1k bad sequences like that { 25%, 50%, -60%, 40%, -80%, ...}. The recognition rate still were around 50%.
Which way is possible to generate learning set, so later network will tell that { 1%, 3%, -2%, 5%, 1%, ...} is good and any of { 1%, -24%, -2%, 5%, 1%, ...} is bad?

Grid search for hyper-paramerters in torch / lua

I am new in torch/lua and am trying evaluate some different optimization algorithms and different parameters for each of them.
Algo: optim.sgd optim.lbfgs
Parameters:
learning_rate: {1e-1, 1e-2, 1e-3}
weight_decay: {1e-1, 1e-2}
So what I am trying to achieve is try every combination of the hyper-parameters and get the optimal parameter set for each of the algorithm.
So is there something like:
param_grid = [
{'C': [1, 10, 100, 1000], 'kernel': ['linear']},
{'C': [1, 10, 100, 1000], 'gamma': [0.001, 0.0001], 'kernel': ['rbf']},
]
as in http://scikit-learn.org/stable/modules/grid_search.html available in torch to deal with it?
Any suggestions would be nice!
Try this hyper-optimization library that is being worked on:
https://github.com/nicholas-leonard/hypero

Resources