FastAI Question on data loading using TextList - machine-learning

My end-goal to implement ULMFit using FastAI to predict disaster tweets(as a part of this Kaggle competition). What I'm trying to do is read the tweets from a Dataframe. But for reasons unknown to me, I'm stuck at the data loading stage. I'm simply unable to do so using the below method -
from fastai.text.all import *
train= pd.read_csv('../input/nlp-getting-started/train.csv')
dls_lm = (TextList.from_df(path,train,cols='text',is_lm=True)
.split_by_rand_pct(0.1)
#.label_for_lm()
.databunch(bs=64))
This line throws - NameError: name 'TextList' is not defined.
I'm able to work around this problem with the below code -
dls_lm = DataBlock(
blocks=TextBlock.from_df('text', is_lm=True),
get_x=ColReader('text'),
splitter=RandomSplitter(0.1)
# using only 10% of entire comments data for validation inorder to learn more
)
dls_lm = dls_lm.dataloaders(train, bs=64, seq_len=72)
Why does this work and not the previous method?
Notebook Link for reference.

Which version of fastai are you running?
import fastai
print(fastai.__version__)
TextList class is from FastAI v1, but it seems to me your import path is for Fastai v2, and in v2, TextList is changed with https://docs.fast.ai/text.data.html#TextBlock (thats why it's working with the Datablock part wich is the good way to handle this)

Related

Where is MobyLCPSolver?

Where is MobyLCPSolver?
ImportError: cannot import name 'MobyLCPSolver' from 'pydrake.all' (/home/docker/drake/drake-build/install/lib/python3.8/site-packages/pydrake/all.py)
I have the latest Drake and cannot import it.
Can anyone help?
As of pydrake v1.12.0, the MobyLcp C++ API is not bound in Python.
However, if you feed an LCP into Solve() then Drake can choose Moby to solve it. You can take advantage of this to create an instance of MobyLCP:
import numpy as np
from pydrake.all import (
ChooseBestSolver,
MakeSolver,
MathematicalProgram,
)
prog = MathematicalProgram()
x = prog.NewContinuousVariables(2)
prog.AddLinearComplementarityConstraint(np.eye(2), np.array([1, 2]), x)
moby_id = ChooseBestSolver(prog)
moby = MakeSolver(moby_id)
print(moby.SolverName())
# The output is: "Moby LCP".
# The C++ type of the `moby` object is drake::solvers::MobyLCP.
That only allows for calling Moby via the MathematicalProgram interface, however. To call any MobyLCP-specific C++ functions like SolveLcpFastRegularized, those would need to be added to the bindings code specifically before they could be used.
You can file a feature request on the Drake GitHub page when you need access to C++ classes or functions that aren't bound into Python yet, or even better you can open a pull request with the bindings that you need.

Missing -symbol.json error when trying to compile a SageMaker semantic segmentation model (built-in algorithm) with SageMaker Neo

I have trained a SageMaker semantic segmentation model, using the built-in sagemaker semantic segmentation algorithm. This deploys ok to a SageMaker endpoint and I can run inference in the cloud successfully from it.
I would like to use the model on a edge device (AWS Panorama Appliance) which should just mean compiling the model with SageMaker Neo to the specifications of the target device.
However, regardless of what my target device is (the Neo settings), I cant seem to compile the model with Neo as I get the following error:
ClientError: InputConfiguration: No valid Mxnet model file -symbol.json found
The model.tar.gz for semantic segmentation models contains hyperparams.json, model_algo-1, model_best.params. According to the docs, model_algo-1 is the serialized mxnet model. Aren't gluon models supported by Neo?
Incidentally I encountered the exact same problem with another SageMaker built-in algorithm, the k-Nearest Neighbour (k-NN). It too seems to be compiled without a -symbol.json.
Is there some scripts I can run to recreated a -symbol.json file or convert the compiled sagemaker model?
After building my model with an Estimator, I got to compile it in SageMaker Neo with code:
optimized_ic = my_estimator.compile_model(
target_instance_family="ml_c5",
target_platform_os="LINUX",
target_platform_arch="ARM64",
input_shape={"data": [1,3,512,512]},
output_path=s3_optimized_output_location,
framework="mxnet",
framework_version="1.8",
)
I would expect this to compile ok, but that is where I get the error saying the model is missing the *-symbol.json file.
For some reason, AWS has decided to not make its built-in algorithms directly compatible with Neo... However, you can re-engineer the network parameters using the model.tar.gz output file and then compile.
Step 1: Extract model from tar file
import tarfile
#path to local tar file
model = 'ss_model.tar.gz'
#extract tar file
t = tarfile.open(model, 'r:gz')
t.extractall()
This should output two files:
model_algo-1, model_best.params
Load weights into network from gluon model zoo for the architecture that you chose
In this case I used DeepLabv3 with resnet50
import gluoncv
import mxnet as mx
from gluoncv import model_zoo
from gluoncv.data.transforms.presets.segmentation import test_transform
model = model_zoo.DeepLabV3(nclass=2, backbone='resnet50', pretrained_base=False, height=800, width=1280, crop_size=240)
model.load_parameters("model_algo-1")
Check the parameters have loaded correctly by making a prediction with new model
Use an image that was used for training.
#use cpu
ctx = mx.cpu(0)
#decode image bytes of loaded file
img = image.imdecode(imbytes)
#transform image
img = test_transform(img, ctx)
img = img.astype('float32')
print('tranformed image shape: ', img.shape)
#get prediction
output = model.predict(img)
Hybridise model into output required by Sagemaker Neo
Additional check for image shape compatibility
model.hybridize()
model(mx.nd.ones((1,3,800,1280)))
export_block('deeplabv3-res50', model, data_shape=(3,800,1280), preprocess=None, layout='CHW')
Recompile model into tar.gz format
This contains the params and json file which Neo looks for.
tar = tarfile.open("comp_model.tar.gz", "w:gz")
for name in ["deeplabv3-res50-0000.params", "deeplabv3-res50-symbol.json"]:
tar.add(name)
tar.close()
Save tar.gz file to s3 and then compile using Neo GUI

In PyTorch, how to convert the cuda() related codes into CPU version?

I have some existing PyTorch codes with cuda() as below, while net is a MainModel.KitModel object:
net = torch.load(model_path)
net.cuda()
and
im = cv2.imread(image_path)
im = Variable(torch.from_numpy(im).unsqueeze(0).float().cuda())
I want to test the code in a machine without any GPU, so I want to convert the cuda-code into CPU version. I tried to look at some relevant posts regarding the CPU/GPU switch of PyTorch, but they are related to the usage of device and thus doesn't apply to my case.
As pointed out by kHarshit in his comment, you can simply replace .cuda() call with .cpu():
net.cpu()
# ...
im = torch.from_numpy(im).unsqueeze(0).float().cpu()
However, this requires changing the code in multiple places every time you want to move from GPU to CPU and vice versa.
To alleviate this difficulty, pytorch has a more "general" method .to().
You may have a device variable defining where you want pytorch to run, this device can also be the CPU (!).
for instance:
if torch.cuda.is_available():
device = torch.device("cuda")
else:
device = torch.device("cpu")
Once you determined once in your code where you want/can run, simply use .to() to send your model/variables there:
net.to(device)
# ...
im = torch.from_numpy(im).unsqueeze(0).float().to(device)
BTW,
You can use .to() to control the data type (.float()) as well:
im = torch.from_numpy(im).unsqueeze(0).to(device=device, dtype=torch.float)
PS,
Note that the Variable API has been deprecated and is no longer required.
net = torch.load(model_path, map_location=torch.device('cpu'))
Pytorch docs: https://pytorch.org/tutorials/beginner/saving_loading_models.html#save-on-cpu-load-on-gpu

how to properly load pickled machine learning models on server

It is a Python Flask app.
The same code works in my local when I run this app on local. But on a server that I rent from DigitalOcean, it gives me this problem.
I have been trying to load this machine learning model(a classification model I trained using sklearn) at run. But it gives me this error OR sometimes, with some solutions I saw in Stackoverflow added, hangs forever.
UnicodeDecodeError: 'ascii' codec can't decode byte 0x80 in position 0
I tried every solution in Stackoverflow: like adding encoding="latin-1" or "bytes" on loading the pickle. I tried below and many other combinations with different arguments that people recommended in Stackoverflow.
def load_model(file_path):
script_directory = os.path.split(os.path.abspath(__file__))[0]
abs_filepath = os.path.join(script_directory, file_path)
with open(abs_filepath, 'rb') as f:
classifier = pickle.loads(f.read())
return classifier
def load_model(file_path):
script_directory = os.path.split(os.path.abspath(__file__))[0]
abs_filepath = os.path.join(script_directory, file_path)
with open(abs_filepath, 'r') as f:
# also with 'rb'
classifier = pickle.load(f, encoding="bytes")
# also with "latin-1" "latin1" etc.. and load, loads, f, and f.read()
return classifier
model = load_model("modelname.pickle")
What is wrong with this?
It seems as there is a problem when you pickle the model using python2 and try to load this model using python3.
Did you try to load this model using Python 2?
Do you have the same mistake when you run it with parameter encoding=latin1? If it is a different error maybe you need to run
dill._dill._reverse_typemap["ObjectType"] = object
before loading it.
The problem is described pretty well here:
https://rebeccabilbro.github.io/convert-py2-pickles-to-py3/

Why does xLearn fit function causes kernel crashes in Jupyter?

I'm trying to make CTR (Click through rate) prediction using a python module named 'xlearn'.
It enables me to implement a FFM (field-aware factorisation machine) quite easily.
However, I have a problem with the fit function ( supposed to train the model) which crashes the kernel of my jupyter notebook without any error messages.
Here is the code :
import xlearn as xl
ffm_model = xl.create_ffm()
param = {'task':'binary', 'lr':0.2, 'lambda':0.002, 'metric':'acc'}
ffm_model.setTrain('ffm_train.txt')
ffm_model.fit(param, "./model.out") #this line crashes the kernel
I've already tried to fit the model just after python ffm_model = xl.create_ffm() this also crashes the kernel without any error messages ...
Don't hesitate to share your ideas I'm really stuck here.
I didn't realize the xLearn module was showing error messages in the terminal :
Xlearn Imgae Error Messages

Resources